Help

Re: Strange Characters in CSV Export File

2118 0
cancel
Showing results for 
Search instead for 
Did you mean: 
Mens_Shed
5 - Automation Enthusiast
5 - Automation Enthusiast

When I export values from a table I get three weird characters at the start of only the first line which contains the field names.

My first line looks like this

Member,Forename,Middle Names,Surname,DOB,Address 1,Address 2,Town,County,Postcode,Home Phone,Mobile Phone,ICE Contact,ICE Number,Email Address,Occupation,Qualifications,Interests,Disabilities

Is this a bug?

12 Replies 12

You don’t say how you’re saving the export or under what OS you’re running, but that’s a byte order mark. Seemingly, you’re viewing a file saved as UTF-8 in an ISO-8559-1 encoded browser or editor. How to eliminate it depends on how you’re saving it and how you’re examining it: You may have an option to save the export without BOM, or you may be able to change the encoding for the saving application to ISO-8559-1.

Good morning and thanks for the reply.

  To create the file I simply use Airtable's 'Download CSV' menu choice which has no options and store the resulting CSV file on my Windows laptop.

  When I open the file in Notetab the characters are not visible but when I process the file using a local server side scripting language, the characters appear. (All I am doing at this stage is to isolate the field names and then parse out the corresponding field values whilst removing the " characters as my scripting language has a problem interpreting fields enclosed by that character.)

goipdbkbcpeecacc.png

I can probably strip out these characters but just wondered why they are there as this has not happened before in all the years I have been working with CSV export files.

I have checked the process on three different browsers and all give the same result.

Thanks again.

The standard encoding for HTML 4.01 was ISO-8859-1, while the standard for HTML 5 is UTF-8; that’s likely the reason you’re only now seeing BOMs.

You should check to see if the reason you’re not seeing the BOM in NoteTab is that they don’t actually exist. I think Windows Notepad expects to see a BOM at the beginning of UTF-8 files and, if one doesn’t exist, actually will add the mark at the beginning of the file. Could your scripting language be doing something similar? Does the language support encoding configuration?

Hello again

  I renamed  a fresh copy of the downloaded CSV file to test.txt and opened it using Open Office Writer and the characters are there so must have been written by the Airtable download process?
Member,Forename,Middle Names,Surname,DOB,Address 1,Address 2,Town,County,Postcode,Home Phone,Mobile Phone,ICE Contact,ICE Number,Email Address,Occupation,Qualifications,Interests,Disabilities
John Tuach

Sounds most like an artifact of OS/browser or OS/editor (or OS/browser/editor) inconsistencies. When I open my test exports from Airtable using a hex editor, the BOM character isn’t present. What OS are you using?

I am on Windows 7.

The process I used was to (a) download a CSV file from Airtable (b) rename it to test.txt and then © opened it with Open Office Writer so no browser was involved.

I would offer to attach the raw file for your perusal but not allowed.

Appreciate your help but so that I can carry on with writing the required script, I have just worked round the issue by chopping first three characters off the start of the first line in the file.

Risky I know but hey, what’s life without any risk!

Oh, are you using the Airtable Windows app?

Mens_Shed
5 - Automation Enthusiast
5 - Automation Enthusiast

Yes - I access Airtable via

through my laptop and Windows 7.

OK, now I’ve managed to confuse myself.

Where before I seemingly couldn’t get Airtable to preface a CSV export with a BOM, now I can’t seem to get it to stop doing that. This is true for the Windows app and the webapp via a browser. Even more mysteriously, this appears to have been standard export behavior for at least the past year; I have Airtable-exported CSV files from the time I was developing my Wardrobe Manager base that contain embedded BOMs.

The strangest thing, though, is that I also have a few exported CSVs without BOMs, including a few from earlier today. I don’t think the files were saved a second time after the initial export, but I’m at a loss as to how they could be missing the BOM. (Could this be a Win 7/Win 10 thing? Perhaps the BOM-less ones were exported using my other PC.)

I’m going to pursue this last possibility some more once I’m in arms’ reach of my Win 10 machine. Your solution will probably have to involve your scripting language. It might be worth doing some web searches on “[your scripting language]” and either “UTF-8” or “BOM”. (For instance, there are a number of sites addressing PowerShell and unwanted BOMs.)

Sorry to confuse everyone involved…

Mens_Shed
5 - Automation Enthusiast
5 - Automation Enthusiast

Hello again

I must have downloaded literally hundreds of text files from many different sites over the past 20 years and used Notetab, and my scripting language to process them and have never seen these characters in a CSV file (or any other type of text file) before.

My money is on Airtable introducing them when the file is constructed.

I wonder if AIrtable staffers are looking at this thread?

Remember, this is an issue that exists only for UTF-8 files saved under Windows. For all intents and purposes, in the web environment, this means files saved from HTML5-enabled sites. Since HTML5 wasn’t officially published until October 2014, the first 17 years are moot.

I just stepped through the random CSV files I happen to have on this PC; about half have a BOM and half don’t.

Note that Windows’ built-in apps will always add a BOM when saving a UTF-8 file, so it’s unlikely this won’t be an issue for you in the future. Notetab appears to have a troubled history with BOMs; the current version doesn’t appear to need them, but I haven’t been able to determine if it will strip them out on a file save. (Notepad++ seemingly can.)

I still think the problem is that your scripting language should ignore a BOM at the beginning of a text file. That said, I’m not sure how essential the BOM is these days, as most up-to-date apps can identify UTF-8 without it. Conceivably, Airtable could drop it — but you’re probably better off using one of the dozens of utilities or Unicode-knowledgeable editors to trim it off.

Zach_Young
5 - Automation Enthusiast
5 - Automation Enthusiast

I’m on a Mac and see this all the time, and am finally having to deal with the exported BOM in my export→process→import pipeline.

Here’s my perspective on this weird little byte sequence in Airtable CSV exports. All quoted text comes from Wikipedia: Byte order mark, and any emphasis is mine.

[It] is a particular usage of the the special Unicode character, U+FEFF BYTE ORDER MARK

It might also be identified as ZWNBSP:

If the BOM character appears in the middle of a data stream, Unicode says it should be interpreted as a “zero-width non-breaking space” [ZWNBSP]. In Unicode 3.2, this usage is deprecated in favor of the “Word Joiner” character, U+2060. This allows U+FEFF to be used only as a BOM.

Technically, the BOM is optional for UTF-8:

The Unicode Standard permits the BOM in UTF-8 but does not require or recommend its use.

but Windows requires it:

Microsoft compilers and interpreters, and many pieces of software on Microsoft Windows such as Notepad treat the BOM as a required magic number rather than use heuristics. These tools add a BOM when saving text as UTF-8, and cannot interpret UTF-8 unless the BOM is present or the file contains only ASCII.

My process involves submitting parts of the CSV to a third party for processing and getting back their results, and they strip the BOM, even though:

The standard also does not recommend removing a BOM when it is there, so that round-tripping between encodings does not lose information, and so that code that relies on it continues to work.

Yep, later stages of my pipeline fail for flagging a false difference when the stripped BOM isn’t there anymore. So, I’m going through and stripping it at the beginning now.

And I’m guessing it’s there in the first place because Airtable put it there by design to be cross-compatible w/Windows.