Issue with CSV import with regards to certain characters


#1

Hi,

I’m trying to import data from CSV into Airtable. This data is a product catalog that I want to import into Airtable to manage it there going forward. But upon uploading, I’m seeing this � character appearing in some data points and not sure why this is happening and how to fix it. Would appreciate any advice.

For example, data in my csv file is exactly as per the below:
“…A lazy isle to which nature has given singular trees, savory fruits, men with bodies vigorous and slender, and women in whose eyes shines a startling candor…” - (baudelaire, les fleurs du mal, parfum exotique, 5-8)

What’s appearing in Airtable after importing the file is:
��A lazy isle to which nature has given singular trees, savory fruits, men with bodies vigorous and slender, and women in whose eyes shines a startling candor�� - (baudelaire, les fleurs du mal, parfum exotique, 5-8)

Another example, this is a line I am importing:
Invented by scientists during 35 years of research, StriVectin’s proprietary NIA-114 technology

And this is what is appearing in Airtable:
Invented by scientists during 35 years of research, StriVectin�s proprietary NIA-114 technology

Thanks!


#2

Hi

I am new to Airtable Comminity, but I have been reading a lot of the posts and I did a search with the keywords: “CSV import and characters” in the search engine and there are a list of other postings with similar issues.

It might help you to get started and perhaps someone with more experience with this issue, can point you in the right direction.

Sorry, that I was not more helpful, but I hope the above reply will be somewhat useful.

Thank you,
MK


#3

What you are seeing suggests a problem exists (most likely) with the encoding of the product catalog data. The � character is the Unicode replacement character, used as a place-holder when an application detects an invalid character encoding in the datastream. Here, it appears whatever created the product catalog used a non-standard encoding for double and single curly (probably) quotes and for the horizontal ellipsis. You can clean up your source data using something like sed or a programming editor. (If you’re on Windows, I swear by Notepad++ because of its ability to wrangle very large files with little sweat and without introducing unintentional modifications.) Alternatively, you might be able to tweak your browser’s encoding settings — but I’m not certain doing would have any effect on the Airtable routine handling data importation.


#4

I am having the same problem. I have a large .xlsm file that I want to import to AT. I save the relevant worksheet to a csv file using Excel 2007 on a Windows 10 machine. When I import the sheet to AT, text fields have a lot unicode error symbols. Excel allows different save as options in terms of the CSV types (msdos, mac) and I’ve tried them all to no avail. I suspect that the text I copied and pasted from the web has something embedded that I don’t see. Any suggestions so I don’t have to retype every entry? Tnx.


#5

Sorry, no idea, as I’ve not had this problem myself. (I’m dealing with a customer base that contains some replacement characters, but only in data already populated when I received it.) Typically, when I see it, it’s usually some sort of punctuation glitch that appears in regularly repeating and presumably search-and-replaceable phrases. If that’s the case with your dataset, I’d suggest pulling the CSV into a reasonably clean file editor (not a word processor) and replacing the offending code with an encoding-friendly equivalent. (Once against I recommend Notepad++.)

If that isn’t the case — if you indeed believe you have non-displaying embedded characters — my suggested solution is, surprise, Notepad++. Install the 32-bit version and the hexeditor plugin (hexed doesn’t work with 64-bit NPP); this should let you examine the CSV at the byte level, revealing what hidden characters you’ll need to search-and-replace. (Another recommended tool for byte-level examination is Frhed, the Free Hex-editor. It can’t perform edits as easily as NPP, but, IIRC, Frhed allowed me to discover Airtable was embedding double-quotes around certain copy-and-pasted fields that even NPP kept hidden. It also allowed me to examine the mysterious three-byte value \0xef\0xbb\0xbf a user was complaining about in data exported from Airtable; another side-effect of mismatched encodings, it proved to be a byte order mark which, although valid in UTF-8 encoded files, often trips up apps not expecting it…