Skip to main content

Background: I have a feed that inserts data into an Airtable base using the API. Sometimes the feed inserts non-printing characters into one of the fields.

Specifically, it inserts the UTF-8 Byte Order Mark (BOM), EF BB BF, in the middle of a string. The field it gets inserted into is a URL field.

The BOM isn’t visible in Airtable, but when we try to use the link, we get 404 because of the %EF%BB%BF in the middle of the link.

For example, the link as displayed in Airtable would be something like “https://community.airtable.com/topic/2341”, but what is actually stored is “https://community.airtable.com/topic/%EF%BB%BF2341” . 

Searching for %EF%BB%BF in Airtable using filters or automations gives no results.

Question: How can I search for and remove the BOM from field values where it appears?

Hm could you create a view or something to share this data?  I want to experiment with this but need an example of the URL to do so!


Hi ​@nroshak,

 

I hope you’re doing well. If you are already using the Airtable API, you could create a function on your code to clean up the data before sending it to Airtable. Or if you want to keep that code the same and use a cleaning automation on the Airtable side, here is a script you can use that should take care of this. You can set it up as a script automation that will run when a new record is created. Hope that helps!

let config = input.config();
const recordId = config.recordId; // passed in from automation trigger
const tableName = config.tableName; // passed in or hardcode below if always same table

if (!recordId) {
throw new Error("No recordId provided. Pass it from the trigger step.");
}
if (!tableName) {
throw new Error("No tableName provided. Pass it from the trigger step or hardcode.");
}

const table = base.getTable(tableName);

// Editable text-like field types to check
const TEXT_FIELD_TYPES = new Set(
'url'
]);

// Patterns to remove: URL-encoded BOM and literal BOM
const BAD_REGEXES = E
/%ef%bb%bf/gi, // URL-encoded BOM, any case
/\uFEFF/g // literal BOM character
];

// Helper function to remove unwanted substrings/characters
function cleanString(s) {
let out = s;
for (const rx of BAD_REGEXES) {
out = out.replace(rx, '');
}
return out;
}

// Find editable string fields
const candidateFields = table.fields.filter(f => {
return !f.isComputed && TEXT_FIELD_TYPES.has(f.type);
});

// Fetch the target record
const record = await table.selectRecordAsync(recordId);
if (!record) {
throw new Error(`Record with ID ${recordId} not found.`);
}

let fieldsToUpdate = {};
for (const field of candidateFields) {
const raw = record.getCellValue(field);
if (typeof raw !== 'string') continue;
const cleaned = cleanString(raw);
if (cleaned !== raw) {
fieldsToUpdatepfield.name] = cleaned;
}
}

if (Object.keys(fieldsToUpdate).length > 0) {
await table.updateRecordAsync(recordId, fieldsToUpdate);
console.log(`Cleaned BOM from record ${recordId} in fields: ${Object.keys(fieldsToUpdate).join(', ')}`);
} else {
console.log(`No BOM found in record ${recordId}.`);
}


You could definitely use an automation and script as ​@airvues suggests but unless I’m missing something couldn’t you just use a formula field called say ‘Clean URL’ that targets your Source URL field and strips the BOM value out?

 

IF({Source URL},
  SUBSTITUTE({Source URL}, "%EF%BB%BF", ""),
  BLANK()
)

 

Or if the BOM is not always appearing in your source field as percentage encoded then you can use this slightly enhanced version which handles the same thing if it’s detectable as a Unicode character - UNICHAR(65279).

 

IF({Source URL},
  SUBSTITUTE(
    SUBSTITUTE(
      SUBSTITUTE({Source URL} & "", "%EF%BB%BF", ""),
      "%ef%bb%bf", ""
    ),
    UNICHAR(65279),
    ""
  ),
  BLANK()
)


Thank you, Enrique! I will try that. J., the problem is that formulas don’t match on the BOM. I can’t search for it or replace it by formula within Airtable.


@nroshak Gotcha. In the extended formula above though the fallback match on UNICHAR(65279) should work as that is the character that will be invisible initially but then appear as the url % encoded equivalent when you try to use the URL.


@nroshak Gotcha. In the extended formula above though the fallback match on UNICHAR(65279) should work as that is the character that will be invisible initially but then appear as the url % encoded equivalent when you try to use the URL.

J, I think this might be an Excel function? I get the error 

Unknown function names: UNICHAR

However! Thank you so much for the idea. Looking around for similar functions in Airtable, I found ENCODE_URL_COMPONENT:

IF(REGEX_MATCH(ENCODE_URL_COMPONENT({Link}),"%EF%BB%BF"),"BOM","OK")

I should be able to use this to identify and strip the BOM. Thank you!

Natalka


Ah yes of course ​@nroshak but that's a great idea to use encode_url_component on the source value so it forces the % vlaues to appear and then you can strip them out. Nice one. 👌