Escaping of data in formulas for API calls?

Is there a preferred way to escape data inside formulas when using the API? For example if we pass a formula into an API call like '{name} = ' + name where “name” is a variable that might contain user-generated inputs, then what’s the best way to escape the “name” variable?

The best I can find is to escape quotes? Using an escape character in a formula?

But is there a more standard library (I happen to be using Python) or escaping approach that can be applied (like would I use the csv or json module)? Or is this documented somewhere?

Welcome to the Airtable community!

The resulting formula needs to exactly match what the formula field would look like in the user interface.

For example if you wanted to filter for people named Peter the result would need to be

{name} = "Peter"

If the name might have quotes in it like Henry M. “Hank” Paulson, you may need to escape those quotes if they are straight quotes. (This forum software can mess up display of curly/straight quotes.)

{name} = "Henry M. \"Hank\" Paulson"

Or you can alternate quote styles

{name} = 'Henry M. "Hank" Paulson'

But you do not need a special escaping library, since the only thing you would need to escape would be quotes. Just do a one line regex replace to escape whatever quote style you choose.

Of course, when you pass the filterByFormula to the API, you may need to do urlencoding, but that is a totally different matter and would happen after you build the formula.

2 Likes

In python something like this could help you deal with variables:
air_filter = ‘{{name}}="{}"’.format(your_variable_name)

Thanks for the pointers, @kuovonne !

What if the text has a newline in it? For example, if there’s a formula where you want to check equality to a “Long text” value that has newlines in it? Should the newline be escaped or is it not allowed in formulas? At least if I try with the airtable-python-wrapper library it generates a bad query and raises an INVALID_FILTER_BY_FORMULA error.

Newline characters in Airtable formulas need to be represented as "\n" versus an actual new line character. I think this (or something similar) is standard in most programming languages.

I’ll take this syntax:

write`Today is ${this.day}`

Over import beautifulsoup4 any day of the week, than you very much!

Thanks for the pointers, @kuovonne !

From your suggestions and some additional experimentation, I went ahead and coded up a helper function in Python for escaping variables. Anyone interested can take a look here: How to escape variables in formulas for the Airtable API - Peter Coles (Please comment if I’ve missed any edge cases!)

This is the solution I came up with: link.

It’s not based on any official documentation, as I could find nothing about this, I just looked at all of the examples to try to identify which characters could change the structure of the query, decided it’s probably quotes and backslashes, and then escaped those characters.

It’s alarming that there is no official escape function we can use, and there is no official docs on how to escape, and since the source code is private, we can’t analyze the parser to figure it out, either :disappointed:

I replied to each of the comments above, but this forum software won’t let me mention more than 2 people or include more than 3 links, so I’m going to have to break this response up into multiple responses. Sorry for spamming, IDK what else to do.

@kuovonne’s answer (link) seems decent, but I’m pretty we need to escape backslashes, too. I haven’t thought through how to attack a query that doesn’t escape them. :thinking: At the very least, you could put a backslash as the last character to break the formula, not sure if you could get to the point of arbitrary injection, though.

And her comment (link) that newlines should be represented as an escape character followed by the letter “n” is incorrect, according to my testing it should be a literal newline (ASCII 10). You can scrutinize my test which I will post after the last reply (sorry, the forum software thinks I’m posting too many at-mentions and links, so I have to break my reply up).

@CH3V4L13R’s Python solution (link) is vulnerable to injection attacks

@Dominik_Bosnjak’s Javascript solution… (link) I can’t tell if it’s correct or not. It looks like interpolation, which would make it a security vulnerability, but it’s doing it in a template literal tagged with write, so JavaScript will pass it to the write function as a parameter (docs). The write function here could be escaping it properly. I don’t know what that function is, @Dominik_Bosnjak, can you link to the docs for that function so we can see if it escapes the inputs, and if so, how it does this?

@Peter_Coles in the code in your blog (link), you replace \n with \\n, which is incorrect (see my test below), same for replacing \t with \\t, and single quotes should not be escaped either. It also needs to escape backslashes.


Here is the test i ran, I tried all the variations that people suggested or implied (eg 2 character newline, escaping single quotes, etc) and they did not work, I played with them until they did work and then used that to build the escape function you see:

require 'uri'
require 'rack/utils'

def escape(str)
  # This puts a backslash before backslashes and double quotes
  str.gsub(/[\\"]/, "\\\\\\&")
end

base = ENV.fetch 'AIRTABLE_BASE'
key  = ENV.fetch 'AIRTABLE_API_KEY'
uri  = URI "https://api.airtable.com/v0/#{base}/testing%20escape%20functions"

[ "abc def",
  "abc\ndef",
  "abc\tdef",
  'abc "def" ghi',
  "abc 'def' ghi",
  "abc \\ def",
].each do |str|
  puts "===== #{str.inspect} -> #{escape(str).inspect} ====="
  uri.query = Rack::Utils.build_query filterByFormula: %({Value}="#{escape(str)}")
  system "curl", uri.to_s, "-s", "-H", "Authorization: Bearer #{key}"
  puts
  puts
end

Here are the results:

$ ruby test_airtable.rb
===== "abc def" -> "abc def" =====
{"records":[{"id":"recBibipUDVVydD0K","fields":{"Value":"abc def"},"createdTime":"2021-09-25T14:42:13.000Z"}]}

===== "abc\ndef" -> "abc\ndef" =====
{"records":[{"id":"recUzj6267kQDoS1b","fields":{"Value":"abc\ndef"},"createdTime":"2021-09-25T13:56:38.000Z"}]}

===== "abc\tdef" -> "abc\tdef" =====
{"records":[{"id":"rec1pMhx7dsFRSPO9","fields":{"Value":"abc\tdef"},"createdTime":"2021-09-25T15:01:59.000Z"}]}

===== "abc \"def\" ghi" -> "abc \\\"def\\\" ghi" =====
{"records":[{"id":"recBINRpQ8W99yOfa","fields":{"Value":"abc \"def\" ghi"},"createdTime":"2021-09-25T13:56:38.000Z"}]}

===== "abc 'def' ghi" -> "abc 'def' ghi" =====
{"records":[{"id":"recx8KmvWuDkhEPPT","fields":{"Value":"abc 'def' ghi"},"createdTime":"2021-09-25T15:04:59.000Z"}]}

===== "abc \\ def" -> "abc \\\\ def" =====
{"records":[{"id":"recMLYuYH3bNzH3X9","fields":{"Value":"abc \\ def"},"createdTime":"2021-09-25T13:56:38.000Z"}]}

I think Airtable really needs to publish an official escape function that we can all reference. It’s alarming that we’re here trying to piece it together. In 2021, injection attacks have been the number 3 security vulnerability in web applications, but I’m pretty sure it was #1 last year (I can’t link it because this forum software is not allowing me to have more than 3 links, but you can find my source by searching “OWASP top 10”).

It scares me how little this is discussed and how often I see interpolation proposed. As devs, we need to train our spider-sense to trigger on interpolation into structured strings like queries and JSON, and also to trigger on interpolation of user data in general, and the combination of those 2 should ring all the alarm bells, because that’s how you get your databases stolen and your companies ransomed.

Forgot the details here, but the basic escaping comment confused me: escape is definitely a thing in any global scope - real, virtual, whatrever. It’s not crazy-powerful on its own, but it’s fairly scalable, from my experience.

Any JS runtime should interpret this naked call the same:

escape('\\')

Where are you running this to be having issues?

Just tried it in the Scripting app’s “fake” global scope as well, parses strings from another base API just fine.

But assuming I’m just not seeing how deeply rooted this issue of yours was, I’ll just say that I admire the tenaciousness. I’d have given up after 15 minutes and simply added another formula field to the source (or mirror) that formats whatever data I want to poll exactly as I want it. No quadruple backslashes, no quintuple grave marks, no any sequence of punctuation, if they really ground your gears haha.