Help

Re: Escaping of data in formulas for API calls?

4673 0
cancel
Showing results for 
Search instead for 
Did you mean: 
Peter_Coles
4 - Data Explorer
4 - Data Explorer

Is there a preferred way to escape data inside formulas when using the API? For example if we pass a formula into an API call like '{name} = ' + name where “name” is a variable that might contain user-generated inputs, then what’s the best way to escape the “name” variable?

The best I can find is to escape quotes? Using an escape character in a formula?

But is there a more standard library (I happen to be using Python) or escaping approach that can be applied (like would I use the csv or json module)? Or is this documented somewhere?

15 Replies 15

Welcome to the Airtable community!

The resulting formula needs to exactly match what the formula field would look like in the user interface.

For example if you wanted to filter for people named Peter the result would need to be

{name} = "Peter"

If the name might have quotes in it like Henry M. “Hank” Paulson, you may need to escape those quotes if they are straight quotes. (This forum software can mess up display of curly/straight quotes.)

{name} = "Henry M. \"Hank\" Paulson"

Or you can alternate quote styles

{name} = 'Henry M. "Hank" Paulson'

But you do not need a special escaping library, since the only thing you would need to escape would be quotes. Just do a one line regex replace to escape whatever quote style you choose.

Of course, when you pass the filterByFormula to the API, you may need to do urlencoding, but that is a totally different matter and would happen after you build the formula.

CH3V4L13R
5 - Automation Enthusiast
5 - Automation Enthusiast

In python something like this could help you deal with variables:
air_filter = ‘{{name}}="{}"’.format(your_variable_name)

Thanks for the pointers, @kuovonne !

What if the text has a newline in it? For example, if there’s a formula where you want to check equality to a “Long text” value that has newlines in it? Should the newline be escaped or is it not allowed in formulas? At least if I try with the airtable-python-wrapper library it generates a bad query and raises an INVALID_FILTER_BY_FORMULA error.

Newline characters in Airtable formulas need to be represented as "\n" versus an actual new line character. I think this (or something similar) is standard in most programming languages.

I’ll take this syntax:

write`Today is ${this.day}`

Over import beautifulsoup4 any day of the week, than you very much!

Thanks for the pointers, @kuovonne !

From your suggestions and some additional experimentation, I went ahead and coded up a helper function in Python for escaping variables. Anyone interested can take a look here: How to escape variables in formulas for the Airtable API - Peter Coles (Please comment if I’ve missed any edge cases!)

Josh_Cheek
5 - Automation Enthusiast
5 - Automation Enthusiast

This is the solution I came up with: link.

It’s not based on any official documentation, as I could find nothing about this, I just looked at all of the examples to try to identify which characters could change the structure of the query, decided it’s probably quotes and backslashes, and then escaped those characters.

It’s alarming that there is no official escape function we can use, and there is no official docs on how to escape, and since the source code is private, we can’t analyze the parser to figure it out, either :disappointed:

I replied to each of the comments above, but this forum software won’t let me mention more than 2 people or include more than 3 links, so I’m going to have to break this response up into multiple responses. Sorry for spamming, IDK what else to do.

@kuovonne’s answer (link) seems decent, but I’m pretty we need to escape backslashes, too. I haven’t thought through how to attack a query that doesn’t escape them. :thinking: At the very least, you could put a backslash as the last character to break the formula, not sure if you could get to the point of arbitrary injection, though.

And her comment (link) that newlines should be represented as an escape character followed by the letter “n” is incorrect, according to my testing it should be a literal newline (ASCII 10). You can scrutinize my test which I will post after the last reply (sorry, the forum software thinks I’m posting too many at-mentions and links, so I have to break my reply up).

@CH3V4L13R’s Python solution (link) is vulnerable to injection attacks

@Dominik_Bosnjak’s Javascript solution… (link) I can’t tell if it’s correct or not. It looks like interpolation, which would make it a security vulnerability, but it’s doing it in a template literal tagged with write, so JavaScript will pass it to the write function as a parameter (docs). The write function here could be escaping it properly. I don’t know what that function is, @Dominik_Bosnjak, can you link to the docs for that function so we can see if it escapes the inputs, and if so, how it does this?

@Peter_Coles in the code in your blog (link), you replace \n with \\n, which is incorrect (see my test below), same for replacing \t with \\t, and single quotes should not be escaped either. It also needs to escape backslashes.


Here is the test i ran, I tried all the variations that people suggested or implied (eg 2 character newline, escaping single quotes, etc) and they did not work, I played with them until they did work and then used that to build the escape function you see:

require 'uri'
require 'rack/utils'

def escape(str)
  # This puts a backslash before backslashes and double quotes
  str.gsub(/[\\"]/, "\\\\\\&")
end

base = ENV.fetch 'AIRTABLE_BASE'
key  = ENV.fetch 'AIRTABLE_API_KEY'
uri  = URI "https://api.airtable.com/v0/#{base}/testing%20escape%20functions"

[ "abc def",
  "abc\ndef",
  "abc\tdef",
  'abc "def" ghi',
  "abc 'def' ghi",
  "abc \\ def",
].each do |str|
  puts "===== #{str.inspect} -> #{escape(str).inspect} ====="
  uri.query = Rack::Utils.build_query filterByFormula: %({Value}="#{escape(str)}")
  system "curl", uri.to_s, "-s", "-H", "Authorization: Bearer #{key}"
  puts
  puts
end

Here are the results:

$ ruby test_airtable.rb
===== "abc def" -> "abc def" =====
{"records":[{"id":"recBibipUDVVydD0K","fields":{"Value":"abc def"},"createdTime":"2021-09-25T14:42:13.000Z"}]}

===== "abc\ndef" -> "abc\ndef" =====
{"records":[{"id":"recUzj6267kQDoS1b","fields":{"Value":"abc\ndef"},"createdTime":"2021-09-25T13:56:38.000Z"}]}

===== "abc\tdef" -> "abc\tdef" =====
{"records":[{"id":"rec1pMhx7dsFRSPO9","fields":{"Value":"abc\tdef"},"createdTime":"2021-09-25T15:01:59.000Z"}]}

===== "abc \"def\" ghi" -> "abc \\\"def\\\" ghi" =====
{"records":[{"id":"recBINRpQ8W99yOfa","fields":{"Value":"abc \"def\" ghi"},"createdTime":"2021-09-25T13:56:38.000Z"}]}

===== "abc 'def' ghi" -> "abc 'def' ghi" =====
{"records":[{"id":"recx8KmvWuDkhEPPT","fields":{"Value":"abc 'def' ghi"},"createdTime":"2021-09-25T15:04:59.000Z"}]}

===== "abc \\ def" -> "abc \\\\ def" =====
{"records":[{"id":"recMLYuYH3bNzH3X9","fields":{"Value":"abc \\ def"},"createdTime":"2021-09-25T13:56:38.000Z"}]}

I think Airtable really needs to publish an official escape function that we can all reference. It’s alarming that we’re here trying to piece it together. In 2021, injection attacks have been the number 3 security vulnerability in web applications, but I’m pretty sure it was #1 last year (I can’t link it because this forum software is not allowing me to have more than 3 links, but you can find my source by searching “OWASP top 10”).

It scares me how little this is discussed and how often I see interpolation proposed. As devs, we need to train our spider-sense to trigger on interpolation into structured strings like queries and JSON, and also to trigger on interpolation of user data in general, and the combination of those 2 should ring all the alarm bells, because that’s how you get your databases stolen and your companies ransomed.

Forgot the details here, but the basic escaping comment confused me: escape is definitely a thing in any global scope - real, virtual, whatrever. It’s not crazy-powerful on its own, but it’s fairly scalable, from my experience.

Any JS runtime should interpret this naked call the same:

escape('\\')

Where are you running this to be having issues?

Just tried it in the Scripting app’s “fake” global scope as well, parses strings from another base API just fine.

But assuming I’m just not seeing how deeply rooted this issue of yours was, I’ll just say that I admire the tenaciousness. I’d have given up after 15 minutes and simply added another formula field to the source (or mirror) that formats whatever data I want to poll exactly as I want it. No quadruple backslashes, no quintuple grave marks, no any sequence of punctuation, if they really ground your gears haha.

To interpolate data into a query, that data needs to be escaped according to the rules of the query language. If you don’t do this, then the data can affect the structure of the query.

For example, a malicious user can enter their email like so, causing the formula that is trying to filter to their user to instead return true to all users:

const name  = `Josh Cheek`
const email = `josh@example.com", ""="`
const query = `OR({Name} = "${name}", {Email} = "${email}")`
console.log(query)
// >> OR({Name} = "Josh Cheek", {Email} = "josh@example.com", ""="")

Here, the query language is the formula language that Airtable has created. The global escape function that comes from JavaScript has no knowledge of Airtable, it’s created for some other purpose. For example, a double quote which is properly escaped should be converted into \", but the escape function you’re referencing converts it to %22, because it’s actually escaping for URLs. And apparently it does a bad job of it, because the ECMAScript standard itself basically says to not use it (see the comment at the end of its definition, here, and at the beginning of section “B”, which defines escape).

@Josh_Cheek I agree with you that the libraries should have an official escape function and that without one injection attacks are way more likely. I was hoping to see a clearer answer when I posted this question and that’s why I wrote my blog post.

I have to disagree with you though about my code being incorrect. :stuck_out_tongue_winking_eye: My example was tested in Python3 using the airtable-python-wrapper against a live API. The escaping of newline characters and tabs is necessary, otherwise—as I previously mentioned—it raises an error:

Should the newline be escaped or is it not allowed in formulas? At least if I try with the airtable-python-wrapper library it generates a bad query and raises an INVALID_FILTER_BY_FORMULA error.

Given your example is in Ruby and makes system calls, does it handle newline and tab characters differently?

I have to disagree with you though about my code being incorrect. :stuck_out_tongue_winking_eye: My example was tested in Python3 using the airtable-python-wrapper against a live API. The escaping of newline characters and tabs is necessary, otherwise—as I previously mentioned—it raises an error:

Hmm, I thought it had to be a literal newline, maybe I was mistaken, or maybe they changed it since then. It seems it will interpret "\n" and "\\n" as a newline, so that one is optional. It doesn’t do the same for the tab, though, so the tab should not be escaped.

Here is my test using your code (I deleted the comments and blank lines, but did not change the code):

# ===== Code from here: https://mrcoles.com/escape-variables-airtable-api/ =====
from typing import Any, Optional, Union
from decimal import Decimal

class AirtableBadTypeException(Exception):
    def __init__(self, val: Any):
        self.val = val
        super().__init__(f"unexpected type variable type: {type(val)}")

def airtable_escape_variable(val: Optional[Union[str, int, float, bool]]) -> str:
    if val is None:
        return "BLANK()"
    if isinstance(val, str):
        escaped_val = airtable_escape_str(val)
        return f'"{escaped_val}"'
    if isinstance(val, bool):
        return str(1 if val else 0)
    if isinstance(val, (int, float, Decimal)):
        return str(val)
    raise AirtableBadTypeException(val)

def airtable_escape_str(val: str, wrap_is_double_quotes=True) -> str:
    val = (
        val.replace("\r", "")  # strip windows carriage returns
        .replace("\\", "\\\\")  # escape backslashes
        .replace("\n", "\\n")  # escape line feeds
        .replace("\t", "\\t")  # escape tabs
    )
    if wrap_is_double_quotes:
        return val.replace('"', '\\"')
    else:
        return val.replace("'", "\\'")


# ===== My code to test it =====
import os
import urllib.request
import urllib.parse

base = os.environ.get('AIRTABLE_BASE')
key  = os.environ.get('AIRTABLE_API_KEY')
url  = f'https://api.airtable.com/v0/{base}/testing%20escape%20functions'

# These are the values that I have in airtable. If I escape them correctly,
# they will match and return the record with it as a value
airtable_values = [
  "abc def",
  "abc\ndef",
  "abc\tdef",
  'abc "def" ghi',
  "abc 'def' ghi",
  "abc \\ def",
  "abc\\\ndef",
  "abc\\ndef",
]

for airtable_value in airtable_values:
    escaped = airtable_escape_variable(airtable_value)
    print(f'===== {repr(airtable_value)} -> {repr(escaped)} =====')
    params = urllib.parse.urlencode({'filterByFormula': '{Value}=%s' % escaped})
    request = urllib.request.Request(f'{url}?{params}')
    request.add_header("Authorization", f'Bearer {key}')
    print(urllib.request.urlopen(request).read().decode('utf-8'))
    print()

Here is the data I’m testing it against:

$ curl "https://api.airtable.com/v0/$AIRTABLE_BASE/testing%20escape%20functions" \
       -H "Authorization: Bearer $AIRTABLE_API_KEY" \
       --silent \
       | jq -c '.records[].fields'
{"Value":"abc\tdef"}
{"Value":"abc \"def\" ghi"}
{"Value":"abc def"}
{"Value":"abc\\\ndef"}
{"Value":"abc \\ def"}
{"Value":"abc\ndef"}
{"Value":"abc 'def' ghi"}
{"Value":"abc\\ndef"}

And here is the output, to my surprise, the newline worked, to yours, the tab doesn’t, so I guess we’re all surprised here :stuck_out_tongue_closed_eyes: Anyway, if you remove the .replace("\t", "\\t") it will work.

$ python3 omg.py
===== 'abc def' -> '"abc def"' =====
{"records":[{"id":"recBibipUDVVydD0K","fields":{"Value":"abc def"},"createdTime":"2021-09-25T14:42:13.000Z"}]}

===== 'abc\ndef' -> '"abc\\ndef"' =====
{"records":[{"id":"recUzj6267kQDoS1b","fields":{"Value":"abc\ndef"},"createdTime":"2021-09-25T13:56:38.000Z"}]}

===== 'abc\tdef' -> '"abc\\tdef"' =====
{"records":[]}

===== 'abc "def" ghi' -> '"abc \\"def\\" ghi"' =====
{"records":[{"id":"recBINRpQ8W99yOfa","fields":{"Value":"abc \"def\" ghi"},"createdTime":"2021-09-25T13:56:38.000Z"}]}

===== "abc 'def' ghi" -> '"abc \'def\' ghi"' =====
{"records":[{"id":"recx8KmvWuDkhEPPT","fields":{"Value":"abc 'def' ghi"},"createdTime":"2021-09-25T15:04:59.000Z"}]}

===== 'abc \\ def' -> '"abc \\\\ def"' =====
{"records":[{"id":"recMLYuYH3bNzH3X9","fields":{"Value":"abc \\ def"},"createdTime":"2021-09-25T13:56:38.000Z"}]}

===== 'abc\\\ndef' -> '"abc\\\\\\ndef"' =====
{"records":[{"id":"recKB7WYh4Rn7fyCB","fields":{"Value":"abc\\\ndef"},"createdTime":"2022-01-13T03:07:21.000Z"}]}

===== 'abc\\ndef' -> '"abc\\\\ndef"' =====
{"records":[{"id":"recy5ApyF2LWREZPj","fields":{"Value":"abc\\ndef"},"createdTime":"2022-01-13T03:12:12.000Z"}]}
Josh_Cheek
5 - Automation Enthusiast
5 - Automation Enthusiast

sigh My post got marked as potential spam and is awaiting moderation. So if you come here to read it, and don’t see it, then that’s why and you’ll have to wait until a moderator approves it. This forum software really hates me :disappointed:

Josh_Cheek
5 - Automation Enthusiast
5 - Automation Enthusiast

Also, looking at your code, I realized that I missed the case of "\r". I added an example with that in it:

$ curl "https://api.airtable.com/v0/$AIRTABLE_BASE/testing%20escape%20functions" \
       -H "Authorization: Bearer $AIRTABLE_API_KEY" \
       --silent \
       | jq -c '.records[].fields' \
       | grep '\\\\r'
{"Value":"abc\rdef"}

However, I have not found a way to escape the carriage return in such a way that the API will return it. It seems like a bug to me :confused: I’ve opened an issue asking about it (here).