Help

Re: Escaping of data in formulas for API calls?

5528 0
cancel
Showing results for 
Search instead for 
Did you mean: 
Peter_Coles
4 - Data Explorer
4 - Data Explorer

Is there a preferred way to escape data inside formulas when using the API? For example if we pass a formula into an API call like '{name} = ' + name where “name” is a variable that might contain user-generated inputs, then what’s the best way to escape the “name” variable?

The best I can find is to escape quotes? Using an escape character in a formula?

But is there a more standard library (I happen to be using Python) or escaping approach that can be applied (like would I use the csv or json module)? Or is this documented somewhere?

15 Replies 15

Forgot the details here, but the basic escaping comment confused me: escape is definitely a thing in any global scope - real, virtual, whatrever. It’s not crazy-powerful on its own, but it’s fairly scalable, from my experience.

Any JS runtime should interpret this naked call the same:

escape('\\')

Where are you running this to be having issues?

Just tried it in the Scripting app’s “fake” global scope as well, parses strings from another base API just fine.

But assuming I’m just not seeing how deeply rooted this issue of yours was, I’ll just say that I admire the tenaciousness. I’d have given up after 15 minutes and simply added another formula field to the source (or mirror) that formats whatever data I want to poll exactly as I want it. No quadruple backslashes, no quintuple grave marks, no any sequence of punctuation, if they really ground your gears haha.

To interpolate data into a query, that data needs to be escaped according to the rules of the query language. If you don’t do this, then the data can affect the structure of the query.

For example, a malicious user can enter their email like so, causing the formula that is trying to filter to their user to instead return true to all users:

const name  = `Josh Cheek`
const email = `josh@example.com", ""="`
const query = `OR({Name} = "${name}", {Email} = "${email}")`
console.log(query)
// >> OR({Name} = "Josh Cheek", {Email} = "josh@example.com", ""="")

Here, the query language is the formula language that Airtable has created. The global escape function that comes from JavaScript has no knowledge of Airtable, it’s created for some other purpose. For example, a double quote which is properly escaped should be converted into \", but the escape function you’re referencing converts it to %22, because it’s actually escaping for URLs. And apparently it does a bad job of it, because the ECMAScript standard itself basically says to not use it (see the comment at the end of its definition, here, and at the beginning of section “B”, which defines escape).

@Josh_Cheek I agree with you that the libraries should have an official escape function and that without one injection attacks are way more likely. I was hoping to see a clearer answer when I posted this question and that’s why I wrote my blog post.

I have to disagree with you though about my code being incorrect. :stuck_out_tongue_winking_eye: My example was tested in Python3 using the airtable-python-wrapper against a live API. The escaping of newline characters and tabs is necessary, otherwise—as I previously mentioned—it raises an error:

Should the newline be escaped or is it not allowed in formulas? At least if I try with the airtable-python-wrapper library it generates a bad query and raises an INVALID_FILTER_BY_FORMULA error.

Given your example is in Ruby and makes system calls, does it handle newline and tab characters differently?

I have to disagree with you though about my code being incorrect. :stuck_out_tongue_winking_eye: My example was tested in Python3 using the airtable-python-wrapper against a live API. The escaping of newline characters and tabs is necessary, otherwise—as I previously mentioned—it raises an error:

Hmm, I thought it had to be a literal newline, maybe I was mistaken, or maybe they changed it since then. It seems it will interpret "\n" and "\\n" as a newline, so that one is optional. It doesn’t do the same for the tab, though, so the tab should not be escaped.

Here is my test using your code (I deleted the comments and blank lines, but did not change the code):

# ===== Code from here: https://mrcoles.com/escape-variables-airtable-api/ =====
from typing import Any, Optional, Union
from decimal import Decimal

class AirtableBadTypeException(Exception):
    def __init__(self, val: Any):
        self.val = val
        super().__init__(f"unexpected type variable type: {type(val)}")

def airtable_escape_variable(val: Optional[Union[str, int, float, bool]]) -> str:
    if val is None:
        return "BLANK()"
    if isinstance(val, str):
        escaped_val = airtable_escape_str(val)
        return f'"{escaped_val}"'
    if isinstance(val, bool):
        return str(1 if val else 0)
    if isinstance(val, (int, float, Decimal)):
        return str(val)
    raise AirtableBadTypeException(val)

def airtable_escape_str(val: str, wrap_is_double_quotes=True) -> str:
    val = (
        val.replace("\r", "")  # strip windows carriage returns
        .replace("\\", "\\\\")  # escape backslashes
        .replace("\n", "\\n")  # escape line feeds
        .replace("\t", "\\t")  # escape tabs
    )
    if wrap_is_double_quotes:
        return val.replace('"', '\\"')
    else:
        return val.replace("'", "\\'")


# ===== My code to test it =====
import os
import urllib.request
import urllib.parse

base = os.environ.get('AIRTABLE_BASE')
key  = os.environ.get('AIRTABLE_API_KEY')
url  = f'https://api.airtable.com/v0/{base}/testing%20escape%20functions'

# These are the values that I have in airtable. If I escape them correctly,
# they will match and return the record with it as a value
airtable_values = [
  "abc def",
  "abc\ndef",
  "abc\tdef",
  'abc "def" ghi',
  "abc 'def' ghi",
  "abc \\ def",
  "abc\\\ndef",
  "abc\\ndef",
]

for airtable_value in airtable_values:
    escaped = airtable_escape_variable(airtable_value)
    print(f'===== {repr(airtable_value)} -> {repr(escaped)} =====')
    params = urllib.parse.urlencode({'filterByFormula': '{Value}=%s' % escaped})
    request = urllib.request.Request(f'{url}?{params}')
    request.add_header("Authorization", f'Bearer {key}')
    print(urllib.request.urlopen(request).read().decode('utf-8'))
    print()

Here is the data I’m testing it against:

$ curl "https://api.airtable.com/v0/$AIRTABLE_BASE/testing%20escape%20functions" \
       -H "Authorization: Bearer $AIRTABLE_API_KEY" \
       --silent \
       | jq -c '.records[].fields'
{"Value":"abc\tdef"}
{"Value":"abc \"def\" ghi"}
{"Value":"abc def"}
{"Value":"abc\\\ndef"}
{"Value":"abc \\ def"}
{"Value":"abc\ndef"}
{"Value":"abc 'def' ghi"}
{"Value":"abc\\ndef"}

And here is the output, to my surprise, the newline worked, to yours, the tab doesn’t, so I guess we’re all surprised here :stuck_out_tongue_closed_eyes: Anyway, if you remove the .replace("\t", "\\t") it will work.

$ python3 omg.py
===== 'abc def' -> '"abc def"' =====
{"records":[{"id":"recBibipUDVVydD0K","fields":{"Value":"abc def"},"createdTime":"2021-09-25T14:42:13.000Z"}]}

===== 'abc\ndef' -> '"abc\\ndef"' =====
{"records":[{"id":"recUzj6267kQDoS1b","fields":{"Value":"abc\ndef"},"createdTime":"2021-09-25T13:56:38.000Z"}]}

===== 'abc\tdef' -> '"abc\\tdef"' =====
{"records":[]}

===== 'abc "def" ghi' -> '"abc \\"def\\" ghi"' =====
{"records":[{"id":"recBINRpQ8W99yOfa","fields":{"Value":"abc \"def\" ghi"},"createdTime":"2021-09-25T13:56:38.000Z"}]}

===== "abc 'def' ghi" -> '"abc \'def\' ghi"' =====
{"records":[{"id":"recx8KmvWuDkhEPPT","fields":{"Value":"abc 'def' ghi"},"createdTime":"2021-09-25T15:04:59.000Z"}]}

===== 'abc \\ def' -> '"abc \\\\ def"' =====
{"records":[{"id":"recMLYuYH3bNzH3X9","fields":{"Value":"abc \\ def"},"createdTime":"2021-09-25T13:56:38.000Z"}]}

===== 'abc\\\ndef' -> '"abc\\\\\\ndef"' =====
{"records":[{"id":"recKB7WYh4Rn7fyCB","fields":{"Value":"abc\\\ndef"},"createdTime":"2022-01-13T03:07:21.000Z"}]}

===== 'abc\\ndef' -> '"abc\\\\ndef"' =====
{"records":[{"id":"recy5ApyF2LWREZPj","fields":{"Value":"abc\\ndef"},"createdTime":"2022-01-13T03:12:12.000Z"}]}
Josh_Cheek
5 - Automation Enthusiast
5 - Automation Enthusiast

sigh My post got marked as potential spam and is awaiting moderation. So if you come here to read it, and don’t see it, then that’s why and you’ll have to wait until a moderator approves it. This forum software really hates me :disappointed:

Josh_Cheek
5 - Automation Enthusiast
5 - Automation Enthusiast

Also, looking at your code, I realized that I missed the case of "\r". I added an example with that in it:

$ curl "https://api.airtable.com/v0/$AIRTABLE_BASE/testing%20escape%20functions" \
       -H "Authorization: Bearer $AIRTABLE_API_KEY" \
       --silent \
       | jq -c '.records[].fields' \
       | grep '\\\\r'
{"Value":"abc\rdef"}

However, I have not found a way to escape the carriage return in such a way that the API will return it. It seems like a bug to me :confused: I’ve opened an issue asking about it (here).