Standard way to prevent formula injections when using AirTable `select` and `filterByFormula`

Luciano_Mammino · ‎Aug 05, 2022

NOTE I originally wrote this in a rush hoping that the information provided here would be sufficient to start a conversation. It turned out it was not. Please refer to this comment below for a better explanation (and A DEMO) of the issue: Standard way to prevent formula injections when using AirTable `select` and `filterByFormula` - #15 ...

Hello,

When using your APIs to get data dynamically from a table, is there any secure way to handle formulas that need to contain user input data?

How can we avoid a user trying to alter the formula by crafting an injection (like a SQL injection but for your formula language)?

I couldn’t find anything in the docs, nor any utility in your (JavaScript) SDK…

So far I had to come up with my own escape function…

In case you need an example, this is my formula:

{code} = 'someCodePassedByUser'

If the user passes the following code

' >= 0 & '

I end up with

{code} = '' >= 0 & ''

which is always TRUE!

Considering that injections are the 3rd item in the OWASP security top10, I would consider this a VERY BIG SECURITY FLAW for people using airtable as a backend. :skull_and_crossbones: :skull_and_crossbones: :skull_and_crossbones:

You should mention this in the docs and provide a standard way to sanitize user input for formulas.

A function built in the JavaScript SDK would be ideal…

Luciano_Mammino · ‎Aug 05, 2022

Lol, i am glad they don’t think that way when they revision potential security issues in things like openssl or chrome which have been in the market for longer than airtable and with way more users. Just because you haven’t found a problem in a long time it doesn’t prove that the product is free of defects (or opportunities for improvements if we want to call them that way).

To be clear, i am not trying to take the piss on airtable. On the contrary, i think it’s a great product which deserves the success it has. And that’s the precise reason way i think it’s important to tackle this kind of things.

I am not getting paid to spend my time reporting this stuff, but it’s the responsible thing to do, so I’d appreciate if we could have a slightly more mature conversation here…

Thanks

kuovonne · ‎Aug 05, 2022

No-one on this thread is getting paid for being here.

However, this community forum is not the best place to report changes you want in the product. There is no guarantee that anyone from Airtable will read a particular thread. If you want to be sure that someone at Airtable will read your comments, I recommend contacting support.

On the other hand, if you want to share your experiences with other user and explain how you have handled the situation, this forum is an excellent place where many people share information.

It sounds like you hav written your own escape function. Is it something like this?

const sanitizedCodeFromUser = dirtyCodeFromUser.replace("'", "\'")
const formulaBeforeEncoding = `{code} = '${sanitizedCodeFromUser}'`

While the concept of an injection attack is real, it is the responsibility of the person writing the code that includes user input to sanitize that user input. The REST API trusts the formula provided because the request came with valid credentials. Due to the nature of an injection attack, the language cannot protect against this type of attack by itself.

On the other hand, what is Airtable’s responsibility to educate new coders about the topic of injection attacks? This is less clear. The documentation was originally designed (many years ago) with the expectation that it would be used by experienced coders who were already familiar with coding and security practices, and only needed to learn Airtable’s specific vocabulary. Yet, more and more people are using code with Airtable without having any prior coding knowledge.

In my personal experience, Airtable has been very responsive when it comes to support requests to clarify documentation so that people will stop getting themselves in trouble. Documentation changes are much easier to implement than product changes.

kuovonne · ‎Aug 05, 2022

Your posts match my understanding of the technical aspects of this issue. Although, I might not fully understand the issue either.

On the other hand, I think Luciano is looking at a tree, while you see a forest.

Luciano_Mammino · ‎Aug 06, 2022

Thanks, @kuovonne! This is a much more constructive answer:

My escape function looks pretty much like the one you suggested (with some additional whistles and bells):

function escape (value) {
  if (value === null || typeof value === 'undefined') {
    return 'BLANK()'
  }

  if (typeof value === 'string') {
    const escapedString = value
      .replace(/"/g, '\\"')
      .replace(/'/g, "\\'")
      .replace(/\r/g, '')
      .replace(/\\/g, '\\\\')
      .replace(/\n/g, '\\n')
      .replace(/\t/g, '\\t')

    return `"${escapedString}"`
  }

  if (typeof value === 'number') {
    return String(value)
  }

  if (typeof value === 'boolean') {
    return value ? '1' : '0'
  }

  throw Error('Invalid value received')
}

In any case, I think you got the issue.

What I am suggesting is not a change in Airtable itself (this is not an inherent Airtable problem), but a change in the documentation and the SDK.

In the SDK, update the filterByFormula documentation (table select operation), making it very clear that injection is possible if dealing with user input and that it’s up to the developer to sanitize that input before including it in the formula.
Do the same in the REST API documentation
Provide an official sanitization implementation that might take care of all the possible edge cases (similarly to what most databases or databases library actually do, for instance mysqli-escape-string)

Developers can seriously shoot themselves in the leg with this kind of issue. Query injection has been one of the biggest sources of security vulnerability for the last 10 years on the web), so we should do everything we can to warn users, support them and prevent this issue from happening to them.

Luciano_Mammino · ‎Aug 06, 2022

Demo :rocket:

Context: I am building an invite-only website where users can access the website only if they have a valid invite code (which is passed as part of the URL in a query string parameter).

Examples of a valid invite code: https://secret-pizza-party-fgpypfb66-lmammino.vercel.app/?code=ce7d5886-2166-4aee-8038-548f68b9739d
Example of an invalid invite code: https://secret-pizza-party-fgpypfb66-lmammino.vercel.app/?code=invalid

Now, an attacker might assume I am using some database to validate the code and they might try to forge several malicious invite codes to try to perform a query injection that matches an arbitrary record in the database and gives them access…

Eventually, they come up with the following code: ' >= 0 & ', they encode it and use it in the URL, resulting in the following URL: https://secret-pizza-party-fgpypfb66-lmammino.vercel.app/?code=%27%20%3E%3D%200%20%26%20%27

…and they got access without having to know an invite code!

Now, this is a silly example, but imagine if someone builds something similar to allow their customers to access sensitive information (invoices, medical records, shipment details, list of purchased products, etc)…

You should get a feeling for how serious this could end up being for a company using an AirTable table as a backend database…

I have personally witnessed companies losing thousands of customers, getting sued, and losing a significant amount of money and credibility on something like this!

Why is this happening?

I am storing all my invite data in an Airtable table that contains the following records:

code,name,favouriteColor,weapon
14b25700-fe5b-45e8-a9be-4863b6239fcf,Leonardo,blue,Twin Katana
ce7d5886-2166-4aee-8038-548f68b9739d,Michelangelo,orange,Nunchaku
ef7ab7b7-33d5-43b9-ad73-e73bb8fd8e77,Raffaello,red.Twin Sai
b0cbb4a4-8a31-4bc1-bee9-d6fe39c1a6b3,Donatello,purple,Bo

When the user accesses the page:

The frontend application extracts the code from the query string
It makes a request to a backend API to get that code validated
The backend, in turn, makes a request to the Airtable table using the JavaScript SDK (WEAK LINK)
If the code can be found in the table, then the entire record is returned to the frontend and the user will be able to visualise the page
If not, the users will see an error

The backend is the weak link here, because of the way it queries the Airtable table:

The code looks like this:

const airtable = new Airtable({ apiKey: process.env.AIRTABLE_API_KEY })
const base = airtable.base(process.env.AIRTABLE_BASE_ID)

// get an invite by invite code (promisified)
export function getInvite (inviteCode) {
  return new Promise((resolve, reject) => {
    base('invites')
      // runs a query on the `invites` table
      .select({
        filterByFormula: `{invite} = '${inviteCode}'`, // <-- ⚠️ PROBLEM!,
        maxRecords: 1
      })
      // reads the first page of results
      .firstPage((err, records) => {
        if (err) {
          // propagate errors
          console.error(err)
          return reject(err)
        }

        // if the record could not be found
        // we consider it an error
        if (!records || records.length === 0) {
          return reject(new Error('Invite not found'))
        }

        // otherwise we create an invite object from the first record
        // (there should be only one with the give code) and return it
        const result = {
          code: String(records[0].fields.invite),
          name: String(records[0].fields.name),
          favouriteColor: String(records[0].fields.favouriteColor),
          weapon: String(records[0].fields.weapon)
        }

        resolve(result)
      })
  })
}

The problem is that, in this code, we are constructing a formula for the filterByFormula by doing a simple string interpolation and this interpolation uses unsanitized user-provided data!

When an attacker uses the code ' >= 0 & ' we end up creating the following formula for the filterByFormula field

{invite} = '' >= 0 & ''

Which is effectively an expression that evaluates to TRUE for every single record.

Any arbitrary record (depending of the scan order of the table) will be selected and returned to the frontend!

The solution

AFAIK, unfortunately, Airtable does not provide an “official” way of dealing with this, nor the SDK offers alternative ways to build formulas (e.g. a builder pattern or something else that would serve as a more secure way to create a formula containing user input).

So it’s up to the developer to do the following:

Realise that this kind of vulnerability is possible
Write their own function to sanitize user input before embedding it into a formula (very hard to do correctly and comprehensively!)

What I am suggesting

Airtable should make very very clear that this is possible everywhere where injections can happen!
Airtable should provide some official ways to escape user input

At that point, as a developer I should be able to update my code and do something like this:

import Airtable, {escape} from 'airtable'

// ...
const formula = `{invite} = ${escape(inviteCode)}`
// ...

or something like:

import Airtable, {formula} from 'airtable'

// ...
const myFormula = formula.field('invite').eq(inviteCode).toString() // this should internally take care of sanitising any input
// ...

I hope this helps

kuovonne · ‎Aug 06, 2022

Thank you for recognizing that this is not a problem in the Airtable product. Your original post did not make this clear.

As you recognize this as a documentation problem, I recommend that you make your suggestion directly to Airtable support.

I do not think this is realistic. Airtable formulas are very complex, and there is no way for the library to know how you want to structure your formula. Your example formula is a very simple one looking for an exact match on a value. However, formulas can be long complex things that never look for an exact match. Being restricted to a ‘formula builder’ would be too limiting.

I think this is exactly what Bill and I are saying developers should do. At most, I think Airtable should tell newbie coders to sanitize user input.

If you know the final formula and are well versed in Airtable’s formula language, sanitizing the user input is not hard to do.

kuovonne · ‎Aug 06, 2022

Luciano_Mammino:

I am storing all my invite data in an Airtable table that contains the following records:

code,name,favouriteColor,weapon
14b25700-fe5b-45e8-a9be-4863b6239fcf,Leonardo,blue,Twin Katana
ce7d5886-2166-4aee-8038-548f68b9739d,Michelangelo,orange,Nunchaku
ef7ab7b7-33d5-43b9-ad73-e73bb8fd8e77,Raffaello,red.Twin Sai
b0cbb4a4-8a31-4bc1-bee9-d6fe39c1a6b3,Donatello,purple,Bo

Teenage Mutant Ninja Turtles?
:ninja: :turtle:

I remember watching the cartoon when I was younger but I don’t remember their colors/weapons.

Bill_French · ‎Aug 06, 2022

Ha ha! Yes, it’s a very good point. However, can we also say that …

Just because you publish (in caps) an apparently new and VERY BIG SECURITY FLAW that (a) the flaw is in Airtable and (b) Airtable has ignored such flaw at the peril of its customers – that this is also true?

This seems to sensationalize an issue that has been around since before the monkeys started swinging bones in the opening scene of 2001: A Space Odyssey. That was 1968 and oddly coincides with early cyber-injection research at IBM. The first SQL injection attack occurred another 40 years later but wasn’t actually discovered until 2009.

… it doesn’t prove that the product is free of defects

No product is free of defects. However, this suggests you are calling out a security defect in the Airtable product. I disagree. I think Airtable would as well. So, you might not want to say it this way.

Airtable as a Back End

From the first day I laid eyes on Airtable, I understood that this product was ill-prepared to act as a back-end database platform. The API was extremely young and largely untested in this use case. Furthermore, there were intimations in this forum from the developers that while it could be utilized as a back-end service for broadly accessible front ends, it wasn’t designed for this. Anyone can see that exposing a backend with an API throttled to just five requests per second was not ideally suited for any sort of open exposure to multiple web users that could dynamically exert load spikes.

And even today, SDK wrappers woefully lack many of the sophisticated features required to defend unsuspecting developers and low-code enthusiasts from self-inflicted wounds.

And this is where I get a little nervous. For every individual attempt to cover all the bases against a query-based injection attack, there are at least 100 times as many hackers that know exactly how to find one edge case you probably didn’t think of. And this is precisely why I (and almost every developer) entirely avoid creating queries in the open. This practice is tantamount to using eval() and the broad realm of security issues that come along with this practice.

I can’t speak for Airtable, but I think this is a bad idea because it suggests Airtable is suitable and fully sanctioned as a back-end database. Perhaps Airtable wants to be that [someday], but my sense is this is contrary to the present architecture and, more importantly, their SaaS business model, which carefully matches paid users with an instance of the service designed to handle the expected load.

The Remedy

Don’t do this. :winking_face:

Instead, proxy all Airtable data through technologies that are (a) designed to function as a backend and (b) vastly impervious to injection attacks. I tend to use Firebase because it provides sub-second query performance that automatically scales and the by-product is the Firebase SDKs and supported libraries (such as AngularFire, EmberFire and ReactFire) ensure that they never embed the information from the database into HTML pages unescaped. No disrespect intended, but I much prefer leaning on a large, well-funded team that has a lot to lose when it comes to security.

Airtable has matured enough now such that near-real-time events are capable of moving data to and from other systems almost effortlessly. And there are security requirements with any type of data conveyance, but webhook architectures are pretty solid and capable of withstanding even the most persistent hacking attempts.

If you’re curious how this approach might look, these posts (here and here) provide some insights and exposes yet another service that Airtable will soon avoid - being a CDN for every imaginable web content use case. :winking_face:

kuovonne · ‎Aug 06, 2022

To me, this is the crux of the issue. Developing in Airtable has become so easy that there is a proliferation of new developers who don’t know what they don’t know. As a result, they have self-inflicted wounds that they don’t even know exist.

Airtable has typically leaned more towards giving the impression that developing in the Airtable platform is easy. Teaching potential developers that they need watch out for these things makes developing look harder/scarier. I don’t know what a good balance is.

I don’t think this is a realistic remedy for the general public that wants a web portal.

The learning curve for Airtable and the Airtable REST API is far, far easier than the learning curve for Firebase.

For use case that do not need to scale beyond Airtable’s limits, Airtable is simply soooo easy to use.

Bill_French · ‎Aug 06, 2022

Certainly, what I advocate is not practical for the general public. But let’s unpack what the general public really wants. For the vast majority of Airtable users, is a web portal what they want? This seems like it’s slightly out of the Airtable wheelhouse. Most of the “portal” products that have emerged were largely reactions to Airtable’s inability to present access to certain slices of data without paying for licenses for every data consumer.

Is it Airtable’s responsibility to provide a broad data publishing capability for which there are only more costs and zero additional revenues to offset those costs?

I think it’s irrational to burden Airtable with potentially sizeable web hosting and data serving responsibilities and security responsibilities without compensation. As such, I view these requirements as out-of-scope for their SaaS business model and largely the responsibility of those who would extend Airtable to do something that it never intended to do for a fixed monthly per-seat price.

Yep, and so what would be your proposed remedy?