Standard way to prevent formula injections when using AirTable `select` and `filterByFormula`

I think that would be wise. You are conflating terms like SDK and API, and I sense I have no clear understanding of an apparently new and VERY BIG SECURITY FLAW that remained concealed until this moment in Airtable’s tenth year with hundreds of thousands of users. It would be like bringing the carcass of Big Foot to a Big Foot convention if you are right.

Lol, i am glad they don’t think that way when they revision potential security issues in things like openssl or chrome which have been in the market for longer than airtable and with way more users. Just because you haven’t found a problem in a long time it doesn’t prove that the product is free of defects (or opportunities for improvements if we want to call them that way).

To be clear, i am not trying to take the piss on airtable. On the contrary, i think it’s a great product which deserves the success it has. And that’s the precise reason way i think it’s important to tackle this kind of things.

I am not getting paid to spend my time reporting this stuff, but it’s the responsible thing to do, so I’d appreciate if we could have a slightly more mature conversation here…

Thanks

No-one on this thread is getting paid for being here.

However, this community forum is not the best place to report changes you want in the product. There is no guarantee that anyone from Airtable will read a particular thread. If you want to be sure that someone at Airtable will read your comments, I recommend contacting support.

On the other hand, if you want to share your experiences with other user and explain how you have handled the situation, this forum is an excellent place where many people share information.

It sounds like you hav written your own escape function. Is it something like this?

const sanitizedCodeFromUser = dirtyCodeFromUser.replace("'", "\'")
const formulaBeforeEncoding = `{code} = '${sanitizedCodeFromUser}'`

While the concept of an injection attack is real, it is the responsibility of the person writing the code that includes user input to sanitize that user input. The REST API trusts the formula provided because the request came with valid credentials. Due to the nature of an injection attack, the language cannot protect against this type of attack by itself.

On the other hand, what is Airtable’s responsibility to educate new coders about the topic of injection attacks? This is less clear. The documentation was originally designed (many years ago) with the expectation that it would be used by experienced coders who were already familiar with coding and security practices, and only needed to learn Airtable’s specific vocabulary. Yet, more and more people are using code with Airtable without having any prior coding knowledge.

In my personal experience, Airtable has been very responsive when it comes to support requests to clarify documentation so that people will stop getting themselves in trouble. Documentation changes are much easier to implement than product changes.

1 Like

Your posts match my understanding of the technical aspects of this issue. Although, I might not fully understand the issue either.

On the other hand, I think Luciano is looking at a tree, while you see a forest.

Thanks, @kuovonne! This is a much more constructive answer:

My escape function looks pretty much like the one you suggested (with some additional whistles and bells):

function escape (value) {
  if (value === null || typeof value === 'undefined') {
    return 'BLANK()'
  }

  if (typeof value === 'string') {
    const escapedString = value
      .replace(/"/g, '\\"')
      .replace(/'/g, "\\'")
      .replace(/\r/g, '')
      .replace(/\\/g, '\\\\')
      .replace(/\n/g, '\\n')
      .replace(/\t/g, '\\t')

    return `"${escapedString}"`
  }

  if (typeof value === 'number') {
    return String(value)
  }

  if (typeof value === 'boolean') {
    return value ? '1' : '0'
  }

  throw Error('Invalid value received')
}

In any case, I think you got the issue.

What I am suggesting is not a change in Airtable itself (this is not an inherent Airtable problem), but a change in the documentation and the SDK.

  1. In the SDK, update the filterByFormula documentation (table select operation), making it very clear that injection is possible if dealing with user input and that it’s up to the developer to sanitize that input before including it in the formula.
  2. Do the same in the REST API documentation
  3. Provide an official sanitization implementation that might take care of all the possible edge cases (similarly to what most databases or databases library actually do, for instance mysqli-escape-string)

Developers can seriously shoot themselves in the leg with this kind of issue. Query injection has been one of the biggest sources of security vulnerability for the last 10 years on the web), so we should do everything we can to warn users, support them and prevent this issue from happening to them.

Demo :rocket:

Context: I am building an invite-only website where users can access the website only if they have a valid invite code (which is passed as part of the URL in a query string parameter).

Now, an attacker might assume I am using some database to validate the code and they might try to forge several malicious invite codes to try to perform a query injection that matches an arbitrary record in the database and gives them access…

Eventually, they come up with the following code: ' >= 0 & ', they encode it and use it in the URL, resulting in the following URL: https://secret-pizza-party-fgpypfb66-lmammino.vercel.app/?code=%27%20%3E%3D%200%20%26%20%27

…and they got access without having to know an invite code!

Now, this is a silly example, but imagine if someone builds something similar to allow their customers to access sensitive information (invoices, medical records, shipment details, list of purchased products, etc)…

You should get a feeling for how serious this could end up being for a company using an AirTable table as a backend database…

I have personally witnessed companies losing thousands of customers, getting sued, and losing a significant amount of money and credibility on something like this!

Why is this happening?

I am storing all my invite data in an Airtable table that contains the following records:

code,name,favouriteColor,weapon
14b25700-fe5b-45e8-a9be-4863b6239fcf,Leonardo,blue,Twin Katana
ce7d5886-2166-4aee-8038-548f68b9739d,Michelangelo,orange,Nunchaku
ef7ab7b7-33d5-43b9-ad73-e73bb8fd8e77,Raffaello,red.Twin Sai
b0cbb4a4-8a31-4bc1-bee9-d6fe39c1a6b3,Donatello,purple,Bo

When the user accesses the page:

  • The frontend application extracts the code from the query string
  • It makes a request to a backend API to get that code validated
  • The backend, in turn, makes a request to the Airtable table using the JavaScript SDK (WEAK LINK)
  • If the code can be found in the table, then the entire record is returned to the frontend and the user will be able to visualise the page
  • If not, the users will see an error

The backend is the weak link here, because of the way it queries the Airtable table:

The code looks like this:

const airtable = new Airtable({ apiKey: process.env.AIRTABLE_API_KEY })
const base = airtable.base(process.env.AIRTABLE_BASE_ID)

// get an invite by invite code (promisified)
export function getInvite (inviteCode) {
  return new Promise((resolve, reject) => {
    base('invites')
      // runs a query on the `invites` table
      .select({
        filterByFormula: `{invite} = '${inviteCode}'`, // <-- ⚠️ PROBLEM!,
        maxRecords: 1
      })
      // reads the first page of results
      .firstPage((err, records) => {
        if (err) {
          // propagate errors
          console.error(err)
          return reject(err)
        }

        // if the record could not be found
        // we consider it an error
        if (!records || records.length === 0) {
          return reject(new Error('Invite not found'))
        }

        // otherwise we create an invite object from the first record
        // (there should be only one with the give code) and return it
        const result = {
          code: String(records[0].fields.invite),
          name: String(records[0].fields.name),
          favouriteColor: String(records[0].fields.favouriteColor),
          weapon: String(records[0].fields.weapon)
        }

        resolve(result)
      })
  })
}

The problem is that, in this code, we are constructing a formula for the filterByFormula by doing a simple string interpolation and this interpolation uses unsanitized user-provided data!

When an attacker uses the code ' >= 0 & ' we end up creating the following formula for the filterByFormula field

{invite} = '' >= 0 & ''

Which is effectively an expression that evaluates to TRUE for every single record.

Any arbitrary record (depending of the scan order of the table) will be selected and returned to the frontend!

The solution

AFAIK, unfortunately, Airtable does not provide an “official” way of dealing with this, nor the SDK offers alternative ways to build formulas (e.g. a builder pattern or something else that would serve as a more secure way to create a formula containing user input).

So it’s up to the developer to do the following:

  1. Realise that this kind of vulnerability is possible
  2. Write their own function to sanitize user input before embedding it into a formula (very hard to do correctly and comprehensively!)

What I am suggesting

  • Airtable should make very very clear that this is possible everywhere where injections can happen!
  • Airtable should provide some official ways to escape user input

At that point, as a developer I should be able to update my code and do something like this:

import Airtable, {escape} from 'airtable'

// ...
const formula = `{invite} = ${escape(inviteCode)}`
// ...

or something like:

import Airtable, {formula} from 'airtable'

// ...
const myFormula = formula.field('invite').eq(inviteCode).toString() // this should internally take care of sanitising any input
// ...

I hope this helps

Thank you for recognizing that this is not a problem in the Airtable product. Your original post did not make this clear.

As you recognize this as a documentation problem, I recommend that you make your suggestion directly to Airtable support.

I do not think this is realistic. Airtable formulas are very complex, and there is no way for the library to know how you want to structure your formula. Your example formula is a very simple one looking for an exact match on a value. However, formulas can be long complex things that never look for an exact match. Being restricted to a ‘formula builder’ would be too limiting.

I think this is exactly what Bill and I are saying developers should do. At most, I think Airtable should tell newbie coders to sanitize user input.

If you know the final formula and are well versed in Airtable’s formula language, sanitizing the user input is not hard to do.

Teenage Mutant Ninja Turtles?
:ninja: :turtle:

I remember watching the cartoon when I was younger but I don’t remember their colors/weapons.

1 Like

Ha ha! Yes, it’s a very good point. However, can we also say that …

Just because you publish (in caps) an apparently new and VERY BIG SECURITY FLAW that (a) the flaw is in Airtable and (b) Airtable has ignored such flaw at the peril of its customers – that this is also true?

This seems to sensationalize an issue that has been around since before the monkeys started swinging bones in the opening scene of 2001: A Space Odyssey. That was 1968 and oddly coincides with early cyber-injection research at IBM. The first SQL injection attack occurred another 40 years later but wasn’t actually discovered until 2009.

… it doesn’t prove that the product is free of defects

No product is free of defects. However, this suggests you are calling out a security defect in the Airtable product. I disagree. I think Airtable would as well. So, you might not want to say it this way.

Airtable as a Back End

From the first day I laid eyes on Airtable, I understood that this product was ill-prepared to act as a back-end database platform. The API was extremely young and largely untested in this use case. Furthermore, there were intimations in this forum from the developers that while it could be utilized as a back-end service for broadly accessible front ends, it wasn’t designed for this. Anyone can see that exposing a backend with an API throttled to just five requests per second was not ideally suited for any sort of open exposure to multiple web users that could dynamically exert load spikes.

And even today, SDK wrappers woefully lack many of the sophisticated features required to defend unsuspecting developers and low-code enthusiasts from self-inflicted wounds.

And this is where I get a little nervous. For every individual attempt to cover all the bases against a query-based injection attack, there are at least 100 times as many hackers that know exactly how to find one edge case you probably didn’t think of. And this is precisely why I (and almost every developer) entirely avoid creating queries in the open. This practice is tantamount to using eval() and the broad realm of security issues that come along with this practice.

I can’t speak for Airtable, but I think this is a bad idea because it suggests Airtable is suitable and fully sanctioned as a back-end database. Perhaps Airtable wants to be that [someday], but my sense is this is contrary to the present architecture and, more importantly, their SaaS business model, which carefully matches paid users with an instance of the service designed to handle the expected load.

The Remedy

Don’t do this. :wink:

Instead, proxy all Airtable data through technologies that are (a) designed to function as a backend and (b) vastly impervious to injection attacks. I tend to use Firebase because it provides sub-second query performance that automatically scales and the by-product is the Firebase SDKs and supported libraries (such as AngularFire, EmberFire and ReactFire) ensure that they never embed the information from the database into HTML pages unescaped. No disrespect intended, but I much prefer leaning on a large, well-funded team that has a lot to lose when it comes to security.

Airtable has matured enough now such that near-real-time events are capable of moving data to and from other systems almost effortlessly. And there are security requirements with any type of data conveyance, but webhook architectures are pretty solid and capable of withstanding even the most persistent hacking attempts.

If you’re curious how this approach might look, these posts (here and here) provide some insights and exposes yet another service that Airtable will soon avoid - being a CDN for every imaginable web content use case. :wink:

1 Like

To me, this is the crux of the issue. Developing in Airtable has become so easy that there is a proliferation of new developers who don’t know what they don’t know. As a result, they have self-inflicted wounds that they don’t even know exist.

Airtable has typically leaned more towards giving the impression that developing in the Airtable platform is easy. Teaching potential developers that they need watch out for these things makes developing look harder/scarier. I don’t know what a good balance is.

I don’t think this is a realistic remedy for the general public that wants a web portal.

The learning curve for Airtable and the Airtable REST API is far, far easier than the learning curve for Firebase.

For use case that do not need to scale beyond Airtable’s limits, Airtable is simply soooo easy to use.

Certainly, what I advocate is not practical for the general public. But let’s unpack what the general public really wants. For the vast majority of Airtable users, is a web portal what they want? This seems like it’s slightly out of the Airtable wheelhouse. Most of the “portal” products that have emerged were largely reactions to Airtable’s inability to present access to certain slices of data without paying for licenses for every data consumer.

Is it Airtable’s responsibility to provide a broad data publishing capability for which there are only more costs and zero additional revenues to offset those costs?

I think it’s irrational to burden Airtable with potentially sizeable web hosting and data serving responsibilities and security responsibilities without compensation. As such, I view these requirements as out-of-scope for their SaaS business model and largely the responsibility of those who would extend Airtable to do something that it never intended to do for a fixed monthly per-seat price.

Yep, and so what would be your proposed remedy?

I think we can debate as long as we want whether this use case is realistic or not or whether airtable is suitable for this use case or not, but i don’t think that debate really helps anyone, except maybe our egos, @Bill.French

As long as airtable offers apis, people will end up using them in all sorts of ways (especially true for a product that has been long enough in the market and with so many users).

Security is everyone responsibility and it’s one of those things where it’s better to be safe than sorry.

I’ll do my best to report this to the documentation team.

Thanks for indulging me in the conversation

Regards

1 Like

I agree in theory. In practice I think people will take as much as they can get.

I don’t have one. But I think people who don’t want to be developers should take care in who they hire as developers, and people who want to be developers need to be in environments where they can learn what they do not know they need to learn.

Very true.

Also true.

Thank you.

You’re welcome. Anytime I have a conversation that involves Bill, I learn something.

We can all agree that if there’s a way to achieve something with an API, well-intentioned users and developers will explore these pathways. But, this is no different than using a chain saw. There are many things it can do; safe, responsible use ultimately falls to the consumer. If you’re not certain that you can operate it, hire a professional, but do not place the burden of safety 100% on the manufacturer.

Airtable has a duty to provide a security context for access to its data. I believe they have and continue to meet this obligation with modern and appropriate techniques. They do not have a duty to provide 100% of the education required to build everything possible with an API in any or many languages. And any attempt to do so may encourage people to use it in ways it was probably never intended to be used.

This is your first post in this community. It contains skulls, crossbones, and an indirect indictment of Airtable’s security. Whose ego did your opening statement serve exactly? :wink:

1 Like

For the records, I did not mean to offend anyone’s sensibility in any way. Nor I intended to attack or belittle Airtable as a product and as a company in any way. So, sincere apologies if my post came across in some negative way and sincere apologies if anyone got offended by my writing style. That was not my intent.

With that being said, I hope people can leave out the form and focus on the substance of this thread.

It’s OK if we don’t agree on the severity of the issue here, but there is definitely an injection threat when using select with filterByFormula and user-provided data.

If we managed to do our little part to make things better against this threat, even by a little bit, we might have done something good with our time…

For me, when the manufacturer markets its chainsaw as a tool that is easy for-non professionals to use, even if the customer has never used power tools before and doesn’t know where to go to hire a professional, that manufacturer has a higher burden to educate the consumer than the manufacturer who markets to only professional contractors.

If you continue to use Airtable, you will see that most threads in this community are not nearly as heated as this one. Most people here are friendly, helpful, and passionate about Airtable. Soft skills can be as difficult to learn and practice as the more technical ones, especially if they don’t come naturally. (I am guilty of this too!)

1 Like

This is a huge security issue with Airtable and the dismissal of it is baffling. I will suggest my org gets off Airtable as quickly as possible.

1 Like

I am baffled by your post. It appears like you only joined this forum a few minutes before posting this. The solution to the issue in this thread is also very simple, with two solutions already posted.

Why join a forum in order to post that you are leaving the platform? Why leave a platform due to a perceived security risk that even the original poster admits is not a flaw specific to Airtable?

If you are trying to make a statement to Airtable, prior comments in this thread state that anyone from Airtable is unlikely to read this thread. If you have an issue with Airtable, you can vote with your wallet, or you can contact support. Both of those actions are more likely to be noticed by Airtable compared to posting in this thread.

3 Likes

I think I’ve made it clear that we both agree the risk of compromised access to data in any platform is high when you do things that allow it to be high. Please stop saying we don’t agree. We agree. It’s bad, and my recommendation 3500 words ago was don’t do that. :wink:

If you take that a step further and expose a web app that allows dynamic query construction without a security context in the web app itself, you’ve elevated the risk. If you comingle records intended for multiple classes of users, expose them to an API, and then fail to wrap such access with a security layer singling out the specific class intended, you are creating an even greater risk that class (a) gets a look at class (b’s) data. You don’t have to use filterByFormula to create this additional security risk.

Initially, you framed the narrative as a broad security flaw related to Airtable. You have since agreed that it’s not really Airtable; it’s any poorly designed web app. I think we both agree that the security architecture of web apps that lean on APIs from ANY platform matters greatly, and the complexities and risks of getting it wrong are far greater than simply focusing on filterByFormula.

Where I think we do not agree is the very premise of using filterByFormula in use cases external to the Airtable platform;

  • I think it’s no different than the risks posed by javascript eval() and should be avoided if at all possible. I have found many ways to do this, and I tend to believe the best part in a machine is no part at all. More parts lead to a greater risk of breakage and a whole host of problems. filterByFormula is simply one that exacerbates the likelihood of a security issue, but it also has the propensity of easily creating unanticipated queries which fail, and this is especially the case when allowing user input to craft such filters.

  • You’ve made it clear that filterByFormula is fine if you undertake the challenge of properly escaping various characters. Some developers have no qualms about using eval() as well.

Both our viewpoints are a matter of engineering preference and ironically resolve deep in the shadow of a #no/low-code platform. Since you clearly understand this problem well, why don’t you just build a better SDK? What’s stopping you from showing Airtable and the user community a better approach?

If you agree to stop sensationalizing this as an Airtable security flaw, I’ll agree to like your future posts where you frame narratives that are reasoned and helpful to API users without creating unnecessary fear, uncertainty, and doubt. Deal?

1 Like

You and you org don’t need to leave Airtable. If you switch to another provider, chances are that they will have similar security threats.

Also, the dismissal is not by Airtable, but only by some users in the community. I reached out to Airtable support on other channels and I am sure they’ll be interested in discussing this…

Finally, keep in mind that what we are debating here is not whether airtable is secure or not, but if airtable could do more to mitigate the risk of users shooting themselves in the leg while using certain features of the platform…

1 Like