Does anyone have a recommended way to handle spam that comes in through publicly share/embedded Airtable forms? We've had forms published for over a year that never got spam before and then between Dec 20, 2022 and Dec 23, 2022 we got 49,245 spam submissions. It disrupted our workflow and prompted messages that we are over our limit. It generally feels invasive and pretty horrible. Any tips are welcome!
Bots now use AI to answer questions, and they're really getting good at it. They can also ferret out hidden fields using headless browsers - it's an arms race; you cannot win that war in this way. If you have so many bot entries in a database that it reaches the limits of the data table itself, the vendor is the problem, not your form or the bots.
As I said earlier, Airtable is the problem. It has no defenses against a bot army that has could your form and determined to gain access to your system through relentless probes.
Your only out is to use a third party forms provider who has features that defend against bots.
If you are using Zapier + Airtable then you could add OOPSpam app in your flow.
An example flow:
New Record Airtable -> OOPSpam -> Insert Record Airtable (or Send Outbound Email).
Note: I work at OOPSpam 🙂
Yep - so, with this approach, if a form attracted ten million bot posts and only one legitimate post, the Airtable instance would have to process ten million and one new records to capture one legitimate record? Wouldn't that pretty much kill the Airtable service?
Furthermore, given a scenario where 100 posts were made to the form and 80 of them were spam. By adding the Zapier process, please tell me how many Airtable API requests would be required to capture the 20 legitimate records?
So the flow is triggered for each submitted form. In the example flow I linked above, there is only one Airtable API call (New Record) for each submission.
It is true that if you go with New Record -> OOPSpam -> Insert then it will call the Airtable API twice. Now looking back at this, it doesn't make sense to trigger the flow for a new record and then insert it back to Airtable because the record already exists in Airtable. Unless we want to insert/update with some new information like spam score.
We also use Airtable for our contact form. Our flow looks like this (simplified)
Webhook -> OOPSpam (check for spam) -> Only Continue If -> Insert Airtable (1 call for a legitimate record)
So it all depends on the workflow. If there is only one Airtable trigger then it will take 100 calls for 100 submissions. if there is another call (to update/insert a record) then it would take extra 20.
In hypothetical situation where you get 1M bot requests then there should be other measures to prevent this attack like DNS/hosting level security and rate limiting. For most cases this flow will work just fine, considering Airtable API rate limit is 5 requests per second per base.
>>> For most cases this flow will work just fine, considering Airtable API rate limit is 5 requests per second per base.
The API rate is not the issue. Most Airtable users must abide by a new limit - calls per month.
It's unlikely that a single form will attract a million requests. But it is likely for every legitimate form submission, some magnitude more will be bots doing what bots do. This approach will eat into the new API quota, and as we all know, Airtable APIs are married to the user's instance, so they can also affect user performance.
It's no secret -- I'm not a fan of using external automation platforms unless absolutely necessary. As such, I would approach this differently.
One approach - internal automation; when a record is added, use AI to identify any records that are probably created by a bot. There are numerous ways to target nefarious submissions with a deterministic prompt, including learner-shot examples that examine certain fields. The learner shots could be dynamic based on a known and vetted collection of legitimate records. This approach would be performant, would not require API calls, or sending all your data to another platform.
Sorry for the late reply.
It is true! Sending your data to third parties is not preferable. That said, OOPSpam is a privacy friendly, doesn't require IP and email. Also, maintaining a spam detection infrastructure like the one you mentioned is a lot of work. Airtable may not be interested in building the system just for the forms.
To add to the internal automation discussion, the internal automation allows Run script, so it is possible to bypass Zapier and do an API directly to OOPSpam within the Airtable automation.
>... maintaining a spam detection infrastructure like the one you mentioned is a lot of work.
It's actually less work.
> Airtable may not be interested in building the system just for the forms.
This is not just about forms. An API process is just as capable of injecting bad data into your system. Airtable has proven over the years that it is not interested in building many good feature ideas. As such, no-codeists must look for solutions that involve things they can do to mitigate issues like this. I believe they will increasingly lean on generative AI to meet these requirements.
> ... please provide an example
Sure. Imagine an AI field that applies a generative AI prompt whenever a new record is added. If the AI inference returns false, an automation deletes the record. The prompt might look something like this.
You are an expert who can recognize records that are spam. By definition, spam records contain data values that are vastly unlike legitimate records. I will provide you with the field values of a record you will use to gauge legitimacy. I will also provide you with a small set of example records that are legitimate. You will assess the current record and output "true" if legitimate and "false" if illegitimate. Examples: ` <fieldNames>: <dataValues> [output]: <true|false> ` Record: <dataValues> [output]
Slowly on me please. I can build a form and am basic in formulas.
I can build a hidden field or a field that unhides.
But you said conditioning is useless .... 2-5 ... if is inserted incorrect value the form record will be deleted.
But am not understanding your condition