Save the date! Join us on October 16 for our Product Ops launch event. Register here.
Oct 26, 2022 06:36 AM
Hi Airtable Community,
I have a base that receives and manipulates data coming in using a webhook. Data comes in as a single record (so each webhook brings in a single record of data) and that single record needs to be processed completely before the next record can be considered. In other words, the automation needs to run to completion on one record before the automation starts again for the next record.
The automation consists of 1 create record action followed by 5 script actions, and is initiated by the receival of a webhook. The automation runs as expected when tested, however I have started to notice missing data assignments that should have been completed by one or two of the script actions. I suspect this may be because a number of webhooks are received within quick succession, which leads to automations kicking off before the previous one has been completed?
Does this sounds plausible? And if so, are there any suggestions on how to resolve this issue?
Cheers,
Ryan.
Solved! Go to Solution.
Oct 26, 2022 06:51 AM
This is a race condition. Each automation run is unaware of the progress (or even existence) of any other automation runs.
Is there a way to throttle the system that is triggering the web hook? Is there a way to rework the system to send all the records together in a single batch?
Could you have the webhook automation simply create a new record in a staging table. Then have another automation that actually processes the data, with logic to ensure that only one record is processed at a time. For example, the last action of this second automation could be to set a checkbox in the next record to be processed, which triggers the automation.
Oct 26, 2022 06:51 AM
This is a race condition. Each automation run is unaware of the progress (or even existence) of any other automation runs.
Is there a way to throttle the system that is triggering the web hook? Is there a way to rework the system to send all the records together in a single batch?
Could you have the webhook automation simply create a new record in a staging table. Then have another automation that actually processes the data, with logic to ensure that only one record is processed at a time. For example, the last action of this second automation could be to set a checkbox in the next record to be processed, which triggers the automation.
Oct 26, 2022 11:21 AM
I believe @kuovonne hit on the most obvious way to throttle the incoming data events; by altering the system generating webhook calls. But this is not always possible. Platforms are generally designed to be multi-threaded and non-blocking which means, it is their objective to avoid serial processes and allowing you to slow them down is not typically possible.
Bingo!
The only way to avoid an aggressive service where an ordered processing control is required is with an events buffer. This is one approach that is not without drawbacks, the biggest of which is a doubling of automation runs.
In this approach, the event source is unimpeded and Airtable’s multi-threaded webhook architecture is capable of keeping up with the pace of change events flooding in. But if one event depends on the completion of a previous event, that’s where the trouble begins.
If you first log every event to a table and then use that table with yet another automation to process what’s in the buffer, you now have the ability to decide which events take priority as well as throttle any events that have dependencies on events already in the buffer queue.
Oct 26, 2022 12:32 PM
Thanks @Bill.French for providing better vocabulary for what I was describing and confirming that I was on the right track.
Upon further thought, I think the buffering table/system would best work with a control table and a linked record field. The initial automation would create the new record in the buffering table and link the record to the control table. A system of back and forth rollups would identify the next record in the queue to process.
Oct 26, 2022 01:33 PM
Perhaps, but I think it depends on what (if any dependencies) exist between the events. I have some systems like this that have no dependencies at all but still require serial processing. Imagine a case where 50,000 events arrive in a day to update just 1200 target records. A system with a control table would be unable to keep up if there were a lot of relationships and linked records.
Oct 26, 2022 02:37 PM
Again, it depends.
It is possible (but ugly) to have a buffer of tens of thousands of records with a control table and back-and-forth rollups. The control table could have multiple control records and the system of scripts would identify the head control record and the tail control record.
Oct 26, 2022 02:48 PM
Typically, events have a timestamp and that dictates the processing order. Ideally, the buffer will empty from time-to-time, but it may not depending on the event pace.
Lots of variables to consider.
Oct 26, 2022 03:56 PM
If the buffer table/queue is empty and then two records enter the queue at the exact same microsecond, how would you have an automation know which to process first, while also being sure that the second one is also processed next? Sticking only with Airtable features. And processing the buffer records as they arrive, versus waiting for a batch processing script/automation that will never run in parallel with itself.
Oct 26, 2022 04:33 PM
Typically, two [related] events will not share the same instant in time. Good example - an event that “creates” an object in the source platform cannot also create the same source object a second time at exactly the same instant. It might create an object and then update the same object 20ms later. But, even if they are time stamped by the originating system with identical time markers, the event classes will be different and the event IDs will be different, so determining which takes precedence is pretty straightforward.
In most event/webhook APIs, events are both classed (like create, update, delete) and include immutable IDs of some sort that typically correspond to that systems’ native object IDs.
One would assume there’s a sorting logic that orders the buffer either in a view or in code. But in some cases, it might simply be first-in-first-out logic - depends.
One of the reasons to use webhooks is to create a near-instantaneous flow of data from the originating system to Airtable. If your requirements don’t need this real-time reflection of data in Airtable, batch processing is a fine way to go. In fact, webhooks might not be an ideal pathway if your users are tolerant of latency.
Oct 26, 2022 04:59 PM
Okay, So I guess it could be done without a control table.
The first webhook automation that creates the buffer record also reads the records in the view and if the newly created record is the first record in the view, it sets the checkbox to run the second automation. If it isn’t the first record in the view, it doesn’t set the checkbox.
The second automation is triggered by the checkbox. The second automation does its thing and also removes the triggering record from the view (probably deletes it) and then sets the checkbox for the next record to be processed (if there are any records left in the view).