Nov 19, 2022 08:54 PM
Today, I set aside some time to explore batch techniques for use with await table.updateRecordsAsync(), figuring out the most records that I could update with the 30 seconds allowed within an Automation Script.
I wrote a script that updates a Status from “Todo” to “Done” in batches of 50 records at a time that updated approximately 7,000 records within the allowed 30 seconds - before the Automation timeout stopped my script.
The same script works as expected within a Scripting Extension App, and takes ~4 minutes for a table with 50,000 records.
Although my batch Automation script was my first attempt, and perhaps a little mediocre around the edges - functionally it worked without error - and from this I’m assuming that the most records anyone could hope to update within one Automation script is 7,000 records.
//Script can update approx 7,000 records before reaching the time out of 30 seconds within an Automation.
//Target Table
let table = base.getTable("Table 1");
let query = await table.selectRecordsAsync({fields: ["Name", "Status"], sorts: [ {field: "Name", direction :"asc"}]});
let records = query.records;
/*returns
const myTable = [
{ id: "rec4da1mVM153RKjc", name: "Record 1" },
{ id: "recypvNNLFf2sQ2Ct", name: "Record 2" },
{ id: "rec4UtiNA4ntT1eSg", name: "Record 3" },
{ id: "reciewbaoRBfo1DvM", name: "Record 4" } ... ]
*/
//Prepare base data with desired update
let myData = records.map( element => ({
id: element.id,
fields: {
"Status": { name: "Done"}
}
}));
/*formats returns
const myDataFormat = [
{ id: 'rec4da1mVM153RKjc', name: 'Record 1', fields: { Status: { name: 'Done' } } },
{ id: 'recypvNNLFf2sQ2Ct', name: 'Record 2', fields: { Status: { name: 'Done' } } },
... ]
*/
//Divide prepared base data into batches of 50 records
let myBatch = [];
let batchSize = 50;
let batchStart = 0;
let batchEnd = batchSize;
let totalIterations = Math.floor(myData.length / batchEnd);
//Process all 50 sized batches
for (let i = 0; i < totalIterations; i++) {
myBatch[i] = myData.slice(batchStart, batchEnd);
batchStart += batchSize;
batchEnd += batchSize;
}
//Process the last odd sized batch
myBatch[totalIterations] = myData.slice(
batchStart,
(myData.length % 50) + batchStart
);
//Stamp the batches with the updates into the table.
for (let i = 0; i < myBatch.length; i++) {
await table.updateRecordsAsync(myBatch[i]);
}
It’s comforting to know that this script will execute manually with the expected results via the Scripting App.
Keen for any pointers, tips or advice that anyone may have on this topic. I found that there was a couple of “gotcha” issues whilst writing, such as not being able to use await table.updateRecodsAsync within a forEach().
I’ll need to make time to read over these forums for batch techniques, as they’re they’re the kinda thing that when you actually need to know and employ them, you’ll at least want to have some theory understood and some practice before attempting to update 50,000 record base :grinning_face_with_sweat: .
Oh, and for the scripting app, due to the time it takes to execute, I used the below loop that keeps the user informed of where the process is at;
for (let i = 0; i < myBatch.length; i++) {
await table.updateRecordsAsync(myBatch[i]);
output.clear();
output.text(`Processing ${myBatch[i][0].name} of ${myData.length}`);
}
So yeah… That’s my Sunday Airtable adventure, I hope you liked it. The end. :open_book:
Nov 23, 2022 01:14 AM
Ha! Who am I kidding? With my code as it was (… :face_vomiting: ), this isn’t the end - it’s only the beginning. :smiling_face_with_sunglasses:
Knowing that there was room for improvement, I spent today’s lunch figuring out a cleaner way to approach how to convert an array into nested batches.
Basically, how do I turn this;
[0,1,2,3,4,5,6,7,8,9]
into this
[[0,1,2],[3,4,5],[6,7,8],[9]];
Today’s solution that I worked on is much cleaner.
let myBatch = [], size = 50;
while (myData.length > 0) myBatch.push(myData.splice(0, size));
for (let i = 0; i < myBatch.length; i++) {
await table.updateRecordsAsync(myBatch[i]);
};
Interestingly, it maxed out at about 6,000 records for the 30 seconds, more or less the same time to process.
I can’t help but wonder how I can gear an Automation to keep on a task until all records have been processed. :thinking: I’m assuming it’s not possible to return a value (such as an array of unprocessed records) upon striking the time limit error…
Dec 05, 2022 09:44 PM
I've experimented with trying to process large volumes of records. The rate limit for submitting mutations is 15 per second. With each mutation containing a maximum of 50 records, that is a theoretical limit of 750 per second. However, submitting batches that quickly while also being careful to not hit the rate 15 mutations/second limit is a bit of a pain. Plus, if you await anything, things slow down considerably. And you probably should await some things if you want to to be sure if things work or at least get an error if things don't work.
I've also found that it is difficult to determine how long a particular mutation takes to process if you do await it. Deleting records is usually very quick. But creating or updating records with lots of data and lots of complex formulas/rollups can take quite some time.
Probably the biggest issue that I've run into is that flooding Airtable with a thousands of mutations in a very short period of time slows down the entire base (and possibly other bases in the workspace). I find that these mega batches are better reserved for extraordinary circumstances when you can tell everyone else to stay out of the base while you run the script as a scripting action.
When needing to process thousands of records in an automation, I prefer to use a single script that identifies what records need to be processed, and then processes as many as it can, at a reasonable pace, and then the script ends if it gets near the 30 second time limit (or other limits that can be easily determined, such as number of fetches). Then I copy the script into multiple different actions in the same automation. Each instance of the script will process as many record as it reasonably can. Any leftover records to process will be processed the next time the script runs, either in the next scripting action (with an identical script) or the next time the automation runs.
Oh, and the format that I usually use is ...
while (myData.length) {
await table.updateRecordsAsync(myData.splice(0, 50))
}