Filtering with Loops in Javascript - How to Avoid Insanity

Forum|Forum|5 years ago
January 21, 2021
31 replies
346 views

+19

Bill_French
Inspiring

As developers, we get roped into codifying a vast and seemingly endless escalation of conditional processing to find and filter records and related tables. Where does it end? Apparently, it doesn’t because as soon as you create the perfect script that loops across many filtering constraints, the client (or management) is going ask for a “slight” modification or you will invariably discover an unanticipated case where you’re standing in a corner holding a paintbrush without egress.

This and vastly many of our script projects are fundamentally based on two essential tasks -

Filter out the data that we need
Perform computations on the filtered data

Typically, we use loops to achieve #1. #2 is easy if we get #1 right. But #1 is never simple and rarely performant given enough conditional complexity and records.

What’s better than this approach? Fewer loops for sure because they are slow, and they make our lives shi**y.

The Remedy…

The term “filter”, and especially in a javascript sense, is subject to abuse. Why not exchange this term with “search”. Indeed, a JSON hash index or an actual inverted index is exactly the right chainsaw to mitigate the looping and increase direct access to lists of record IDs that match what the loop is intended to provide.

I use hash indexes all the time, but increasingly, I use Lunr to establish inverted indexes that are very agile and extremely fast. Imagine finding 200 record IDs from a collection of 30,000 rows where three fields match specific values and the fourth is “fuzzy”. Looping requires lots of code and plenty of time. An inverted index can do this in 20 milliseconds.

And when you factor in the advantages of term presence, wildcard searches, and field-specific boosting, the loop that builds the index is time extremely well spent. Plus, you can create and cache-forward index documents independent of your business logic.

One last comment concerning perfomance - there are many computations and metrics that we perform repeatedly for business analytics. Why not compute these metrics and store them in the search index so that when the CEO wants to see how her business is performing, you’ve reduced the overhead to a simple search query rendered at the edge in 500ms?

While an inverted full-text index offers many advantages, there’s a simple approach you can use to improve script performance while creating an elegant approach that drastically simplifies your code when nested loops seem the only viable route.

Hash Index

This is simply a JSON document intended to make it fast and simple to perform lookups; it’s literally a copy of a collection of items where the object key represents a way to directly access the data related to that key.

Let’s say you had a thousand records in a table and you needed to loop through it to find a given record whose name matched a specific customer name – and you need to perform this looping lookup repeatedly nested inside another enumeration. You have three options -

Perform a FilterByFormula approach for each lookup; ugly, slow, convoluted.
Loop through every record seeking a match; very slow and CPU-intensive, added code complexity.
Perform a “seek” into a pre-built hash index; almost instant, elegant.

Approach 3 requires a single pass across all the records containing the customer names (that’s the downside). That pass would create a collection of objects in a single JSON document (ex., oCustomerNameHash that is global to your app) and looks something like this:

{ "ABC Transportation" : {
   "customer_id" : "1001",
   "customer_contact" : "Jimmy Johns",
   other attributes needed for the index...
   }
}
... objects representing rest of the customers ...

When you want to know the contact name for “ABC Transportation”, the code is simple. effortless, and requires only 6 milliseconds.

let customerContact = oCustomerNameHash["ABC Transportation"].customer_contact;

Even if the hash index has 50,000 items in it, the response time will be sub-20 milliseconds. This single line of code eliminates a nested loop and the complexity of doing so. It is possible because you performed the loop in advance of needing to perform the lookups. Essentially, you are caching forward an intelligent and performant class of data objects that can be reused throughout your script.

This makes it possible to vastly compress the code and complexity while also sparing the number of loop introspections required to a single pass across that data instead of vastly many passes which are required in nested filtering loops.

With a hash index like this, you may often find multiple cases where the hash index can be utilized, so it has the advantage of being ready to perform lookups for the duration of your script process. Furthermore, you may discover you need to perform lookups using additional keys which suggests multiple hash indexes for the same data.

Imagine you wanted to also lookup customer name by customer ID or customer contact by customer ID. This would require a new hash index like oCustomerIDHash. Instead of building a second hash, this added seek capability can be built into the same constructor loop so there’s no need to loop for every index dimension you might need.

Example… you can mix both indexed dimentions into one hash index document - this time we’ll call it oCustomerHash. By simply creating two keys for each customer - one for the customer name, and one for the customer ID – we have begun to create a complex indexing model. The lookup process is constant and the agility is magnified.

The objects in the index would include two entries for each customer.

{ "ABC Transportation" : {
   "customer_id" : "1001",
   "customer_contact" : "Jimmy Johns",
   other attributes needed for the index...
   }
}

{ "1001" : {
   "customer_name" : "ABC Transportation",
   "customer_contact" : "Jimmy Johns",
   other attributes needed for the index...
   }
}

Now you can query by customer name to get customer ID or by customer ID to get customer name instantly - no nested looping required.

let customerName = oCustomerHash["1001"].customer_contact;
let customerID   = oCustomerHash["ABC Transportation"].customer_id;

Building the Index

This is one approach that can be used to construct the index from an existing collection of records.

let oCustomerHash = {};
for (let record of oRecords)
{
    // add the customer name object
    oCustomerHash[oRecords[record].getCellValue("Customer Name")] = {
        "customer_id"      : oRecords[record].getCellValue("Customer ID"),
        "customer_contact" : oRecords[record].getCellValue("Customer Contact"),
     }
    // add the customer id object
     oCustomerHash[oRecords[record].getCellValue("Customer ID")] = {
        "customer_name"    : oRecords[record].getCellValue("Customer Name"),
        "customer_contact" : oRecords[record].getCellValue("Customer Contact"),
    }
}

Forum|pagination.label 1 / 2

+18

Olpy_Acaflo
Inspiring
Forum|Forum|4 years ago
July 5, 2021

Thanks a lot Bill for the reference to Lunr !
Another one of your essential contributions to my journey in airtable & js API’s !
I wish you the best in the heart of the Kuiper Belt !

olπ

r_s
Participating Frequently
Forum|Forum|3 years ago
April 5, 2022

@Bill.French, thank you for taking the time to put this very helpful guide together!

+14

Rose_Haft1
Inspiring
Forum|Forum|3 years ago
May 17, 2022

Thanks a lot Bill for the reference to Lunr !
Another one of your essential contributions to my journey in airtable & js API’s !
I wish you the best in the heart of the Kuiper Belt !

olπ

Hi @Bill.French - It isn’t clear what is the “oRecords”? Can you explain a bit more how are you generating this list of records and object types.

Bill.French:

let oCustomerHash = {};
for (let record of oRecords)
{
    // add the customer name object
    oCustomerHash[oRecords[record].getCellValue("Customer Name")] = {
        "customer_id"      : oRecords[record].getCellValue("Customer ID"),
        "customer_contact" : oRecords[record].getCellValue("Customer Contact"),
     }
    // add the customer id object
     oCustomerHash[oRecords[record].getCellValue("Customer ID")] = {
        "customer_name"    : oRecords[record].getCellValue("Customer Name"),
        "customer_contact" : oRecords[record].getCellValue("Customer Contact"),
    }
}

+14

Rose_Haft1
Inspiring
Forum|Forum|3 years ago
May 17, 2022

Hi @Bill.French - It isn’t clear what is the “oRecords”? Can you explain a bit more how are you generating this list of records and object types.

Bill.French:

let oCustomerHash = {};
for (let record of oRecords)
{
    // add the customer name object
    oCustomerHash[oRecords[record].getCellValue("Customer Name")] = {
        "customer_id"      : oRecords[record].getCellValue("Customer ID"),
        "customer_contact" : oRecords[record].getCellValue("Customer Contact"),
     }
    // add the customer id object
     oCustomerHash[oRecords[record].getCellValue("Customer ID")] = {
        "customer_name"    : oRecords[record].getCellValue("Customer Name"),
        "customer_contact" : oRecords[record].getCellValue("Customer Contact"),
    }
}

@Bill.French - Also, are we supposed to get an error about using record as an index?

+14

Rose_Haft1
Inspiring
Forum|Forum|3 years ago
May 17, 2022

Hi @Bill.French - How is it possible to store the hash in airtable via createrecordasync? The hash creation is currently crashing my script…

+19

Bill_French
Author
Inspiring
Forum|Forum|3 years ago
May 18, 2022

Hi @Bill.French - How is it possible to store the hash in airtable via createrecordasync? The hash creation is currently crashing my script…

The hash table is a JSON object and cannot be stored without serializing it (stringifying). But, why store it? Just build it dynamically when you need it. If you store it and the data it contains changes, it will be outdated and largely useless.

I would need to see the code to avoid guessing what your issues are.

The code block you are referencing builds two hash indices inside a single JSON document - one for looking up customers by ID and another for looking up customers by name. In a moderately-sized data table of say 25k customers, these indexes are created in about 2 to 4 seconds and once built, seeks into them are typically < 1 to 2 ms. As such, you can reference these inside a loop without any significant performance hit - e.g., 1,000 to 2,000 lookups per second is possible.

I recommend you build a hash index and then examine it with a console.log() output in a simple script block.

+14

Rose_Haft1
Inspiring
Forum|Forum|3 years ago
May 18, 2022

I would need to see the code to avoid guessing what your issues are.

I recommend you build a hash index and then examine it with a console.log() output in a simple script block.

While I appreciate your saying, what I meant is do you use await selectrecordsasync? Or what is the form of this? Do you have an example? I was able to get it to work using loops but not for… of

Why I want to store it is that it has crashed the code block and I do not want to have that happen in the future.

+19

Bill_French
Author
Inspiring
Forum|Forum|3 years ago
May 18, 2022

While I appreciate your saying, what I meant is do you use await selectrecordsasync? Or what is the form of this? Do you have an example? I was able to get it to work using loops but not for… of

Why I want to store it is that it has crashed the code block and I do not want to have that happen in the future.

There should be no reason a script block would crash by making a single JSON object unless the data you are storing into it is massive. I’d have to look at the cause of the crash to truly understand how best to advise you.

Yes - any method of reading data can be used to create a collection of JSON object. The examples are in the article.

Step 1 - build the hash index
Step 2 - perform lookups inside loops without nested looping

Ujval_Shah
Inspiring
Forum|Forum|3 years ago
May 26, 2022

What is “ oRecords ”? @Bill.French

+19

Bill_French
Author
Inspiring
Forum|Forum|3 years ago
May 26, 2022

What is “ oRecords ”? @Bill.French

It is a collection of table records returned from Airtable’s SDK. You can glean what this is by looking at this reference to one of the Airtable fields:

oRecords[record].getCellValue("Customer ID")

+14

Rose_Haft1
Inspiring
Forum|Forum|3 years ago
June 8, 2022

Hi @Bill.French - I managed to get the hash to work but now I am getting a time out fail in Airtable. Is it possible to create the hash in another way?

+19

Bill_French
Author
Inspiring
Forum|Forum|3 years ago
June 8, 2022

Hi @Bill.French - I managed to get the hash to work but now I am getting a time out fail in Airtable. Is it possible to create the hash in another way?

Is it possible the timeout is unrelated to the manner in which the hash index is created?

Have you benchmarked the time it takes the script to create the hash index? In my tests and the examples I published, even a 50,000 item hash index takes just a few seconds to build. As such, I’m sceptical that the hash indexing process is the cause of the timeouts.

And there are other mitigating factors when tackling a timeout issue in Airtable script. Is this code running in an automation step? Is it an inbound webhook? A script block?

+14

Rose_Haft1
Inspiring
Forum|Forum|3 years ago
June 8, 2022

Is it possible the timeout is unrelated to the manner in which the hash index is created?

And there are other mitigating factors when tackling a timeout issue in Airtable script. Is this code running in an automation step? Is it an inbound webhook? A script block?

There are a couple of if statements in there. I will see what I can move into another process. Thanks!

+21

Justin_Barrett
Inspiring
Forum|Forum|3 years ago
June 8, 2022

There are a couple of if statements in there. I will see what I can move into another process. Thanks!

“IF” statements aren’t going to cause any kind of delay. The most likely delays are from reading/writing data to/from Airtable. Such operations require using the “await” keyword, and depending on the complexity, frequency, and placement of such operations, those could be the cause of the delay leading to the timeout. For example, if you’re running a loop and are querying a table’s records inside the loop, that will eat up a ton of time because you’re waiting for the query results once for each loop iteration. A more efficient way is to run the query once before the loop begins, and then use the result inside the loop.

These are just general tips, of course. If you could share your actual code, we could probably see directly what’s leading to the timeout.

+19

Bill_French
Author
Inspiring
Forum|Forum|3 years ago
June 8, 2022

There are a couple of if statements in there. I will see what I can move into another process. Thanks!

Unlikely to help but no one can be certain without seeing the actual process - even a process narrative would help.

Ideally, you need to benchmark the segments of the process to see where the delays and slowness really exist. Just log the current time/seconds to the console throughout the script to see where resources take the longest.

I’ll bet a fine latte the hash index is not what’s slowing this down to the point of time-out. :winking_face:

+29

kuovonne
Brainy
Forum|Forum|3 years ago
June 8, 2022

Unlikely to help but no one can be certain without seeing the actual process - even a process narrative would help.

I’ll bet a fine latte the hash index is not what’s slowing this down to the point of time-out. :winking_face:

I agree that the creation of the hash index should should be very quick. If the building of the hash is taking more than a couple of seconds at most, it probably isn’t being built efficiently.

Does the script run in scripting app? How long does it take in scripting app?

Nick_Richards
Participating Frequently
Forum|Forum|3 years ago
September 28, 2022

Thanks for the info @Bill.French. If you had to update a bunch of records, would you use the hash index in some way? Once a quarter, I need to update a single field for all records but I hit the timeout before it updates them all. How can I speed this up? My script is 8 lines long, one for loop across all records and updating a single field. Doesn’t seem like it should take this long.

Nick_Richards
Participating Frequently
Forum|Forum|3 years ago
September 28, 2022

Oh, and it’s just 150 records, so not a big dataset either.

+29

kuovonne
Brainy
Forum|Forum|3 years ago
September 28, 2022

You are probably updating the records one at a time instead of in batches. If you want to improve your scripting skills, you should learn to update records in batches.

But the easiest way to deal with it is probably to convert the script to Scripting Extension instead of an automation. Scripting Extension doesn’t have the same time limits. If you are worried that you will forget to run the script, have the automation send an email nagging to get it done.

Hashes are most useful when you need to find records based on a field value. They can speed things up if a script takes a lot of time searching for records. But if your script is taking most of its time writing updates, a hash won’t help that part.

+19

Bill_French
Author
Inspiring
Forum|Forum|3 years ago
September 28, 2022

It depends on the nature of the process. If an update process depends on lots of lookups, it’s likely a hash index will be beneficial.

I would add these finer points to use cases where hash indexes are helpful.

When you need to perform lookups repeatedly. The big dividend for hash indices is paid out when the index can be used dozens, hundreds, or even tens of thousands of times.
Finding records through a blistering fast index key is a common use case, and it’s deeply beneficial because you only need to load the data once despite needing hundreds o thousands of possible seeks into that data. However, hash indexes are useful for all sorts of tabular data, not just Airtable records. Imagine loading the entire US counties database over an HTTP request and then performing lookups to gather data properties from the external counties database. A hash index allows you to instantly cross reference by a simple post-code lookup that takes a few milliseconds.

Nick_Richards
Participating Frequently
Forum|Forum|3 years ago
September 28, 2022

You are probably updating the records one at a time instead of in batches. If you want to improve your scripting skills, you should learn to update records in batches.

@kuovonne Yes, this is what happened. I realized after that I was saving each one. I changed it to batching shortly after I asked the above question and it’s much quicker. If I do a Scripting Extension, am I still capped at updating 50 records at a time? And I’m assuming there’s no way to trigger a scripting extension from an automation? Or have these run automatically?

Nick_Richards
Participating Frequently
Forum|Forum|3 years ago
September 28, 2022

It depends on the nature of the process. If an update process depends on lots of lookups, it’s likely a hash index will be beneficial.

I would add these finer points to use cases where hash indexes are helpful.

When you need to perform lookups repeatedly. The big dividend for hash indices is paid out when the index can be used dozens, hundreds, or even tens of thousands of times.
Finding records through a blistering fast index key is a common use case, and it’s deeply beneficial because you only need to load the data once despite needing hundreds o thousands of possible seeks into that data. However, hash indexes are useful for all sorts of tabular data, not just Airtable records. Imagine loading the entire US counties database over an HTTP request and then performing lookups to gather data properties from the external counties database. A hash index allows you to instantly cross reference by a simple post-code lookup that takes a few milliseconds.

@Bill.French Thanks! I’ll certainly be keeping this one in my back pocket. Seems super useful.

+29

kuovonne
Brainy
Forum|Forum|3 years ago
September 28, 2022

Yes, you are still capped at sending 50 records in a single request, and 15 requests per second.

Both scripting extension and scripting actions can run scripts. You cannot call scripting extension from an automation, but you can often adapt a script from one environment to run in the other.

Since you switched to batching updates, you should have no problem keeping the script as an automation script.

+19

Bill_French
Author
Inspiring
Forum|Forum|3 years ago
September 28, 2022

Yes, and when doing this, it leaves few other dials to increase performance which means, higher throughput must come from all that happens around the 50-item batching.

One approach to faster performance is to not need fast throughput in the first place. You must assess things like -

Is it possible to do this process slowly without impacting users or the usability of your solution?
Reflect on the pathway that got you into this challenge in the first place? Is the data model painting you into a corner where massive batch updates are necessary?
Have you exhausted the possibility of a fully event-driven [real-time] architecture? This is to suggest that by processing data items as they occur, you spread the updates over minutes, seconds, and micro-seconds to eliminate the need to ever batch process anything.

Nick_Richards
Participating Frequently
Forum|Forum|3 years ago
September 28, 2022

@Bill.French Interesting questions that I need to ask myself.

So I have a bunch of companies that I do specific tasks for. Every quarter, I do the same task and process for each customer. I use Airtable to track the status during these quarterly tasks. Each quarter, I want to reset the status for each company to the beginning of the process.

So, is there a better way to do this reset? I don’t typically mass update records. But I love hearing how others tackle problems.

Forum|pagination.label 1 / 2

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded