Help

Re: Filtering with Loops in Javascript - How to Avoid Insanity

4081 0
cancel
Showing results for 
Search instead for 
Did you mean: 

As developers, we get roped into codifying a vast and seemingly endless escalation of conditional processing to find and filter records and related tables. Where does it end? Apparently, it doesn’t because as soon as you create the perfect script that loops across many filtering constraints, the client (or management) is going ask for a “slight” modification or you will invariably discover an unanticipated case where you’re standing in a corner holding a paintbrush without egress.

This and vastly many of our script projects are fundamentally based on two essential tasks -

  1. Filter out the data that we need
  2. Perform computations on the filtered data

Typically, we use loops to achieve #1. #2 is easy if we get #1 right. But #1 is never simple and rarely performant given enough conditional complexity and records.

What’s better than this approach? Fewer loops for sure because they are slow, and they make our lives shi**y.

The Remedy…

The term “filter”, and especially in a javascript sense, is subject to abuse. Why not exchange this term with “search”. Indeed, a JSON hash index or an actual inverted index is exactly the right chainsaw to mitigate the looping and increase direct access to lists of record IDs that match what the loop is intended to provide.

I use hash indexes all the time, but increasingly, I use Lunr to establish inverted indexes that are very agile and extremely fast. Imagine finding 200 record IDs from a collection of 30,000 rows where three fields match specific values and the fourth is “fuzzy”. Looping requires lots of code and plenty of time. An inverted index can do this in 20 milliseconds.

And when you factor in the advantages of term presence, wildcard searches, and field-specific boosting, the loop that builds the index is time extremely well spent. Plus, you can create and cache-forward index documents independent of your business logic.

One last comment concerning perfomance - there are many computations and metrics that we perform repeatedly for business analytics. Why not compute these metrics and store them in the search index so that when the CEO wants to see how her business is performing, you’ve reduced the overhead to a simple search query rendered at the edge in 500ms?

While an inverted full-text index offers many advantages, there’s a simple approach you can use to improve script performance while creating an elegant approach that drastically simplifies your code when nested loops seem the only viable route.

Hash Index

This is simply a JSON document intended to make it fast and simple to perform lookups; it’s literally a copy of a collection of items where the object key represents a way to directly access the data related to that key.

Let’s say you had a thousand records in a table and you needed to loop through it to find a given record whose name matched a specific customer name – and you need to perform this looping lookup repeatedly nested inside another enumeration. You have three options -

  1. Perform a FilterByFormula approach for each lookup; ugly, slow, convoluted.
  2. Loop through every record seeking a match; very slow and CPU-intensive, added code complexity.
  3. Perform a “seek” into a pre-built hash index; almost instant, elegant.

Approach 3 requires a single pass across all the records containing the customer names (that’s the downside). That pass would create a collection of objects in a single JSON document (ex., oCustomerNameHash that is global to your app) and looks something like this:

{ "ABC Transportation" : {
   "customer_id" : "1001",
   "customer_contact" : "Jimmy Johns",
   other attributes needed for the index...
   }
}
... objects representing rest of the customers ...

When you want to know the contact name for “ABC Transportation”, the code is simple. effortless, and requires only 6 milliseconds.

let customerContact = oCustomerNameHash["ABC Transportation"].customer_contact;

Even if the hash index has 50,000 items in it, the response time will be sub-20 milliseconds. This single line of code eliminates a nested loop and the complexity of doing so. It is possible because you performed the loop in advance of needing to perform the lookups. Essentially, you are caching forward an intelligent and performant class of data objects that can be reused throughout your script.

This makes it possible to vastly compress the code and complexity while also sparing the number of loop introspections required to a single pass across that data instead of vastly many passes which are required in nested filtering loops.

With a hash index like this, you may often find multiple cases where the hash index can be utilized, so it has the advantage of being ready to perform lookups for the duration of your script process. Furthermore, you may discover you need to perform lookups using additional keys which suggests multiple hash indexes for the same data.

Imagine you wanted to also lookup customer name by customer ID or customer contact by customer ID. This would require a new hash index like oCustomerIDHash. Instead of building a second hash, this added seek capability can be built into the same constructor loop so there’s no need to loop for every index dimension you might need.

Example… you can mix both indexed dimentions into one hash index document - this time we’ll call it oCustomerHash. By simply creating two keys for each customer - one for the customer name, and one for the customer ID – we have begun to create a complex indexing model. The lookup process is constant and the agility is magnified.

The objects in the index would include two entries for each customer.

{ "ABC Transportation" : {
   "customer_id" : "1001",
   "customer_contact" : "Jimmy Johns",
   other attributes needed for the index...
   }
}

{ "1001" : {
   "customer_name" : "ABC Transportation",
   "customer_contact" : "Jimmy Johns",
   other attributes needed for the index...
   }
}

Now you can query by customer name to get customer ID or by customer ID to get customer name instantly - no nested looping required.

let customerName = oCustomerHash["1001"].customer_contact;
let customerID   = oCustomerHash["ABC Transportation"].customer_id;

Building the Index

This is one approach that can be used to construct the index from an existing collection of records.

let oCustomerHash = {};
for (let record of oRecords)
{
    // add the customer name object
    oCustomerHash[oRecords[record].getCellValue("Customer Name")] = {
        "customer_id"      : oRecords[record].getCellValue("Customer ID"),
        "customer_contact" : oRecords[record].getCellValue("Customer Contact"),
     }
    // add the customer id object
     oCustomerHash[oRecords[record].getCellValue("Customer ID")] = {
        "customer_name"    : oRecords[record].getCellValue("Customer Name"),
        "customer_contact" : oRecords[record].getCellValue("Customer Contact"),
    }
}
31 Replies 31

Okay - but the entire meme of this topic is to not underthink it. :winking_face: This is what it looks like when I don’t underthink.

I really like this TOPIC and @Bill.French 's point of view:
to have gradually educated us to think about flying over our problems with a drone
instead of only searching the streets house by house, as a fireman would say when helping after a flood.

To stay in the metaphor, Bill FRENCH also teaches us how to build seawalls, deviate (data) streams
so that we don’t have to suffer floods again and again.
I’m very grateful for this teaching because it’s not what I found most easily elsewhere:
You learn languages, well known patterns, exceptions, workaround or hacky stuff
in several communities that talk about javascript, data, tables and frontend.
But flying over the landscape with a drone before starting to enter the streets and houses,
That’s really the line of Bill French’s teaching since I’m coming on the airtable community.

Then I agree with @kuovonne that it is not necessary to learn to fly a drone
if putting a ladder on a single house to look at it from its roof is enough.
But this Topic still needs the drone I think.
It did us good and it reminds us regularly.

Only one regret?
In all the airtable resources, community included, I don’t think I made a mistake
by not finding any example of script using hash index in the airtable context.
Am I wrong?

Why don’t I share this nice script exploiting hash index myself ?
Oh My god ! If you knew in which pit (from my own initial ignorance of GCP) I’m currently diving while breathing a " high depth " mix :grinning_face_with_big_eyes:
I will come back to you as soon as possible in another section (…CDN…) not to name it with a show&tell but I’m not ready yet since I’m only a part-time scripter and largely selftaught as soon as I get out of the airtable community incubator.

Bests,

oLπ