My script never stops

I’m definitely not an expert in asynchronous programming, but it seems to me like you are telling your script to await the results of the everyMonth() async function, but then never telling it to do anything with those results.

Is there a reason that you need to have your everyMonth() function be asynchronous?

You might try just removing the async and await keywords from your script entirely. I’m not sure I see any need for them in your script. And if you can’t do that because the everyMonth() function is surrounding other async functions, then maybe just try doing away with the wrapping everyMonth() function and let everything inside of it run inline in the script environment.

@Jeremy_Oglesby I just made it a function as I’m planning much more to the script, so to easier get around in it.
I’ve done this before and it definitely worked fine.
I just tried to take it out, but nothing changed though.

1 Like

You definitely have a lot of nested loops. I saw at least three loops, including two loops on the same query result.

I recommend two things:

  1. Throw in some console.log() statements in the loops to teach what is going one. Maybe with all the loops it just takes super long to process everything.

  2. Look for a different way of calculating what you want that doesn’t require all those nested loops.

I wouldn’t worry about the asynchronous call in this case. It looks like you do only one async call and if isn’t in a loop. Of course, I might have missed something as I am reading this on a 4” screen.

It’s the last loop that seems to be going on forever…
I’ll attempt another way of achieving roughly the same.

Out of curiosity what determines the speed that the script finishes except from amount of data to run through?
Whats the 80/20 - own CPU power, airtable serverspeed, internet speed or something else?

All of those things can affect the speed of a script. And, you don’t have much control over those things.

However, you do have control over the number one thing that affects the speed of the script—the actual code itself. There are many ways to do things in code. The more complicated the task, the more skill it takes to pick how to accomplish that task.

Here is an example. Suppose you need to bake enough cookies so that every student in the class gets three cookies. You could loop through the class roster and add three cookies for each student. The more students there are, the longer it will take to calculate the number of cookies. Or you could multiply the number of students by three, and the size of the class won’t really affect the calculation time. However, this faster method requires knowing that multiplying exists and that it produces the same result.

Now you situation is much more complicated, and you probably will need to retain some of your loops. However, maybe you don’t need all of the loops or maybe they don’t need to be nested. I can’t say for sure what specific change to you code will make it run faster. That would require a lot of time to (1) find out your exact goals and (2) understand your current algorithm.

Gosh, I hate this … (sorry - no answer in my message - just opinions because that’s what I do on Saturdays).

As developers we get roped into codifying a vast and seemingly endless escalation of conditional processing. Where does it end? Apparently it doesn’t because as soon as you resolve this, the client (or management) is going ask for a “slight” modification or you will invariably discover an unanticipated case where you’re standing in a corner holding a paintbrush without egress.

This and vastly many of our projects are fundamentally based on two essential tasks -

  1. Filter out the data that we need
  2. Perform computations on the filtered data

Typically, we use loops to achieve #1. #2 is easy if we get #1 right. But #1 is never simple and rarely performant given enough conditional complexity.

What’s better than this approach? Fewer loops for sure because they are slow, and they make our lives shi**y.

The Remedy…

The term “filter”, and especially in a javascript sense, is subject to abuse. Why not exchange this term with “search”. Indeed, a JSON hash index or an actual inverted index is exactly the right chainsaw to mitigate the looping and increase direct access to lists of record IDs that match what the loop is intended to provide.

I use hash indexes all the time, but increasingly, I use Lunr to establish inverted indexes that are very agile and extremely fast. Imagine finding 200 record IDs from a collection of 30,000 rows where three fields match specific values and the fourth is “fuzzy”. Looping requires lots of code and plenty of time. An inverted index can do this in 20 milliseconds.

And when you factor in the advantages of term presence, wildcard searches, and field-specific boosting, the one loop that builds the index is time extremely well spent. Plus, you can create and cache-forward index documents independent of your business logic.

One last comment - there are many computations and metrics that we perform repeatedly for business analytics. The amount of electricity consumed by the world to sustain analytics is likely many times the GDP of small industrial countries. Why not compute these metrics and store them in the search index so that when the CEO wants to see how her business is performing, you’ve reduced the overhead to a simple search query rendered at the edge in 500ms.

3 Likes

I am now engrossed in reading lunrjs documentation as my wife is elbowing me to (not so)subtly remind me that I am supposed to be installing new bath and shower trim in the hall bathroom…

Thanks a lot @Bill.French

1 Like

Had you indexed that task, you would have been able to find it quickly. Just sayin…

1 Like

Hi @Bill.French, @kuovonne & @Jeremy_Oglesby

Thank you for all your feedback.
I’m very green doing any-form of coding. Airtable has been the gateway drug to finally give into the temptation at giving JS a try.
So this is super valuable information.

I think the immediate problem has been as @kuovonne points out, that I might not initially have spend the time upfront really figuring out what I’m trying to do. Instead I just been building as I go. Which have ended up with my code being 1000 times too convoluted and complicated.

I’m very intrigued by @Bill.French and a JSON hash index - however might be a little to over my head at the moment - considering I’m using loops left front and center. However is there a place I can learn more about an inverted hash index and how to implement it? I’ve looked across the internet and it haven’t become much wiser.

As a general rule of thumb do you (@Bill.French , @kuovonne & @Jeremy_Oglesby ) recommend to always make designated views that has filtered out most of the irrelevant records before doing a selectRecordsAsync()?
As I think I have had this idea that I should try to solve this with code, so wonder how you guys look at this?

Sorry for all these stupid questions…

I agree that Airtable is a great place for the code curious to test out the waters.

That’s fine. It happens. Now that you know it happened, go back and figure out a plan. As you think and learn, give yourself permission to dump your existing code and start fresh. Starting fresh doesn’t mean that the time writing the original code was wasted—you were learning from the experience.

Bill explained it to me in a different post many months ago. Google "hash index” and don’t worry about the JSON part. It is just a key-value pair object. It doesn’t ever have to be converted to a JSON string

It is a trade off. If you use a view to pre-filter the records you run the risk of someone deleting the filter or changing the conditions. If you lock the view and the collaborator who locked the view ever leaves, you are stuck with a view that you can’t change, even if the needs of your base change. On the other hand, if you really need the performance enhancement of a filtered view, it may be worth the trade off.

Litterally what I did.
I think I got it all summed up in a few lines instead of two loops.

I think it’s the JSON bit that throws me. I’ll do some more research on it. Otherwise I might have to ask him.

Currently I’m the owner of the base, so I can lock it off completely. But as you mention there is definitely trade-offs. I hope I can grasp some of Bill’s hash-index method as that does sound like what I’m looking for.

1 Like

Two ideas here -

  1. Hash index
  2. Inverted index

#1 Hash Index

This is simply a JSON document intended to make it fast and simple to perform lookups; it’s literally a copy of a collection of items where the key represents a way to directly access the data related to that key.

Let’s say you had a thousand records in a table and you needed to loop through it to find a given record whose name matched a specific customer name. You have three options -

  1. Perform a FilterByFormula approach for each lookup; ugly, slow, convoluted.
  2. Loop through every record seeking a match; very slow and CPU-intensive, added code complexity.
  3. Extract the data from a pre-built hash index; almost instant, elegant.

Approach 3 requires a single pass across all the records containing the customer names (that’s the downside). That pass would create a collection of objects in a single JSON document (ex., oCustomerNameHash that is global to your app) that looks something like this:

{ "ABC Transportation" : {
   "customer_id" : "1001",
   "customer_contact" : "Jimmy Johns",
   other attributes needed for the index...
   }
}

When you want to know the contact name for “ABC Transportation”, the code is simple. effortless, and requires only 6 milliseconds.

let customerContact = oCustomerNameHash["ABC Transportation"].customer_contact;

Even if the hash index has 50,000 items in it, the response time will be sub- 20 milliseconds. This single line eliminates a nested loop and the complexity of doing so. It is possible because you performed the loop in advance of needing to perform the lookups. This makes it possible to vastly compress the code and complexity while also sparing the number of loop introspections required to just one pass instead of vastly many passes.

With a hash index like this, you often find multiple cases where oCustomerNameHash can be utilized. Furthermore, you may need multiple hash indexes for the same data. Imagine you wanted to also lookup by customer ID. This would require a new hash index like oCustomerIDHash. This can be built using the same loop so there’s no need to loop for every index you need.

Lastly, you can mix both indexes into one hash index document by simply creating two keys for each customer - one for the customer name, and one for the customer ID. The lookup process is constant.

#2 Inverted Index

This is a very different idea and ascribed to full-text indices. It makes it possible to perform lookups similar to a hash index but it uses an inverted architecture to make it possible to index multiple fields and perform global search for tokens or field-level search for specific tokens.

I believe you want to focus on the hash index for your process.

3 Likes

@Bill.French Thank you a million times for taking precious time to write this out.
Invaluable information…

2 Likes

Bill provides an excellent writeup on using a hash index. I’d like to add a few additional comments for anyone new to coding.

A hash index in an Airtable script is actually a JavaScript object. JSON is a format for storing structured data as text. The JavaScript object can be converted to JSON using JSON.stringify(obj) and the result will be a text string. Note that other programming languages can also convert their data structures to JSON, and JSON can represent many data structures that are not hash indexes. Neither the concept of a hash index nor JSON are specific to JavaScript.

Just to be clear, filterByFormula is a specific feature of Airtable’s REST API where you past an Airtable formula to the API as part of retrieving records. The actual filtering occurs on Airtable’s servers, not in your own code. The concept of filterByFormula does not existing in an Airtable script. Instead, Airtable scripts retrieve all the records in the table or view, and then your code must use JavaScript to find the desired records.

Keep in mind that building the index takes processing time. You must loop through all of the records once to build the index. If you need to look up only one value, creating a hash index might be overkill. However, if you find yourself needing to lookup multiple values (such as in a loop), a hash index can definitely speed things up dramatically.

1 Like

Excellent finer points - thanks for the clarifications!

Indeed it is! And it’s a nuance that we often conflate. In this case, the guideposts for @Kim_Trager1 were already getting lengthy so I cut this corner intentionally. When I describe JSON documents, I have the tendency to imagine them as javascript objects until such time as they are stored and then they become serialized javascript objects.

But it does - nothing prevents us from calling into the Airtable API from a script block. It’s probably not wise, nor is it performant, but that’s the point of the message. It’s a pathway to gather specific data; just a bad one an especially the case when enumerating records.

However, imagine a case where the records exist in another base. Is there a way in a Script Block to access Airtable records in another base? I don’t think there is, ergo - my mention of the API.

Might be? I think it is certainly overkill! This approach is only useful if the number combined milliseconds to create the index is < the number of milliseconds to perform direct record references for all the lookups. This is actually pretty easy to measure but to save readers some time - a hash index will almost always win.

1 Like

I think we are getting in the weeds here, but sometimes that’s fun.

I decided to point out the difference because someone new to coding who tries to google “hash table” for the topic at hand will get better search results versus googling “JSON hash index”.

Let me clarify, the filterByFormula concept that exists in the REST API does not exist in the Scripting API. Yes, a script can use filterByFormula when calling the REST API, but even in this case, filterByFormula is still a feature of the REST API, and not of the scripting api. There are lots of APIs out that that scripting can access. I would not call features in an api that a script interacts with a feature of the scripting platform itself.

Sometimes you might need to find a single value that is not a value stored in a cell. If you wanted to find out the value that occurs most frequently in the data set, a hash index would the way to go, even though you want a single value. Or someone new to coding might be adapting existing code with a hash index already in place because it is already written and he doesn’t fully understand the system yet.

1 Like

Yes, we are in the weeds.

Indeed. This is why I didn’t refer to it as such. I simply made the observation that there are three ways to access the data from a Script block and the REST API - while one such possible pathway - is not typically a good idea. But it is a pathway and it may be the only pathway if your script block needs to build a hash index from a table in a far off base, right?

1 Like

Another use case is when records in a table need to be updated by the script because it affects later iterations in a loop, but you don’t want to wait for a) the record to update, and then b) the full record list to query again.

I just finished coding this type of setup for a client. Long story short, it involves matching people in two groups: Group A and Group B. The As can only match to a single B, while the Bs can accept multiple matching As. However, each B can only accept limited number of matches, and the threshold is different for each B. In the table for the Bs, the count of matched As is handled by a count field targeting links to A records. Because part of the matching process includes finding all Bs who haven’t hit their match threshold, it would have added a massive delay to the main loop if I were to link an A to a B, then requery all the Bs again to get the change from the count field before trying to match the next A with the remaining available Bs. Instead, I built a hash index of all the relevant data for the Bs and manually updated each B’s count in that index after a match was made. That made the matching process incredibly fast.

3 Likes

Brilliant! I have used hash indices from time-to-time as a way to urgently shape a virtualized data reality in advance of the physical storage catching up. This makes it possible to perform earlier alerting, for example, long before other business logic is able to see the data pattern through queries.

Lately I’ve been on a performance tear - not sure what’s driving this, but I think clients are increasingly asking for data-sciency approaches in operational requirements. Thier data, of course, is growing for all of them and techniques like this will only become more important I think.

I’ve become so performance-biased that I recently switched to anklet socks to shave off about 3.5 seconds per foot dressing in the morning.

4 Likes

This topic was solved and automatically closed 3 days after the last reply. New replies are no longer allowed.