Jan 15, 2021 02:46 PM
I’m trying to make monthly reports of a few thousands entries.
However my script never stops, and not sure what the mistake is.
I’m pretty new to JS and not sure if it’s my loop within a loop that is the problems?
let wip, wipOrderView ,tableWIP, queryWIP
let cost, costReviewView, costTable, date, customer, garment, switchs, totalFactCost, costTime, estTotalCost, price
let currentMonth
wip='Work in Progress'
wipOrderView='Monthly_order_review'
cost = 'Costing'
costReviewView = 'Monthly_Overview'
date='Date'
customer='Customer'
garment = 'Garment'
switchs = 'Switch'
totalFactCost = 'Total fact cost'
costTime = 'Time'
estTotalCost = 'estTotalCost'
price = 'priceScripting'
tableWIP = base.getTable(wip).getView(wipOrderView)
costTable = base.getTable(cost).getView(costReviewView)
/////////////////////////////////////EVERY MONTHLY REVIEW //////////////////////////////////
let everyMonth = async () =>{
output.text(`This will take a little time as the system is compiling thousands of records...`)
let monthlyCostQuery, months, allUniqueMonths, allUniqeGarments, garmentTotals
monthlyCostQuery = await costTable.selectRecordsAsync()
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'November', 'December']
// Find all unique [Years and Months] and unique [Customer:Garment]
allUniqueMonths = []
allUniqeGarments=[]
monthlyCostQuery.records.forEach(el=> {
//Filter out unique [Year:Month]
let getMonth = new Date(el.getCellValue(date)).getMonth()
let getYear = new Date(el.getCellValue(date)).getFullYear()
let dateObj = {'year':getYear, 'month':months[getMonth]}
if(!allUniqueMonths.some(obj => obj['year']===getYear && obj['month']===months[getMonth])){
allUniqueMonths.push(dateObj)
}
//Filter out unique [Customer:Garment:EstimateCost:Price]
let getCustomer = el.getCellValueAsString(customer)
let getGarment = el.getCellValueAsString(garment)
let estCost = el.getCellValue(estTotalCost)[0]
let garPrice = 0
//rollup numbers - avoid if no number rolled up
if(el.getCellValue(price)){
garPrice = el.getCellValue(price)[0]
}
let garmentObj = {'customer':getCustomer, 'garment': getGarment, 'totalEst': estCost, 'price':garPrice}
if(!allUniqeGarments.some(obj=> obj['customer']===getCustomer && obj['garment']===getGarment && obj['totalEst']===estCost && obj['price'])){
allUniqeGarments.push(garmentObj)
}
})
//get total time, cost, staffcost, materialcost
garmentTotals = []
allUniqeGarments.forEach(obj=>{
let custArr = obj['customer']
let garmArr = obj['garment']
let estArr = obj['totalEst']
let priceArr = obj['price']
let totalTime = 0
let costStaff = 0
let costMaterial = 0
monthlyCostQuery.records.forEach(el => {
let custCost = el.getCellValueAsString(customer)
let garmCost = el.getCellValueAsString(garment)
let estCost = el.getCellValue(estTotalCost)[0]
let priceCost = 0
//rollup number issue when no number is rolled up
if(el.getCellValueAsString(price)){
priceCost = el.getCellValue(price)[0]
}
if(custArr === custCost && garmArr===garmCost && estArr === estCost && priceArr === priceCost){
if(el.getCellValueAsString(switchs)==='Material' || el.getCellValueAsString(switchs)==='Leather' ){
costMaterial += el.getCellValue(totalFactCost)
}else if(el.getCellValueAsString(switchs)==='Staff'){
costStaff += el.getCellValue(totalFactCost)
totalTime += el.getCellValue(costTime)
}
}
})
garmentTotals.push({'customer': custArr, 'garment':garmArr,'price':priceArr, 'timeTotal': totalTime, 'estimate':estArr, 'totalCost':+costStaff+costMaterial, 'staffCost':costStaff, 'materialCost':costMaterial})
})
console.log(garmentTotals)
}
await everyMonth()
Solved! Go to Solution.
Jan 19, 2021 08:09 AM
I agree that Airtable is a great place for the code curious to test out the waters.
That’s fine. It happens. Now that you know it happened, go back and figure out a plan. As you think and learn, give yourself permission to dump your existing code and start fresh. Starting fresh doesn’t mean that the time writing the original code was wasted—you were learning from the experience.
Bill explained it to me in a different post many months ago. Google "hash index” and don’t worry about the JSON part. It is just a key-value pair object. It doesn’t ever have to be converted to a JSON string
It is a trade off. If you use a view to pre-filter the records you run the risk of someone deleting the filter or changing the conditions. If you lock the view and the collaborator who locked the view ever leaves, you are stuck with a view that you can’t change, even if the needs of your base change. On the other hand, if you really need the performance enhancement of a filtered view, it may be worth the trade off.
Jan 19, 2021 08:48 AM
Litterally what I did.
I think I got it all summed up in a few lines instead of two loops.
I think it’s the JSON bit that throws me. I’ll do some more research on it. Otherwise I might have to ask him.
Currently I’m the owner of the base, so I can lock it off completely. But as you mention there is definitely trade-offs. I hope I can grasp some of Bill’s hash-index method as that does sound like what I’m looking for.
Jan 19, 2021 09:26 AM
Two ideas here -
#1 Hash Index
This is simply a JSON document intended to make it fast and simple to perform lookups; it’s literally a copy of a collection of items where the key represents a way to directly access the data related to that key.
Let’s say you had a thousand records in a table and you needed to loop through it to find a given record whose name matched a specific customer name. You have three options -
Approach 3 requires a single pass across all the records containing the customer names (that’s the downside). That pass would create a collection of objects in a single JSON document (ex., oCustomerNameHash that is global to your app) that looks something like this:
{ "ABC Transportation" : {
"customer_id" : "1001",
"customer_contact" : "Jimmy Johns",
other attributes needed for the index...
}
}
When you want to know the contact name for “ABC Transportation”, the code is simple. effortless, and requires only 6 milliseconds.
let customerContact = oCustomerNameHash["ABC Transportation"].customer_contact;
Even if the hash index has 50,000 items in it, the response time will be sub- 20 milliseconds. This single line eliminates a nested loop and the complexity of doing so. It is possible because you performed the loop in advance of needing to perform the lookups. This makes it possible to vastly compress the code and complexity while also sparing the number of loop introspections required to just one pass instead of vastly many passes.
With a hash index like this, you often find multiple cases where oCustomerNameHash can be utilized. Furthermore, you may need multiple hash indexes for the same data. Imagine you wanted to also lookup by customer ID. This would require a new hash index like oCustomerIDHash. This can be built using the same loop so there’s no need to loop for every index you need.
Lastly, you can mix both indexes into one hash index document by simply creating two keys for each customer - one for the customer name, and one for the customer ID. The lookup process is constant.
#2 Inverted Index
This is a very different idea and ascribed to full-text indices. It makes it possible to perform lookups similar to a hash index but it uses an inverted architecture to make it possible to index multiple fields and perform global search for tokens or field-level search for specific tokens.
I believe you want to focus on the hash index for your process.
Jan 20, 2021 06:54 AM
@Bill.French Thank you a million times for taking precious time to write this out.
Invaluable information…
Jan 21, 2021 11:21 AM
Bill provides an excellent writeup on using a hash index. I’d like to add a few additional comments for anyone new to coding.
A hash index in an Airtable script is actually a JavaScript object. JSON is a format for storing structured data as text. The JavaScript object can be converted to JSON using JSON.stringify(obj)
and the result will be a text string. Note that other programming languages can also convert their data structures to JSON, and JSON can represent many data structures that are not hash indexes. Neither the concept of a hash index nor JSON are specific to JavaScript.
Just to be clear, filterByFormula
is a specific feature of Airtable’s REST API where you past an Airtable formula to the API as part of retrieving records. The actual filtering occurs on Airtable’s servers, not in your own code. The concept of filterByFormula
does not existing in an Airtable script. Instead, Airtable scripts retrieve all the records in the table or view, and then your code must use JavaScript to find the desired records.
Keep in mind that building the index takes processing time. You must loop through all of the records once to build the index. If you need to look up only one value, creating a hash index might be overkill. However, if you find yourself needing to lookup multiple values (such as in a loop), a hash index can definitely speed things up dramatically.
Jan 21, 2021 11:38 AM
Excellent finer points - thanks for the clarifications!
Indeed it is! And it’s a nuance that we often conflate. In this case, the guideposts for @Kim_Trager1 were already getting lengthy so I cut this corner intentionally. When I describe JSON documents, I have the tendency to imagine them as javascript objects until such time as they are stored and then they become serialized javascript objects.
But it does - nothing prevents us from calling into the Airtable API from a script block. It’s probably not wise, nor is it performant, but that’s the point of the message. It’s a pathway to gather specific data; just a bad one an especially the case when enumerating records.
However, imagine a case where the records exist in another base. Is there a way in a Script Block to access Airtable records in another base? I don’t think there is, ergo - my mention of the API.
Might be? I think it is certainly overkill! This approach is only useful if the number combined milliseconds to create the index is < the number of milliseconds to perform direct record references for all the lookups. This is actually pretty easy to measure but to save readers some time - a hash index will almost always win.
Jan 21, 2021 12:07 PM
I think we are getting in the weeds here, but sometimes that’s fun.
I decided to point out the difference because someone new to coding who tries to google “hash table” for the topic at hand will get better search results versus googling “JSON hash index”.
Let me clarify, the filterByFormula
concept that exists in the REST API does not exist in the Scripting API. Yes, a script can use filterByFormula
when calling the REST API, but even in this case, filterByFormula
is still a feature of the REST API, and not of the scripting api. There are lots of APIs out that that scripting can access. I would not call features in an api that a script interacts with a feature of the scripting platform itself.
Sometimes you might need to find a single value that is not a value stored in a cell. If you wanted to find out the value that occurs most frequently in the data set, a hash index would the way to go, even though you want a single value. Or someone new to coding might be adapting existing code with a hash index already in place because it is already written and he doesn’t fully understand the system yet.
Jan 21, 2021 12:35 PM
Yes, we are in the weeds.
Indeed. This is why I didn’t refer to it as such. I simply made the observation that there are three ways to access the data from a Script block and the REST API - while one such possible pathway - is not typically a good idea. But it is a pathway and it may be the only pathway if your script block needs to build a hash index from a table in a far off base, right?
Jan 21, 2021 07:47 PM
Another use case is when records in a table need to be updated by the script because it affects later iterations in a loop, but you don’t want to wait for a) the record to update, and then b) the full record list to query again.
I just finished coding this type of setup for a client. Long story short, it involves matching people in two groups: Group A and Group B. The As can only match to a single B, while the Bs can accept multiple matching As. However, each B can only accept limited number of matches, and the threshold is different for each B. In the table for the Bs, the count of matched As is handled by a count field targeting links to A records. Because part of the matching process includes finding all Bs who haven’t hit their match threshold, it would have added a massive delay to the main loop if I were to link an A to a B, then requery all the Bs again to get the change from the count field before trying to match the next A with the remaining available Bs. Instead, I built a hash index of all the relevant data for the Bs and manually updated each B’s count in that index after a match was made. That made the matching process incredibly fast.
Jan 22, 2021 02:33 AM
Brilliant! I have used hash indices from time-to-time as a way to urgently shape a virtualized data reality in advance of the physical storage catching up. This makes it possible to perform earlier alerting, for example, long before other business logic is able to see the data pattern through queries.
Lately I’ve been on a performance tear - not sure what’s driving this, but I think clients are increasingly asking for data-sciency approaches in operational requirements. Thier data, of course, is growing for all of them and techniques like this will only become more important I think.
I’ve become so performance-biased that I recently switched to anklet socks to shave off about 3.5 seconds per foot dressing in the morning.