Help

The Community will be undergoing maintenance on Friday January 10 at 2:00pm - Saturday January 11 at 2:00pm EST, and will be "read-only." For assistance during this time, please visit our Help Center.

Re: Multi-record calculations (c.f. 'running balance,' 'row number,' and the like)

6056 0
cancel
Showing results for 
Search instead for 
Did you mean: 

tl;dr version: A frequent query and/or complaint concerns users who wish to access data from a record for use in a calculation performed in a different record from the same table. Often this takes the form of a user asking ‘how to retrieve a value based on row number’; other times the user wishes to perform a calculation that assumes knowledge of the previous record — for instance, maintaining a running balance. A recently published base contains four approaches to this problem, three involving sequential records offset by varying numbers of days, and a fourth allowing retrieval of arbitrary data from the previous and previous-but-one record.


Lately, I’ve been playing around with cross-record or multi-record calculations — that is, the ability to retrieve a value from a prior record for use with calculations in the current record. This is a recurring issue for users who conflate Airtable’s spreadsheet-like appearance with presumed spreadsheet-like functionality and wonder why they can’t simply access the field in a specific row by row number. Another common related request is for information on how to calculate a running balance. Unfortunately, while both of these tasks are possible under Airtable (for a given value of ‘possible,’ at least), neither is easily or intuitively achieved.

Towards that end, I have created a demo base providing routines and techniques facilitating cross-record calculations. The base contains four different demonstrations, three involving the keeping of a running balance and based on dates, the fourth dealing with retrieval of arbitrary data from prior records without reference to date. The base contains five tables, one for each demonstration, plus a fifth, shared table used in calculations.

Examples contained in the base

  1. Calculating a running balance based on daily entries.
  2. Calculating a running balance based on an entry every workday
  3. Calculating a running balance based on irregular or inconsistent entries.
  4. Retrieving a value from the previous and previous-but-one records.

Strictly speaking, Examples 1 and 2 are unnecessary, as the method given in Example 3 not only supports the first and second use cases, it offers a much more robust implementation, far more tolerant of potential problems with data entry (most notably, skipped dates) than are the first two. I have chosen to retain them because (1) Examples 3 and 4 are most easily understood in comparison with Examples 1 and 2, and (2) conceivably, there may be use cases where a missed date is unacceptable, and an algorithm that breaks down, forcing intervention, would be preferable to one that self-heals.

Caveats and Conditions

  1. All methods require every record in the primary table to be linked to a single record in a secondary, calculation-focused table. This complicates data entry, as an explicit link must be created for each record generated. While not necessarily an onerous burden — creating the link for a newly entered record requires but two mouse-clicks — it does represent additional effort.[1]

  2. Any field or fields needing to be accessible cross-record must be passed explicitly by means of a rollup field that contains a value for every record in the table. In addition, the application must have some way of knowing where the value ends. Both requirements make these techniques less suitable for very large values or for values containing random or arbitrary data.

  3. Even when the data from a previous record are accessible by a later one, other obstacles may prevent their being used in calculations. For instance, none of the three ‘running balance’ examples is capable of calculating an actual running balance. Instead, the value passed from the prior record must first be copy-and-pasted into a different field in order to complete the calculation; otherwise, Airtable blocks creation of one of the three fields essential to the process from fear of generating a circular reference. (Although in reality this method does not cause such an error, it would be unreasonable to expect Airtable’s error-checking routines to be sophisticated enough to differentiate between this highly probable circular reference and an actual one.)

    It should also be noted there are many other algorithms requiring access to a previous day’s data (for instance, to calculate a daily relative price change) that can be implemented using these techniques without triggering Airtable’s protective urges.

  4. The three ‘running balance’ demonstrations all assume an integer balance between -99,999 and 999,999 — that is, a numeral of no more than six digits, including the minus sign. Values of fewer than six digits are zero-padded to simplify data extraction. The choice of six digits is a purely arbitrary one and may easily be changed; however, to alter the number of digits, add decimal support, or implement any similar change, both the encoding and the decoding routines will require updating. (For instance, in the ‘running balance’ examples, {DateBalance} contains the encoding formula and {PrevBalance} the decoding one.)

    For that matter, for Examples 1 and 2, at least, the decision to use a fixed number of digits is similarly arbitrary. (Although possible to implement using variable-length values, using one of a known length instead greatly simplifies the algorithm that drives the third example.) Users wishing to create a version of any ‘running balance’ example not dependent upon a fixed-length value should reference the formulas incorporated in Example 4, which passes a variable-length value.

Basic Mechanism

All four examples hang from the same basic framework: Some sort of processing takes place within the current record, ultimately arriving at a value specific to that record. In many instances, this may represent a derived value specific to that day; other times the value may depend on other fields within the record. Once the value has been calculated, it is concatenated with another value that uniquely identifies the record with a colon (’:’) as separator; in the demo base, this field is known as either {DateBalance} (Examples 1 – 3) or {IndexValue} (Example 4).

The record, along with all others in its table, is linked to the single record in the [Calc] table. Based on the reciprocal link, a rollup field within [Calc] retrieves the value of all {DateBalance}/{IndexValue} fields in the table, aggregating them with the ARRAYJOIN() function. The result is a field containing a comma-separated list of key:value pairs from the original table, with each pair representing a unique, record-specific key coupled with the field value for that specific record.

Back in the main table, a second rollup field is used to retrieve the field value for a different record, using an aggregation formula that searches the key:value list for a pair in which the key is appropriately offset from that of the current record. For instance, in Example 1, which maintains a running balance based on sequential, daily entries, the formula searches for a key equivalent to ({Today}-1). (To provide for as great a potential active timespan as possible. the ‘running balance’ routines normalize all dates to ‘days since the Unix Epoch, January 1, 1970.’) Similarly, Example 2 searches for the key representing the workday prior to TODAY().

Example 3 works a little differently: It searches for today’s key and then decrements through the key:value list an amount equal to the length of one key:value pair to arrive at the previous entry. And Example 4 eschews dates entirely; instead, it uses an index value derived from an Airtable autonumber field, seeking the key offset appropriately from the current record’s index.

Regardless of the mechanism used to determine which key:value pair to retrieve, once the key has been located, it is a trivial task to extract the corresponding value and apply it to the formula — unless doing so would create the potential for a circular reference. (For instance, the ‘running balance’ examples all store the current record’s key concatenated with the value of the running balance for that record. These concatenated fields are rolled up into a long string of values; the string is searched for the key corresponding to the previous day; and the concatenated value is extracted as the field {PrevBalance}, as it represents the value of {Rolling Balance} for the previous day.In theory, one would take the balance as of yesterday, add to it today’s profit or loss, and arrive at the new rolling balance as updated for today.

Circular References

The problem is that by making {PrevBalance} a component of {Rolling Balance}, and {Rolling Balance} an integral part of {DateBalance}, from which {PrevBalance} is derived, we run the risk of {PrevBalance} becoming, as the song has it, “[Its] Own Grandpa.” Accordingly, Airtable will not allow one to define interconnected fields in such a way as to risk a circular reference.

While much of the time such functionality offers a welcome bulwark against possible coding errors, in this particular case it is basically a pain in the rear. For while we acknowledge the risk, we also know we are not creating a circular reference — to do so we would have to match the current record’s key and extract its associated data — yet there is no way by which one may bypass or disable Airtable’s protective routines.

Accordingly, the previous record’s running balance is extracted into a field named {PrevBalance}, the value of which must be manually transferred into the numeric {Previous Balance} field of the same record. (This may be accomplished using the Crtl-C/Ctrl-V copy-and-paste key sequence; the value may be copied by selecting {PrevBalance}'s fill handle and dragging it to {Previous Balance}; or the process may be automated using a third-party middleware utility, such as Zapier.)

Determining the Length of the Value Portion of the Pair

As mentioned earlier, the three ‘running balance’ examples all expect the ‘value’ portion of the key:value pair to be a fixed length of six digits; the encode routines, when creating key:value pairs, zero-pad string representation of value to ensure it is the proper length. While using a fixed length certainly simplifies Example 3’s implementation, it makes little difference either way for Example 1 or 2, and Example 4’s dependence upon arbitrary data makes fixed-length values an impossibility.

The method used by Example 4 to decide how many characters comprise the value portion of the pair can easily be applied to other examples, as well. Once the desired key has been found, the formula extracts the sequence of characters starting at the point immediately following the key and continuing through the character immediately preceding the first instance of the ARRAYJOIN() separator character encountered. Undoubtedly, that description sounds far more daunting on a first read-through than it actually is: Essentially, the routine starts with the character following the colon separating the key and value and reads each character until it finds a comma, which is the default ARRAYJOIN() separator.

For instance, were we to use this algorithm in Example 1 on March 7, 2018, we would first search for the key ‘17597:’. (March 7, 2018, is day 17,598 since the epoch; since we want the value for the previous day, March 6, we search for a key one less than that for today.) As you can see in the base, the value of {DayByDay::DateBalance} for March 6 is 17597:007176. As each key:value pair, when rolled up by the [Calc] table, is immediately followed by a comma, this means the portion of the key:value list important to us reads 17597:007176,. We start at the ‘0’ immediately following the colon and read through the ‘6’ immediately before the comma, which gives us 007176 — or $7,176.00, the value of {Running Balance} as of March 6.

Arbitrary Intervals, Arbitrary Data

As implemented, the algorithms driving Examples 1 and 2 are extremely unforgiving. Example 1 expects there to be one, and only one, entry made every day, 365 days a year. Similarly, Example 2 expects there to be one, and only one, entry made each workday of the year. Accordingly, if there are any weekday holiday observed during the year, it is essential each invocation of the WORKDAY() function made in Example 2’s formulas includes the optional parameter consisting of a comma-separated list of all such observed holidays in ISO date format; otherwise, any observed but unlisted holiday — and any day without an entry under Example 1 — will break the algorithm. Furthermore, it will remain broken until someone intervenes manually to put things back to order, either by hard–coding workarounds into the encode/decode formulas or by generating dummy entry records to fill in the gaps.

Now, this very well might be desired behavior; certainly, if missing an entry shuts down the entire base, there’s little chance of a skipped day going unnoticed. On the other hand, if for your application missing a day is an annoyance, not a catastrophe, you may not feel like having to make an heroic effort after every snow day or power outage. If that is the case — or if the interval between entries is irregular or unpredictable, the algorithm implemented in Example 3 might be a better fit.

As mentioned earlier, it works most smoothly with values of known length. Unlike the routines used in the first two examples, the decoding formula here doesn’t search for the target record’s key but instead finds the one for the current record. It then decrements its pointer one greater than the length of a value, positioning itself at the start of the previously stored value, which it then retrieves. Since it selects the target value based entirely on its position relative to the current record, the routine need not know the target’s index, creation date, or any such identifying characteristic.

Example 4 also foregoes any data-driven attempt to identify the target record, instead relying, as did the previous example, entirely on sequence and relative positioning. However, since it supports variable-length values, it cannot locate the target value based on its relative offset from the current record’s key. Instead, it uses an Airtable autonumber field to generate sequential keys, searching for the target based on its sequence number relative to the current record’s. As the length of the target value is unknown, the routine uses the method described earlier to determine the final character to retrieve.

Since the relationship among records is completely disconnected from any data they contain, the technique used in Example 4 supports dynamic reordering of the key:value pair. For instance, consider the table [ToDo], each record of which represents a task needing to be performed. Typically such lists are created during a period of brainstorming, with outstanding obligations recorded in the order in which they were recalled, rather than by priority. Clearly, any attempt to manage the list according to this ‘found’ sequence runs the risk of prioritizing insignificant actions over more-critical ones.

However, Airtable allows the user to rearrange a table’s records by selecting them and, with the mouse button still depressed, dragging them to their preferred position in the list. Once a new order has been imposed, the records’ sequencing can be updated to match by first changing the autonumber field to one of a different type — typically number or single-line text — and then returning its configuration to autonumber. This renumbers fields according to how they are ordered in the current view, with the top-most record receiving autonumber 1, the next-top-most 2, and so one. Since the Example 4 algorithm determines the relationship among records as based upon their autonumber sequence, such renumbering also changes their relative positioning: The first key:value pair in the list will be that of the highest-ranked record; the key:value pair that follows it will be the second-ranked record; and so on.

Issues and Gotchas

It’s probably important to note once again these routines do not magically transform an Airtable table into a spreadsheet navigable in two dimensions. One still cannot access a specific cell based upon an arbitrary column and row assignment; instead, one must do so according to some quality known about the target record: a unique, data-based key, perhaps, or its relative or absolute position in record sequence.

Any scheme that makes use of fixed-length fields, either for convenience or from necessity, must ensure the chosen length will contain the largest potential value contemplated — including a decimal indicator and minus sign, if applicable. Implementations supporting variable-length values, on the other hand, must ensure the ARRAYJOIN() function uses a separator character that will not be found within any stored value. Finally, any approach based upon internal, undocumented Airtable data structures — for instance, the order in which rolled up key:value pairs are stored and displayed — is always at risk of being broken by a future Airtable update. It is up to the user to decide whether such risks outweighs the possible benefits of reckless coding.

4 Replies 4
Brett_Snelgrove
6 - Interface Innovator
6 - Interface Innovator

Thank you for you this. I will need to read thru it several time to fully understand it and the sample base.

Do you think this approach will solve this request/issue I flagged recently here?

Thanks in advance!

Russell_Bishop1
8 - Airtable Astronomer
8 - Airtable Astronomer

Probably the most impressive thing I’ve seen done with Airtable - to the point where I really hope this would be surpassed by native functionality.

Thank you for your genius @W_Vann_Hall

Russell_Bishop1
8 - Airtable Astronomer
8 - Airtable Astronomer

To help with Caveat 1., I would recommend grouping your Table by Link, so that every new entry is automatically linked to ‘.’ (or your name of the Link table record).

Also continuing this thread – I had to solve a similar problem but needed to allow for multiple records a day (with TIME as well as DATE).

I changed the DATETIME_FORMAT from ‘X’ to ‘YYMMDDHHmm’ and it seems to work perfectly, although I’m not actually sure that the Rollup is reading the whole key?

Unfortunately I can’t quite decipher that whole formula.