Re: Formula to calculate Median

Aparna_Kothary · ‎Aug 20, 2018

Is is possible to calculate the Median of a group of values in Airtable?

W_Vann_Hall · ‎Aug 21, 2018

Kinda.

In my Black Mirror base, in the [Episodes] table, field 95 or 96 is a formula field {Median}. However, the way I ‘calculated’ it was to create a concatenated string of values arranged in numeric order, extract the centermost value string, and convert it with VALUE(). In my case, all the values were expressed as two-digit integer values (left-padded with a zero if necessary), which made everything extremely simple. It works, but it’s not a universally applicable solution…

Aparna_Kothary · ‎Aug 21, 2018

Thanks so much. I looked at the base and found the field where the values are concatenated but can’t see the field where it pulls out the median. Can you explain a bit more how you pulled out the “centermost value strong”

W_Vann_Hall · ‎Aug 22, 2018

I should preface this entire reply by mentioning if all you wish to do is identify the median value, rather than extract it for use in other calculations, the summary field at the foot of all columns in a grid view contains a ‘Median’ option that will display the median of all values for the field as shown in the current view. That same tab offers other options currently not available as an Airtable formula function, including range, standard deviation, and a plotted histogram.

Sure! Again, the approach I followed certainly isn’t one that will apply to most uses, but perhaps my description will help you find a method suitable for your needs.

In that same [Episodes] table, you’ll find a sequence of 19 linked-record fields labeled {19|19} through {19|01}. These represent possible rankings for each of the 19 episodes of Black Mirror aired to date. You’ll also find there are 19 records in that table, each — unsurprisingly — representing one of the episodes.

Taken together, these 19 fields (columns) by 19 records (rows) represent a matrix of rankings by episode. To complete the matrix, I took the ordered rankings list from each of the 13 sites publishing a “Black Mirror Episodes: From Worst to Best” article and created a link from the appropriate ranking field to the corresponding site record in the [19 Episodes] table. (For instance, the wonderful but wonderfully idiosyncratic blog Data Meets Media ranked the very first episode aired, “The National Anthem,” as the 14th-best; accordingly, in field {19|14}, you’ll find a link to the ‘Data Meets Media’ record of [19 Episodes].) When I was done, each [Episodes] record contained 13 links to the [19 Episodes] table, distributed across those 19 linked-record fields.

To be able to analyze and report upon the similarities and differences across the various rankings, I had to find some way to represent this matrix numerically. Clearly, with 19 episodes but only 13 reviewing sites, not every cell in that 19 x 19 matrix could be filled. Furthermore, while there was very little critical consensus overall, reviewers were able to find common ground regarding certain episodes; for instance, five sites agreed the worst episode of the first four seasons was “The Waldo Moment.” The first step, then, in applying any sort of numeric evaluation — average or median ranking, standard deviation, and so on — would be to convert each row of the matrix into a sequence of values, weighted according to ranking.

This I do in the formula field {Distribution2}, which immediately precedes {Median} in [Episodes]. (A similar field, {Distribution}, records how many sites assigned a specific ranking to an episode. For instance, {Distribution} for “The Waldo Moment” has a value of 5221000000110000010, indicating 5 sites ranked it 19th, 2 sites ranked it 18th, 2 17th, 1 16th, and so on, up thorough the single site ranking it 2nd.) For {Distribution2}, I handle things a little differently: For each possible ranking for a given episode, if no review site assigned it that rank, I tack an empty string — '' — onto the value so far. Otherwise, for as many links exist for that ranking, I add the textual representation of that numeric rank. Accordingly, {Distribution2} for “The Waldo Moment” is 19191919191818171716090802. If I use a period to separate the rankings — 19.19.19.19.19.18.18.17.17.16.09.08.02 — one can easily see it contains 5 19s, 2 18s, 2 17s, and so forth. As there are 13 values concatenated in {Distribution2}, finding the median is as simple as extracting the middle two characters from the string and converting them to a number — which is what the {Median} formula field does.

Again, this is a very specific use case. Because of the way my range of values was determined, they were already arranged in numeric order. In addition, I could safely assume each value to be positive and two digits in length. I suspect your biggest problem will be getting your values into proper numeric order, as rollup fields follow the sequence in which linked records were added.

One possibility is to generate the links after sorting the values. This base offers a quick demonstration of the technique. In the [Main] table, the values in {Number} were entered at random. The records were then sorted. With the records in numeric order, I linked them all to the [CalcMedian] table by defining a linked-record field and then pasting a period ('.') into each cell. This creates the links in the same sequence as the sorted records.

Once the links were established, in the [CalcMedian] table I defined a rollup field {Sequence} that rolled up the {TwoDigit} field in [Main] with the aggregation function ARRAYJOIN(values,''), resulting in this sequence: 070808091012121313. Finally, I created a formula field, {Median}, with this formula:

IF(
    MOD(
        LEN(Sequence)/2,
        2
        )=1,
    VALUE(
        MID(
            Sequence,
            (LEN(Sequence)/2),
            2
            )
        ),
    (VALUE(
        MID(
            Sequence,
            (LEN(Sequence)/2)-1,
            2
            )
        )+VALUE(
            MID(
                Sequence,
                (LEN(Sequence)/2)+1,
                2
                )
            )
        )/2
    )

The first branch of the IF() statement arrives at a median for an array with an odd number of elements; the second handles arrays with an even number of elements. For instance, if you delete record #9 from [Main], {Median} will be recalculated as 9.5. (This formula assumes two-digit values.)

That’s probably about eight times as much detail as you wanted, but maybe there’s something there you can use…

Kendy511 · ‎Mar 22, 2021

Do we have any other solution for the values were expressed as more than two-digit integer values ?