Parse JSON File via Formula

Hi All,

I’m parsing JSON files into Airtable Long Text fields, and then setting up Formula Fields to relay the contents of that JSON. For example, the JSON data looks similar to this;

{"NAME":"Colour","VAL":"Red","ID":"1"},
{"NAME":"Shape","VAL":"Square","ID":"2"},
{"NAME":"Size","VAL":"500","ID":"3"},
{"NAME":"Scale","VAL":"2","ID":"4"}

At the moment, I’m using a fairly hacky method to display the data in a formula field, for example;

ABS(SUBSTITUTE(
RIGHT(MID({Log Content},FIND("Size", {Log Content}),17),4)
, '"',''))

But this is buggy as. Obviously if Size is more or less than 3 digits, the output fails or works simply by chance.

What would be the correct way to write a formula that retrieves the entire value between the inverted commas, regardless of the length?

I don’t know if you can write a formula for that, although maybe a REGEX formula can extract what you’re looking for. Or a custom JavaScript could probably do it.

I don’t know either REGEX or JavaScript, so I would personally just use the JSON Parsing tool of Make, which is a low-code automation tool that integrates with Airtable.

I don’t think we need an Airtable script or a 3rd party service to process this.

Here’s my next crack at a formula using Regex_Extract(), and then killing off the tail by substituting into oblivion.

Again, I think this is too hacky, although more robust than my original, so long as you know;

  • The maximum number of digits, including the potential of decimals
  • The trailing object code that needs to be substituted with a blank.
ABS(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
MID(
REGEX_EXTRACT(
{Log Content},
'Size","VAL":".*'),14,8)
, ',','')
, '"','')
, 'I','')
, 'D','')
, ':','')
)

However hacky… It’s surprisingly easy to maintain and adjust - I will see if I can make it more dynamic - I think using a FIND() variable to replace the hard-coded “14” might be better.

Another almost working method, using Regex_Replace - however it doesn’t work if there’s a decimal in the target;

VALUE(REGEX_REPLACE(MID({Log Content},FIND("Size", {Log Content}) ,22), "\\D", ""))

EDIT:
Getting closer!!

REGEX_REPLACE(
MID({Name},FIND('"Size","VAL":"',{Name})+ LEN('"Size","VAL":"'),8),"[^.0123456789]", "")

I’ve used formula fields to parse JSON-like text. I rely heavily on regular expressions for the task. That is also why I use JSON-like text, but not actual JSON. I only use it with JSON-like text that I create and fully control.

Here is part of a formula that I use to create my JSON-like text.
image

Here is the result of that formula:
image

Here is one formula where I parse out a rollup of that JSON-like text.
image

Thanks so much @kuovonne - JSON-like parsing should work, and thank you kindly for sharing your example. I will see if I can adjust it to suit.

Something I’m trying now - First step is the simplification of the data;

image

REGEX_REPLACE(
REGEX_REPLACE({JSON}, 
'["NAME","VAL","ID"]',''),
'["{:","}"]',' ')

This returns;

Colour Red 1
Shape Square 2
Size 500 3
Scale 2 4

So, next I need to figure out on how to search the word “Colour” and then return the following full word. And search the word “Size” and return the following number (including possibility of decimal).

1 Like

Use a regular expression testing website to get the bones of the regular expression. Then shoehorn the regular expression into the limitations of Airtable’s regular expression library.

Yah, good idea - although I struggle with then translating my solution back into the Airtables, such as this example where I’m removing ID":"#"},"

image

Airtable returns #Error!

Having a read and play with Regex this morning, this method works OK too, but I’m sure it can be further streamlined;

REGEX_REPLACE(
REGEX_REPLACE(
REGEX_EXTRACT({JSON Data},
'Colour","VAL":".*"ID'),
'Colour","VAL":"',""),
'","ID',""
)
REGEX_REPLACE(
REGEX_REPLACE(
REGEX_EXTRACT({JSON Data},
'Shape","VAL":".*"ID'),
'Shape","VAL":"',""),
'","ID',""
)

Both “Red”, “Square” and “Purple” , “Triangle” are returned respectively.

I believe my problem is from having the entire string arguments returned along with the desired center text, when realistically I just want the center text from the return of my argument. For example;

REGEX_EXTRACT(Data,'VAL":".*"ID')

I 'm only needing the .* text to return, not the wrappers.

Woo hoo, I finally found my solution! I needed to employ Regex Grouping using (), and Airtable returns the first group.

REGEX_EXTRACT(Data,'Size","VAL":"(.*)",')

So again, using this input;

{"NAME":"Colour","VAL":"Red","ID":"1"},
{"NAME":"Shape","VAL":"Square","ID":"2"},
{"NAME":"Size","VAL":"500","ID":"3"},
{"NAME":"Scale","VAL":"2","ID":"4"}

We can now easily extract values of variables into their own field using the same formula, but simply swapping out the variable name, Colour, Shape, Size etc.

Remember to wrap Integers or Decimals with ABS() too if not wanting a text string for numbers.

This is tidy as, I love it! Thanks @kuovonne and @ScottWorld - hopefully this might help you both too.

EDIT: Lol, the plot thickens. My test data had new lines, but unfortunately my real world data is one very long string, and now I have to figure out why Regex selects the very last ", in the string, rather than the next ", :sob:

Ahh ha! Added a ? to the wild card, which makes the search “non-greedy”.

REGEX_EXTRACT(Data,'Size","VAL":"(.*?)",')
2 Likes

Congrats! Persistence pays off! Getting a working REGEX pattern can be a battle. You now are armed to figure out many more REGEX problems.

I recommend wrapping the digits in VALUE() instead, because it was designed to convert text to number. It can also preserve negative numbers.

Quality test data that matches the myriad situations found in your real world data is super important, especially with REGEX. Cannot overemphasize the importance of testing.

Actually, Airtable’s VALUE() function is extremely problematic because it is actually designed to perform a large variety of mathematical functions & mathematical substitutions. This function causes way more trouble than it is worth. It’s too bad that the Airtable employees didn’t create a REAL text-to-number function for us to use.

1 Like

Yeah. But if you have a well crafted REGEX pattern, you can be sure of grabbing only digits and avoiding having VALUE() do weird math.

1 Like

I find that ABS() triggers the integer/decimal needed to allow numeric formatting. I’ll read up on Value, but suspect they’re both same-same for what I’m needing.

One other technique I’m also using is Substitute() on the JSON data to remove ‘ apostrophe characters, because I will go cross-eyed trying to regex data that also contains apostrophes :rofl:

Yeah. There are multiple ways of doing things with code. They both work for this particular use case, for now. But I think it is useful to know the differences between the functions.

ABS() returns the absolute value of a number. Basically that means it turns negative numbers positive. So you can never get a negative number from ABS().

The fact that ABS() does text to number conversion is an undocumented feature, which means that Airtable could change that behavior without notice.

On the other hand, VALUE() was specifically designed to convert text to a number, so it can return negative numbers, and it’s ability to convert text a number is not at risk of suddenly disappearing.

It is true that VALUE() does some very strange computations in some cases, but you can avoid that if you ensure that the text being passed in contains only digits, an optional negative sign at the beginning, and a single, optional decimal point.

1 Like

This topic was solved and automatically closed 3 days after the last reply. New replies are no longer allowed.