Solved

Parse JSON File via Formula

Forum|Forum|3 years ago
April 23, 2022
13 replies
357 views

+24

Karlstens
Brainy

Hi All,

I’m parsing JSON files into Airtable Long Text fields, and then setting up Formula Fields to relay the contents of that JSON. For example, the JSON data looks similar to this;

{"NAME":"Colour","VAL":"Red","ID":"1"},
{"NAME":"Shape","VAL":"Square","ID":"2"},
{"NAME":"Size","VAL":"500","ID":"3"},
{"NAME":"Scale","VAL":"2","ID":"4"}

At the moment, I’m using a fairly hacky method to display the data in a formula field, for example;

ABS(SUBSTITUTE(
RIGHT(MID({Log Content},FIND("Size", {Log Content}),17),4)
, '"',''))

But this is buggy as. Obviously if Size is more or less than 3 digits, the output fails or works simply by chance.

What would be the correct way to write a formula that retrieves the entire value between the inverted commas, regardless of the length?

Best answer by Karlstens

Having a read and play with Regex this morning, this method works OK too, but I’m sure it can be further streamlined;

REGEX_REPLACE(
REGEX_REPLACE(
REGEX_EXTRACT({JSON Data},
'Colour","VAL":".*"ID'),
'Colour","VAL":"',""),
'","ID',""
)

REGEX_REPLACE(
REGEX_REPLACE(
REGEX_EXTRACT({JSON Data},
'Shape","VAL":".*"ID'),
'Shape","VAL":"',""),
'","ID',""
)

Both “Red”, “Square” and “Purple” , “Triangle” are returned respectively.

I believe my problem is from having the entire string arguments returned along with the desired center text, when realistically I just want the center text from the return of my argument. For example;

REGEX_EXTRACT(Data,'VAL":".*"ID')

I 'm only needing the .* text to return, not the wrappers.

Woo hoo, I finally found my solution! I needed to employ Regex Grouping using (), and Airtable returns the first group.

REGEX_EXTRACT(Data,'Size","VAL":"(.*)",')

So again, using this input;

{"NAME":"Colour","VAL":"Red","ID":"1"},
{"NAME":"Shape","VAL":"Square","ID":"2"},
{"NAME":"Size","VAL":"500","ID":"3"},
{"NAME":"Scale","VAL":"2","ID":"4"}

We can now easily extract values of variables into their own field using the same formula, but simply swapping out the variable name, Colour, Shape, Size etc.

Remember to wrap Integers or Decimals with ABS() too if not wanting a text string for numbers.

This is tidy as, I love it! Thanks @kuovonne and @ScottWorld - hopefully this might help you both too.

EDIT: Lol, the plot thickens. My test data had new lines, but unfortunately my real world data is one very long string, and now I have to figure out why Regex selects the very last ", in the string, rather than the next ", :sob:

Ahh ha! Added a ? to the wild card, which makes the search “non-greedy”.

REGEX_EXTRACT(Data,'Size","VAL":"(.*?)",')

+35

ScottWorld
Genius
Forum|Forum|3 years ago
April 23, 2022

I don’t know if you can write a formula for that, although maybe a REGEX formula can extract what you’re looking for. Or a custom JavaScript could probably do it.

I don’t know either REGEX or JavaScript, so I would personally just use the JSON Parsing tool of Make, which is a low-code automation tool that integrates with Airtable.

+24

Karlstens
Author
Brainy
Forum|Forum|3 years ago
April 23, 2022

I don’t know if you can write a formula for that, although maybe a REGEX formula can extract what you’re looking for. Or a custom JavaScript could probably do it.

I don’t know either REGEX or JavaScript, so I would personally just use the JSON Parsing tool of Make, which is a low-code automation tool that integrates with Airtable.

I don’t think we need an Airtable script or a 3rd party service to process this.

Here’s my next crack at a formula using Regex_Extract(), and then killing off the tail by substituting into oblivion.

Again, I think this is too hacky, although more robust than my original, so long as you know;

The maximum number of digits, including the potential of decimals
The trailing object code that needs to be substituted with a blank.

ABS(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
MID(
REGEX_EXTRACT(
{Log Content},
'Size","VAL":".*'),14,8)
, ',','')
, '"','')
, 'I','')
, 'D','')
, ':','')
)

However hacky… It’s surprisingly easy to maintain and adjust - I will see if I can make it more dynamic - I think using a FIND() variable to replace the hard-coded “14” might be better.

Another almost working method, using Regex_Replace - however it doesn’t work if there’s a decimal in the target;

VALUE(REGEX_REPLACE(MID({Log Content},FIND("Size", {Log Content}) ,22), "\\D", ""))

EDIT:
Getting closer!!

REGEX_REPLACE(
MID({Name},FIND('"Size","VAL":"',{Name})+ LEN('"Size","VAL":"'),8),"[^.0123456789]", "")

+29

kuovonne
Brainy
Forum|Forum|3 years ago
April 23, 2022

I’ve used formula fields to parse JSON-like text. I rely heavily on regular expressions for the task. That is also why I use JSON-like text, but not actual JSON. I only use it with JSON-like text that I create and fully control.

Here is part of a formula that I use to create my JSON-like text.

Here is the result of that formula:

Here is one formula where I parse out a rollup of that JSON-like text.

+24

Karlstens
Author
Brainy
Forum|Forum|3 years ago
April 25, 2022

Here is part of a formula that I use to create my JSON-like text.

Here is the result of that formula:

Here is one formula where I parse out a rollup of that JSON-like text.

Thanks so much @kuovonne - JSON-like parsing should work, and thank you kindly for sharing your example. I will see if I can adjust it to suit.

Something I’m trying now - First step is the simplification of the data;

REGEX_REPLACE(
REGEX_REPLACE({JSON}, 
'["NAME","VAL","ID"]',''),
'["{:","}"]',' ')

This returns;

Colour Red 1
Shape Square 2
Size 500 3
Scale 2 4

So, next I need to figure out on how to search the word “Colour” and then return the following full word. And search the word “Size” and return the following number (including possibility of decimal).

+29

kuovonne
Brainy
Forum|Forum|3 years ago
April 25, 2022

Thanks so much @kuovonne - JSON-like parsing should work, and thank you kindly for sharing your example. I will see if I can adjust it to suit.

Something I’m trying now - First step is the simplification of the data;

REGEX_REPLACE(
REGEX_REPLACE({JSON}, 
'["NAME","VAL","ID"]',''),
'["{:","}"]',' ')

This returns;

Colour Red 1
Shape Square 2
Size 500 3
Scale 2 4

Use a regular expression testing website to get the bones of the regular expression. Then shoehorn the regular expression into the limitations of Airtable’s regular expression library.

+24

Karlstens
Author
Brainy
Forum|Forum|3 years ago
April 25, 2022

Use a regular expression testing website to get the bones of the regular expression. Then shoehorn the regular expression into the limitations of Airtable’s regular expression library.

Yah, good idea - although I struggle with then translating my solution back into the Airtables, such as this example where I’m removing ID":"#"},"

Airtable returns #Error!

+24

Karlstens
Author
Brainy
Forum|Forum|3 years ago
April 30, 2022

Having a read and play with Regex this morning, this method works OK too, but I’m sure it can be further streamlined;

REGEX_REPLACE(
REGEX_REPLACE(
REGEX_EXTRACT({JSON Data},
'Colour","VAL":".*"ID'),
'Colour","VAL":"',""),
'","ID',""
)

REGEX_REPLACE(
REGEX_REPLACE(
REGEX_EXTRACT({JSON Data},
'Shape","VAL":".*"ID'),
'Shape","VAL":"',""),
'","ID',""
)

Both “Red”, “Square” and “Purple” , “Triangle” are returned respectively.

REGEX_EXTRACT(Data,'VAL":".*"ID')

I 'm only needing the .* text to return, not the wrappers.

+24

Karlstens
Author
Brainy
Answer
Forum|Forum|3 years ago
April 30, 2022

Having a read and play with Regex this morning, this method works OK too, but I’m sure it can be further streamlined;

REGEX_REPLACE(
REGEX_REPLACE(
REGEX_EXTRACT({JSON Data},
'Colour","VAL":".*"ID'),
'Colour","VAL":"',""),
'","ID',""
)

REGEX_REPLACE(
REGEX_REPLACE(
REGEX_EXTRACT({JSON Data},
'Shape","VAL":".*"ID'),
'Shape","VAL":"',""),
'","ID',""
)

Both “Red”, “Square” and “Purple” , “Triangle” are returned respectively.

REGEX_EXTRACT(Data,'VAL":".*"ID')

I 'm only needing the .* text to return, not the wrappers.

Woo hoo, I finally found my solution! I needed to employ Regex Grouping using (), and Airtable returns the first group.

REGEX_EXTRACT(Data,'Size","VAL":"(.*)",')

So again, using this input;

{"NAME":"Colour","VAL":"Red","ID":"1"},
{"NAME":"Shape","VAL":"Square","ID":"2"},
{"NAME":"Size","VAL":"500","ID":"3"},
{"NAME":"Scale","VAL":"2","ID":"4"}

We can now easily extract values of variables into their own field using the same formula, but simply swapping out the variable name, Colour, Shape, Size etc.

Remember to wrap Integers or Decimals with ABS() too if not wanting a text string for numbers.

This is tidy as, I love it! Thanks @kuovonne and @ScottWorld - hopefully this might help you both too.

Ahh ha! Added a ? to the wild card, which makes the search “non-greedy”.

REGEX_EXTRACT(Data,'Size","VAL":"(.*?)",')

+29

kuovonne
Brainy
Forum|Forum|3 years ago
April 30, 2022

Woo hoo, I finally found my solution! I needed to employ Regex Grouping using (), and Airtable returns the first group.

REGEX_EXTRACT(Data,'Size","VAL":"(.*)",')

So again, using this input;

{"NAME":"Colour","VAL":"Red","ID":"1"},
{"NAME":"Shape","VAL":"Square","ID":"2"},
{"NAME":"Size","VAL":"500","ID":"3"},
{"NAME":"Scale","VAL":"2","ID":"4"}

We can now easily extract values of variables into their own field using the same formula, but simply swapping out the variable name, Colour, Shape, Size etc.

Remember to wrap Integers or Decimals with ABS() too if not wanting a text string for numbers.

This is tidy as, I love it! Thanks @kuovonne and @ScottWorld - hopefully this might help you both too.

Ahh ha! Added a ? to the wild card, which makes the search “non-greedy”.

REGEX_EXTRACT(Data,'Size","VAL":"(.*?)",')

Congrats! Persistence pays off! Getting a working REGEX pattern can be a battle. You now are armed to figure out many more REGEX problems.

I recommend wrapping the digits in VALUE() instead, because it was designed to convert text to number. It can also preserve negative numbers.

Quality test data that matches the myriad situations found in your real world data is super important, especially with REGEX. Cannot overemphasize the importance of testing.

+35

ScottWorld
Genius
Forum|Forum|3 years ago
April 30, 2022

Congrats! Persistence pays off! Getting a working REGEX pattern can be a battle. You now are armed to figure out many more REGEX problems.

I recommend wrapping the digits in VALUE() instead, because it was designed to convert text to number. It can also preserve negative numbers.

Quality test data that matches the myriad situations found in your real world data is super important, especially with REGEX. Cannot overemphasize the importance of testing.

Actually, Airtable’s VALUE() function is extremely problematic because it is actually designed to perform a large variety of mathematical functions & mathematical substitutions. This function causes way more trouble than it is worth. It’s too bad that the Airtable employees didn’t create a REAL text-to-number function for us to use.

+29

kuovonne
Brainy
Forum|Forum|3 years ago
April 30, 2022

Yeah. But if you have a well crafted REGEX pattern, you can be sure of grabbing only digits and avoiding having VALUE() do weird math.

+24

Karlstens
Author
Brainy
Forum|Forum|3 years ago
May 1, 2022

I find that ABS() triggers the integer/decimal needed to allow numeric formatting. I’ll read up on Value, but suspect they’re both same-same for what I’m needing.

One other technique I’m also using is Substitute() on the JSON data to remove ‘ apostrophe characters, because I will go cross-eyed trying to regex data that also contains apostrophes :rofl:

+29

kuovonne
Brainy
Forum|Forum|3 years ago
May 1, 2022

I find that ABS() triggers the integer/decimal needed to allow numeric formatting. I’ll read up on Value, but suspect they’re both same-same for what I’m needing.

One other technique I’m also using is Substitute() on the JSON data to remove ‘ apostrophe characters, because I will go cross-eyed trying to regex data that also contains apostrophes :rofl:

Yeah. There are multiple ways of doing things with code. They both work for this particular use case, for now. But I think it is useful to know the differences between the functions.

ABS() returns the absolute value of a number. Basically that means it turns negative numbers positive. So you can never get a negative number from ABS().

The fact that ABS() does text to number conversion is an undocumented feature, which means that Airtable could change that behavior without notice.

On the other hand, VALUE() was specifically designed to convert text to a number, so it can return negative numbers, and it’s ability to convert text a number is not at risk of suddenly disappearing.

It is true that VALUE() does some very strange computations in some cases, but you can avoid that if you ensure that the text being passed in contains only digits, an optional negative sign at the beginning, and a single, optional decimal point.

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded