Apr 22, 2022 05:20 PM
Hi All,
I’m parsing JSON files into Airtable Long Text fields, and then setting up Formula Fields to relay the contents of that JSON. For example, the JSON data looks similar to this;
{"NAME":"Colour","VAL":"Red","ID":"1"},
{"NAME":"Shape","VAL":"Square","ID":"2"},
{"NAME":"Size","VAL":"500","ID":"3"},
{"NAME":"Scale","VAL":"2","ID":"4"}
At the moment, I’m using a fairly hacky method to display the data in a formula field, for example;
ABS(SUBSTITUTE(
RIGHT(MID({Log Content},FIND("Size", {Log Content}),17),4)
, '"',''))
But this is buggy as. Obviously if Size is more or less than 3 digits, the output fails or works simply by chance.
What would be the correct way to write a formula that retrieves the entire value between the inverted commas, regardless of the length?
Solved! Go to Solution.
Apr 29, 2022 07:24 PM
Woo hoo, I finally found my solution! I needed to employ Regex Grouping using (), and Airtable returns the first group.
REGEX_EXTRACT(Data,'Size","VAL":"(.*)",')
So again, using this input;
{"NAME":"Colour","VAL":"Red","ID":"1"},
{"NAME":"Shape","VAL":"Square","ID":"2"},
{"NAME":"Size","VAL":"500","ID":"3"},
{"NAME":"Scale","VAL":"2","ID":"4"}
We can now easily extract values of variables into their own field using the same formula, but simply swapping out the variable name, Colour, Shape, Size etc.
Remember to wrap Integers or Decimals with ABS() too if not wanting a text string for numbers.
This is tidy as, I love it! Thanks @kuovonne and @ScottWorld - hopefully this might help you both too.
EDIT: Lol, the plot thickens. My test data had new lines, but unfortunately my real world data is one very long string, and now I have to figure out why Regex selects the very last ", in the string, rather than the next ", :sob:
Ahh ha! Added a ? to the wild card, which makes the search “non-greedy”.
REGEX_EXTRACT(Data,'Size","VAL":"(.*?)",')
Apr 22, 2022 07:19 PM
I don’t know if you can write a formula for that, although maybe a REGEX formula can extract what you’re looking for. Or a custom JavaScript could probably do it.
I don’t know either REGEX or JavaScript, so I would personally just use the JSON Parsing tool of Make, which is a low-code automation tool that integrates with Airtable.
Apr 22, 2022 10:52 PM
I don’t think we need an Airtable script or a 3rd party service to process this.
Here’s my next crack at a formula using Regex_Extract(), and then killing off the tail by substituting into oblivion.
Again, I think this is too hacky, although more robust than my original, so long as you know;
ABS(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
MID(
REGEX_EXTRACT(
{Log Content},
'Size","VAL":".*'),14,8)
, ',','')
, '"','')
, 'I','')
, 'D','')
, ':','')
)
However hacky… It’s surprisingly easy to maintain and adjust - I will see if I can make it more dynamic - I think using a FIND() variable to replace the hard-coded “14” might be better.
Another almost working method, using Regex_Replace - however it doesn’t work if there’s a decimal in the target;
VALUE(REGEX_REPLACE(MID({Log Content},FIND("Size", {Log Content}) ,22), "\\D", ""))
EDIT:
Getting closer!!
REGEX_REPLACE(
MID({Name},FIND('"Size","VAL":"',{Name})+ LEN('"Size","VAL":"'),8),"[^.0123456789]", "")
Apr 23, 2022 12:27 PM
I’ve used formula fields to parse JSON-like text. I rely heavily on regular expressions for the task. That is also why I use JSON-like text, but not actual JSON. I only use it with JSON-like text that I create and fully control.
Here is part of a formula that I use to create my JSON-like text.
Here is the result of that formula:
Here is one formula where I parse out a rollup of that JSON-like text.
Apr 25, 2022 02:58 PM
Thanks so much @kuovonne - JSON-like parsing should work, and thank you kindly for sharing your example. I will see if I can adjust it to suit.
Something I’m trying now - First step is the simplification of the data;
REGEX_REPLACE(
REGEX_REPLACE({JSON},
'["NAME","VAL","ID"]',''),
'["{:","}"]',' ')
This returns;
Colour Red 1
Shape Square 2
Size 500 3
Scale 2 4
So, next I need to figure out on how to search the word “Colour” and then return the following full word. And search the word “Size” and return the following number (including possibility of decimal).
Apr 25, 2022 04:12 PM
Use a regular expression testing website to get the bones of the regular expression. Then shoehorn the regular expression into the limitations of Airtable’s regular expression library.
Apr 25, 2022 04:35 PM
Yah, good idea - although I struggle with then translating my solution back into the Airtables, such as this example where I’m removing ID":"#"},"
Airtable returns #Error!
Apr 29, 2022 06:55 PM
Having a read and play with Regex this morning, this method works OK too, but I’m sure it can be further streamlined;
REGEX_REPLACE(
REGEX_REPLACE(
REGEX_EXTRACT({JSON Data},
'Colour","VAL":".*"ID'),
'Colour","VAL":"',""),
'","ID',""
)
REGEX_REPLACE(
REGEX_REPLACE(
REGEX_EXTRACT({JSON Data},
'Shape","VAL":".*"ID'),
'Shape","VAL":"',""),
'","ID',""
)
Both “Red”, “Square” and “Purple” , “Triangle” are returned respectively.
I believe my problem is from having the entire string arguments returned along with the desired center text, when realistically I just want the center text from the return of my argument. For example;
REGEX_EXTRACT(Data,'VAL":".*"ID')
I 'm only needing the .* text to return, not the wrappers.
Apr 29, 2022 07:24 PM
Woo hoo, I finally found my solution! I needed to employ Regex Grouping using (), and Airtable returns the first group.
REGEX_EXTRACT(Data,'Size","VAL":"(.*)",')
So again, using this input;
{"NAME":"Colour","VAL":"Red","ID":"1"},
{"NAME":"Shape","VAL":"Square","ID":"2"},
{"NAME":"Size","VAL":"500","ID":"3"},
{"NAME":"Scale","VAL":"2","ID":"4"}
We can now easily extract values of variables into their own field using the same formula, but simply swapping out the variable name, Colour, Shape, Size etc.
Remember to wrap Integers or Decimals with ABS() too if not wanting a text string for numbers.
This is tidy as, I love it! Thanks @kuovonne and @ScottWorld - hopefully this might help you both too.
EDIT: Lol, the plot thickens. My test data had new lines, but unfortunately my real world data is one very long string, and now I have to figure out why Regex selects the very last ", in the string, rather than the next ", :sob:
Ahh ha! Added a ? to the wild card, which makes the search “non-greedy”.
REGEX_EXTRACT(Data,'Size","VAL":"(.*?)",')
Apr 30, 2022 05:07 AM
Congrats! Persistence pays off! Getting a working REGEX pattern can be a battle. You now are armed to figure out many more REGEX problems.
I recommend wrapping the digits in VALUE()
instead, because it was designed to convert text to number. It can also preserve negative numbers.
Quality test data that matches the myriad situations found in your real world data is super important, especially with REGEX. Cannot overemphasize the importance of testing.