Hello everyone! We recently launched some exciting new additions to our formula field with three new regex (regular expression) functions:
REGEX_MATCH(string, regex)
REGEX_EXTRACT(string, regex)
REGEX_REPLACE(string, regex, replacement)
These functions can be used to match character combinations within text strings. For those interested, Airtable’s REGEX functions are implemented using the RE2 regular expression library.
You can learn more about these functions in this article or check out some example use cases shared in the posts below.
Page 2 / 2
Hi all !
I’am try to use REGEX_REPLACE to replace , by \n and it’s works in this case:
@kuovonne’s suggestion of a rollup sounds like it’s what you want considering that you want each item on its own line. For the rollup’s aggregation formula, use the ARRAYJOIN() function to join all items with “\n” as the separator.
Correction: you’ll need two backslashes. The backslash itself is used by the regex interpreter to escape the character after it in the regex string. However, to actually represent the backslash as a backslash, it needs to be escaped by itself. Confusing, I know, but I’m pretty sure this will work where my previous suggestion likely won’t:
Correction: you’ll need two backslashes. The backslash itself is used by the regex interpreter to escape the character after it in the regex string. However, to actually represent the backslash as a backslash, it needs to be escaped by itself. Confusing, I know, but I’m pretty sure this will work where my previous suggestion likely won’t:
I would like to create a regular expression which captures a date/time string in the format (with optional leading zeroes for month, day and hour in 24-hour format):
M/D/YYYY Hss
Why is it that when I use the following expression, Airtable only returns the month instead of the whole string?
The above expression is intended to be very precise about the number ranges which I would like to accept for years, months, days, hours, minutes, seconds, etc.
However, when I apply the following expression, Airtable returns the entire date/time string:
The latter expression is obviously not preferred since it accepts date/time strings such as:
89/67/9349 55:47:58
instead of date/time strings I expect such as:
8/7/2019 5:47:58
If I am using the first expression, is there something I can change to make it return the whole string instead of the first group match?
Also, the golang variant at Regex101.com wasn’t able to work this out, because it advised the top expression (Airtable replaced the single backslashes with double backslashes on its own though). Is there a better website we can use to test regular expressions for use in Airtable? Ideally, I’d like one which uses double backslashes as well, but the more important criterion is that it is able to give me the correct regular expression to be using in cases like the above.
Thank you!
Hi Airtable Community,
I would like to create a regular expression which captures a date/time string in the format (with optional leading zeroes for month, day and hour in 24-hour format):
M/D/YYYY Hss
Why is it that when I use the following expression, Airtable only returns the month instead of the whole string?
The above expression is intended to be very precise about the number ranges which I would like to accept for years, months, days, hours, minutes, seconds, etc.
However, when I apply the following expression, Airtable returns the entire date/time string:
The latter expression is obviously not preferred since it accepts date/time strings such as:
89/67/9349 55:47:58
instead of date/time strings I expect such as:
8/7/2019 5:47:58
If I am using the first expression, is there something I can change to make it return the whole string instead of the first group match?
Also, the golang variant at Regex101.com wasn’t able to work this out, because it advised the top expression (Airtable replaced the single backslashes with double backslashes on its own though). Is there a better website we can use to test regular expressions for use in Airtable? Ideally, I’d like one which uses double backslashes as well, but the more important criterion is that it is able to give me the correct regular expression to be using in cases like the above.
Thank you!
Welcome to the community, @Ockenden_Tech! :grinning_face_with_big_eyes: Where are these date/time strings coming from? Are they contained in larger blocks of text? Do you want to parse the string once captured? I’m wondering if a combination of functions might be more effective, but knowing more about the data source and your end goal will help us guide you in the right direction. If you can provide an example or two of a string containing these date/time strings, that would be very helpful.
These date/time strings originally come from Google Sheets, and are stored in another field on the same table. Their original form was as an array, and they have already been converted to string format using ARRAYJOIN.
Yes, they are contained in larger blocks of text. I might want to parse the string, but at the moment, I am just trying to get the whole date/time string.
These date/time strings originally come from Google Sheets, and are stored in another field on the same table. Their original form was as an array, and they have already been converted to string format using ARRAYJOIN.
Yes, they are contained in larger blocks of text. I might want to parse the string, but at the moment, I am just trying to get the whole date/time string.
An example string would be:
Submission time: 9/17/2021 12:22:39
Name: John Smith
Thank you.
@Ockenden_Tech Thanks for that example. For something like that, I’d be more inclined to use the regex to look for the pieces around the part that you want to extract, so that you know where the date/time starts and ends, and then remove the date/time as a solid chunk. In this case, I’d target the colon-space combo right before the date, and the newline after the time. Try this:
REGEX_EXTRACT(string, "(?:.*: )(:^\n]*)(?:.*)")
@Justin_Barrett thanks again for your prompt response! Actually, the regular expression I gave before can also capture the date/time string:
The point I had been making though was that it would also extract “date/time” strings from text such as:
Submission time: 89/67/9349 55:47:58
Name: Jane Doe
I believe the regular expression you provided would do the same.
This is only an example, but I guess the wider point I am trying to make is: where can we find a good reference on the specific regular expression language that Airtable uses? Because it is simply not just the golang variant.
You see, I seem to be getting that round brackets () function very differently in Airtable compared to in the regular expression languages (or language variants) that I have been seeing on sites like Regex101.com. In standard regular expression languages, round brackets (), when used in conjunction with a pipe |, are meant to be used for grouping two permissible options, e.g. (\w|\d), but I believe that Airtable takes things a step further and thinks that just because the round brackets () are there, that the expression in between should be treated as a capturing group. I think this is the reason why for the first regular expression I provided, only the first matching group is shown, and not the rest. How is one supposed to define the boundaries of the alternating options? Would one need to use \b on either sides instead of round brackets ()?
On another but related note, in standard regular expression languages, (?:regex) is used to signify a non-capturing group, and (?i)regex is used to signify case-insensitivity. How is one supposed to express these concepts in Airtable, without the use of round brackets ()?
@Justin_Barrett thanks again for your prompt response! Actually, the regular expression I gave before can also capture the date/time string:
The point I had been making though was that it would also extract “date/time” strings from text such as:
Submission time: 89/67/9349 55:47:58
Name: Jane Doe
I believe the regular expression you provided would do the same.
This is only an example, but I guess the wider point I am trying to make is: where can we find a good reference on the specific regular expression language that Airtable uses? Because it is simply not just the golang variant.
You see, I seem to be getting that round brackets () function very differently in Airtable compared to in the regular expression languages (or language variants) that I have been seeing on sites like Regex101.com. In standard regular expression languages, round brackets (), when used in conjunction with a pipe |, are meant to be used for grouping two permissible options, e.g. (\w|\d), but I believe that Airtable takes things a step further and thinks that just because the round brackets () are there, that the expression in between should be treated as a capturing group. I think this is the reason why for the first regular expression I provided, only the first matching group is shown, and not the rest. How is one supposed to define the boundaries of the alternating options? Would one need to use \b on either sides instead of round brackets ()?
On another but related note, in standard regular expression languages, (?:regex) is used to signify a non-capturing group, and (?i)regex is used to signify case-insensitivity. How is one supposed to express these concepts in Airtable, without the use of round brackets ()?
True, but what’s the likelihood that an invalid date/time like that would come through?
That’s how round brackets are defined with the Golang variant on Regex101.com. Look in the lower-right corner at the Quick Reference section:
As you change language variants (“flavors” per the site), you’ll see that reference change.
Airtable’s variant uses the same syntax for the same things, so those exact same options will work in Airtable.
As I said, I’m pretty sure that Golang is the closest match. It used to be labeled something else, and the original post by Airtable staff about the regex functions mentions that former name when recommending Regex101.com as a test site, but every test I do on Regex101.com, I do with Golang. Every time I’ve copied the result to Airtable, it works without issues or changes.
Finally finding time to look into REGEX further, and considering the recent spike in development going on at Airtable, can REGEX please be revisited by Airtable Devs so that what executes on the website https://regex101.com(which is referred to as a practice site by Airtable themselves) can actually be executed within the Airtable formula field?
As a new user - the first thing I did was head to that website as per the Airtable recommendation, then test a simple REGEX word count by trying to parse a log;
REGEX_MATCH({Log_String}," - START RUN - ")
But the fact that Airtable REGEX_MATCH is only returning the first word match, which is NOT how Regex is designed, it should be returning all string matches - well… this limitation absolutely needs attention from Airtable development so that Regex meets the standard and delivers expected outcomes.
Where I could simply write;
" " & (REGEX_MATCH({Log_String}," - START RUN - ")
I have this… horrid… looking…
IF(((LEN({Log_String}) - LEN(SUBSTITUTE({Log_String}," - START RUN - ","")))/LEN(" - START RUN - "))," " & ((LEN({Log_String}) - LEN(SUBSTITUTE({Log_String}," - START RUN - ","")))/LEN(" - START RUN - ")),BLANK())
Finally finding time to look into REGEX further, and considering the recent spike in development going on at Airtable, can REGEX please be revisited by Airtable Devs so that what executes on the website https://regex101.com(which is referred to as a practice site by Airtable themselves) can actually be executed within the Airtable formula field?
As a new user - the first thing I did was head to that website as per the Airtable recommendation, then test a simple REGEX word count by trying to parse a log;
REGEX_MATCH({Log_String}," - START RUN - ")
But the fact that Airtable REGEX_MATCH is only returning the first word match, which is NOT how Regex is designed, it should be returning all string matches - well… this limitation absolutely needs attention from Airtable development so that Regex meets the standard and delivers expected outcomes.
Where I could simply write;
" " & (REGEX_MATCH({Log_String}," - START RUN - ")
I have this… horrid… looking…
IF(((LEN({Log_String}) - LEN(SUBSTITUTE({Log_String}," - START RUN - ","")))/LEN(" - START RUN - "))," " & ((LEN({Log_String}) - LEN(SUBSTITUTE({Log_String}," - START RUN - ","")))/LEN(" - START RUN - ")),BLANK())
Per the formula field reference page, the REGEX functions use the RE2 library, which is slightly different from the REGEX testing websites.
It is unlikely that the underlying REGEX library will be changed. You might want to use the RE2 library reference when building your expressions.
Per the formula field reference page, the REGEX functions use the RE2 library, which is slightly different from the REGEX testing websites.
It is unlikely that the underlying REGEX library will be changed. You might want to use the RE2 library reference when building your expressions.
In testing on an RE2 website, I noticed that it returned an array for matches - so I’m not too sure why Airtable devs returned just a boolean instead of the match count array length.
Until we hear back from Airtable development, I’m going to see what workarounds that I can employ via automated scripting - I have a hunch that this will return the match results I’m expecting, so all I need to do is to automate the script with a field updated trigger comparison.
EDIT:
Looking into a Script solution, I’ve put a working test into play that I think (with a bit more development) will work well. In summary, I’m gearing a Script to check all field names within a table for any that have a prefix of . Upon finding them the field name is used as a search term for matching. The below screenshot shows that “ Test_1” is being matching the term “Test_1” four times within the Target.
Now I just need to create some loop structures so that all the fields with are parsed, and that if the result time-stamp is before the File Name time-stamp then to execute the script.
True, but what’s the likelihood that an invalid date/time like that would come through?
That’s how round brackets are defined with the Golang variant on Regex101.com. Look in the lower-right corner at the Quick Reference section:
As you change language variants (“flavors” per the site), you’ll see that reference change.
Airtable’s variant uses the same syntax for the same things, so those exact same options will work in Airtable.
As I said, I’m pretty sure that Golang is the closest match. It used to be labeled something else, and the original post by Airtable staff about the regex functions mentions that former name when recommending Regex101.com as a test site, but every test I do on Regex101.com, I do with Golang. Every time I’ve copied the result to Airtable, it works without issues or changes.
@Justin_Barrett thanks for your reply. But as stated in my original post, I verified the regular expressions I posted above on Regex101.com under the golang variant, and they were approved; however, when I tried to use them in Airtable, they didn’t work.
That’s how round brackets are defined with the Golang variant on Regex101.com . Look in the lower-right corner at the Quick Reference section:
Right, but in this picture, (...) is specified on a standalone basis. What I was querying was the case when round brackets are meant to symbolise a group of options (...|...). I believe that in Golang, the presence of the outer (...) doesn’t transform the expression into a capturing group, i.e. other matching parts of the regular expression continue to be returned; however, in Airtable only the matches within the (...) are returned, even when the round brackets only serve to define a group, as in (...|...). This also makes it difficult to use expressions like (?:...) shown in your picture, and others I mentioned in my posts above.
Hence why I was questioning whether the Airtable regex language was indeed golang or something else? I have tried the RE2 variant as well, and it also doesn’t line up with Airtable.
Can the Airtable developers provide an exact reference guide to Airtable’s regex language?
Thanks.
@Justin_Barrett thanks for your reply. But as stated in my original post, I verified the regular expressions I posted above on Regex101.com under the golang variant, and they were approved; however, when I tried to use them in Airtable, they didn’t work.
That’s how round brackets are defined with the Golang variant on Regex101.com . Look in the lower-right corner at the Quick Reference section:
Right, but in this picture, (...) is specified on a standalone basis. What I was querying was the case when round brackets are meant to symbolise a group of options (...|...). I believe that in Golang, the presence of the outer (...) doesn’t transform the expression into a capturing group, i.e. other matching parts of the regular expression continue to be returned; however, in Airtable only the matches within the (...) are returned, even when the round brackets only serve to define a group, as in (...|...). This also makes it difficult to use expressions like (?:...) shown in your picture, and others I mentioned in my posts above.
Hence why I was questioning whether the Airtable regex language was indeed golang or something else? I have tried the RE2 variant as well, and it also doesn’t line up with Airtable.
Can the Airtable developers provide an exact reference guide to Airtable’s regex language?
Thanks.
The documentation only indicates that Airtable uses the RE2 library, but only the devs can say for sure how/if that library has been modified to be used by Airtable’s formula system. I recommend reaching out to Airtable support directly (in the app: Help → Contact support). While Airtable staffers do frequent the forum, they prefer to be approached directly with support requests, and questions like this definitely lean more in that direction.
This seemed like a good place to ask my regex question, which has two parts. One error and second how to make more than one search string (if possible).
I’m attempting to extract a variety of strings from a file path and then use those strings to fill in a linked or multiselect field with automation. I’ve had some success with a little support from the gurus at regex101, but since I’m getting an error on this expression when used in an airtable formula I thought it more appropriate to ask for help here.
file path
/3D Projects/21-951 LH Hot Cyclone/BFA/7353 LH Hot Cyclone to App 5-24-21/7353 LH Hot Cyclone to App 5-24-21.pdf
regex - from regex101
\bBFA\b =>BFA
airtable formula
REGEX_EXTRACT({file path},"\bBFA\b") => BFA --automation to set Status to BFA (success)
my problem is if the path does not contain BFA
/3D Projects/21-951 LH Hot Cyclone/7353 LH Hot Cyclone to App 5-24-21/7353 LH Hot Cyclone to App 5-24-21.pdf
REGEX_EXTRACT({file path},"\bBFA\b") => #ERROR!
The second thing is that I’d like to search for “BFA or APP or DES”, so expanding to look for 3 strings.
Any help is greatly appreciated.
This seemed like a good place to ask my regex question, which has two parts. One error and second how to make more than one search string (if possible).
I’m attempting to extract a variety of strings from a file path and then use those strings to fill in a linked or multiselect field with automation. I’ve had some success with a little support from the gurus at regex101, but since I’m getting an error on this expression when used in an airtable formula I thought it more appropriate to ask for help here.
file path
/3D Projects/21-951 LH Hot Cyclone/BFA/7353 LH Hot Cyclone to App 5-24-21/7353 LH Hot Cyclone to App 5-24-21.pdf
regex - from regex101
\bBFA\b =>BFA
airtable formula
REGEX_EXTRACT({file path},"\bBFA\b") => BFA --automation to set Status to BFA (success)
my problem is if the path does not contain BFA
/3D Projects/21-951 LH Hot Cyclone/7353 LH Hot Cyclone to App 5-24-21/7353 LH Hot Cyclone to App 5-24-21.pdf
REGEX_EXTRACT({file path},"\bBFA\b") => #ERROR!
The second thing is that I’d like to search for “BFA or APP or DES”, so expanding to look for 3 strings.
Any help is greatly appreciated.
Once again I found an answer to part of my question thanks to another post. The following solves the error part of my question. Is this the best way to handle this?
IF(
REGEX_MATCH({file path},“BFA”),
REGEX_EXTRACT({file path},“BFA”)
)
This seemed like a good place to ask my regex question, which has two parts. One error and second how to make more than one search string (if possible).
I’m attempting to extract a variety of strings from a file path and then use those strings to fill in a linked or multiselect field with automation. I’ve had some success with a little support from the gurus at regex101, but since I’m getting an error on this expression when used in an airtable formula I thought it more appropriate to ask for help here.
file path
/3D Projects/21-951 LH Hot Cyclone/BFA/7353 LH Hot Cyclone to App 5-24-21/7353 LH Hot Cyclone to App 5-24-21.pdf
regex - from regex101
\bBFA\b =>BFA
airtable formula
REGEX_EXTRACT({file path},"\bBFA\b") => BFA --automation to set Status to BFA (success)
my problem is if the path does not contain BFA
/3D Projects/21-951 LH Hot Cyclone/7353 LH Hot Cyclone to App 5-24-21/7353 LH Hot Cyclone to App 5-24-21.pdf
REGEX_EXTRACT({file path},"\bBFA\b") => #ERROR!
The second thing is that I’d like to search for “BFA or APP or DES”, so expanding to look for 3 strings.
Any help is greatly appreciated.
Answered part 2 of my question with this.
IF(
REGEX_MATCH({file path},(?:BFA|APP|DES)),
REGEX_EXTRACT({file path},(?:BFA|APP|DES))
)
Answered part 2 of my question with this.
IF(
REGEX_MATCH({file path},(?:BFA|APP|DES)),
REGEX_EXTRACT({file path},(?:BFA|APP|DES))
)
Thanks for sharing your solution, but you’re missing quotes in that final example:
I’d like to transform an attachments (images) column into a delimited list of URLs. I can do it with Regex using look aheads (i.e. the ?= syntax mentioned above) but since this feature is not supported by RE2, I’m wondering if anyone has a work around?
I would probably nest this in more Replace functions(s) to get rid of the () and swap the comma to a pipe etc.
I’d like to transform an attachments (images) column into a delimited list of URLs. I can do it with Regex using look aheads (i.e. the ?= syntax mentioned above) but since this feature is not supported by RE2, I’m wondering if anyone has a work around?
I would probably nest this in more Replace functions(s) to get rid of the () and swap the comma to a pipe etc.
@Matt_Kennedy1 Nesting definitely takes care of it. Here’s what I came up with, nested from inside to outside:
Replace the closing parenthesis after each URL, optionally adding everything up to the opening parenthesis of the next, with a single space. This gets rid of all filenames and all parentheses except for the very beginning.
Replace everything at the beginning—up to and including the opening parenthesis—with nothing.
Trim the string to get rid of the single space on the end.
Substitute the remaining single spaces between URLs with the desired separator.