Help

New: REGEX formula functions

cancel
Showing results for 
Search instead for 
Did you mean: 
Jason
Airtable Employee
Airtable Employee

regex_functions

Hello everyone! We recently launched some exciting new additions to our formula field with three new regex (regular expression) functions:

  1. REGEX_MATCH(string, regex)
  2. REGEX_EXTRACT(string, regex)
  3. REGEX_REPLACE(string, regex, replacement)

These functions can be used to match character combinations within text strings. For those interested, Airtable’s REGEX functions are implemented using the RE2 regular expression library.

You can learn more about these functions in this article or check out some example use cases shared in the posts below.

50 Comments
Justin_Barrett
18 - Pluto
18 - Pluto

@Karlstens For any regex tokens that you want to be seen as “themselves” in the regex string, escape them with a backslash. Try this:

IF(REGEX_MATCH({Tasks}, "\[ \]"), "Incomplete", "Complete"))
Justin_Barrett
18 - Pluto
18 - Pluto

Correction: you’ll need two backslashes. The backslash itself is used by the regex interpreter to escape the character after it in the regex string. However, to actually represent the backslash as a backslash, it needs to be escaped by itself. Confusing, I know, but I’m pretty sure this will work where my previous suggestion likely won’t:

IF(REGEX_MATCH({Tasks}, "\\[ \\]"), "Incomplete", "Complete"))
Karlstens
11 - Venus
11 - Venus

Double backslash, understood! Cheers Justin.

Ockenden_Tech
5 - Automation Enthusiast
5 - Automation Enthusiast

Hi Airtable Community,

I would like to create a regular expression which captures a date/time string in the format (with optional leading zeroes for month, day and hour in 24-hour format):
M/D/YYYY H:mm:ss

Why is it that when I use the following expression, Airtable only returns the month instead of the whole string?

REGEX_EXTRACT(string,"(0?[1-9]|1[0-2])\\/(0?[1-9]|[1-2]\\d|3[0-1])\\/(19|20)\\d\\d\\s(0?[1-9]|1[0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9]")

The above expression is intended to be very precise about the number ranges which I would like to accept for years, months, days, hours, minutes, seconds, etc.

However, when I apply the following expression, Airtable returns the entire date/time string:
REGEX_EXTRACT(string,"\\d{1,2}\\/\\d{1,2}\\/\\d{4}\\s\\d{1,2}:[0-5][0-9]:[0-5][0-9]")

The latter expression is obviously not preferred since it accepts date/time strings such as:
89/67/9349 55:47:58

instead of date/time strings I expect such as:
8/7/2019 5:47:58

If I am using the first expression, is there something I can change to make it return the whole string instead of the first group match?

Also, the golang variant at Regex101.com wasn’t able to work this out, because it advised the top expression (Airtable replaced the single backslashes with double backslashes on its own though). Is there a better website we can use to test regular expressions for use in Airtable? Ideally, I’d like one which uses double backslashes as well, but the more important criterion is that it is able to give me the correct regular expression to be using in cases like the above.

Thank you!

Justin_Barrett
18 - Pluto
18 - Pluto

Welcome to the community, @Ockenden_Tech! :grinning_face_with_big_eyes: Where are these date/time strings coming from? Are they contained in larger blocks of text? Do you want to parse the string once captured? I’m wondering if a combination of functions might be more effective, but knowing more about the data source and your end goal will help us guide you in the right direction. If you can provide an example or two of a string containing these date/time strings, that would be very helpful.

Ockenden_Tech
5 - Automation Enthusiast
5 - Automation Enthusiast

@Justin_Barrett thanks so much for your prompt reply!

These date/time strings originally come from Google Sheets, and are stored in another field on the same table. Their original form was as an array, and they have already been converted to string format using ARRAYJOIN.

Yes, they are contained in larger blocks of text. I might want to parse the string, but at the moment, I am just trying to get the whole date/time string.

An example string would be:

Submission time: 9/17/2021 12:22:39
Name: John Smith

Thank you.

Justin_Barrett
18 - Pluto
18 - Pluto

@Ockenden_Tech Thanks for that example. For something like that, I’d be more inclined to use the regex to look for the pieces around the part that you want to extract, so that you know where the date/time starts and ends, and then remove the date/time as a solid chunk. In this case, I’d target the colon-space combo right before the date, and the newline after the time. Try this:

REGEX_EXTRACT(string, "(?:.*: )([^\n]*)(?:.*)")

Screen Shot 2021-10-15 at 11.12.47 PM

Ockenden_Tech
5 - Automation Enthusiast
5 - Automation Enthusiast

@Justin_Barrett thanks again for your prompt response! Actually, the regular expression I gave before can also capture the date/time string:

REGEX_EXTRACT(string,"\\d{1,2}\\/\\d{1,2}\\/\\d{4}\\s\\d{1,2}:[0-5][0-9]:[0-5][0-9]")

The point I had been making though was that it would also extract “date/time” strings from text such as:

Submission time: 89/67/9349 55:47:58
Name: Jane Doe

I believe the regular expression you provided would do the same.

This is only an example, but I guess the wider point I am trying to make is: where can we find a good reference on the specific regular expression language that Airtable uses? Because it is simply not just the golang variant.

You see, I seem to be getting that round brackets () function very differently in Airtable compared to in the regular expression languages (or language variants) that I have been seeing on sites like Regex101.com. In standard regular expression languages, round brackets (), when used in conjunction with a pipe |, are meant to be used for grouping two permissible options, e.g. (\w|\d), but I believe that Airtable takes things a step further and thinks that just because the round brackets () are there, that the expression in between should be treated as a capturing group. I think this is the reason why for the first regular expression I provided, only the first matching group is shown, and not the rest. How is one supposed to define the boundaries of the alternating options? Would one need to use \b on either sides instead of round brackets ()?

On another but related note, in standard regular expression languages, (?:regex) is used to signify a non-capturing group, and (?i)regex is used to signify case-insensitivity. How is one supposed to express these concepts in Airtable, without the use of round brackets ()?

Justin_Barrett
18 - Pluto
18 - Pluto

True, but what’s the likelihood that an invalid date/time like that would come through?

That’s how round brackets are defined with the Golang variant on Regex101.com. Look in the lower-right corner at the Quick Reference section:

Screen Shot 2021-10-18 at 9.00.36 AM

As you change language variants (“flavors” per the site), you’ll see that reference change.

Airtable’s variant uses the same syntax for the same things, so those exact same options will work in Airtable.

As I said, I’m pretty sure that Golang is the closest match. It used to be labeled something else, and the original post by Airtable staff about the regex functions mentions that former name when recommending Regex101.com as a test site, but every test I do on Regex101.com, I do with Golang. Every time I’ve copied the result to Airtable, it works without issues or changes.

Karlstens
11 - Venus
11 - Venus

Finally finding time to look into REGEX further, and considering the recent spike in development going on at Airtable, can REGEX please be revisited by Airtable Devs so that what executes on the website https://regex101.com (which is referred to as a practice site by Airtable themselves) can actually be executed within the Airtable formula field?

As a new user - the first thing I did was head to that website as per the Airtable recommendation, then test a simple REGEX word count by trying to parse a log;

REGEX_MATCH({Log_String}," - START RUN - ")

But the fact that Airtable REGEX_MATCH is only returning the first word match, which is NOT how Regex is designed, it should be returning all string matches - well… this limitation absolutely needs attention from Airtable development so that Regex meets the standard and delivers expected outcomes.

Where I could simply write;

"️ " & (REGEX_MATCH({Log_String}," - START RUN - ")

I have this… horrid… looking…

IF(((LEN({Log_String}) - LEN(SUBSTITUTE({Log_String}," - START RUN - ","")))/LEN(" - START RUN - ")),"️ " & ((LEN({Log_String}) - LEN(SUBSTITUTE({Log_String}," - START RUN - ","")))/LEN(" - START RUN - ")),BLANK())