Skip to main content

New: REGEX formula functions


Forum|alt.badge.img+20

Hello everyone! We recently launched some exciting new additions to our formula field with three new regex (regular expression) functions:

  1. REGEX_MATCH(string, regex)
  2. REGEX_EXTRACT(string, regex)
  3. REGEX_REPLACE(string, regex, replacement)

These functions can be used to match character combinations within text strings. For those interested, Airtable’s REGEX functions are implemented using the RE2 regular expression library.

You can learn more about these functions in this article or check out some example use cases shared in the posts below.

50 replies

Forum|alt.badge.img+20
  • Author
  • Inspiring
  • 375 replies
  • January 12, 2021

Validating and normalizing phone numbers

If you need to validate a set of phone numbers against specific criteria, you can use REGEX_MATCH() like in the formula below.

REGEX_MATCH( {Possible Phone Number}, '^([+]?[0-9]( |-)?)?(\\(?[0-9]{3}\\)?|[0-9]{3})( |-)?([0-9]{3}( |-)?[0-9]{4}|[a-zA-Z0-9]{7})$')

This regular expression handles a number of different phone number formats:

  • Country codes
  • Using dashes or spaces as delimiters
  • Parentheses around the area code
  • Using letters instead of numbers

Next, if it’s a valid phone number, we can normalize it using REGEX_REPLACE to make it easier to work with in the rest of our Airtable base:

IF({Is Valid Phone Number?}, UPPER(REGEX_REPLACE( {Possible Phone Number}, '[^A-Za-z0-9]', '')), ERROR('Invalid phone number'))

If the phone number passed validation, we normalize it by using REGEX_REPLACE() to replace non-alphanumeric characters with an empty string, resulting in just the dialable digits. We also use the UPPER() formula to make the casing consistent.

Normalizing phone numbers into a prettier format is an exercise left to the reader :slightly_smiling_face:


Forum|alt.badge.img+20
  • Author
  • Inspiring
  • 375 replies
  • January 12, 2021

Validating email addresses

Similar to the other example above, you can use REGEX_MATCH() to validate a list of email addresses as well.

REGEX_MATCH( {Email address}, "(\\W|^)[\\w.\\-]{0,25}@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}(\\W|$)")

This regex handles basic validation for email addresses:

  • Starts with a username that is composed of letters, numbers, or characters like underscore, period or dash.
  • An at-sign (@).
  • A domain name, which is a string of letters, numbers, or characters like underscore, period, or dash, followed by a period, followed by another string of letters, numbers, or characters like underscore, period or dash.

Email validation is a very complicated subject, but this regex is a good starting point for most use cases!


Forum|alt.badge.img+20
  • Author
  • Inspiring
  • 375 replies
  • January 12, 2021

Extracting domain from URLs

As one more example, you can use REGEX_EXTRACT to extract the domain name from a URL field:

REGEX_EXTRACT( {URL}, '^(?:https?:\\/\\/)?(?:[^@\n]+@)?(?:www\\.)?([^:\\/\n?]+)')

URLs start with ‘http’ or ‘https’, possibly a username followed by an ‘@’, the domain, and then the rest of the URL. Here, we strip off the subdomain if it’s ‘www’, but otherwise preserve it:


Forum|alt.badge.img+20
  • Author
  • Inspiring
  • 375 replies
  • January 12, 2021

Regular expressions are a powerful capability for working with textual data, and these examples are just a taste for what’s possible. Stack Overflow has many highly-voted examples of usecases for regexes, and regex101.com’s RE2 flavor is a great way of testing your regexes that is compatible with Airtable.

And lastly, for those musically inclined, I’m just going to leave this here…


ScottWorld
Forum|alt.badge.img+33
  • Brainy
  • 8795 replies
  • January 12, 2021

Thanks, @Jason! Very exciting stuff, and thanks for all the examples! :slightly_smiling_face:

I’ve discovered that this is a great REGEX function for extracting email addresses from a string of text:
[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,63}

However, as discussed in this thread, the REGEX_EXTRACT function only returns the first matching string, instead of all matching strings from a field.

Is this something that the Airtable team is planning to address in the future?

In the meantime, this tool from MiniExtensions allows for extracting multiple strings with REGEX:


Forum|alt.badge.img+19
  • Inspiring
  • 3264 replies
  • January 12, 2021
ScottWorld wrote:

Thanks, @Jason! Very exciting stuff, and thanks for all the examples! :slightly_smiling_face:

I’ve discovered that this is a great REGEX function for extracting email addresses from a string of text:
[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,63}

However, as discussed in this thread, the REGEX_EXTRACT function only returns the first matching string, instead of all matching strings from a field.

Is this something that the Airtable team is planning to address in the future?

In the meantime, this tool from MiniExtensions allows for extracting multiple strings with REGEX:


This is precisely why none of these functions should have been developed in the first place.

The right approach to introducing RegEx() and other less obvious but needed functions such as Split() and Math() is to introduce a single function that allows us to call arbitrary Script Blocks that perform computations and return values.

If Airtable would do this – and only this – a vast number of functions would begin to emerge. There would be aftermarket function libraries that you plugin and use. The functions would be able to use all of the power of javascript functionality including every aspect of RegEx(), arrays, and multi-pass processing. There would be no limitations to what we might create and distribute to other users.

It’s great to see that Airtable is always chugging ahead, but it’s unfortunate they haven’t created a universal ability to leverage that which already exists - a fully-baked javascript environment without limitations.

I just don’t get it - they could look like geniuses or they could do something that is once-again, half-baked and extremely limited.


dilipborad
Forum|alt.badge.img+20
  • Brainy
  • 216 replies
  • January 12, 2021

Thanks,
Now it adds more power/command in search and replace data.

These things are also working fine with API. while we fetch data from AirTable.
Now we have less complexity in search records from the table.

Previously :
filterByFormula=AND({FIELD_NAME}=“FULL_VALUE”)
But if there is a problem with this. Because it matches the exact string only.

Now :
REGEX_MATCH({FIELD_NAME}, ‘PARTIAL_VALUE’)
If you add any word that contains that field, that’s also matches and returns in the result.


Forum|alt.badge.img+19
  • Inspiring
  • 3264 replies
  • January 25, 2021
dilipborad wrote:

Thanks,
Now it adds more power/command in search and replace data.

These things are also working fine with API. while we fetch data from AirTable.
Now we have less complexity in search records from the table.

Previously :
filterByFormula=AND({FIELD_NAME}=“FULL_VALUE”)
But if there is a problem with this. Because it matches the exact string only.

Now :
REGEX_MATCH({FIELD_NAME}, ‘PARTIAL_VALUE’)
If you add any word that contains that field, that’s also matches and returns in the result.


This is a fine improvement and will certainly get some use (even by me), but I can think of many RegEx queries that I cannot perform given the method in which they implement this.

I think it’s good to provide users with “helper” functions that soften the complexity of RegEx, but why not allow the primitive itself as well?


Is there a way to use the REGEX function to determine if the contents of a text field are in ENGLISH language ONLY.

something like
REGEX_MATCH({title},"^[a-zA-Z0-9$@!%*?&#^-_. +]+") ,

where if the contents of the field {title} has some other language set like Japanese , or CHinese or Arabic or French etc , it would return a FALSE else TRUE.


dilipborad
Forum|alt.badge.img+20
  • Brainy
  • 216 replies
  • February 6, 2021
Hari_Balasubram wrote:

Is there a way to use the REGEX function to determine if the contents of a text field are in ENGLISH language ONLY.

something like
REGEX_MATCH({title},"^[a-zA-Z0-9$@!%*?&#^-_. +]+") ,

where if the contents of the field {title} has some other language set like Japanese , or CHinese or Arabic or French etc , it would return a FALSE else TRUE.


Yes, it should be working.
Check this regex101: build, debug and share regex


Russell_Bishop1
Forum|alt.badge.img+19
Jason11 wrote:

Validating email addresses

Similar to the other example above, you can use REGEX_MATCH() to validate a list of email addresses as well.

REGEX_MATCH( {Email address}, "(\\W|^)[\\w.\\-]{0,25}@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}(\\W|$)")

This regex handles basic validation for email addresses:

  • Starts with a username that is composed of letters, numbers, or characters like underscore, period or dash.
  • An at-sign (@).
  • A domain name, which is a string of letters, numbers, or characters like underscore, period, or dash, followed by a period, followed by another string of letters, numbers, or characters like underscore, period or dash.

Email validation is a very complicated subject, but this regex is a good starting point for most use cases!


This is so helpful for preventing Automations throwing errors!


Forum|alt.badge.img+2
  • Participating Frequently
  • 6 replies
  • February 18, 2021

Is it possible to extract an entire URL from a larger block of text using this formula?


Justin_Barrett
Forum|alt.badge.img+20
ZYGOTE_DEVELOPM wrote:

Is it possible to extract an entire URL from a larger block of text using this formula?


Not that exact formula, but if you do a web search for “regex extract url” there are lots of options that come up. However, they would need to be tested to find one that works with the flavor of Regex used by Airtable. Sadly I don’t have time to test, so I can’t offer a more specific solution.


Forum|alt.badge.img+3

Hi,
Wondering if REGEX_REPLACE could handle replacing mid-sentence new lines with “”. The solutions I have use look behind or possession (neither supported by RE2), or replacement with capture groups (not supported by REGEX_REPLACE sfaik).


Forum|alt.badge.img+3
Jason_Woodruff1 wrote:

Hi,
Wondering if REGEX_REPLACE could handle replacing mid-sentence new lines with “”. The solutions I have use look behind or possession (neither supported by RE2), or replacement with capture groups (not supported by REGEX_REPLACE sfaik).


ha, ha, capture groups do work!

REGEX_REPLACE(Text,"(\\S)[ \\t]*(?:\\r\n|\n)[ \\t]*(\\S)","$1 $2")


  • New Participant
  • 1 reply
  • February 25, 2021

Hm. I seem to be getting some unexpected results. Could be my regex game is weak.

Formula:

REGEX_REPLACE(REGEX_REPLACE('-- --  ---', '\\s+','x'), '-+','y')

Expectation:

yxyxy

Result:

yxy xy

Or with the more concise

    REGEX_REPLACE('-- --  ---', '\\s+','x')

Result:

   --x-- x---

kuovonne
Forum|alt.badge.img+27
  • Brainy
  • 6002 replies
  • February 26, 2021

@Greg_Joyce I get your “expected result” when I create a formula field with your formula.


Can’t get REGEX_EXTRACT to work as expected when trying to extract multiple things from a string. My goal is to extract Initials from a Name field. So “John Smith” would extract “JS”.

I’ve verified all these Regex works with an online tester, but can’t get them to work in Airtable:

REGEX_EXTRACT(Name,"\b(\w)")

REGEX_EXTRACT(Name,"(\b[a-zA-Z])[a-zA-Z]* ?")

REGEX_EXTRACT(Name,"([[:alpha:]])[[:alpha:]]+")

Value of Name:
“John Smith”

Current results:
“J”

Expected results:
“JS”


dilipborad
Forum|alt.badge.img+20
  • Brainy
  • 216 replies
  • March 2, 2021
Matt_Coppernoll wrote:

Can’t get REGEX_EXTRACT to work as expected when trying to extract multiple things from a string. My goal is to extract Initials from a Name field. So “John Smith” would extract “JS”.

I’ve verified all these Regex works with an online tester, but can’t get them to work in Airtable:

REGEX_EXTRACT(Name,"\b(\w)")

REGEX_EXTRACT(Name,"(\b[a-zA-Z])[a-zA-Z]* ?")

REGEX_EXTRACT(Name,"([[:alpha:]])[[:alpha:]]+")

Value of Name:
“John Smith”

Current results:
“J”

Expected results:
“JS”


You need to write 2 expressions and merge them like this.
REGEX_EXTRACT({Name},"^(.{1})") & REGEX_EXTRACT({Name},"\\s+([a-zA-Z]){1}")

I hope this is works for you.


kuovonne
Forum|alt.badge.img+27
  • Brainy
  • 6002 replies
  • March 2, 2021
Matt_Coppernoll wrote:

Can’t get REGEX_EXTRACT to work as expected when trying to extract multiple things from a string. My goal is to extract Initials from a Name field. So “John Smith” would extract “JS”.

I’ve verified all these Regex works with an online tester, but can’t get them to work in Airtable:

REGEX_EXTRACT(Name,"\b(\w)")

REGEX_EXTRACT(Name,"(\b[a-zA-Z])[a-zA-Z]* ?")

REGEX_EXTRACT(Name,"([[:alpha:]])[[:alpha:]]+")

Value of Name:
“John Smith”

Current results:
“J”

Expected results:
“JS”


In Airtable, REGEX_EXTRACT extracts only the first match, not all matches.


I’d just REGEX REPLACE for this to match everything but the first letter and replace it with “”. As follows:

REGEX_REPLACE({Field},"[a-z]+[a-z\s]","")

This matches whole words, minus the first character
REGEX_REPLACE({Field},’(\B\w)*’,’’)

But yes, none of the REGEX commands use the global flag and will stop at the first match


  • New Participant
  • 1 reply
  • May 5, 2021

I’m getting an error using REGEX_REPLACE even though the formula works on regex101. I’m trying to add or to the last item in a series.

My formula in Airtable:
REGEX_REPLACE(String,"/,(?=[^,]*$)",", or")

Any ideas?


Justin_Barrett
Forum|alt.badge.img+20
jj23 wrote:

I’m getting an error using REGEX_REPLACE even though the formula works on regex101. I’m trying to add or to the last item in a series.

My formula in Airtable:
REGEX_REPLACE(String,"/,(?=[^,]*$)",", or")

Any ideas?


Welcome to the community, @jj23! :grinning_face_with_big_eyes: When testing on Regex101, you need to use the Golang variant (originally labeled “RE2” when these functions were first released), which apparently doesn’t support the “?=” syntax. The “PCRE2” variant where this does work is apparently not the same variant of RE2 as the Golang version.

As a side note, you don’t need to copy the opening forward slash (or, for Golang, the graves). Those surrounding symbols are only required when pasting the regex directly into the relevant scripting language. For Airtable, you’re passing the regex as a string, so it doesn’t need the extra formatting characters.

After a little fiddling, I got this to work:

REGEX_REPLACE(String, ",([^,]*$)", ", or$1")


Russell_Bishop1
Forum|alt.badge.img+19

Absolutely glorious @Justin_Barrett , thank you. Now my “Artists” field displays beautifully:

“Sunchase, NickBee & Teddy Killerz”


Justin_Barrett
Forum|alt.badge.img+20
Russell_Bishop1 wrote:

Absolutely glorious @Justin_Barrett , thank you. Now my “Artists” field displays beautifully:

“Sunchase, NickBee & Teddy Killerz”


@Russell_Bishop1 Thanks, but props go to @jj23 for working out most of the regex. My cleanup work was minimal by comparison.