Skip to main content



Hello everyone! We recently launched some exciting new additions to our formula field with three new regex (regular expression) functions:





  1. REGEX_MATCH(string, regex)


  2. REGEX_EXTRACT(string, regex)


  3. REGEX_REPLACE(string, regex, replacement)




These functions can be used to match character combinations within text strings. For those interested, Airtable’s REGEX functions are implemented using the RE2 regular expression library.



You can learn more about these functions in this article or check out some example use cases shared in the posts below.

Validating and normalizing phone numbers



If you need to validate a set of phone numbers against specific criteria, you can use REGEX_MATCH() like in the formula below.





REGEX_MATCH( {Possible Phone Number}, '^([+]?[0-9]( |-)?)?(\\(?[0-9]{3}\\)?|[0-9]{3})( |-)?([0-9]{3}( |-)?[0-9]{4}|[a-zA-Z0-9]{7})$')





This regular expression handles a number of different phone number formats:





  • Country codes


  • Using dashes or spaces as delimiters


  • Parentheses around the area code


  • Using letters instead of numbers






Next, if it’s a valid phone number, we can normalize it using REGEX_REPLACE to make it easier to work with in the rest of our Airtable base:





IF({Is Valid Phone Number?}, UPPER(REGEX_REPLACE( {Possible Phone Number}, ' ^A-Za-z0-9]', '')), ERROR('Invalid phone number'))





If the phone number passed validation, we normalize it by using REGEX_REPLACE() to replace non-alphanumeric characters with an empty string, resulting in just the dialable digits. We also use the UPPER() formula to make the casing consistent.



Normalizing phone numbers into a prettier format is an exercise left to the reader 🙂


Validating email addresses



Similar to the other example above, you can use REGEX_MATCH() to validate a list of email addresses as well.





REGEX_MATCH( {Email address}, "(\\W|^)[\\w.\\-]{0,25}@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}(\\W|$)")





This regex handles basic validation for email addresses:





  • Starts with a username that is composed of letters, numbers, or characters like underscore, period or dash.


  • An at-sign (@).


  • A domain name, which is a string of letters, numbers, or characters like underscore, period, or dash, followed by a period, followed by another string of letters, numbers, or characters like underscore, period or dash.




Email validation is a very complicated subject, but this regex is a good starting point for most use cases!




Extracting domain from URLs



As one more example, you can use REGEX_EXTRACT to extract the domain name from a URL field:





REGEX_EXTRACT( {URL}, '^(?:https?:\\/\\/)?(?:[^@\n]+@)?(?:www\\.)?([^:\\/\n?]+)')





URLs start with ‘http’ or ‘https’, possibly a username followed by an ‘@’, the domain, and then the rest of the URL. Here, we strip off the subdomain if it’s ‘www’, but otherwise preserve it:




Regular expressions are a powerful capability for working with textual data, and these examples are just a taste for what’s possible. Stack Overflow has many highly-voted examples of usecases for regexes, and regex101.com’s RE2 flavor is a great way of testing your regexes that is compatible with Airtable.



And lastly, for those musically inclined, I’m just going to leave this here…




Thanks, @Jason! Very exciting stuff, and thanks for all the examples! 🙂



I’ve discovered that this is a great REGEX function for extracting email addresses from a string of text:


eA-Za-z0-9._%+-]+@+A-Za-z0-9.-]+\.\A-Za-z]{2,63}



However, as discussed in this thread, the REGEX_EXTRACT function only returns the first matching string, instead of all matching strings from a field.



Is this something that the Airtable team is planning to address in the future?



In the meantime, this tool from MiniExtensions allows for extracting multiple strings with REGEX:




Thanks, @Jason! Very exciting stuff, and thanks for all the examples! 🙂



I’ve discovered that this is a great REGEX function for extracting email addresses from a string of text:


eA-Za-z0-9._%+-]+@+A-Za-z0-9.-]+\.\A-Za-z]{2,63}



However, as discussed in this thread, the REGEX_EXTRACT function only returns the first matching string, instead of all matching strings from a field.



Is this something that the Airtable team is planning to address in the future?



In the meantime, this tool from MiniExtensions allows for extracting multiple strings with REGEX:








This is precisely why none of these functions should have been developed in the first place.



The right approach to introducing RegEx() and other less obvious but needed functions such as Split() and Math() is to introduce a single function that allows us to call arbitrary Script Blocks that perform computations and return values.



If Airtable would do this – and only this – a vast number of functions would begin to emerge. There would be aftermarket function libraries that you plugin and use. The functions would be able to use all of the power of javascript functionality including every aspect of RegEx(), arrays, and multi-pass processing. There would be no limitations to what we might create and distribute to other users.



It’s great to see that Airtable is always chugging ahead, but it’s unfortunate they haven’t created a universal ability to leverage that which already exists - a fully-baked javascript environment without limitations.



I just don’t get it - they could look like geniuses or they could do something that is once-again, half-baked and extremely limited.


Thanks,


Now it adds more power/command in search and replace data.



These things are also working fine with API. while we fetch data from AirTable.


Now we have less complexity in search records from the table.



Previously :


filterByFormula=AND({FIELD_NAME}=“FULL_VALUE”)


But if there is a problem with this. Because it matches the exact string only.



Now :


REGEX_MATCH({FIELD_NAME}, ‘PARTIAL_VALUE’)


If you add any word that contains that field, that’s also matches and returns in the result.


Thanks,


Now it adds more power/command in search and replace data.



These things are also working fine with API. while we fetch data from AirTable.


Now we have less complexity in search records from the table.



Previously :


filterByFormula=AND({FIELD_NAME}=“FULL_VALUE”)


But if there is a problem with this. Because it matches the exact string only.



Now :


REGEX_MATCH({FIELD_NAME}, ‘PARTIAL_VALUE’)


If you add any word that contains that field, that’s also matches and returns in the result.




This is a fine improvement and will certainly get some use (even by me), but I can think of many RegEx queries that I cannot perform given the method in which they implement this.



I think it’s good to provide users with “helper” functions that soften the complexity of RegEx, but why not allow the primitive itself as well?


Is there a way to use the REGEX function to determine if the contents of a text field are in ENGLISH language ONLY.



something like


REGEX_MATCH({title},"^[a-zA-Z0-9$@!%*?&#^-_. +]+") ,



where if the contents of the field {title} has some other language set like Japanese , or CHinese or Arabic or French etc , it would return a FALSE else TRUE.


Is there a way to use the REGEX function to determine if the contents of a text field are in ENGLISH language ONLY.



something like


REGEX_MATCH({title},"^[a-zA-Z0-9$@!%*?&#^-_. +]+") ,



where if the contents of the field {title} has some other language set like Japanese , or CHinese or Arabic or French etc , it would return a FALSE else TRUE.




Yes, it should be working.


Check this regex101: build, debug and share regex


Validating email addresses



Similar to the other example above, you can use REGEX_MATCH() to validate a list of email addresses as well.





REGEX_MATCH( {Email address}, "(\\W|^)[\\w.\\-]{0,25}@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}(\\W|$)")





This regex handles basic validation for email addresses:





  • Starts with a username that is composed of letters, numbers, or characters like underscore, period or dash.


  • An at-sign (@).


  • A domain name, which is a string of letters, numbers, or characters like underscore, period, or dash, followed by a period, followed by another string of letters, numbers, or characters like underscore, period or dash.




Email validation is a very complicated subject, but this regex is a good starting point for most use cases!




This is so helpful for preventing Automations throwing errors!


Is it possible to extract an entire URL from a larger block of text using this formula?


Is it possible to extract an entire URL from a larger block of text using this formula?


Not that exact formula, but if you do a web search for “regex extract url” there are lots of options that come up. However, they would need to be tested to find one that works with the flavor of Regex used by Airtable. Sadly I don’t have time to test, so I can’t offer a more specific solution.


Hi,


Wondering if REGEX_REPLACE could handle replacing mid-sentence new lines with “”. The solutions I have use look behind or possession (neither supported by RE2), or replacement with capture groups (not supported by REGEX_REPLACE sfaik).


Hi,


Wondering if REGEX_REPLACE could handle replacing mid-sentence new lines with “”. The solutions I have use look behind or possession (neither supported by RE2), or replacement with capture groups (not supported by REGEX_REPLACE sfaik).


ha, ha, capture groups do work!



REGEX_REPLACE(Text,"(\\S)\ \\t]*(?:\\r\n|\n)| \\t]*(\\S)","$1 $2")


Hm. I seem to be getting some unexpected results. Could be my regex game is weak.



Formula:



REGEX_REPLACE(REGEX_REPLACE('-- --  ---', '\\s+','x'), '-+','y')



Expectation:



yxyxy



Result:



yxy xy



Or with the more concise



    REGEX_REPLACE('-- --  ---', '\\s+','x')



Result:



   --x-- x---

@Greg_Joyce I get your “expected result” when I create a formula field with your formula.




Can’t get REGEX_EXTRACT to work as expected when trying to extract multiple things from a string. My goal is to extract Initials from a Name field. So “John Smith” would extract “JS”.



I’ve verified all these Regex works with an online tester, but can’t get them to work in Airtable:



REGEX_EXTRACT(Name,"\b(\w)")



REGEX_EXTRACT(Name,"(\bRa-zA-Z]),a-zA-Z]* ?")



REGEX_EXTRACT(Name,"(XT:alpha:]])"(:alpha:]]+")



Value of Name:


“John Smith”



Current results:


“J”



Expected results:


“JS”


Can’t get REGEX_EXTRACT to work as expected when trying to extract multiple things from a string. My goal is to extract Initials from a Name field. So “John Smith” would extract “JS”.



I’ve verified all these Regex works with an online tester, but can’t get them to work in Airtable:



REGEX_EXTRACT(Name,"\b(\w)")



REGEX_EXTRACT(Name,"(\bRa-zA-Z]),a-zA-Z]* ?")



REGEX_EXTRACT(Name,"(XT:alpha:]])"(:alpha:]]+")



Value of Name:


“John Smith”



Current results:


“J”



Expected results:


“JS”


You need to write 2 expressions and merge them like this.


REGEX_EXTRACT({Name},"^(.{1})") & REGEX_EXTRACT({Name},"\\s+(Ea-zA-Z]){1}")



I hope this is works for you.


Can’t get REGEX_EXTRACT to work as expected when trying to extract multiple things from a string. My goal is to extract Initials from a Name field. So “John Smith” would extract “JS”.



I’ve verified all these Regex works with an online tester, but can’t get them to work in Airtable:



REGEX_EXTRACT(Name,"\b(\w)")



REGEX_EXTRACT(Name,"(\bRa-zA-Z]),a-zA-Z]* ?")



REGEX_EXTRACT(Name,"(XT:alpha:]])"(:alpha:]]+")



Value of Name:


“John Smith”



Current results:


“J”



Expected results:


“JS”


In Airtable, REGEX_EXTRACT extracts only the first match, not all matches.


I’d just REGEX REPLACE for this to match everything but the first letter and replace it with “”. As follows:



REGEX_REPLACE({Field},"ea-z]+[a-z\s]","")



This matches whole words, minus the first character


REGEX_REPLACE({Field},’(\B\w)*’,’’)



But yes, none of the REGEX commands use the global flag and will stop at the first match


I’m getting an error using REGEX_REPLACE even though the formula works on regex101. I’m trying to add or to the last item in a series.



My formula in Airtable:


REGEX_REPLACE(String,"/,(?=,^,]*$)",", or")



Any ideas?


I’m getting an error using REGEX_REPLACE even though the formula works on regex101. I’m trying to add or to the last item in a series.



My formula in Airtable:


REGEX_REPLACE(String,"/,(?=,^,]*$)",", or")



Any ideas?


Welcome to the community, @jj23! :grinning_face_with_big_eyes: When testing on Regex101, you need to use the Golang variant (originally labeled “RE2” when these functions were first released), which apparently doesn’t support the “?=” syntax. The “PCRE2” variant where this does work is apparently not the same variant of RE2 as the Golang version.



As a side note, you don’t need to copy the opening forward slash (or, for Golang, the graves). Those surrounding symbols are only required when pasting the regex directly into the relevant scripting language. For Airtable, you’re passing the regex as a string, so it doesn’t need the extra formatting characters.



After a little fiddling, I got this to work:



REGEX_REPLACE(String, ",(E^,]*$)", ", or$1")




Absolutely glorious @Justin_Barrett , thank you. Now my “Artists” field displays beautifully:



“Sunchase, NickBee & Teddy Killerz”


Absolutely glorious @Justin_Barrett , thank you. Now my “Artists” field displays beautifully:



“Sunchase, NickBee & Teddy Killerz”


@Russell_Bishop1 Thanks, but props go to @jj23 for working out most of the regex. My cleanup work was minimal by comparison.


Reply