Skip to main content

Hi!, I am trying to use the REGEX formula, but I cannot use it propertly. I need to extract names from a string. To make it easier the names are written after a hashtag #. This are examples


#TOM write the essay of a client

Complete the packing list for #Peter


From an old post I managed to use this formula REGEX_EXTRACT({Transaction Name}, “#^\s]+”))

written by Justin Barret


This formula kind of works, but is giving me the answer with the hasthag in front

#TOM

#Peter


I needed it without the hashtag. Any clue about how to do it?

I am getting lots of trouble to understand the sintaxis of the REGEX formulas.


Thanks!!

Hey @Tristan-ARG!


The regex behavior in Airtable’s formulas is disappointing, and if you’re trying to write more in-depth regex patterns, you will quickly run up against issues.


If you want to skip straight to the fixed formula and pattern, here’s a shortcut lol.


If you want a bit more of a walkthrough on how your pattern is behaving, there’s a bit below kinda explaining it.



How Your Regex Pattern Is Behaving


The first thing to note, is that your regex formula is actually working as intended.


So, # ^\s]+ is your pattern.

Let’s break it down.


Let’s say we have this string that we want to extract from:

Hello! I want to circulate a hashtag such as #Tom!




  1. #


    • ⭐ This will literally match the actual character #.

    • This is why your extract is pulling in the hashtag.




What We’ve Extracted So Far: “#”






  1. ^\\s]


    • ⭐ This will match any character only if it is not a whitespace character.

    • A quick side note here. Airtable’s regex syntax requires two \\.

    • When you have something contained in square brackets, it will generally mean to look for a collection of characters that you have inside of the brackets.

      • e.g.
        Similarly, rbsw] will look for any character as long as it is either b, s, or w.



    • Now, in regex, the carrot (^) denotes when you don’t want something.

      • e.g. >^atn] will match any character that is anything except for a, t, and n.






What We’ve Extracted So Far:#T






  1. +

    • This will take the token you put it next to, and repeat its behavior. (More context below). In this case, it will repeat the behavior of the t^\\s].

    • This is called a quantifier. There are a few types, but this one will only return a match if it returns one or more times. This is distinct from ? which indicates zero or one, and * which indicates zero or more times.

      • e.g. The pattern \w? will look for a single letter, assuming it even exists.

        The pattern \w* will look for a single letter, and if it finds one, it will match everything next to it until it runs into something that is not a letter. (We call this a greedy quantifier, as it will match as many things as it can consecutively).






What We’ve Extracted So Far:#Tom





Final Solution


You’ll want to transform the string that the formula returns to you to remove the hashtag.


This formula should do it:



IF(
{Transaction Name},
SUBSTITUTE(
REGEX_EXTRACT(
{Transaction Name},
“#n^\s]+”
),
"#",
""
)
)

This formula is built with this control flow in mind:



  1. If the {Transaction Name} field has a value:

    • Return the regex extract from the field, then

    • Take the “#” character, and remove it from the extracted string.



  2. If the {Transaction Name} field has no value:

    • Do nothing. (Will return blank)





Wow!! this is amazing, thanks for taking time to write such a detailed and well organized reply!. It worked like a charm!. I actually was trying to understand the syntaxis and the only place where I actually understood it was in this post hehe. Thanks again!!!


Reply