Help

Extract text from a string - REGEX

Topic Labels: Formulas
Solved
Jump to Solution
18668 2
cancel
Showing results for 
Search instead for 
Did you mean: 
Tristan-ARG
6 - Interface Innovator
6 - Interface Innovator

Hi!, I am trying to use the REGEX formula, but I cannot use it propertly. I need to extract names from a string. To make it easier the names are written after a hashtag #. This are examples

#TOM write the essay of a client
Complete the packing list for #Peter

From an old post I managed to use this formula REGEX_EXTRACT({Transaction Name}, “#[^\s]+”))
written by Justin Barret

This formula kind of works, but is giving me the answer with the hasthag in front
#TOM
#Peter

I needed it without the hashtag. Any clue about how to do it?
I am getting lots of trouble to understand the sintaxis of the REGEX formulas.

Thanks!!

1 Solution

Accepted Solutions
Ben_Young1
11 - Venus
11 - Venus

Hey @Tristan-ARG!

The regex behavior in Airtable’s formulas is disappointing, and if you’re trying to write more in-depth regex patterns, you will quickly run up against issues.

If you want to skip straight to the fixed formula and pattern, here’s a shortcut lol.

If you want a bit more of a walkthrough on how your pattern is behaving, there’s a bit below kinda explaining it.

How Your Regex Pattern Is Behaving

The first thing to note, is that your regex formula is actually working as intended.

So, #[^\s]+ is your pattern.
Let’s break it down.

Let’s say we have this string that we want to extract from:
Hello! I want to circulate a hashtag such as #Tom!

  1. #
    • :star: This will literally match the actual character #.
    • This is why your extract is pulling in the hashtag.

What We’ve Extracted So Far: “#”


  1. [^\\s]
    • :star: This will match any character only if it is not a whitespace character.
    • A quick side note here. Airtable’s regex syntax requires two \\.
    • When you have something contained in square brackets, it will generally mean to look for a collection of characters that you have inside of the brackets.
      • e.g. [a-z] will look for lowercase characters in the range of the entire English alphabet.
        Similarly, [bsw] will look for any character as long as it is either b, s, or w.
    • Now, in regex, the carrot (^) denotes when you don’t want something.
      • e.g. [^atn] will match any character that is anything except for a, t, and n.

What We’ve Extracted So Far:#T


  1. +
    • This will take the token you put it next to, and repeat its behavior. (More context below). In this case, it will repeat the behavior of the [^\\s].
    • This is called a quantifier. There are a few types, but this one will only return a match if it returns one or more times. This is distinct from ? which indicates zero or one, and * which indicates zero or more times.
      • e.g. The pattern \w? will look for a single letter, assuming it even exists.
        The pattern \w* will look for a single letter, and if it finds one, it will match everything next to it until it runs into something that is not a letter. (We call this a greedy quantifier, as it will match as many things as it can consecutively).

What We’ve Extracted So Far:#Tom


Final Solution

You’ll want to transform the string that the formula returns to you to remove the hashtag.

This formula should do it:


IF(
    {Transaction Name},
    SUBSTITUTE(
        REGEX_EXTRACT(
            {Transaction Name},
            “#[^\s]+”
        ),
        "#",
        ""
    )
)

This formula is built with this control flow in mind:

  1. If the {Transaction Name} field has a value:
    • Return the regex extract from the field, then
    • Take the “#” character, and remove it from the extracted string.
  2. If the {Transaction Name} field has no value:
    • Do nothing. (Will return blank)

See Solution in Thread

2 Replies 2
Ben_Young1
11 - Venus
11 - Venus

Hey @Tristan-ARG!

The regex behavior in Airtable’s formulas is disappointing, and if you’re trying to write more in-depth regex patterns, you will quickly run up against issues.

If you want to skip straight to the fixed formula and pattern, here’s a shortcut lol.

If you want a bit more of a walkthrough on how your pattern is behaving, there’s a bit below kinda explaining it.

How Your Regex Pattern Is Behaving

The first thing to note, is that your regex formula is actually working as intended.

So, #[^\s]+ is your pattern.
Let’s break it down.

Let’s say we have this string that we want to extract from:
Hello! I want to circulate a hashtag such as #Tom!

  1. #
    • :star: This will literally match the actual character #.
    • This is why your extract is pulling in the hashtag.

What We’ve Extracted So Far: “#”


  1. [^\\s]
    • :star: This will match any character only if it is not a whitespace character.
    • A quick side note here. Airtable’s regex syntax requires two \\.
    • When you have something contained in square brackets, it will generally mean to look for a collection of characters that you have inside of the brackets.
      • e.g. [a-z] will look for lowercase characters in the range of the entire English alphabet.
        Similarly, [bsw] will look for any character as long as it is either b, s, or w.
    • Now, in regex, the carrot (^) denotes when you don’t want something.
      • e.g. [^atn] will match any character that is anything except for a, t, and n.

What We’ve Extracted So Far:#T


  1. +
    • This will take the token you put it next to, and repeat its behavior. (More context below). In this case, it will repeat the behavior of the [^\\s].
    • This is called a quantifier. There are a few types, but this one will only return a match if it returns one or more times. This is distinct from ? which indicates zero or one, and * which indicates zero or more times.
      • e.g. The pattern \w? will look for a single letter, assuming it even exists.
        The pattern \w* will look for a single letter, and if it finds one, it will match everything next to it until it runs into something that is not a letter. (We call this a greedy quantifier, as it will match as many things as it can consecutively).

What We’ve Extracted So Far:#Tom


Final Solution

You’ll want to transform the string that the formula returns to you to remove the hashtag.

This formula should do it:


IF(
    {Transaction Name},
    SUBSTITUTE(
        REGEX_EXTRACT(
            {Transaction Name},
            “#[^\s]+”
        ),
        "#",
        ""
    )
)

This formula is built with this control flow in mind:

  1. If the {Transaction Name} field has a value:
    • Return the regex extract from the field, then
    • Take the “#” character, and remove it from the extracted string.
  2. If the {Transaction Name} field has no value:
    • Do nothing. (Will return blank)
Tristan-ARG
6 - Interface Innovator
6 - Interface Innovator

Wow!! this is amazing, thanks for taking time to write such a detailed and well organized reply!. It worked like a charm!. I actually was trying to understand the syntaxis and the only place where I actually understood it was in this post hehe. Thanks again!!!