Help

The Community will be temporarily unavailable starting on Friday February 28. We’ll be back as soon as we can! To learn more, check out our Announcements blog post.

Extract text from a string - REGEX

Topic Labels: Formulas
Solved
Jump to Solution
22342 2
cancel
Showing results for 
Search instead for 
Did you mean: 
Tristan-ARG
6 - Interface Innovator
6 - Interface Innovator

Hi!, I am trying to use the REGEX formula, but I cannot use it propertly. I need to extract names from a string. To make it easier the names are written after a hashtag #. This are examples

#TOM write the essay of a client
Complete the packing list for #Peter

From an old post I managed to use this formula REGEX_EXTRACT({Transaction Name}, “#[^\s]+”))
written by Justin Barret

This formula kind of works, but is giving me the answer with the hasthag in front
#TOM
#Peter

I needed it without the hashtag. Any clue about how to do it?
I am getting lots of trouble to understand the sintaxis of the REGEX formulas.

Thanks!!

1 Solution

Accepted Solutions
Ben_Young1
11 - Venus
11 - Venus

Hey @Tristan-ARG!

The regex behavior in Airtable’s formulas is disappointing, and if you’re trying to write more in-depth regex patterns, you will quickly run up against issues.

If you want to skip straight to the fixed formula and pattern, here’s a shortcut lol.

If you want a bit more of a walkthrough on how your pattern is behaving, there’s a bit below kinda explaining it.

How Your Regex Pattern Is Behaving

The first thing to note, is that your regex formula is actually working as intended.

So, #[^\s]+ is your pattern.
Let’s break it down.

Let’s say we have this string that we want to extract from:
Hello! I want to circulate a hashtag such as #Tom!

  1. #
    • :star: This will literally match the actual character #.
    • This is why your extract is pulling in the hashtag.

What We’ve Extracted So Far: “#”


  1. [^\\s]
    • :star: This will match any character only if it is not a whitespace character.
    • A quick side note here. Airtable’s regex syntax requires two \\.
    • When you have something contained in square brackets, it will generally mean to look for a collection of characters that you have inside of the brackets.
      • e.g. [a-z] will look for lowercase characters in the range of the entire English alphabet.
        Similarly, [bsw] will look for any character as long as it is either b, s, or w.
    • Now, in regex, the carrot (^) denotes when you don’t want something.
      • e.g. [^atn] will match any character that is anything except for a, t, and n.

What We’ve Extracted So Far:#T


  1. +
    • This will take the token you put it next to, and repeat its behavior. (More context below). In this case, it will repeat the behavior of the [^\\s].
    • This is called a quantifier. There are a few types, but this one will only return a match if it returns one or more times. This is distinct from ? which indicates zero or one, and * which indicates zero or more times.
      • e.g. The pattern \w? will look for a single letter, assuming it even exists.
        The pattern \w* will look for a single letter, and if it finds one, it will match everything next to it until it runs into something that is not a letter. (We call this a greedy quantifier, as it will match as many things as it can consecutively).

What We’ve Extracted So Far:#Tom


Final Solution

You’ll want to transform the string that the formula returns to you to remove the hashtag.

This formula should do it:


IF(
    {Transaction Name},
    SUBSTITUTE(
        REGEX_EXTRACT(
            {Transaction Name},
            “#[^\s]+”
        ),
        "#",
        ""
    )
)

This formula is built with this control flow in mind:

  1. If the {Transaction Name} field has a value:
    • Return the regex extract from the field, then
    • Take the “#” character, and remove it from the extracted string.
  2. If the {Transaction Name} field has no value:
    • Do nothing. (Will return blank)

See Solution in Thread

2 Replies 2
Ben_Young1
11 - Venus
11 - Venus

Hey @Tristan-ARG!

The regex behavior in Airtable’s formulas is disappointing, and if you’re trying to write more in-depth regex patterns, you will quickly run up against issues.

If you want to skip straight to the fixed formula and pattern, here’s a shortcut lol.

If you want a bit more of a walkthrough on how your pattern is behaving, there’s a bit below kinda explaining it.

How Your Regex Pattern Is Behaving

The first thing to note, is that your regex formula is actually working as intended.

So, #[^\s]+ is your pattern.
Let’s break it down.

Let’s say we have this string that we want to extract from:
Hello! I want to circulate a hashtag such as #Tom!

  1. #
    • :star: This will literally match the actual character #.
    • This is why your extract is pulling in the hashtag.

What We’ve Extracted So Far: “#”


  1. [^\\s]
    • :star: This will match any character only if it is not a whitespace character.
    • A quick side note here. Airtable’s regex syntax requires two \\.
    • When you have something contained in square brackets, it will generally mean to look for a collection of characters that you have inside of the brackets.
      • e.g. [a-z] will look for lowercase characters in the range of the entire English alphabet.
        Similarly, [bsw] will look for any character as long as it is either b, s, or w.
    • Now, in regex, the carrot (^) denotes when you don’t want something.
      • e.g. [^atn] will match any character that is anything except for a, t, and n.

What We’ve Extracted So Far:#T


  1. +
    • This will take the token you put it next to, and repeat its behavior. (More context below). In this case, it will repeat the behavior of the [^\\s].
    • This is called a quantifier. There are a few types, but this one will only return a match if it returns one or more times. This is distinct from ? which indicates zero or one, and * which indicates zero or more times.
      • e.g. The pattern \w? will look for a single letter, assuming it even exists.
        The pattern \w* will look for a single letter, and if it finds one, it will match everything next to it until it runs into something that is not a letter. (We call this a greedy quantifier, as it will match as many things as it can consecutively).

What We’ve Extracted So Far:#Tom


Final Solution

You’ll want to transform the string that the formula returns to you to remove the hashtag.

This formula should do it:


IF(
    {Transaction Name},
    SUBSTITUTE(
        REGEX_EXTRACT(
            {Transaction Name},
            “#[^\s]+”
        ),
        "#",
        ""
    )
)

This formula is built with this control flow in mind:

  1. If the {Transaction Name} field has a value:
    • Return the regex extract from the field, then
    • Take the “#” character, and remove it from the extracted string.
  2. If the {Transaction Name} field has no value:
    • Do nothing. (Will return blank)
Tristan-ARG
6 - Interface Innovator
6 - Interface Innovator

Wow!! this is amazing, thanks for taking time to write such a detailed and well organized reply!. It worked like a charm!. I actually was trying to understand the syntaxis and the only place where I actually understood it was in this post hehe. Thanks again!!!