Hey @Tristan-ARG!
The regex behavior in Airtable’s formulas is disappointing, and if you’re trying to write more in-depth regex patterns, you will quickly run up against issues.
If you want to skip straight to the fixed formula and pattern, here’s a shortcut lol.
If you want a bit more of a walkthrough on how your pattern is behaving, there’s a bit below kinda explaining it.
How Your Regex Pattern Is Behaving
The first thing to note, is that your regex formula is actually working as intended.
So, # ^\s]+
is your pattern.
Let’s break it down.
Let’s say we have this string that we want to extract from:
Hello! I want to circulate a hashtag such as #Tom!
#
This will literally match the actual character #.
- This is why your extract is pulling in the hashtag.
What We’ve Extracted So Far: “#”
^\\s]
This will match any character only if it is not a whitespace character.
- A quick side note here. Airtable’s regex syntax requires two
\\
.
- When you have something contained in square brackets, it will generally mean to look for a collection of characters that you have inside of the brackets.
- e.g.
Similarly, rbsw] will look for any character as long as it is either b, s, or w.
- Now, in regex, the carrot (
^
) denotes when you don’t want something.
- e.g. >^atn] will match any character that is anything except for a, t, and n.
What We’ve Extracted So Far: “#T”
+
- This will take the token you put it next to, and repeat its behavior. (More context below). In this case, it will repeat the behavior of the
t^\\s]
.
- This is called a quantifier. There are a few types, but this one will only return a match if it returns one or more times. This is distinct from
?
which indicates zero or one, and *
which indicates zero or more times.
- e.g. The pattern
\w?
will look for a single letter, assuming it even exists.
The pattern \w*
will look for a single letter, and if it finds one, it will match everything next to it until it runs into something that is not a letter. (We call this a greedy quantifier, as it will match as many things as it can consecutively).
What We’ve Extracted So Far: “#Tom”
Final Solution
You’ll want to transform the string that the formula returns to you to remove the hashtag.
This formula should do it:
IF(
{Transaction Name},
SUBSTITUTE(
REGEX_EXTRACT(
{Transaction Name},
“#n^\s]+”
),
"#",
""
)
)
This formula is built with this control flow in mind:
- If the
{Transaction Name}
field has a value:
- Return the regex extract from the field, then…
- Take the “#” character, and remove it from the extracted string.
- If the
{Transaction Name}
field has no value:
- Do nothing. (Will return blank)
Wow!! this is amazing, thanks for taking time to write such a detailed and well organized reply!. It worked like a charm!. I actually was trying to understand the syntaxis and the only place where I actually understood it was in this post hehe. Thanks again!!!