Help

Regex extract pattern

Topic Labels: Formulas
Solved
Jump to Solution
1570 2
cancel
Showing results for 
Search instead for 
Did you mean: 
Ricardo
7 - App Architect
7 - App Architect

Depsite @Justin_Barrett’s help last time, I still cannot figure out simple regex patterns.

This time, I’m combining in a roll-up year and value and am trying to extract the value based on a date.

IF(
  AND(Date, Values),
  REGEX_EXTRACT(
    Values,
    "(?:" & DATETIME_FORMAT(Date,"YYYY") & " )([^,])"))

For example
Values: 2023 530,2022 480,
Date: 2022
Result: 480

Preferably each string would be Year:Value,Year:Value… but I was afraid the double dot may me some reserved character and dropped it.

1 Solution

Accepted Solutions
Justin_Barrett
18 - Pluto
18 - Pluto

Your formula is pretty close. If you’re looking to extract the “480” from the example, you would need to add an asterisk in your second group, like this:

IF(
  AND(Date, Values),
  REGEX_EXTRACT(
    Values,
    "(?:" & DATETIME_FORMAT(Date,"YYYY") & " )([^,]*)"
  )
)

Here’s why that’s important. The original tokens you used—[^,]—are perfect for extracting a single character that’s not a comma, but that would only extract a single character. Adding the asterisk after the closing square brace effectively says to repeat the previous token as many times as possible; i.e. capture as many non-comma characters as you can. Once it hits the comma, it’ll stop.

The colon (double dot) character is used as part of some tokens in regular expressions—like the ?: prefix in a group to indicate that the group contents should be matched but not captured—but any reserved token can be used safely as long as it is escaped. To escape any token, add a backslash immediately before it. (In some contexts this would just be a single backslash, but because the backslash itself is also used to escape characters in strings, this means that you need two backslashes when building a regular expression string in Airtable: the first as the escape character in the string itself, and the second being the backslash that becomes an escape character in the regular expression.)

Yeah, it’s a little quirky, but it’s still my preferred way to do complex string parsing in Airtable.

See Solution in Thread

2 Replies 2

Hi @Ricardo, I think you don’t need Regex for that (Airtable’s implementation of Regex is questionable anyways, no flags for instance). I’ve written a formula that extracts the values as well. You would just have to add the entire block of code behind the & for every other year besides 2022 and 2023.

MID(String, FIND("2023", String) + 4,(SEARCH(",",String,FIND("2023", String)) - (FIND("2023", String) + 4)))& " "&MID(String, FIND("2022", String) + 4,(SEARCH(",",String,FIND("2022", String)) - (FIND("2022", String) + 4)))

In this example, “String” is the name of the field with the following content: “2023 530,2022 480,”

Let me know if you have any questions!

PS: This is now separated by a whitespace, but you can of course also comma-separate those values if you want

Justin_Barrett
18 - Pluto
18 - Pluto

Your formula is pretty close. If you’re looking to extract the “480” from the example, you would need to add an asterisk in your second group, like this:

IF(
  AND(Date, Values),
  REGEX_EXTRACT(
    Values,
    "(?:" & DATETIME_FORMAT(Date,"YYYY") & " )([^,]*)"
  )
)

Here’s why that’s important. The original tokens you used—[^,]—are perfect for extracting a single character that’s not a comma, but that would only extract a single character. Adding the asterisk after the closing square brace effectively says to repeat the previous token as many times as possible; i.e. capture as many non-comma characters as you can. Once it hits the comma, it’ll stop.

The colon (double dot) character is used as part of some tokens in regular expressions—like the ?: prefix in a group to indicate that the group contents should be matched but not captured—but any reserved token can be used safely as long as it is escaped. To escape any token, add a backslash immediately before it. (In some contexts this would just be a single backslash, but because the backslash itself is also used to escape characters in strings, this means that you need two backslashes when building a regular expression string in Airtable: the first as the escape character in the string itself, and the second being the backslash that becomes an escape character in the regular expression.)

Yeah, it’s a little quirky, but it’s still my preferred way to do complex string parsing in Airtable.