Help extracting a string from a rollup (FIND,MID,LEFT,RIGHT)

Ricardo · ‎Jun 28, 2022

Hello community, my record has a date and looks up a roll-up like the below example. I’m trying to figure out the right formula to extract the string below:

Rollup: …5. 04.05 Vila Falo,6. 05.05 Alvero,8. 07.05 Gjirokastra…
Record date: 05.05
Current formula: MID({Rollup}&“”,FIND(DATETIME_FORMAT(Date,‘DD.MM’),{Rollup}&“”)+5,XXX)
Output: Alvero,8. 07.05 Gjirokastra…
Desired output: Alvero

XXX is the part where my monkey brain fails to come up with a solution to determine the length of the string to be extracted. Your advice would be appreciated. :slightly_smiling_face:

Justin_Barrett · ‎Jun 28, 2022

This is where regular expressions come in handy. Regular expressions excel at finding patterns, which is ideal for cases where the length will vary, but you have consistent patterns that you can use to determine where to stop. In this case, you would look for the date matching the record date, then extract all characters until you get to a comma. This worked for me:

IF(
  AND(Date, Rollup),
  REGEX_EXTRACT(
    Rollup,
    "(?:" & SUBSTITUTE(DATETIME_FORMAT(Date,"DD.MM"), ".", "\\.") & " )([^,]*)")
)

(NOTE: I used a single line text field for the “Rollup” field, but it should still work with an actual rollup field as long as the output is the same as you outlined above)

Here’s how this breaks down…

At the start of the regular expression, we first want to find—but not actually extract—the matching date. The syntax for this find-but-ignore process looks like this: (?:XXX), where XXX is the text/pattern you want to find. In this case we want the date followed by a single space. However the “DD.MM” date format that you’re using includes a reserved REGEX token: the period. In REGEX, a period normally matches any single character, but we want to actually match a period. To do this, we need to escape the period by preceding it with a single backward slash. To actually include a backward slash, though, we need another backward slash in front of it.

To mark the group that we actually want to both match and extract, we surround it with parentheses. Inside those parentheses we need to tell the REGEX parser to match as many characters as possible up to—but not including—the next comma. Individual characters are matched by putting them inside square braces. To match anything except those listed characters, you include the caret symbol— ^ —before the rest. To then indicate that the match should be made as many times as possible, you add an asterisk after the square braces. All of this explains this combo at the end of the regular expression: ([^,]*) In short, extract a group that matches any non-comma character zero or more times. Once it hits a comma, it stops.

When it’s all finished, the expression matches—but doesn’t capture—the record date followed by a single space, then actually captures as many non-comma characters as possible.

See Solution in Thread

Mohamed_Swella1 · ‎Jun 28, 2022

Hi @Ricardo ,

Im sure someone else can come up with an easier formula, but this one should work

LEFT(RIGHT(Rollup,{formula copy copy}),FIND(“,”,Rollup,FIND(DATETIME_FORMAT(Date,‘DD.MM’),{Rollup}&“”)+5)-(FIND(DATETIME_FORMAT(Date,‘DD.MM’),{Rollup}&“”)+5))

Justin_Barrett · ‎Jun 28, 2022

This is where regular expressions come in handy. Regular expressions excel at finding patterns, which is ideal for cases where the length will vary, but you have consistent patterns that you can use to determine where to stop. In this case, you would look for the date matching the record date, then extract all characters until you get to a comma. This worked for me:

IF(
  AND(Date, Rollup),
  REGEX_EXTRACT(
    Rollup,
    "(?:" & SUBSTITUTE(DATETIME_FORMAT(Date,"DD.MM"), ".", "\\.") & " )([^,]*)")
)

(NOTE: I used a single line text field for the “Rollup” field, but it should still work with an actual rollup field as long as the output is the same as you outlined above)

Here’s how this breaks down…

At the start of the regular expression, we first want to find—but not actually extract—the matching date. The syntax for this find-but-ignore process looks like this: (?:XXX), where XXX is the text/pattern you want to find. In this case we want the date followed by a single space. However the “DD.MM” date format that you’re using includes a reserved REGEX token: the period. In REGEX, a period normally matches any single character, but we want to actually match a period. To do this, we need to escape the period by preceding it with a single backward slash. To actually include a backward slash, though, we need another backward slash in front of it.

To mark the group that we actually want to both match and extract, we surround it with parentheses. Inside those parentheses we need to tell the REGEX parser to match as many characters as possible up to—but not including—the next comma. Individual characters are matched by putting them inside square braces. To match anything except those listed characters, you include the caret symbol— ^ —before the rest. To then indicate that the match should be made as many times as possible, you add an asterisk after the square braces. All of this explains this combo at the end of the regular expression: ([^,]*) In short, extract a group that matches any non-comma character zero or more times. Once it hits a comma, it stops.

When it’s all finished, the expression matches—but doesn’t capture—the record date followed by a single space, then actually captures as many non-comma characters as possible.

Ricardo · ‎Jun 29, 2022

This is amazing! Thanks for explaining how it works, this is better than the original Airtable guide to REGEX functions. The syntax is quite complex and to write I need to look-up and process every possible expression and its meaning, but that’s very powerful. :slightly_smiling_face:

I used a single line text field for the “Rollup” field, but it should still work with an actual rollup field as long as the output is the same as you outlined above

I added an &“” to the rollup to turn the array into a string, I suppose ARRAYJOIN() would do the same.

Justin_Barrett · ‎Jun 29, 2022

Correct. I had assumed that you’d already converted the array into a string as part of the rollup field’s aggregation formula (the commas that show up by default if the aggregation formula is simply values are for display only; they’re not actually part of the field output). If your rollup aggregation formula is ARRAYJOIN(values, ","), then you won’t need to add the & "" to the formula when referencing the rollup field. Either way, the formula will work as long as the commas are actually there.