Re: How to find text in text string

ARTHUR_BENNETT · ‎Dec 11, 2017

what formula can I run against a text field that will display all text characters up to a “,”.

Example: TEXT1, TEXT2

Result: TEXT1

Justin_Barrett · ‎Sep 21, 2021

Correct! :partying_face:

That is a regular expression. Regular expressions are wonderful for working on strings based on patterns. It can take some time to understand, which is why I recommend using a site like regex101.com to test regular expressions before using them in Airtable (be sure to pick the “Golang” option on the left if you use that site; that’s the flavor closest to what Airtable’s regex interpreter uses).

Let’s break it down into a few basic concepts first…

Parentheses are used to define groups. In that expression above, there are three defined groups. Sometimes you can locate things without grouping them first, but in this situation, using groups makes isolating specific things a lot easier.

Whenever a group begins with ?:, that means that everything else in the group should be located, but not returned. In other words, that group is just there as a reference to help find other items.

When you see a number in curly braces—e.g. {2}—that means to match the preceding item (or group, as in this case) that many times. You can also use multiple numbers to make the same match a range of times. For example, {2,5} indicates to match the preceding item at least 2 times, but not more than 5.

Now let’s dive into the first group. Temporarily stripping away the parentheses and the “ignore me” prefix, we have this:

[^,]*, *

Square braces indicate that any of the contained characters should be matched. However, this character collection begins with a caret ^ symbol, meaning that any characters except those that follow the caret should be matched. In short, match any character that’s not a comma.

The asterisk * says to match the preceding item zero or more times. When combined with the square brace collection, this says to look for zero or more characters that are not commas.

Then there’s a literal comma, which matches that single character. That’s followed by a space and another asterisk, meaning to match a space zero or more times.

The whole first group then translates into this: match—but don’t collect—any quantity of non-comma characters that are followed by a literal comma and zero or more spaces. When combined with the number in curly braces after it, it says to find that grouping exactly two times.

The next group— ([^,]*) — should be easier to understand now: match any quantity of non-comma characters. Because this group doesn’t have the ?: prefix, its contents will be captured (i.e. extracted).

The final group is similar to the first. It’s a non-capturing group that matches zero or more of any character (the period represents any single character)

To sum it up, that expression finds—but doesn’t capture—the first two stretches of text ending in commas (and, optionally, spaces), collects the third item up to—but not including—the comma after it, and then ignores everything else.

As you’ve correctly guessed, modifying it to capture any Nth item is just a matter of changing the number in the curly braces to N - 1.

Frank_Reagan · ‎Sep 22, 2021

Thank you so much @Justin_Barrett! This worked and is SO much easier and efficient than my workaround :grinning_face_with_big_eyes: I appreciate your explanation and resources for REGEX and I am looking forward to learning & practicing it more!

Alexey_Gusev · ‎Sep 23, 2021

additional thanks for that info. regular expressions is something new for me, and when I tried to write my own, it passed test, but refused to work here. when i tried to change random options, i never expected it should be Golang.