Formula to break a string of text onto 2 separate strings of text

J_B_S · ‎Mar 11, 2019

I have a string of text that I want to break into two separate strings of text.
Here’s an example:

This is an example string of text / I want to break it into two separate strings of text

I want all of the words that precede the “/” to be one string of text. And then I want all of the words that come after the “/” to be a separate string of text. I want to completely eliminate the “/”.

So, to recap, we start with this:

This is an example string of text / I want to break it into two separate parts

And then we end-up with this:

This is an example string of text
I want to break it into two separate parts

Rodrigo_Flores · ‎Dec 04, 2021

This is working so good with First and Last Name, but, i’m wondering… how about the “Middle2” in your table. Im on a Latinamerican country and We Usually name like this (example name):
Carlos Roberto Flores González

How can i extract “Flores” from it?

Thanks in advance.

Justin_Barrett · ‎Dec 04, 2021

Welcome to the community, @Rodrigo_Flores! :grinning_face_with_big_eyes: This will extract the next-to-last item from a string (using spaces for separation):

IF(Name, REGEX_EXTRACT(Name, "([^ ]+) ([^ ]+)$"))

Rodrigo_Flores · ‎Dec 04, 2021

Wow, it worked! :winking_face: many thanks Justin. Just to be aware, how can i study the logic of those functions “([^ ]+) ([^ ]+)$”, or, what’s the logic behind that… :smiling_face_with_sunglasses: you rock!

thanks!

Justin_Barrett · ‎Dec 04, 2021

First off, the $ at the end anchors the pattern to the end of the string. Without that, the regex engine would search for a match at the start of the string.

Parentheses are used to define search groups. In the expression that I wrote, there are two matching groups separated by a space, meaning that both groups and the separating space would need to be matched for anything to be returned.

Square braces are wrapped around collections of individual characters that should be matched. However, the caret ^ character at the front tells the regex engine to not match any of the following characters. In this case, we want to not match a space.

The + token after the square brace collection says to match the preceding token—any non-space character—one or more times.

Taken altogether, this collection—([^ ]+)—will match any non-space character one or more times, effectively selecting a single word. (Note: There is a separate \w token for specifically selecting “word” characters, but it’s very narrowly-focused in the characters that it matches. The version that I’m using is more broad, literally matches anything that’s not a space.)

Taking everything into account, the full expression says, “Find the pattern WORD-SPACE-WORD at the end of the string.”

With that in mind, we now hit an interesting bit of idiosyncratic behavior: Airtable’s regex implementation only returns the contents of the first defined group. So even though we’re telling the regex engine to find two words separated by a space, only the first word is actually returned, giving us the second-to-last word in the string.

To learn more, I recommend bookmarking a site like regex101.com (where you’ll need to select the Golang “flavor” to most-closely match the regex engine that Airtable uses), and just playing around.

Rodrigo_Flores · ‎Dec 08, 2021

Thank you Justin, do me a favor… take care, you’re awesome! :relaxed:

ACRA_Data · ‎Mar 10, 2022

This conversation seems related to my current issue, but I’m looking for something that splits text out to columns from a comma separated list of kids’ first names in one field. Sometimes there are no kids sometime one, two, and even one instance of 7 kids. So I want to have at least seven new fields populated with the text between commas: Child FN 1, Child FN 2, Child FN 3, Child FN 4, Child FN 5, Child FN 6, Child FN 7. Any suggestions?

ACRA_Data · ‎Mar 11, 2022

@Justin_Barrett, I’m curious if you had some suggestions. Thanks.

Justin_Barrett · ‎Mar 11, 2022

Welcome to the community, @ACRA_Data! :grinning_face_with_big_eyes: This can be solved using a variation of the multi-line extraction technique from above.

First, add a {Child Count} field. The method above uses a formula to count lines, but because the children in your base are in linked records, you can use a count field that counts the links.

For {Child 1}, the formula could be this:

IF({Child Count}, REGEX_EXTRACT({Children First Names} & "", "[^,]*"))

{Child 2} would be this:

IF({Child Count} > 1, REGEX_EXTRACT({Children First Names} & "", "(?:[^,]*,)([^,]*)"))

{Child 3} would be this:

IF({Child Count} > 2, REGEX_EXTRACT({Children First Names} & "", "(?:[^,]*,){2}([^,]*)"))

The formulas for {Child 4} through {Child 7} would continue the pattern begun in {Child 3}. With each successive formula, you would increase the following numbers by one:

The number in the comparison against {Child Count}
The number in curly brackets in the middle of the regular expression

ACRA_Data · ‎Mar 13, 2022

Thanks @Justin_Barrett! This is super helpful!

Riddhi_Mehta-Ne · ‎Apr 01, 2022

Hello All,
I’m new to this community and appreciate all the discussion thus far. But I don’t see a solution to my particular problem quite yet.

I’m trying to do something similar to @ACRA_Data, but rather than having the first names in separate columns, I would like them to be in the same column as a multiple select field, rather than a single text field. I think I need to change something in the REGEX_EXTRACT function, but can’t figure it out.

The column on the left in the image below shows you what I have, and the column on the right is what I am working towards. The column on the left can have words separated by commas ranging from 1 to 10. Any help is very much appreciated.