Re: Extract every first letter from a strings with regex without a loop

Kim_Trager1 · ‎Aug 09, 2021

I’m trying to make shortenings of company names using a regular expression, by taking taking the first letter every part of the name.
Fx ‘Great Garden Centre’ → GGC
I’m not sure if this is possible without a loop, but I’m also no buff at using regex.

I’ve sofar I can only extract the first letter using this:

REGEX_EXTRACT(Company, "\\b[a-zA-Z]")

Is this possible without looping through every word?

Martin_Kopischk · ‎Aug 11, 2021

An even simpler formula would be

REGEX_REPLACE(Notes, "\\B\\w+\\W*", "")

This will delete all word characters that are not at the start of a word up to, and including, eventual subsequent non-word characters. “Great Garden Company” becomes “GGC” and “Not so great Company” becomes “NsgC”.

Note that, as of this writing, Airtable regular expressions are not Unicode aware, so that letters outside the English Latin subset will be considered non-word characters and thus will not be abbreviated.

See Solution in Thread

Ela_Automatela_ · ‎Aug 10, 2021

Hi there! What you did will only extract the letter G from the example you’ve provided. It’s because according to documentation REGEX_EXTRACT returns only the first occurence that matches your regex. I thought it’s not possible but I’ve played a while and I was able to achieve it by below formula:

SUBSTITUTE(REGEX_REPLACE(Notes, "[a-z]", ""), " ", "")

So with REGEX_REPLACE I’m replacing every small letter with empty string first. This doesn’t seem to work as expected giving me “G G C” with spaces between letters. For that I’ve wrapped it with SUBSTITUTE that replaces spaces (" “) with nothing (”") so you get “GGC”.

Kim_Trager1 · ‎Aug 11, 2021

Ahhh thats quite a nice way around it, but I assume I’ll need always to write first letters in caps. But guess it’ll have to do. Thank you for your help and clear explanation.

Ela_Automatela_ · ‎Aug 11, 2021

If you don’t want to worry about capital letters or not, this should do it:

SUBSTITUTE(REGEX_REPLACE(Notes, "\\B[a-z]", ""), " ", "")

I’ve checked it and it seems to work. I’m not an expert in regex though, I just try and see what happens :grinning_face_with_big_eyes:

Martin_Kopischk · ‎Aug 11, 2021

An even simpler formula would be

REGEX_REPLACE(Notes, "\\B\\w+\\W*", "")

This will delete all word characters that are not at the start of a word up to, and including, eventual subsequent non-word characters. “Great Garden Company” becomes “GGC” and “Not so great Company” becomes “NsgC”.

Note that, as of this writing, Airtable regular expressions are not Unicode aware, so that letters outside the English Latin subset will be considered non-word characters and thus will not be abbreviated.

Kim_Trager1 · ‎Aug 12, 2021

That is just perfect - thank you @Martin_Kopischke

kuovonne · ‎Aug 12, 2021

This is a nice solution.

However, one of the problems with creating a regular expression like this is dealing with punctuation:

Annie’s Bagels
Pick 'n Save
'Tis the Season
company.com

A slight tweak will seal with the first case.

REGEX_REPLACE(Notes, "\\B[\\w']+\\W*", "")

But there are so many cases where you just can’t predict punctuation that is is very hard to have a robust regular expression to deal with all of these situations.

Martin_Kopischk · ‎Aug 12, 2021

Absolutely, you are right on all counts. There is only so much you can achieve with a regex considering the bewildering range of possible inputs, including incorrect forms (case in point: your regex will actually not match your first example the way you would expect it to, because that apostrophe is a typographically correct one while the regex only matches on ASCII single quotes), never mind the fact that any letter not accounted for by ASCII throws a spanner into the works, as mentioned. Personally, I’d recommend not relying on such an automatism at all and setting the abbreviation manually instead, but that is outside the scope of OP’s question, and I prefer to avoid answering “you shouldn’t be doing that” / “why would you want to do that” when I can :winking_face: .

Kim_Trager1 · ‎Aug 13, 2021

For whatever reason I always dip headless into the automation bucket and completely disregard that sometimes the best way is just do it manually.

Martin_Kopischk · ‎Aug 13, 2021

One of the nicer aspects of Airtable is that you can actually combine both approaches for an existing data set. Define a formula column using a variant of my regex to get a first raw set of abbreviations, then convert that column into text. Airtable will keep the calculated values while converting, which means you can now fine tune the existing abbreviations, and any new entry will then add their abbreviation manually.

Or, if you prefer, you can keep the calculated column and add an override text column where users can manually tune (or deviate from) it, then base all display and operations on a third formula column that selects the override value if present, or the automatic one if not (I do that in some scenarios where the automatic value is right in at least 75 % of the cases). It all depends on factors like existing data set, frequency of new inputs, qualification of users inputting data etc.