Extract every first letter from a strings with regex without a loop

I’m trying to make shortenings of company names using a regular expression, by taking taking the first letter every part of the name.
Fx ‘Great Garden Centre’ → GGC
I’m not sure if this is possible without a loop, but I’m also no buff at using regex.

I’ve sofar I can only extract the first letter using this:

REGEX_EXTRACT(Company, "\\b[a-zA-Z]")

Is this possible without looping through every word?

Hi there! What you did will only extract the letter G from the example you’ve provided. It’s because according to documentation REGEX_EXTRACT returns only the first occurence that matches your regex. I thought it’s not possible but I’ve played a while and I was able to achieve it by below formula:

SUBSTITUTE(REGEX_REPLACE(Notes, "[a-z]", ""), " ", "")

So with REGEX_REPLACE I’m replacing every small letter with empty string first. This doesn’t seem to work as expected giving me “G G C” with spaces between letters. For that I’ve wrapped it with SUBSTITUTE that replaces spaces (" “) with nothing (”") so you get “GGC”.

2 Likes

Ahhh thats quite a nice way around it, but I assume I’ll need always to write first letters in caps. But guess it’ll have to do. Thank you for your help and clear explanation.

1 Like

If you don’t want to worry about capital letters or not, this should do it:

SUBSTITUTE(REGEX_REPLACE(Notes, "\\B[a-z]", ""), " ", "")

I’ve checked it and it seems to work. I’m not an expert in regex though, I just try and see what happens :smiley:

1 Like

An even simpler formula would be

REGEX_REPLACE(Notes, "\\B\\w+\\W*", "")

This will delete all word characters that are not at the start of a word up to, and including, eventual subsequent non-word characters. “Great Garden Company” becomes “GGC” and “Not so great Company” becomes “NsgC”.

Note that, as of this writing, Airtable regular expressions are not Unicode aware, so that letters outside the English Latin subset will be considered non-word characters and thus will not be abbreviated.

4 Likes

That is just perfect - thank you @Martin_Kopischke

This is a nice solution.

However, one of the problems with creating a regular expression like this is dealing with punctuation:

  • Annie’s Bagels
  • Pick 'n Save
  • 'Tis the Season
  • company.com

A slight tweak will seal with the first case.

REGEX_REPLACE(Notes, "\\B[\\w']+\\W*", "")

But there are so many cases where you just can’t predict punctuation that is is very hard to have a robust regular expression to deal with all of these situations.

3 Likes

Absolutely, you are right on all counts. There is only so much you can achieve with a regex considering the bewildering range of possible inputs, including incorrect forms (case in point: your regex will actually not match your first example the way you would expect it to, because that apostrophe is a typographically correct one while the regex only matches on ASCII single quotes), never mind the fact that any letter not accounted for by ASCII throws a spanner into the works, as mentioned. Personally, I’d recommend not relying on such an automatism at all and setting the abbreviation manually instead, but that is outside the scope of OP’s question, and I prefer to avoid answering “you shouldn’t be doing that” / “why would you want to do that” when I can :wink:.

3 Likes

For whatever reason I always dip headless into the automation bucket and completely disregard that sometimes the best way is just do it manually.

1 Like

One of the nicer aspects of Airtable is that you can actually combine both approaches for an existing data set. Define a formula column using a variant of my regex to get a first raw set of abbreviations, then convert that column into text. Airtable will keep the calculated values while converting, which means you can now fine tune the existing abbreviations, and any new entry will then add their abbreviation manually.

Or, if you prefer, you can keep the calculated column and add an override text column where users can manually tune (or deviate from) it, then base all display and operations on a third formula column that selects the override value if present, or the automatic one if not (I do that in some scenarios where the automatic value is right in at least 75 % of the cases). It all depends on factors like existing data set, frequency of new inputs, qualification of users inputting data etc.

1 Like

On a more personal note, I wouldn’t call this “headless”: the allure of automatisms is great, and when they work, they feel like magic. It’s when you start hitting the edge cases you will come to the insight that doing the work, albeit annoying, is often superior to magic that goes wrong. At least that’s what happened to me: I have rolled back most automatisms in our system over time, sometimes falling back to the override pattern I described, but most often just saying “f**k it, not worth the hassle” (disclaimer: we have a very small, specialised team doing data entry and the volume of new inputs at any point in time is small enough for me to keep tabs on inconsistencies and lags in the data; YMMV).

2 Likes

This topic was solved and automatically closed 3 days after the last reply. New replies are no longer allowed.