Re: Extract every first letter from a strings with regex without a loop

Kim_Trager1 · ‎Aug 09, 2021

I’m trying to make shortenings of company names using a regular expression, by taking taking the first letter every part of the name.
Fx ‘Great Garden Centre’ → GGC
I’m not sure if this is possible without a loop, but I’m also no buff at using regex.

I’ve sofar I can only extract the first letter using this:

REGEX_EXTRACT(Company, "\\b[a-zA-Z]")

Is this possible without looping through every word?

Martin_Kopischk · ‎Aug 13, 2021

On a more personal note, I wouldn’t call this “headless”: the allure of automatisms is great, and when they work, they feel like magic. It’s when you start hitting the edge cases you will come to the insight that doing the work, albeit annoying, is often superior to magic that goes wrong. At least that’s what happened to me: I have rolled back most automatisms in our system over time, sometimes falling back to the override pattern I described, but most often just saying “f**k it, not worth the hassle” (disclaimer: we have a very small, specialised team doing data entry and the volume of new inputs at any point in time is small enough for me to keep tabs on inconsistencies and lags in the data; YMMV).

Steady_Admin · ‎Mar 06, 2023

This is working really well. Do you have any advice on how to extract the first TWO characters from each word? Having trouble finding documentation online about this. Thanks!

Zach_Young · ‎Mar 06, 2024

Like @Martin_Kopischk pointed out, Airtable regex doesn't support Unicode categories in its character classes: both `\bB` and `\wW` only consider ASCII characters. Being able to specify something like `\p{Uppercase_Letter}` might be very handy.

Airtable regex does support grouping and group references in the substitution. With that you can specify more generally what you want to keep instead of trying to specify what you want to get rid of.

(Oh, very basic regex primer: the period (dot) means "any character".)

REGEX_REPLACE(Field, '(.).*?\s', '$1')

Means find the first character and capture (remember) it:

(.)

find any remaining characters up to the first space char (could be space, tab, or any line breaking char)

.*?\s

then, replace all that with the single captured char

'$1'

Because only the first character was captured, the following characters for each word are removed when substituting in the captured char.

For a string like 'Great Garden Centre' that yields only 'GG' because 'Centre' doesn't have a space after it. That can easily be addressed by just tacking on a space to the end of the field in the formula:

REGEX_REPLACE(Field&' ', '(.).*?\s', '$1')

@Steady_Admin, this can be adapted for your need by just extending the capture group to include the first two chars:

REGEX_REPLACE(Field, '(..).*?\s', '$1')

I made a comparison of the various proposals, here.

Zach_Young · ‎Mar 29, 2024

foo

foo
bar

bar