Use Regex to extract info form text


#1

First Airtable. is. awesome. I which I knew about it before to help organize my work.
Something I feel is missing though is a more powerful approach to deal with string-based functions.
For instance, it would be great to be able to use RegEx in the FIND or SUBSTITUTE functions.
A use case, e.g., is the extraction of part of a URL to use it as a primary key. It would allow me to name automatically a bunch of website links I wanna store.

I.e., there’s a REGEXEXTRACT function in Google Sheet, this is extremely useful for numerous of applications.


#2

@Marc_Evrard Thanks for mentioning this Google Sheet formula. For some reason I had never seen that before. I do think that the things you mention can be realized by using FIND + LEFT/MID/RIGHT + LEN and so on. I’m kinda curious to see some of the cases.


#3

Hi Andre,
Thanks for the suggestion.
For instance the function =REGEXEXTRACT(<url text>, ".*\://?([^\/]+)") in Google Sheet would extract the sub-domain and domain names of a given URL. I’m not sure how I could produce such a concise solution using a bunch of FIND, LEFT, MID, RIGHT and LEN functions. RegEx may not be particularly readable, but they are extremely powerful.
Best


#4

Just a quick follow up on this, I did solve the problem with the following formula:

MID(URL, 
    FIND("//", URL) + 2, 
    FIND("/", 
         URL, 
         FIND("//", URL) + 2
        ) - (FIND("//", URL) + 2)
   )

Some people might actually find it more readable that the regex above. Though, for instance, this problem was a rather simple one when compared to all the possibilities offered by regular expressions (some examples).
By the way, if you think about any better way to proceed with this, feel free to comment further.