Help

Help composing REGEX formula to extract strings from URL

Topic Labels: Formulas
Solved
Jump to Solution
2950 5
cancel
Showing results for 
Search instead for 
Did you mean: 
Stephen_Piscura
5 - Automation Enthusiast
5 - Automation Enthusiast

Hi there, I recognize the power of regex expressions, but I’m just pathetic at composing them, so I’m reaching out for help from someone who actually understands the syntax.

I have a table in which all entries contain a URL with this structure:

https://scholarship.ode.state.oh.us/Student/Home/Details/555555?stdntID=444444

From record to record, only two components of the URL change:

  1. The number immediately following Details/ (i.e. 555555)
  2. The number immediately following stdntID= (i.e. 444444)

I need to write two formulas, one for each of the components named above, to extract these unique numbers. If anyone could help in this regard, goodness what a kindness it would be.

Thank you for taking the time to read and consider.

1 Solution

Accepted Solutions
kuovonne
18 - Pluto
18 - Pluto

Try these formulas. They might be a bit more than you need, but they should work.

IF(
  REGEX_MATCH({My URL Field}, "Details/\\d+"),
  SUBSTITUTE(
    REGEX_EXTRACT({My URL Field}, "Details/\\d+"),
    "Details/", 
    ""
  )
)

IF(
  REGEX_MATCH({My URL Field}, "stdntID=\\d+"),
  SUBSTITUTE(
    REGEX_EXTRACT({My URL Field}, "stdntID=\\d+"),
    "stdntID=", 
    ""
  )
)

Here is some more information about Airtable’s REGEX functions:

  • Airtable adds the extra backslashes to escape the backslash. I’m not sure why they need to be escaped, but that is not the source of your problem.

  • Airtable RegEx does not support look ahead / look behind.

  • Airtable REGEX_EXTRACT returns an error if it cannot find a match.

  • Airtable REGEX_EXTRACT returns only the first match.

See Solution in Thread

5 Replies 5
augmented
10 - Mercury
10 - Mercury

Try these:

Details

\d+(?=?)

student id

\d+(.\d+)?$

Thank you, @augmented, for taking a swing at it!

When I attempt your solution, i’m noticing extra backslashes are added when I save the formula, like this:

IF(
  {My URL Field}, REGEX_EXTRACT({My URL Field}, "\\d+(?=?)")
)

(returns #ERROR!)

and like this:

IF(
  {My URL Field}, REGEX_EXTRACT({My URL Field}, "\\d+(.\\d+)?$")
)

(returns blanks cells)

The formulas as of now are not extracting the numbers as hoped, but I’m wondering if it’s because of those extra backslashes being added.

augmented
10 - Mercury
10 - Mercury

Interesting. I’ve never used REGEX in Airtable, but outside it many times. The extra backslashes are definitely not wanted.

I just tried to replicate your example in Airtable and I couldn’t avoid the extra backslashes. I’m not even sure that REGEX is fully implemented in Airtable. No way to know without getting rid of the auto backslashes.

Really sorry that I can’t help.

Really appreciate your kind efforts nevertheless, @augmented! I may attempt to mark this as a bug, if I can.

kuovonne
18 - Pluto
18 - Pluto

Try these formulas. They might be a bit more than you need, but they should work.

IF(
  REGEX_MATCH({My URL Field}, "Details/\\d+"),
  SUBSTITUTE(
    REGEX_EXTRACT({My URL Field}, "Details/\\d+"),
    "Details/", 
    ""
  )
)

IF(
  REGEX_MATCH({My URL Field}, "stdntID=\\d+"),
  SUBSTITUTE(
    REGEX_EXTRACT({My URL Field}, "stdntID=\\d+"),
    "stdntID=", 
    ""
  )
)

Here is some more information about Airtable’s REGEX functions:

  • Airtable adds the extra backslashes to escape the backslash. I’m not sure why they need to be escaped, but that is not the source of your problem.

  • Airtable RegEx does not support look ahead / look behind.

  • Airtable REGEX_EXTRACT returns an error if it cannot find a match.

  • Airtable REGEX_EXTRACT returns only the first match.