Help composing REGEX formula to extract strings from URL

Hi there, I recognize the power of regex expressions, but I’m just pathetic at composing them, so I’m reaching out for help from someone who actually understands the syntax.

I have a table in which all entries contain a URL with this structure:

https://scholarship.ode.state.oh.us/Student/Home/Details/555555?stdntID=444444

From record to record, only two components of the URL change:

  1. The number immediately following Details/ (i.e. 555555)
  2. The number immediately following stdntID= (i.e. 444444)

I need to write two formulas, one for each of the components named above, to extract these unique numbers. If anyone could help in this regard, goodness what a kindness it would be.

Thank you for taking the time to read and consider.

Try these:

Details

\d+(?=?)

student id

\d+(.\d+)?$

Thank you, @augmented, for taking a swing at it!

When I attempt your solution, i’m noticing extra backslashes are added when I save the formula, like this:

IF(
  {My URL Field}, REGEX_EXTRACT({My URL Field}, "\\d+(?=?)")
)

(returns #ERROR!)

and like this:

IF(
  {My URL Field}, REGEX_EXTRACT({My URL Field}, "\\d+(.\\d+)?$")
)

(returns blanks cells)

The formulas as of now are not extracting the numbers as hoped, but I’m wondering if it’s because of those extra backslashes being added.

Interesting. I’ve never used REGEX in Airtable, but outside it many times. The extra backslashes are definitely not wanted.

I just tried to replicate your example in Airtable and I couldn’t avoid the extra backslashes. I’m not even sure that REGEX is fully implemented in Airtable. No way to know without getting rid of the auto backslashes.

Really sorry that I can’t help.

Really appreciate your kind efforts nevertheless, @augmented! I may attempt to mark this as a bug, if I can.

Try these formulas. They might be a bit more than you need, but they should work.

IF(
  REGEX_MATCH({My URL Field}, "Details/\\d+"),
  SUBSTITUTE(
    REGEX_EXTRACT({My URL Field}, "Details/\\d+"),
    "Details/", 
    ""
  )
)

IF(
  REGEX_MATCH({My URL Field}, "stdntID=\\d+"),
  SUBSTITUTE(
    REGEX_EXTRACT({My URL Field}, "stdntID=\\d+"),
    "stdntID=", 
    ""
  )
)

Here is some more information about Airtable’s REGEX functions:

  • Airtable adds the extra backslashes to escape the backslash. I’m not sure why they need to be escaped, but that is not the source of your problem.

  • Airtable RegEx does not support look ahead / look behind.

  • Airtable REGEX_EXTRACT returns an error if it cannot find a match.

  • Airtable REGEX_EXTRACT returns only the first match.

1 Like

This topic was solved and automatically closed 3 days after the last reply. New replies are no longer allowed.