Skip to main content
Solved

Help composing REGEX formula to extract strings from URL

  • February 10, 2021
  • 5 replies
  • 90 views

Hi there, I recognize the power of regex expressions, but I’m just pathetic at composing them, so I’m reaching out for help from someone who actually understands the syntax.

I have a table in which all entries contain a URL with this structure:

https://scholarship.ode.state.oh.us/Student/Home/Details/555555?stdntID=444444

From record to record, only two components of the URL change:

  1. The number immediately following Details/ (i.e. 555555)
  2. The number immediately following stdntID= (i.e. 444444)

I need to write two formulas, one for each of the components named above, to extract these unique numbers. If anyone could help in this regard, goodness what a kindness it would be.

Thank you for taking the time to read and consider.

Best answer by kuovonne

Try these formulas. They might be a bit more than you need, but they should work.

IF(
  REGEX_MATCH({My URL Field}, "Details/\\d+"),
  SUBSTITUTE(
    REGEX_EXTRACT({My URL Field}, "Details/\\d+"),
    "Details/", 
    ""
  )
)

IF(
  REGEX_MATCH({My URL Field}, "stdntID=\\d+"),
  SUBSTITUTE(
    REGEX_EXTRACT({My URL Field}, "stdntID=\\d+"),
    "stdntID=", 
    ""
  )
)

Here is some more information about Airtable’s REGEX functions:

  • Airtable adds the extra backslashes to escape the backslash. I’m not sure why they need to be escaped, but that is not the source of your problem.

  • Airtable RegEx does not support look ahead / look behind.

  • Airtable REGEX_EXTRACT returns an error if it cannot find a match.

  • Airtable REGEX_EXTRACT returns only the first match.

5 replies

Forum|alt.badge.img+18
  • Inspiring
  • February 10, 2021

Try these:

Details

\d+(?=?)

student id

\d+(.\d+)?$


  • Author
  • Participating Frequently
  • February 11, 2021

Try these:

Details

\d+(?=?)

student id

\d+(.\d+)?$


Thank you, @augmented, for taking a swing at it!

When I attempt your solution, i’m noticing extra backslashes are added when I save the formula, like this:

IF(
  {My URL Field}, REGEX_EXTRACT({My URL Field}, "\\d+(?=?)")
)

(returns #ERROR!)

and like this:

IF(
  {My URL Field}, REGEX_EXTRACT({My URL Field}, "\\d+(.\\d+)?$")
)

(returns blanks cells)

The formulas as of now are not extracting the numbers as hoped, but I’m wondering if it’s because of those extra backslashes being added.


Forum|alt.badge.img+18
  • Inspiring
  • February 11, 2021

Interesting. I’ve never used REGEX in Airtable, but outside it many times. The extra backslashes are definitely not wanted.

I just tried to replicate your example in Airtable and I couldn’t avoid the extra backslashes. I’m not even sure that REGEX is fully implemented in Airtable. No way to know without getting rid of the auto backslashes.

Really sorry that I can’t help.


  • Author
  • Participating Frequently
  • February 12, 2021

Interesting. I’ve never used REGEX in Airtable, but outside it many times. The extra backslashes are definitely not wanted.

I just tried to replicate your example in Airtable and I couldn’t avoid the extra backslashes. I’m not even sure that REGEX is fully implemented in Airtable. No way to know without getting rid of the auto backslashes.

Really sorry that I can’t help.


Really appreciate your kind efforts nevertheless, @augmented! I may attempt to mark this as a bug, if I can.


kuovonne
Forum|alt.badge.img+29
  • Brainy
  • Answer
  • February 12, 2021

Try these formulas. They might be a bit more than you need, but they should work.

IF(
  REGEX_MATCH({My URL Field}, "Details/\\d+"),
  SUBSTITUTE(
    REGEX_EXTRACT({My URL Field}, "Details/\\d+"),
    "Details/", 
    ""
  )
)

IF(
  REGEX_MATCH({My URL Field}, "stdntID=\\d+"),
  SUBSTITUTE(
    REGEX_EXTRACT({My URL Field}, "stdntID=\\d+"),
    "stdntID=", 
    ""
  )
)

Here is some more information about Airtable’s REGEX functions:

  • Airtable adds the extra backslashes to escape the backslash. I’m not sure why they need to be escaped, but that is not the source of your problem.

  • Airtable RegEx does not support look ahead / look behind.

  • Airtable REGEX_EXTRACT returns an error if it cannot find a match.

  • Airtable REGEX_EXTRACT returns only the first match.