Help

Upcoming database upgrades. to improve our reliability at 03:30 UTC on Feb. 25 / 7:30pm PT on Feb. 24. Some users may briefly experience slow load times or error messages. Learn more here

Re: Help composing REGEX formula to extract strings from URL

Solved
Jump to Solution
172 0
cancel
Showing results for 
Search instead for 
Did you mean: 

Hi there, I recognize the power of regex expressions, but I’m just pathetic at composing them, so I’m reaching out for help from someone who actually understands the syntax.

I have a table in which all entries contain a URL with this structure:

https://scholarship.ode.state.oh.us/Student/Home/Details/555555?stdntID=444444

From record to record, only two components of the URL change:

  1. The number immediately following Details/ (i.e. 555555)
  2. The number immediately following stdntID= (i.e. 444444)

I need to write two formulas, one for each of the components named above, to extract these unique numbers. If anyone could help in this regard, goodness what a kindness it would be.

Thank you for taking the time to read and consider.

1 Solution

Accepted Solutions

Try these formulas. They might be a bit more than you need, but they should work.

IF(
  REGEX_MATCH({My URL Field}, "Details/\\d+"),
  SUBSTITUTE(
    REGEX_EXTRACT({My URL Field}, "Details/\\d+"),
    "Details/", 
    ""
  )
)

IF(
  REGEX_MATCH({My URL Field}, "stdntID=\\d+"),
  SUBSTITUTE(
    REGEX_EXTRACT({My URL Field}, "stdntID=\\d+"),
    "stdntID=", 
    ""
  )
)

Here is some more information about Airtable’s REGEX functions:

  • Airtable adds the extra backslashes to escape the backslash. I’m not sure why they need to be escaped, but that is not the source of your problem.

  • Airtable RegEx does not support look ahead / look behind.

  • Airtable REGEX_EXTRACT returns an error if it cannot find a match.

  • Airtable REGEX_EXTRACT returns only the first match.

See Solution in Thread

5 Replies 5

Try these:

Details

\d+(?=?)

student id

\d+(.\d+)?$

Thank you, @augmented, for taking a swing at it!

When I attempt your solution, i’m noticing extra backslashes are added when I save the formula, like this:

IF(
  {My URL Field}, REGEX_EXTRACT({My URL Field}, "\\d+(?=?)")
)

(returns #ERROR!)

and like this:

IF(
  {My URL Field}, REGEX_EXTRACT({My URL Field}, "\\d+(.\\d+)?$")
)

(returns blanks cells)

The formulas as of now are not extracting the numbers as hoped, but I’m wondering if it’s because of those extra backslashes being added.

Interesting. I’ve never used REGEX in Airtable, but outside it many times. The extra backslashes are definitely not wanted.

I just tried to replicate your example in Airtable and I couldn’t avoid the extra backslashes. I’m not even sure that REGEX is fully implemented in Airtable. No way to know without getting rid of the auto backslashes.

Really sorry that I can’t help.

Really appreciate your kind efforts nevertheless, @augmented! I may attempt to mark this as a bug, if I can.

Try these formulas. They might be a bit more than you need, but they should work.

IF(
  REGEX_MATCH({My URL Field}, "Details/\\d+"),
  SUBSTITUTE(
    REGEX_EXTRACT({My URL Field}, "Details/\\d+"),
    "Details/", 
    ""
  )
)

IF(
  REGEX_MATCH({My URL Field}, "stdntID=\\d+"),
  SUBSTITUTE(
    REGEX_EXTRACT({My URL Field}, "stdntID=\\d+"),
    "stdntID=", 
    ""
  )
)

Here is some more information about Airtable’s REGEX functions:

  • Airtable adds the extra backslashes to escape the backslash. I’m not sure why they need to be escaped, but that is not the source of your problem.

  • Airtable RegEx does not support look ahead / look behind.

  • Airtable REGEX_EXTRACT returns an error if it cannot find a match.

  • Airtable REGEX_EXTRACT returns only the first match.