Help

Re: REGEX Extract - Help with Formula

490 0
cancel
Showing results for 
Search instead for 
Did you mean: 
Nicholas_Boehm
4 - Data Explorer
4 - Data Explorer

Hello folks,

I am trying to extract two numeric values from text. One of the values is directly after “$” and varies in length. The next is directly before a word “endpoint” and also varies in length. Here is an example of text:

• $24,200 + Professional Services ARR • 650 endpoints • Blah came to Blah wanting to move away from Blah • Huge shoutout to Blah for helping with Blah and answering all their technical follow up questions.

In this instance, I need 24,200 and 650 each in their own columns.

I am totally lost on how to do this and have been trying for hours. Any help is seriously appreciated.

2 Replies 2
Bill_Felix
6 - Interface Innovator
6 - Interface Innovator

whew, this won’t be pretty… :thinking:

I think the way to think about this is to ‘split’ the input into two parts; and then ‘extract’ the dollar/number you want out of it.

so with your column named ‘input’, create a field named ‘dollarAmount’ with this formula:

REGEX_EXTRACT(LEFT(input,SEARCH(' • ', input)), '[\\$ ]+?(\\d+([,\\.\\d]+)?)' )

and then create a field named ‘endpointAmount’ with this formula:

VALUE(LEFT(RIGHT(input,LEN(input) - LEN(LEFT(input,SEARCH(' • ', input)))),SEARCH(' endpoints' ,RIGHT(input,LEN(input) - LEN(LEFT(input,SEARCH(' • ', input)))))))

so will this surely not work every time, because your unstructured data is sure to outsmart me given enough time. But this will get you a decent part of the way.

I agree - that’s the right approach given the constraints of working in a formula. Two observations -

  1. Sometimes it makes sense to avoid this drama by crafting a system that generates discrete values further upstream in the process. Is this possible? Often-times, we create very complex and brittle solutions that are fundamentally unneeded if we did something else better. Elon Musk - “The best part is no part at all.

  2. Numbers in a string are also known as entities, and there are some clever AI javascript approaches that allow us to extract entitles quite easily. Are you amenable to performing such data reclassification using script automation?