Re: Extract Text from a Data string

1648 0
Showing results for 
Search instead for 
Did you mean: 
4 - Data Explorer
4 - Data Explorer

I have read the solutions for extracting text but still can’t seem to get it right.

Need extract text between the dashes (-). The text between the dashes could be any number of characters, if there is a limited it is unknown.

- ###### - ##### - ##### -

4 Replies 4
10 - Mercury
10 - Mercury

Hi Rick. I would just substitute in an empty string or a space, depending on your preference.

SUBSTITUTE({string field name}, '-','')

Sound good?

what would removing the spaces get me.


Would like to create 5 columns from each data set between the dashes. The length of each data set is not fixed.

This is a very similar use case to this thread:

That example was looking for numbers presented in the pattern of
(length)( x )(width)( x )(height)

Your pattern appears to be alphanumeric characters separated by spaces and dashes, or
(alphanumeric text 1)( - )(alphanumeric text 2)( - )(alphanumeric text 3)( - )(alphanumeric text 4)( - )(alphanumeric text 5)

The REGEX pattern for that is:


^ if you want to explicitly define the separator as one or more consecutive dashes. (\\-+) means one or more - characters.


^ if you want to explicitly define the separator as one or more consecutive non-alphanumeric characters (which would include both spaces and dashes). (\\W+) means one or more non-alphanumeric characters

Therefore a full formula for the first of the five fields could be either:

   SUBSTITUTE({string field name}, " ", ""), 

Note: the SUBSTIUTE was included, assuming you want to remove the spaces.


   {string field name}, 

^ this is the recommended formula. If there are not always 5 groups of text, then replace all the + with * to change the formula from “one or more” to “zero or more”.

For either method, the REGEX pattern has 9 “capture groups” (each group is what’s in-between the parentheses). Each of your five formula fields that extracts the alphanumeric text will have the exact same REGEX_REPLACE() formula apart from the last argument to reference the relevant capture group.

Your first field will have "$1" as the last argument (as shown above), the second field will have "$3", the third will have "$5", etc.

Someday I will “get” regex but until that day…thank you.

I am having an issue with the extraction, the first $1 work great, but after that, $3, $5 I am getting improper results.

I did realize my example was slightly incorrect the first part of the pattern is always a number, not text.