Aug 26, 2023 07:33 AM - edited Aug 26, 2023 06:16 PM
I have a column that has strings of option lists, and I would like to extract only one option "COLOR=" ,between the comma & " "(space), from them. These option lists sometimes include different language letter so It must extract english only.
Here we have four examples of the string that shows the pattern.
=>>>> I would like to see >>>>
I tried REGEX-EXTRACT({option column},"COLOR=[a-zA-Z]+.*[a-zA-Z]+)"
but obviously it includes after comma as well. something with '*,+' was recommended but do not know how to combine these..
Thanks for the help in advanced!
Aug 26, 2023 04:46 PM
How about this?
REGEX_REPLACE(REGEX_EXTRACT({option column},".*(COLOR=[^,]*)")," \\W*", "")
Aug 26, 2023 05:26 PM
Hmm, if I'm understanding your requirements correctly, I think you had it almost right:
REGEX_EXTRACT({option column},"COLOR=[a-zA-Z ]+")
You can see the regex tested against your sample data here: https://regexr.com/7j738
Note that since it will also match spaces at the end, you can add a TRIM() function to get rid of extra spaces:
TRIM(REGEX_EXTRACT({option column},"COLOR=[a-zA-Z ]+"))
or, to do it within one function:
REGEX_EXTRACT({option column},"COLOR=[a-zA-Z ]+[a-zA-Z]"))
Btw, if you ever need just the Korean characters, instead of [A-Za-z], use:
[\u1100-\u11FF\u3130-\u318F\uA960-\uA97F\uAC00-\uD7AF\uD7B0-\uD7FF]
Aug 26, 2023 06:15 PM
this totally works! thank you so much!🌈