Help

Re: regex extract - string between the commas & english only

729 0
cancel
Showing results for 
Search instead for 
Did you mean: 
Jessica_Shin
5 - Automation Enthusiast
5 - Automation Enthusiast

I have a column that has strings of option lists, and I would like to extract only one option "COLOR=" ,between the comma & " "(space), from them. These option lists sometimes include different language letter so It must extract english only.

Here we have four examples of the string that shows the pattern.

  1. COLOR=WHITE 화이트, DETAIL=NONE
  2. SLEEVE=SHORT 쇼트, COLOR=SKY BLUE
  3. SLEEVE=LONG, COLOR=GRAY, TOP=DOUBLE
  4. COLOR=LIGHT YELLOW 밝은 노랑

=>>>> I would like to see >>>>

  1. COLOR=WHITE
  2. COLOR=SKY BLUE
  3. COLOR=GRAY
  4. COLOR=LIGHT YELLOW

I tried REGEX-EXTRACT({option column},"COLOR=[a-zA-Z]+.*[a-zA-Z]+)"

but obviously it includes after comma as well. something with '*,+' was recommended but do not know how to combine these..

Thanks for the help in advanced!

3 Replies 3
Sho
11 - Venus
11 - Venus

How about this?

REGEX_REPLACE(REGEX_EXTRACT({option column},".*(COLOR=[^,]*)")," \\W*", "")

Hmm, if I'm understanding your requirements correctly, I think you had it almost right:

REGEX_EXTRACT({option column},"COLOR=[a-zA-Z ]+")

 You can see the regex tested against your sample data here: https://regexr.com/7j738

Note that since it will also match spaces at the end, you can add a TRIM() function to get rid of extra spaces:

TRIM(REGEX_EXTRACT({option column},"COLOR=[a-zA-Z ]+"))

or, to do it within one function: 

REGEX_EXTRACT({option column},"COLOR=[a-zA-Z ]+[a-zA-Z]")) 

Btw, if you ever need just the Korean characters, instead of [A-Za-z], use:

[\u1100-\u11FF\u3130-\u318F\uA960-\uA97F\uAC00-\uD7AF\uD7B0-\uD7FF]

 

this totally works! thank you so much!🌈