Oct 15, 2024 02:25 AM
I have string fields and I want to extract the first two words (minus any punctuation or special symbols). Ideally, they would remove the white space and capitalize each word as well.
So a field with “Midnight: A Tale” = “MidnightA”.
”What’s New in Books” = “WhatsNew”
I’ve looked all over to find a way to get two words, but can only find out how to get one. I got into RegEx a bit, but couldn’t figure out how to account for the punctuation/symbols. If someone has a slick way to extract the first two words of a string, I’d be eternally grateful!
Solved! Go to Solution.
Oct 15, 2024 03:39 AM
Hello,
Someone may have an easier solution, but you could try the following.
Here is the result that I get (column "first 2 words"):
Formula "Word1" extracts the first word:
Formula "Remainder" extrats the rest of the sentence if it exists:
Formula "word2" extracts the second word from the remainder:
Formula "First 2 words" concatenates Word1 and Word2 and removes characters " ' "and " : " :
You will need to complete the REGEX_REPLACE function with all special characters that you wish to remove (as i did with the dot for example)
Regards,
Pascal
Oct 15, 2024 03:39 AM
Hello,
Someone may have an easier solution, but you could try the following.
Here is the result that I get (column "first 2 words"):
Formula "Word1" extracts the first word:
Formula "Remainder" extrats the rest of the sentence if it exists:
Formula "word2" extracts the second word from the remainder:
Formula "First 2 words" concatenates Word1 and Word2 and removes characters " ' "and " : " :
You will need to complete the REGEX_REPLACE function with all special characters that you wish to remove (as i did with the dot for example)
Regards,
Pascal
Oct 15, 2024 05:17 AM
Here’s a formula that accomplishes your requirements. Make sure to replace {YourField} with the actual name of your field containing the strings.
UPPER(
LEFT(
REGEX_REPLACE(
REGEX_EXTRACT({YourField}, '^(\\w+)[^\\w]*(\\w+)'),
'[^A-Za-z0-9]', ''
),
1
)
) &
UPPER(
MID(
REGEX_REPLACE(
REGEX_EXTRACT({YourField}, '^(\\w+)[^\\w]*(\\w+)'),
'[^A-Za-z0-9]', ''
),
2,
1
)
)
Hope this helps.
Oct 15, 2024 09:07 AM
Thanks for the help! This formula returns the first two letters of the field, rather than the first two words.
Oct 15, 2024 09:12 AM
Thank you, Pascal, this worked perfectly! For others trying it, I did include an additional field to capitalize the second word before concatenating them.
IF({File second word}, UPPER(LEFT({File second word}, 1)) & (REGEX_EXTRACT({File second word}, "(?:.)([^ ]*)")))