Apr 26, 2020 09:56 PM
Another challenging one…
Any advice appreciated!
I have a long text field containing multiple customer reviews from Amazon, separated by strings of special characters.
For example:
"
//Lorem ipsum lorem ipsum//
//Lorem ipsum fake ipsum//
//Lorem counterfeit lorem ipsum//
//Lorem ipsum lorem ipsum//
"
I want to write a formula that will search this field for specific keywords such as “fake” and “counterfeit”.
When the formula finds a match to these keywords, it will grab all the text between the special character markers wherein the keyword was found.
So, for example, using the above long text value, my formula would output:
"
Lorem ipsum fake ipsum
Lorem counterfeit lorem ipsum
"
Thanks for your help!
Apr 27, 2020 05:55 AM
This is going to be a little bit of a challenge and will depend on a few factors: whether the number of line is consistent (e.g. there are always 4 lines and whether it’s repeating), whether you’re always searching for the same words (fake, counterfeit) and whether they can show up multiple times.
To point you in the right direction. Here I first check whether the word exists then try to construct a MID by finding the start and end position of the string you’re trying to extract.
You can find whether the word you’re looking for is in the string by using the find() formula. If it is found, it’ll return a number, otherwise it’ll return an error.
Once you’ve found the word you’re looking for, you can find the end of the substring with find again find("//", {string}, {position where you found the key words}).
The tricky part is how do we find the beginning? Is it always preceeded by the word Lorem (or a set number of characters), you can just subtract it from where we find the keyword. Otherwise, you can extract every instance of “//” and then use the one that is closest to the position of the keyword you’re looking for.
Hope this helps!
Apr 27, 2020 08:03 PM
It does help indeed.
Thanks, @Aron!
To answer your questions:
The number of lines is not consistent
The preceding keyword is not always consistent.
I think I can get most of the way there with find.
Can anyone show me an example formula of how they would do this?
Many thanks