Extract given number of words from a field of comma separated words


#1

Hi all.

I have a field which contains a list of 50 words/phrases that are all separated by commas, in order of importance with the most important being first. Let’s called this field ‘50W’

I have some other fields that I want to auto populate/extract from ‘50W’. One of them is called ‘1W’ which should extract only the first word/phrase, meaning everything in front of the first comma.

Then I have another field called ‘10W’ which i want to extract the first 10 words/phrases, everything in front of the 10th comma.

It would be great if this was possible but I can’t seem to do it whatever i try.

Does anyone have any ideas I can try?

Many thanks.

Phill


#2

I think a Formula with FIND() and LEFT() is the way to go, but I can’t find the way to get the 10W value :thinking:


#3

Thanks Elias.

Both of these appear to require a predefined character count, which is fine when the words/phrase have known or identical lengths, but in my case the words/phrases are different in every case.

For example, here would be the first 5 words/phrases of the field ‘50W’:

one, two words, three words, four words, five words, etc

Some of them may well contain entries with the same number of words but this doesn’t matter.

Thank you for your ideas.

Phill


#4

I created a similar routine to extract hashtags; maybe you can modify it accordingly, 'cause I never want to see it again. :wink:


#5

My goodness!! That’s pretty intense. I don’t even want to think how that it took you to work all that out…

Needless to say, for my 50 words, I won’t be attempting that for only 600 records… it’ll be quicker to do it all manually for sure :slight_smile:

Really well done though. Nice work.


#6

Oh, your task is relatively simple — assuming the variation are all ‘the first X’ words — since all it takes is a LEFT() and an ugly SWITCH() statement.

The following assumes (1) your list of fifty words is in the field {50W}; (2) the field that determines whether to use the '1W' or '2W' or '10W' formula is called, simply, {W}, and (3) you need variations from '1W' through '10W'.

Here’s the formula:

LEFT(
    {50W},
    SWITCH(
        W,
        '1W',FIND(',',{50W})-1,
        '2W',FIND(',',{50W},FIND(',',{50W})+1)-1,
        '3W',FIND(',',{50W},FIND(',',{50W},FIND(',',{50W})+1)+1)-1,
        '4W',FIND(',',{50W},FIND(',',{50W},FIND(',',{50W},FIND(',',{50W})+1)+1)+1)-1,
        '5W',FIND(',',{50W},FIND(',',{50W},FIND(',',{50W},FIND(',',{50W},FIND(',',{50W})+1)+1)+1)+1)-1,
        '6W',FIND(',',{50W},FIND(',',{50W},FIND(',',{50W},FIND(',',{50W},FIND(',',{50W},FIND(',',{50W})+1)+1)+1)+1)+1)-1,
        '7W',FIND(',',{50W},FIND(',',{50W},FIND(',',{50W},FIND(',',{50W},FIND(',',{50W},FIND(',',{50W},FIND(',',{50W})+1)+1)+1)+1)+1)+1)-1,
        '8W',FIND(',',{50W},FIND(',',{50W},FIND(',',{50W},FIND(',',{50W},FIND(',',{50W},FIND(',',{50W},FIND(',',{50W},FIND(',',{50W})+1)+1)+1)+1)+1)+1)+1)-1,
        '9W',FIND(',',{50W},FIND(',',{50W},FIND(',',{50W},FIND(',',{50W},FIND(',',{50W},FIND(',',{50W},FIND(',',{50W},FIND(',',{50W},FIND(',',{50W})+1)+1)+1)+1)+1)+1)+1)+1)-1,
       '10W',FIND(',',{50W},FIND(',',{50W},FIND(',',{50W},FIND(',',{50W},FIND(',',{50W},FIND(',',{50W},FIND(',',{50W},FIND(',',{50W},FIND(',',{50W},FIND(',',{50W})+1)+1)+1)+1)+1)+1)+1)+1)+1)-1
        ))

And here’s a demonstration base showing how it all works together.


#7

Yeah, @Katherine_Duh, I have to say I’m really enjoying that SWITCH() statement. :wink:


#8

@W_Vann_Hall this is VERY impressive. I really appreciate you taking the time to work on this. I love it. Although i’m still a bit new to all this so it’s stretching some of my brain power understanding what is happening.

I’ve had a little play around, and wonder if it might be possible to expand the functionality of your idea to be able to call out a given number of keywords on a record by record basis.

In this i have created a base to demonstrate what my ‘ideal world’ base would be able to do. ‘50W’ is different for every record and doesn’t always contain exactly 50 words, it could be anything from 0-50 or more. although we can assume i will not exceed 50 words.

The count col just shows how many words are in 50W for reference.

Then in the ‘how many words to show’ col, i put in a value from 1-50.

‘Ideal Result’ shows what i would like it to show.

‘Extract’ is the magic bit that creates the contents of Ideal Result automatically without me having to copy paste as i did with the contents of ‘Ideal Result’ :slight_smile:

there might be an occasion where i want to see 15 words, but currently i might have less than that entered in ‘50W’ . In this case it can just show as many as are available and this will update as 50W is populated.

Do you think this is doable or am I simply dreaming?

Many thanks.

Phill


#9

Actually, the formula I provided took that functionality into consideration. It starts with

LEFT(
    {50W},
    SWITCH(
        W,
        '1W',FIND(',',{50W})-1,
        '2W',FIND(',',{50W},FIND(',',{50W})+1)-1,
           [...]

In your example base, you could replace W with {HowManyWordsToShow?} and '1W' with 1, '2W' with 2, and so on. That would let the number of words extracted to be selected on a per-record basis.

That leaves the chore of creating a SWITCH() statement supporting the extraction of one word up through whatever maximum number of words might be desired. The issue is Airtable’s current lack of any sort of looping mechanism. Without such functionality, the only way to perform an action against each of an indeterminate number of instances is to build in explicit support for every possible number of instances one might encounter and then select the applicable branch.

This limitation also determines how to implement your 'extract this many words up to the maximum words currently in 50W’ algorithm. Actually, since when asked to extract more words than are currently defined, the routine seemingly fails by gobbling all of 50W, you might not have to change anything — but if you’d rather have a solution based on intention rather than circumstance, your extraction routines could include something along these lines:

IF(
    {Count}>={HowManyWordsToShow?},
    {HowManyWordsToShow?},
    {Count}
    )

#10

@W_Vann_Hall

Absolutely genius. I love it… especially your use of the word ‘chore’ for creating the SWITCH() statement… that took me about half an hour, and it works VERY well with only one issue. …

If i request the exact same amount of keywords, as contained in 50W (and consequently the Count field), the Extract result field is blank and whatever i change i can’t seem to fiddle it to work.

Any ideas?

Thank you.

Phill


#11

Oh, hell, I forgot to make that fix.

You’ll probably want to make this change in an external editor (and if you created that 50-step SWITCH() using Airtable’s internal editor, you’re a far better man than I…):

  • replace {50W} with {50W}&','

I think that’s the issue: The routine is looking for the Xth comma in a list containing only (X-1) commas. You might want to test it first by making that change in one of the branches and running it against a word list with the same number of words as W for that branch.


#12

Thank you so much, this is AMAZING!!!

Indeed I am not a better man than you as I didn’t brave entering the 50-step switch using Airtable’s internal editor… I am not insane… although having realised that would have been silly, I take some comfort in the knowledge that although i’m not the better man, I’m not daft… :slight_smile:

Your contributions to my dilemma have really enabled me to add functionality to my base which I didn’t actually believe was possible. You are a very valuable resource to the Airtable community and I hope the folks at Airtable acknowledge this, as your advice, input and suggestions make their tool more valuable.

Thank you VERY much @W_Vann_Hall

Phill


#13

I’ve learned much from this thread. Thanks all. But I can’t help but wonder, would you not be better off importing the word/phrase collections as individual fields in a record? That way, you could manipulate them many ways with little effort. I don’t understand why it would be better to store these values as text blobs when you already have a seperator you can use to import them as atomic values?


#14

For me, i need the words & phrases formatted as comma separated values as these get uploaded to various places. Each place i upload them to requires (or permits) differing quantities of keywords/phrases.

Additionally, often times, when i am researching & finding my words, they are already created as comma separated words.

So I could nail down the words one by one, each to their own fields… but it would take me longer to do this (unless i had a formula to strip away the commas) and it wouldn’t make too much sense, as i’d need another formula later on to put the commas back in for each upload.

so i decided the best approach was to pull in all the words (comma separated) into a ‘master’ field (containing all 50 words) and then extract from there as many as i need for each case.

sounds a bit weird i know, but as Tupac once sang… ‘That’s just the way it is’ … :slight_smile:

Glad you got some value from this thread. I certainly did. I have also been telling other FB group members to use this community forum as it’s an amazing resource… even if you just read through existing threads (as you have done) you can get all sorts of great ideas…

it wasn’t you i told to come here and read through old posts was it? :slight_smile:

cheers.


#15

@W_Vann_Hall
I have a bunch of recipes (food) in pages and PDF format that I am trying to insert comma’s between #, Unit, ingredient (so that they will populate in different columns in the database). Would your hashtag extractor work at putting a comma between those values?