Extracting Hashtags (was: Formula Assistance)


#1

Building out a social media calendar where one of the fields indicates the Copy that will be going in a post. Was hoping to create a separate formula field that automatically searched the Copy field for any hashtags and listed them there.

Support’s initial response sent me the formula below as a start, but it only pulls in the first # within the copy section. Wasn’t sure if anyone had any ideas/suggestions on how to expand it to pull/list all of them. Thanks in advance!

IF(
FIND(" “,{Notes},FIND(”#",{Notes})),
MID({Notes},FIND("#",{Notes}),FIND(" “,{Notes},FIND(”#",{Notes}))-FIND("#",{Notes})),
RIGHT({Notes}, LEN({Notes})-FIND("#",{Notes})+1)
)


#2

Unfortunately, Airtable currently does not support processing loops. If you know in advance the maximum number of hashtags one might encounter in your copy, you can hardcode a routine to extract up to that number of tags, but at the moment you cannot create a formula that will simply step through your text and extract however many hashtags it finds. (Note that each repetition of a tag counts as a separate instance, so your maximum number is not necessarily the number of unique hashtags contained.)

The most difficult aspect of this task is somehow maintaining a pointer to where one should start searching for the next hashtag. For instance, the formula for finding the fourth occurrence of '#' in {Notes} would be

FIND(
    '#',
    Notes,
    FIND(
        '#',
        Notes,
        FIND(
            '#',
            Notes,
            FIND(
                '#',
                Notes
                )+1
            )+1
        )+1
    )

Keep in mind, however, that simply identifies where the fourth octothorpe (a.k.a. ‘pound sign’) can be found. To extract the fourth hashtag (assuming a hashtag begins with '#' and ends with the next space character) requires the following code:

IF(
    FIND(
        '#',
        Notes,
        FIND(
            '#',
            Notes,
            FIND(
                '#',
                Notes,
                FIND(
                    '#',
                    Notes
                    )+1
                )+1
            )+1
        ),
    MID(
        Notes,
        FIND(
            '#',
            Notes,
            FIND(
                '#',
                Notes,
                FIND(
                    '#',
                    Notes,
                    FIND(
                        '#',
                        Notes
                        )+1
                    )+1
                )+1
            ),
        FIND(
            ' ',
            Notes,
            FIND(
                '#',
                Notes,
                FIND(
                    '#',
                    Notes,
                    FIND(
                        '#',
                        Notes,
                        FIND(
                            '#',
                            Notes
                            )+1
                        )+1
                    )+1
                )
            )-FIND(
                '#',
                Notes,
                FIND(
                    '#',
                    Notes,
                    FIND(
                        '#',
                        Notes,
                        FIND(
                            '#',
                            Notes
                            )+1
                        )+1
                    )+1
                )
            )
    )

All of that is required to determine if there are at least four hashtags in {Notes} and, if so, to extract the fourth hashtag. Now imagine what your complete IF() statement would look like if you wanted to support a maximum of ten hashtags — or how about thirty?

Actually, if I were to implement this, I’d use two formula fields, one to determine the number of tags {Notes} contains and the other to extract and concatenate the tags. That would at least allow me to get rid of the opening IF() clause to determine whether or not a fourth (or fifth, or sixteenth…) hashtag exists.

My first field, {NbrTags}, would be a formula field with the formula

LEN(Notes)-LEN(SUBSTITUTE(Notes,'#',''))

(That in effect counts how many octothorpes can be found in {Notes} by deleting them and seeing how many characters shorter {Notes} is without octothorpes.)

My second field would depend on how you want the extracted tags returned. In this case, I want them returned, comma-separated, like so:

#to, #many, #hash, #find

The code to accomplish that seemingly simple task? Take a deep breath…

IF(
    NbrTags>=1,
    MID(
        Notes,
        FIND(
            '#',
            Notes
            ),
        FIND(
            ' ',
            Notes,
            FIND(
                '#',
                Notes
                )
            )-FIND(
                '#',
                Notes
                )
        )&IF(
            NbrTags>=2,
            ', '&MID(
                    Notes,
                    FIND(
                        '#',
                        Notes,
                        FIND(
                            '#',
                            Notes
                            )+1
                        ),
                    FIND(
                        ' ',
                        Notes,
                        FIND(
                            '#',
                            Notes,
                            FIND(
                                '#',
                                Notes
                                )+1
                            )-FIND(
                                '#',
                                Notes,
                                FIND(
                                    '#',
                                    Notes
                                    )+1
                                )
                        )
                    )&IF(
                        NbrTags>=3,
                        ', '&MID(
                                Notes,
                                FIND(
                                    '#',
                                    Notes,
                                    FIND(
                                        '#',
                                        Notes,
                                        FIND(
                                            '#',
                                            Notes
                                            )+1
                                        )+1
                                    ),
                                FIND(
                                    ' ',
                                    Notes,
                                    FIND(
                                        '#',
                                        Notes,
                                        FIND(
                                            '#',
                                            Notes,
                                            FIND(
                                                '#',
                                                Notes
                                                )+1
                                            )+1
                                        )
                                    )-FIND(
                                        '#',
                                        Notes,
                                        FIND(
                                            '#',
                                            Notes,
                                            FIND(
                                                '#',
                                                Notes
                                                )+1
                                            )+1
                                        )
                                )&IF(
                                    NbrTags>=4,
                                    ', '&MID(
                                            Notes,
                                            FIND(
                                                '#',
                                                Notes,
                                                FIND(
                                                    '#',
                                                    Notes,
                                                    FIND(
                                                        '#',
                                                        Notes,
                                                        FIND(
                                                            '#',
                                                            Notes
                                                            )+1
                                                        )+1
                                                    )+1
                                                ),
                                            FIND(
                                                ' ',
                                                Notes,
                                                FIND(
                                                    '#',
                                                    Notes,
                                                    FIND(
                                                        '#',
                                                        Notes,
                                                        FIND(
                                                            '#',
                                                            Notes,
                                                            FIND(
                                                                '#',
                                                                Notes
                                                                )+1
                                                            )+1
                                                        )+1
                                                    )
                                                )-FIND(
                                                    '#',
                                                    Notes,
                                                    FIND(
                                                        '#',
                                                        Notes,
                                                        FIND(
                                                            '#',
                                                            Notes,
                                                            FIND(
                                                                '#',
                                                                Notes
                                                                )+1
                                                            )+1
                                                        )+1
                                                    )
                                            )
                                    )
                        )
            )
    )

Aren’t you glad I simplified things by using {NbrTags}?

I’ll leave it as an exercise to expand that formula to support 10, 15, 30 tags in {Notes}. :wink:


Formula to Separate Concatenated Items?
If() with and() or()
#3

Wow – thank you for the help! So I should’ve been more clear. I’m simply hoping to extract and list all hashtags in this new formula field (don’t need to know how many there are). Ideally this will provide quick visibility into which #s (starts with # and ends with a SPACE) are in the field, rather than having to read through each one every time.

It looks like the last suggestion (2 fields) is the safest though – I’ll try it out.

Thanks SO much for the help,

Michael


#4

Unfortunately, the routine doesn’t figure out how many there are because it thinks you want to know; it does so because it has to look for each hashtag at a time. (That is, it can’t simply “extract and list all hashtags,” because there’s no way to tell Airtable to keep doing something until it reaches the end. Instead, you have to create a routine that extracts the first hashtag in the field and then extracts the second hashtag in the field and then extracts the third hashtag in the field… and so on. Each extraction requires an dedicated piece of the formula designed to locate and extract a specific hashtag. If there are more hashtags than the formula explicitly provides for, those additional hashtags are ignored.)

Using the second field, {NbrTags}, appears to shorten the formula by 20%, in terms of length, and probably by 40%, in terms of complexity…