Help

Re: Extracting Hashtags (was: Formula Assistance)

1295 0
cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 
Michael_Boss
4 - Data Explorer
4 - Data Explorer

Building out a social media calendar where one of the fields indicates the Copy that will be going in a post. Was hoping to create a separate formula field that automatically searched the Copy field for any hashtags and listed them there.

Supportā€™s initial response sent me the formula below as a start, but it only pulls in the first # within the copy section. Wasnā€™t sure if anyone had any ideas/suggestions on how to expand it to pull/list all of them. Thanks in advance!

IF(
FIND(" ā€œ,{Notes},FIND(ā€#",{Notes})),
MID({Notes},FIND("#",{Notes}),FIND(" ā€œ,{Notes},FIND(ā€#",{Notes}))-FIND("#",{Notes})),
RIGHT({Notes}, LEN({Notes})-FIND("#",{Notes})+1)
)

4 Replies 4

Unfortunately, Airtable currently does not support processing loops. If you know in advance the maximum number of hashtags one might encounter in your copy, you can hardcode a routine to extract up to that number of tags, but at the moment you cannot create a formula that will simply step through your text and extract however many hashtags it finds. (Note that each repetition of a tag counts as a separate instance, so your maximum number is not necessarily the number of unique hashtags contained.)

The most difficult aspect of this task is somehow maintaining a pointer to where one should start searching for the next hashtag. For instance, the formula for finding the fourth occurrence of '#' in {Notes} would be

FIND(
    '#',
    Notes,
    FIND(
        '#',
        Notes,
        FIND(
            '#',
            Notes,
            FIND(
                '#',
                Notes
                )+1
            )+1
        )+1
    )

Keep in mind, however, that simply identifies where the fourth octothorpe (a.k.a. ā€˜pound signā€™) can be found. To extract the fourth hashtag (assuming a hashtag begins with '#' and ends with the next space character) requires the following code:

IF(
    FIND(
        '#',
        Notes,
        FIND(
            '#',
            Notes,
            FIND(
                '#',
                Notes,
                FIND(
                    '#',
                    Notes
                    )+1
                )+1
            )+1
        ),
    MID(
        Notes,
        FIND(
            '#',
            Notes,
            FIND(
                '#',
                Notes,
                FIND(
                    '#',
                    Notes,
                    FIND(
                        '#',
                        Notes
                        )+1
                    )+1
                )+1
            ),
        FIND(
            ' ',
            Notes,
            FIND(
                '#',
                Notes,
                FIND(
                    '#',
                    Notes,
                    FIND(
                        '#',
                        Notes,
                        FIND(
                            '#',
                            Notes
                            )+1
                        )+1
                    )+1
                )
            )-FIND(
                '#',
                Notes,
                FIND(
                    '#',
                    Notes,
                    FIND(
                        '#',
                        Notes,
                        FIND(
                            '#',
                            Notes
                            )+1
                        )+1
                    )+1
                )
            )
    )

All of that is required to determine if there are at least four hashtags in {Notes} and, if so, to extract the fourth hashtag. Now imagine what your complete IF() statement would look like if you wanted to support a maximum of ten hashtags ā€” or how about thirty?

Actually, if I were to implement this, Iā€™d use two formula fields, one to determine the number of tags {Notes} contains and the other to extract and concatenate the tags. That would at least allow me to get rid of the opening IF() clause to determine whether or not a fourth (or fifth, or sixteenthā€¦) hashtag exists.

My first field, {NbrTags}, would be a formula field with the formula

LEN(Notes)-LEN(SUBSTITUTE(Notes,'#',''))

(That in effect counts how many octothorpes can be found in {Notes} by deleting them and seeing how many characters shorter {Notes} is without octothorpes.)

My second field would depend on how you want the extracted tags returned. In this case, I want them returned, comma-separated, like so:

#to, #many, #hash, #find

The code to accomplish that seemingly simple task? Take a deep breathā€¦

IF(
    NbrTags>=1,
    MID(
        Notes,
        FIND(
            '#',
            Notes
            ),
        FIND(
            ' ',
            Notes,
            FIND(
                '#',
                Notes
                )
            )-FIND(
                '#',
                Notes
                )
        )&IF(
            NbrTags>=2,
            ', '&MID(
                    Notes,
                    FIND(
                        '#',
                        Notes,
                        FIND(
                            '#',
                            Notes
                            )+1
                        ),
                    FIND(
                        ' ',
                        Notes,
                        FIND(
                            '#',
                            Notes,
                            FIND(
                                '#',
                                Notes
                                )+1
                            )-FIND(
                                '#',
                                Notes,
                                FIND(
                                    '#',
                                    Notes
                                    )+1
                                )
                        )
                    )&IF(
                        NbrTags>=3,
                        ', '&MID(
                                Notes,
                                FIND(
                                    '#',
                                    Notes,
                                    FIND(
                                        '#',
                                        Notes,
                                        FIND(
                                            '#',
                                            Notes
                                            )+1
                                        )+1
                                    ),
                                FIND(
                                    ' ',
                                    Notes,
                                    FIND(
                                        '#',
                                        Notes,
                                        FIND(
                                            '#',
                                            Notes,
                                            FIND(
                                                '#',
                                                Notes
                                                )+1
                                            )+1
                                        )
                                    )-FIND(
                                        '#',
                                        Notes,
                                        FIND(
                                            '#',
                                            Notes,
                                            FIND(
                                                '#',
                                                Notes
                                                )+1
                                            )+1
                                        )
                                )&IF(
                                    NbrTags>=4,
                                    ', '&MID(
                                            Notes,
                                            FIND(
                                                '#',
                                                Notes,
                                                FIND(
                                                    '#',
                                                    Notes,
                                                    FIND(
                                                        '#',
                                                        Notes,
                                                        FIND(
                                                            '#',
                                                            Notes
                                                            )+1
                                                        )+1
                                                    )+1
                                                ),
                                            FIND(
                                                ' ',
                                                Notes,
                                                FIND(
                                                    '#',
                                                    Notes,
                                                    FIND(
                                                        '#',
                                                        Notes,
                                                        FIND(
                                                            '#',
                                                            Notes,
                                                            FIND(
                                                                '#',
                                                                Notes
                                                                )+1
                                                            )+1
                                                        )+1
                                                    )
                                                )-FIND(
                                                    '#',
                                                    Notes,
                                                    FIND(
                                                        '#',
                                                        Notes,
                                                        FIND(
                                                            '#',
                                                            Notes,
                                                            FIND(
                                                                '#',
                                                                Notes
                                                                )+1
                                                            )+1
                                                        )+1
                                                    )
                                            )
                                    )
                        )
            )
    )

Arenā€™t you glad I simplified things by using {NbrTags}?

Iā€™ll leave it as an exercise to expand that formula to support 10, 15, 30 tags in {Notes}. :winking_face:

Wow ā€“ thank you for the help! So I shouldā€™ve been more clear. Iā€™m simply hoping to extract and list all hashtags in this new formula field (donā€™t need to know how many there are). Ideally this will provide quick visibility into which #s (starts with # and ends with a SPACE) are in the field, rather than having to read through each one every time.

It looks like the last suggestion (2 fields) is the safest though ā€“ Iā€™ll try it out.

Thanks SO much for the help,

Michael

Unfortunately, the routine doesnā€™t figure out how many there are because it thinks you want to know; it does so because it has to look for each hashtag at a time. (That is, it canā€™t simply ā€œextract and list all hashtags,ā€ because thereā€™s no way to tell Airtable to keep doing something until it reaches the end. Instead, you have to create a routine that extracts the first hashtag in the field and then extracts the second hashtag in the field and then extracts the third hashtag in the fieldā€¦ and so on. Each extraction requires an dedicated piece of the formula designed to locate and extract a specific hashtag. If there are more hashtags than the formula explicitly provides for, those additional hashtags are ignored.)

Using the second field, {NbrTags}, appears to shorten the formula by 20%, in terms of length, and probably by 40%, in terms of complexityā€¦

@W_Vann_Hall fantastic ! šŸ«¶