Function to count the number of words in a string or text field

#1

I haven’t been able to find an answer to this on the site, but there doesn’t seem to be a function to count the number of words in a text field. There’s a function to count the number of characters – LEN.

One workaround could be to the count the number of spaces in a text string, but I don’t see a function to do that, either. SEARCH and FIND return the first position of the search string inside the text, not the total number of times the search string occurs.

A word count function is very important for using AirTable as a content management system.

1 Like

#2

OK, in case anyone else is searching for this – my workaround is to divide the length of a text by 5.5: LEN(text_field)/5.5. (The average word length in English is 4.5 characters, plus another character for the space between words.)

It gets somewhat close to the real number, though, of course, it counts punctuation marks and other special characters.

0 Likes

#3

You can count the number of spaces and sum 1 with this technique:

0 Likes

#4

Thanks – great advice!

I took out all the space characters, then subtracted the length of the new string form the old string, to get the number of spaces in the text.

LEN(Text)-LEN(SUBSTITUTE(Text," “,”"))

The result were a slight undercount, but within 5% of the word count that I got from Google docs

Then I realized I was also missing paragraph breaks.

Here’s the formula I used to count how many paragraph breaks there are in the text (which you should dive by two if you double-space between paragraphs):

LEN(Text)-LEN(SUBSTITUTE(Text,"\n",""))

I added both of these together into a single formula:

(LEN(Text)-LEN(SUBSTITUTE(Text," “,”"))) + (LEN(Text)-LEN(SUBSTITUTE(Text,"\n","")))

Now I’m a little tiny bit over – less than 1% compared to the Google Docs word count function.

Now I’m in business! Thanks!

I could shorten that formula a little bit with basic algebra:

2*LEN(Text)-LEN(SUBSTITUTE(Text," “,”")) -LEN(SUBSTITUTE(Text,"\n","")

Thanks, Elias!

1 Like

#5

I realized that I could nest SUBSTITUTE functions to solve the double-spacing between paragraphs problem:

LEN(SUBSTITUTE(SUBSTITUTE(Text,"\n\n","\n"),"\n",""))

1 Like

#6

Perfect! I know, a bit tricky and long formula but working in the end :smiley:

0 Likes

#7

Let me try to make this a little more foolproof.

Number of words =
total character count with single spaces between words (no line breaks, no double spaces)
minus…
total character count with no spaces or line breaks
plus…
1 (so it counts the last word)

See if this gets you guys closer to the word count you’re seeing in Word:

IMPORTANT: Replace the “__” with two spaces below. This stupid forum isn’t capable of displaying two spaces in a row:

LEN(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE({English Text},"\n"," "),"__"," "),"__"," ")) - LEN(SUBSTITUTE(SUBSTITUTE({English Text}," ",""),"\n","")) + 1

It’s still going to be off if you have more than 4 spaces in a row anywhere, but if that’s an issue you can just keep nesting the SUBSTITUTE that turns two spaces into one space. If Airtable supported regular expressions, this would be much easier. :face_with_raised_eyebrow:

0 Likes

#8

No, you had it right: Tagging the formula as code in Markdown will preserve spaces. It just doesn’t seem like it does because you’re more closely paying attention to the input window than the output one. Here’s your formula with ‘__’ replaced with the intended double-space:

LEN(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE({English Text},"\n"," "),"  "," "),"  "," ")) 
- LEN(SUBSTITUTE(SUBSTITUTE({English Text}," ",""),"\n","")) + 1

And here it is indented (somewhat quirkily, as I’m still never happy with my attempts to show complex expressions containing multiple sections at he same hierarchical level)"

LEN(
    SUBSTITUTE(
        SUBSTITUTE(
            SUBSTITUTE(
                {English Text},
                "\n",
                " "
                ),
            "  ",
            " "
            ),
        "  ",
        " "
        )
    ) -
  LEN(
    SUBSTITUTE(
        SUBSTITUTE(
            {English Text},
            " ",
            ""
            ),
        "\n",
        ""
        )
    ) + 
  1
1 Like