Data mining - Trying to create a sheet with keywords (rows) that show how many times they appear and where

I have a decently large data set, and I am trying to mine from it information to see what has value. On that I am trying to create a new sheet that has a list of keywords. I would like to then have a column next to each keyword that displays how many times that word appears in my main sheet (or a single column).

I have seen several topics that cover something similar but nothing that seems to fit the bill.

Can anyone help? Thanks!

Yep, this is not really practical in formulas or roll-ups unless it is a very limited set of terms, so that leaves you with two options -

  1. Block Scripts (that are run manually, but internal to the base)
  2. API-based Apps (that are run automatically, and external to the base)

In each case, javascript is generally powerful enough to extract the keywords and organize them into new data tables/records or even dashboards. However, the biggest challenge is understanding the definition of a keyword.

Imagine all these other challenges were solved - describe what a “keyword” is and the nature of the content you need to extract such keywords from. An example would be great to see.

Thanks for the quick response! :slight_smile:

The keywords I have. Want to create a sheet where I can add a change the list of Keywords and then have the table update with the number of appearances of that keyword. So for example

Column A (Keywords) Column B (appearances)
Acme 1250
Beta 900
Charlie 650
etc…

Okay, good to know - two more questions:

  1. Manual or automated?
  2. Example of the content being searched for the existence of keywords? Is it a rich-text field? A Long string? Attached document?

Keywords are 100% manual. But would prefer the number of appearances to be automated. So add in new keyword via copy and past (or type) and column B out puts the number of times that keyword is mentioned. :slight_smile:

Right now the list of Keywords are single string with 90% being a single word. I would be happy with something that only allows single words. But short phrases would be idea.

Not an issue - the list of strings could be single words of a string of words; looking for existence is a simple matter of counting string (a) inside content (b).

Okay - so now I’m confused. Earlier you said there was a list of keywords and phrases that needed to be tested against each content item presumably in a list of rows where the content being tested is a text field? rich-text field? An attached field?

Now you’re saying something a little different - enter a keyword in col B and test against another [text?] field on that same row?

The former requires a program; the latter requires only a formula field.

Sorry that I wasn’t clear.

First to answer your question: Plain Text field (not rich text)

Second, I think we were on the same page first. I have a sheet that has a long list of rows of data. In that data we have 2 columns which have the keywords I want to count. So for reference Lets call it a 3 column table. First column A has the record identifier (serial #), Column B has a summary (short string of plain text) Column C has a long summary (long string of plain text).

Now ideally I would like to create a new sheet that has 2 Columns (Sheet 2). Column A has a list of Keywords. These are not pulled from that data set. I enter them in to see if the data set has any mentions (call it a search). Column B (sheet 2) to display a number of instances that the keyword from the same row in Sheet 2 appears in Sheet 1 under column A and B combined.

Is that clearer? And huge thanks!

1 Like

Yep - much better. We’re dialing it in.

Okay - I have a few ideas - will need some time to formulate an approach, but I’ll circle back in a while.

Your amazing, where do I send the beer?

1 Like

Not so fast - I haven’t actually solved this riddle yet. :wink:

Is this sort’a what you’re looking for?

Sorry about the delay, the notifier email went to my spam folder. That is exactly what I am looking for!!!

Okay - good to hear!

Share with me (in a direct message) your Airtable account email address and I’ll share the Base to you to examine the analytics. Once you confirm this is what your goal is, I’ll drop the script block to you, install it into the base and add more content and keywords to further test.

That’s better then what I was looking for!

Hey,

This is amazing and exaclty the stort of thing that I’m trying to do. Is there any way that you would be able to share it with me please?

Thanks

This is also the exact solution I am looking for…Any chance of getting a copy of the script block?

I’d like to know too if this is available and where.

@Kay_Aleksic, @Michael_Tranchina - here’s the script.

/*

   ***********************************************************
   ScriptBloqs - Datamining Text
   Copyright (c) 2020 by Global Technologies Corporation
   ALL RIGHTS RESERVED
   ***********************************************************
   
*/

//
// get the table names and records
//
output.markdown('# Data Mining');
output.markdown('## Updating Metrics...');

// Load all of the terms records in the keywords and phrases table
let termsTable      = base.getTable("Keywords & Phrases");
let queryResults    = await termsTable.selectRecordsAsync();
let oTermsRecords   = queryResults.records;

// Load all of the content records in the table
let contentTable    = base.getTable("Content");
queryResults        = await contentTable.selectRecordsAsync();
let oContentRecords = queryResults.records;

// process the terms list
let oSourceGrid  = {};
let oResultsGrid = {};
let i = 0;
for (let record of oTermsRecords)
{

    var thisTerm = record.getCellValue("Key Terms").toString().toLowerCase();
    // output.inspect(thisTerm);

    let summaryCount   = 0;
    let contentCount   = 0;
    let inBothCount    = 0;
    let aSerialNumbers = [];

    for (let record of oContentRecords)
    {

        var thisSerialNumber = record.getCellValue("Serial #");
        var thisSummary      = record.getCellValue("Summary");
        var thisContent      = record.getCellValue("Content");

        var inSummary = (thisSummary.toString().toLowerCase().indexOf(thisTerm) > -1) ? true : false;
        var inContent = (thisContent.toString().toLowerCase().indexOf(thisTerm) > -1) ? true : false;

        if (inSummary)
            summaryCount ++;

        if (inContent)
            contentCount ++;

        if ((inSummary) && (inContent))
            inBothCount ++;

        if ((inSummary) || (inContent))
          aSerialNumbers.push(thisSerialNumber);

        oSourceGrid[thisSerialNumber] = {
            "Summary"       : thisSummary,
            "Content"       : thisContent
        }


    }

    oResultsGrid[i] = {
        "Key Term"       : thisTerm,
        "In Both Count"  : inBothCount,
        "Summary Count"  : summaryCount,
        "Content Count"  : contentCount,
        "Serial Numbers" : aSerialNumbers.toString()
    }

    output.text("Creating entry for " + thisTerm + "!");
    await termsTable.updateRecordAsync(record.id, {
        "In Both Count"  : inBothCount,
        "Summary Count"  : summaryCount,
        "Content Count"  : contentCount,
        "Serial Numbers" : aSerialNumbers.toString()
    })

    i ++;

}

output.clear();
output.markdown('# Data Mining');
output.markdown('## Metrics Updated');

output.markdown('### Content');
output.table(oSourceGrid);

output.markdown('### Metrics');
output.table(oResultsGrid);