Split text to columns

Gabriel_Aldamiz · ‎Apr 04, 2020

Hi everyone, I’m new here, and haven’t found the response to my problem anywhere.

One of my columns contains text with several tags. I need to split the text in that column, into several columns in which each column would contain one tag.

Can anyone help with that?

Thanks!!

Justin_Barrett · ‎Apr 04, 2020

Welcome to the community, @Gabriel_Aldamiz-eche! :grinning_face_with_big_eyes: Could you please provide an example of the data you’re working with? Because you’re a new community user, you won’t be able to directly post screenshots, but even if you just give us a quick-and-dirty layout of some of the data from your base, that will help a lot. The more specific, the better!

Gabriel_Aldamiz · ‎Apr 05, 2020

Thanks Justin,

About Input: I receive text which contains several words, and that text goes to Column 1.
Column 1: “travel family paris summer”

Output: I need to convert the text in Column 1 to several columns, one word per column:
Column 2: “travel”
Column 3: “family”
Column 4: “paris”
Column 5: “summer”

Please let me know if I’ve clarified my problem:)

Gabi

Justin_Barrett · ‎Apr 05, 2020

Thanks for the update, Gabi. One further question: are there always going to be four words in the text you receive, or will the number of words vary? If it’s consistent, that makes the solution easier, but it’s probably still possible if the word count changes.

Gabriel_Aldamiz · ‎Apr 05, 2020

For sure there will be a different amount of words. From zero to any small amount, for example, 10 (i’m limiting the number of characters).

Thanks Justin!

Mike_Pennisi · ‎Apr 06, 2020

Hi there, @Gabriel_Aldamiz-eche! I think I understand what you’re trying to do,
so I wrote a script that should help.

Click here to see the code

/**
 * Airtable Field Split Script
 *
 * For each record in a given Airtable Base, read the value of one cell,
 * interpret that value as a space-separated list (e.g. the value "a b c"
 * becomes "a", "b", and "c"), and store each value in a corresponding
 * destination cell.
 *
 * How to adapt this script
 *
 * 1. Replace the value assigned to the `tableName` constant with the name of
 *    the table in your Base
 * 2. Replace the value of the `sourceName` constant with the field which
 *    contains the source value (that is: the value to be split).
 * 3. Replace the value of the `destinationNames` constant with a JavaScript
 *    array of string values, each describing the name of a field to store one
 *    of the split values.
 */
const tableName = 'Split Demo';
const sourceName = 'My Things';
const destinationNames = ['Thing 1', 'Thing 2', 'Thing 3'];

// You shouldn't need to modify the code following this line to use this
// script.

async function main() {
    if (typeof tableName !== 'string') {
        throw new Error('The `tableName` constant must be a JavaScript string value.');
    }
    if (typeof sourceName !== 'string') {
        throw new Error('The `sourceName` constant must be a JavaScript string value.');
    }
    if (!Array.isArray(destinationNames)) {
        throw new Error('The `destinationNames` constant must be a JavaScript array value.');
    }
    if (!destinationNames.length) {
        throw new Error('The `destinationNames` array must have at least one element.');
    }
    if (!destinationNames.every((name) => typeof name === 'string')) {
        throw new Error(
            'Every element of the `destinationNames` array must be a JavaScript string value'
        );
    }

    let table = base.getTable(tableName);
    let result = await table.selectRecordsAsync();

    for (let record of result.records) {
        let values = (record.getCellValueAsString(sourceName) || '').split(/\s+/);

        if (values.length > destinationNames.length) {
            output.markdown(
                `**WARNING**: encountered ${values.length} values, but only ` +
                    `${destinationNames.length} destination fields are available. ` +
                    `(values: ${values})`
            );

            values.length = destinationNames.length;
        }

        // Create an object suitable for use in `updateRecordAsync`. It keys
        // are the names of the "destination" fields as specified at the
        // beginning of this scripts, and its values are the corresponding
        // values from the "source" field.
        //
        // For example, given a "source" field with the text "foo bar" and a
        // set of "destination" fields named "first", "second", and "third",
        // `toUpdate` will be the following JavaScript object:
        //
        //     {
        //       "first": "foo",
        //       "second": "bar",
        //       "third": ""
        //     }
        let toUpdate = destinationNames.reduce((memo, name) => {
            memo[name] = values.shift();
            return memo;
        }, {});

        await table.updateRecordAsync(record, toUpdate);
    }
}

try {
    await main();
} catch (error) {
    output.markdown(`**ERROR**: ${error}`);
}

/**
 * Quality assurance testing plan:
 *
 * 1. Create a table named "Split QA" with the following text fields: "source",
 *    "d1", "d2", and "d3"
 * 2. Populate the table with the following records:
 *
 *    source    | d1 | d2 | d3
 *    ----------|----|----|---
 *              | x  | x  | x
 *    abcde     | x  | x  | x
 *    a bcde    | x  | x  | x
 *    a b cde   | x  | x  | x
 *    a b c de  | x  | x  | x
 *    a b c d e | x  | x  | x
 *
 * 3. Install the Scripting block
 * 4. Insert this source code into the scripting block
 * 5. Modify the `tableName` constant to be `"Split QA"`
 * 6. Update the `sourceName` constant to be `"source"`
 * 7. Update the `destinationNames` constant to be `["d1", "d2", "d3"]`
 * 8. Run the script
 *
 * Expected: every record should be updated, and a message should be printed to
 * the scripting block's output container.
 *
 * Table:
 *
 *    source    | d1    | d2   | d3
 *    ----------|-------|------|---
 *              |       |      |
 *    abcde     | abcde |      |
 *    a bcde    | a     | bcde |
 *    a b cde   | a     | b    | cde
 *    a b c de  | a     | b    | c
 *    a b c d e | a     | b    | c
 *
 * Message:
 *
 *     WARNING: encountered 4 values, but only 3 destination fields are available. (values: a,b,c,de)
 *     WARNING: encountered 5 values, but only 3 destination fields are available. (values: a,b,c,d,e)
 */

You’ll need to install the Scripting Block to use that code. If that’s new for
you, check out Airtable’s guide for how to get started with the Scripting
Block:

Justin_Barrett · ‎May 17, 2020

Sorry for the delayed reply, @Gabriel_Aldamiz-eche. Work in April kept me hopping, and I lost track of this conversation in the process. If you (or anyone else) wants a formula-based solution for this, here’s what I came up with. I only extracted up to 4 words, but as you’ll see below, the process to go beyond that is fairly straightforward:

The {Word Count} formula field calculates how many words have been provided:

IF(Words, LEN(Words) - LEN(SUBSTITUTE(Words, " ", "")) + 1, 0)

The {Words Expanded} field is a little misleading based on how it displays in the table because Airtable displays large blocks of spaces as a single space. Here’s the formula behind it:

SUBSTITUTE(Words, " ", REPT(" ", 20))

And here’s the actual output, using the first record as an example:

first                    second                    third

In short, it replaces the single-space separator between words with a block of 20 spaces. The number 20 is used not only here, but in all the remaining formulas as well. It can be a different number if you wish, but it needs to be significantly larger than the largest single word you expect someone to enter.

The formula for {Word 1} kicks off the whole process of word extraction by taking the leftmost 20 characters of that expanded string—which would be the first word and a bunch of spaces after it—and then trimming off the extra spaces to leave only the word:

IF({Word Count} > 0, TRIM(LEFT({Words Expanded}, 20)))

The formulas for {Word 2} onward are nearly identical to each other because they each refer to the previous extracted word. Here’s {Word 2}:

IF({Word Count} > 1, TRIM(MID({Words Expanded}, 20 + FIND({Word 1}, {Words Expanded}) + LEN({Word 1}), 20)))

{Word 3} looks like this (only minor changes to the formula in {Word 2}:

IF({Word Count} > 2, TRIM(MID({Words Expanded}, 20 + FIND({Word 2}, {Words Expanded}) + LEN({Word 2}), 20)))

…and so on for as many other word fields as you feel you need.

Jesse_Bayer · ‎Oct 28, 2020

This is very helpful but in a current table, we hae some of which has very length sentences. Up to 113 words! Does this mean 113 formulas?
The sentences consist of repeat words (It lists days of the week and times; Thursday 10:00 AM, Sunday 6:00 AM, and so on.) Is there a way to this more efficiently?

Justin_Barrett · ‎Oct 28, 2020

With that many words, a script in the Scripting app (or an automation’s script action) is a far better way to split things up, handle repeat words and punctuation, etc.

However, with a possible word count over 100, my gut question is: do you really want/need 100+ fields, or is there a better way to present the data based on how it will be used by other parts of your process?

Jesse_Bayer · ‎Oct 29, 2020

Hi Justin,
That’s a great question and I don’t want over 100 columns. I should have probably started with the goal first:
Ideally, we would have a multi-select tool to organize days of week and moving forward, the input will be using the multi-select tool on the webform.
This desired formula is to format the existing 893 records from Google Sheets.
We need to separate the days of the week from the string, in one cell, per record. The string is now Day of Week Time. A single record may have up to 7 days of the week: (Sunday 7:30 AM Monday 7:30 AM Tuesday 7:30 AM Wednesday 7:30 AM Saturday 8:00 PM etc.) This is a meeting schedule database for an on-line directory; one type of meeting may be on several days of the week at different times. :upside_down_face: