Help

Re: Split text to columns

7555 1
cancel
Showing results for 
Search instead for 
Did you mean: 
Gabriel_Aldamiz
4 - Data Explorer
4 - Data Explorer

Hi everyone, I’m new here, and haven’t found the response to my problem anywhere.

One of my columns contains text with several tags. I need to split the text in that column, into several columns in which each column would contain one tag.

Can anyone help with that?

Thanks!!

19 Replies 19

Welcome to the community, @Gabriel_Aldamiz-eche! :grinning_face_with_big_eyes: Could you please provide an example of the data you’re working with? Because you’re a new community user, you won’t be able to directly post screenshots, but even if you just give us a quick-and-dirty layout of some of the data from your base, that will help a lot. The more specific, the better!

Thanks Justin,

About Input: I receive text which contains several words, and that text goes to Column 1.
Column 1: “travel family paris summer”

Output: I need to convert the text in Column 1 to several columns, one word per column:
Column 2: “travel”
Column 3: “family”
Column 4: “paris”
Column 5: “summer”

Please let me know if I’ve clarified my problem:)

Gabi

Thanks for the update, Gabi. One further question: are there always going to be four words in the text you receive, or will the number of words vary? If it’s consistent, that makes the solution easier, but it’s probably still possible if the word count changes.

Gabriel_Aldamiz
4 - Data Explorer
4 - Data Explorer

For sure there will be a different amount of words. From zero to any small amount, for example, 10 (i’m limiting the number of characters).

Thanks Justin!

Hi there, @Gabriel_Aldamiz-eche! I think I understand what you’re trying to do,
so I wrote a script that should help.

Click here to see the code
/**
 * Airtable Field Split Script
 *
 * For each record in a given Airtable Base, read the value of one cell,
 * interpret that value as a space-separated list (e.g. the value "a b c"
 * becomes "a", "b", and "c"), and store each value in a corresponding
 * destination cell.
 *
 * How to adapt this script
 *
 * 1. Replace the value assigned to the `tableName` constant with the name of
 *    the table in your Base
 * 2. Replace the value of the `sourceName` constant with the field which
 *    contains the source value (that is: the value to be split).
 * 3. Replace the value of the `destinationNames` constant with a JavaScript
 *    array of string values, each describing the name of a field to store one
 *    of the split values.
 */
const tableName = 'Split Demo';
const sourceName = 'My Things';
const destinationNames = ['Thing 1', 'Thing 2', 'Thing 3'];

// You shouldn't need to modify the code following this line to use this
// script.

async function main() {
    if (typeof tableName !== 'string') {
        throw new Error('The `tableName` constant must be a JavaScript string value.');
    }
    if (typeof sourceName !== 'string') {
        throw new Error('The `sourceName` constant must be a JavaScript string value.');
    }
    if (!Array.isArray(destinationNames)) {
        throw new Error('The `destinationNames` constant must be a JavaScript array value.');
    }
    if (!destinationNames.length) {
        throw new Error('The `destinationNames` array must have at least one element.');
    }
    if (!destinationNames.every((name) => typeof name === 'string')) {
        throw new Error(
            'Every element of the `destinationNames` array must be a JavaScript string value'
        );
    }

    let table = base.getTable(tableName);
    let result = await table.selectRecordsAsync();

    for (let record of result.records) {
        let values = (record.getCellValueAsString(sourceName) || '').split(/\s+/);

        if (values.length > destinationNames.length) {
            output.markdown(
                `**WARNING**: encountered ${values.length} values, but only ` +
                    `${destinationNames.length} destination fields are available. ` +
                    `(values: ${values})`
            );

            values.length = destinationNames.length;
        }

        // Create an object suitable for use in `updateRecordAsync`. It keys
        // are the names of the "destination" fields as specified at the
        // beginning of this scripts, and its values are the corresponding
        // values from the "source" field.
        //
        // For example, given a "source" field with the text "foo bar" and a
        // set of "destination" fields named "first", "second", and "third",
        // `toUpdate` will be the following JavaScript object:
        //
        //     {
        //       "first": "foo",
        //       "second": "bar",
        //       "third": ""
        //     }
        let toUpdate = destinationNames.reduce((memo, name) => {
            memo[name] = values.shift();
            return memo;
        }, {});

        await table.updateRecordAsync(record, toUpdate);
    }
}

try {
    await main();
} catch (error) {
    output.markdown(`**ERROR**: ${error}`);
}

/**
 * Quality assurance testing plan:
 *
 * 1. Create a table named "Split QA" with the following text fields: "source",
 *    "d1", "d2", and "d3"
 * 2. Populate the table with the following records:
 *
 *    source    | d1 | d2 | d3
 *    ----------|----|----|---
 *              | x  | x  | x
 *    abcde     | x  | x  | x
 *    a bcde    | x  | x  | x
 *    a b cde   | x  | x  | x
 *    a b c de  | x  | x  | x
 *    a b c d e | x  | x  | x
 *
 * 3. Install the Scripting block
 * 4. Insert this source code into the scripting block
 * 5. Modify the `tableName` constant to be `"Split QA"`
 * 6. Update the `sourceName` constant to be `"source"`
 * 7. Update the `destinationNames` constant to be `["d1", "d2", "d3"]`
 * 8. Run the script
 *
 * Expected: every record should be updated, and a message should be printed to
 * the scripting block's output container.
 *
 * Table:
 *
 *    source    | d1    | d2   | d3
 *    ----------|-------|------|---
 *              |       |      |
 *    abcde     | abcde |      |
 *    a bcde    | a     | bcde |
 *    a b cde   | a     | b    | cde
 *    a b c de  | a     | b    | c
 *    a b c d e | a     | b    | c
 *
 * Message:
 *
 *     WARNING: encountered 4 values, but only 3 destination fields are available. (values: a,b,c,de)
 *     WARNING: encountered 5 values, but only 3 destination fields are available. (values: a,b,c,d,e)
 */

You’ll need to install the Scripting Block to use that code. If that’s new for
you, check out Airtable’s guide for how to get started with the Scripting
Block:

Sorry for the delayed reply, @Gabriel_Aldamiz-eche. Work in April kept me hopping, and I lost track of this conversation in the process. If you (or anyone else) wants a formula-based solution for this, here’s what I came up with. I only extracted up to 4 words, but as you’ll see below, the process to go beyond that is fairly straightforward:

Screen Shot 2020-05-16 at 11.37.49 PM

The {Word Count} formula field calculates how many words have been provided:

IF(Words, LEN(Words) - LEN(SUBSTITUTE(Words, " ", "")) + 1, 0)

The {Words Expanded} field is a little misleading based on how it displays in the table because Airtable displays large blocks of spaces as a single space. Here’s the formula behind it:

SUBSTITUTE(Words, " ", REPT(" ", 20))

And here’s the actual output, using the first record as an example:

first                    second                    third

In short, it replaces the single-space separator between words with a block of 20 spaces. The number 20 is used not only here, but in all the remaining formulas as well. It can be a different number if you wish, but it needs to be significantly larger than the largest single word you expect someone to enter.

The formula for {Word 1} kicks off the whole process of word extraction by taking the leftmost 20 characters of that expanded string—which would be the first word and a bunch of spaces after it—and then trimming off the extra spaces to leave only the word:

IF({Word Count} > 0, TRIM(LEFT({Words Expanded}, 20)))

The formulas for {Word 2} onward are nearly identical to each other because they each refer to the previous extracted word. Here’s {Word 2}:

IF({Word Count} > 1, TRIM(MID({Words Expanded}, 20 + FIND({Word 1}, {Words Expanded}) + LEN({Word 1}), 20)))

{Word 3} looks like this (only minor changes to the formula in {Word 2}:

IF({Word Count} > 2, TRIM(MID({Words Expanded}, 20 + FIND({Word 2}, {Words Expanded}) + LEN({Word 2}), 20)))

…and so on for as many other word fields as you feel you need.

Jesse_Bayer
4 - Data Explorer
4 - Data Explorer

This is very helpful but in a current table, we hae some of which has very length sentences. Up to 113 words! Does this mean 113 formulas?
The sentences consist of repeat words (It lists days of the week and times; Thursday 10:00 AM, Sunday 6:00 AM, and so on.) Is there a way to this more efficiently?

With that many words, a script in the Scripting app (or an automation’s script action) is a far better way to split things up, handle repeat words and punctuation, etc.

However, with a possible word count over 100, my gut question is: do you really want/need 100+ fields, or is there a better way to present the data based on how it will be used by other parts of your process?

Hi Justin,
That’s a great question and I don’t want over 100 columns. I should have probably started with the goal first:
Ideally, we would have a multi-select tool to organize days of week and moving forward, the input will be using the multi-select tool on the webform.
This desired formula is to format the existing 893 records from Google Sheets.
We need to separate the days of the week from the string, in one cell, per record. The string is now Day of Week Time. A single record may have up to 7 days of the week: (Sunday 7:30 AM Monday 7:30 AM Tuesday 7:30 AM Wednesday 7:30 AM Saturday 8:00 PM etc.) This is a meeting schedule database for an on-line directory; one type of meeting may be on several days of the week at different times. :upside_down_face:

Thanks for the breakdown. Just to make sure I’m clear, you want to start with a field containing something like this:

Sunday 7:30 AM Monday 7:30 AM Tuesday 7:30 AM Wednesday 7:30 AM Saturday 8:00 PM

…and use a formula to give you this:

Sunday Monday Tuesday Wednesday Saturday

Is that correct?

If so, here’s a follow-up question: will there ever be a case where a given weekday will repeat? For example, could there be more than one meeting on Sunday? If so, would you want to see “Sunday Sunday” or just “Sunday”?

Another question: could these meetings take place at virtually any time (i.e. any time between, say, 7 am and 10 pm), or do they only occur in more distinct blocks (e.g. between 7 and 10 am, and between 7 and 10 pm)?

Jesse_Bayer
4 - Data Explorer
4 - Data Explorer

Hi Justin, your first question gave me some pause regarding the future of this field. We currently have set no limit on the amount of entries for this field; to answer you question, the potential permutations for weekdays is 10080. This would never happen, but theoretically every minute of every day * 7 days a week. I’ve looked at how other similar sites have handled this and they have limited the initial meeting entry to only 7 options. Even still, we allow for “updates” to an existing entry and, again, there is no limit. :man_shrugging: Regarding time “blocks” there are none. It’s only start time.

Thanks for the update. I was hoping to find a more defined pattern that would make the formula easier to create, but unfortunately I don’t see that based on your description.

I think you misunderstood my question. I wasn’t asking about defining time blocks for the events. I was asking if the events could start at any time of day, or if you’re only allowing them to start within certain windows of time (windows is probably a better word than blocks for this case). The reason that I asked about time windows is because the method I’d use involves systematically removing specific pieces like numbers. If the event start times are only allowed within certain ranges—e.g. an event can’t start earlier than 7am or later than 10am—and perhaps at certain intervals, that would narrow down the list of numbers we have to replace. It doesn’t sound like that will work, so we’ll just have to replace them all.

With that in mind, here’s the formula (be sure to scroll the box to see the full thing):

SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(Input, 'AM', '')
, 'PM', '')
, ':', '')
, '0', '')
, '1', '')
, '2', '')
, '3', '')
, '4', '')
, '5', '')
, '6', '')
, '7', '')
, '8', '')
, '9', '')
, '  ', ' ')
, '  ', ' ')

Screen Shot 2020-11-05 at 9.06.03 PM

Hi @Mike_Pennisi! I can’t thank you enough for this code that you provided. I can make it run great in a script that I manually run, but when I go to put it in an automation for the exact same table and fields, I get an error for the very last part: output.markdown(**ERROR**: ${error}).

TypeError: Invalid arguments passed to recordQueryResult.getRecord(recordId):
• recordId should be a string, not undefined

The automation I want to setup is that any new record has this script run on it. Could that be the issue in that it’s not trying to do the whole table, and instead, just 1 record within the table?

Freggin Life Saver! Thank you!!!

Welcome to the community, @Monica_Spaunhorst1! :grinning_face_with_big_eyes: I appreciate the nod, but that solution is only as good as the formula functions that were available when it was made. Regular expression functions were added more recently and would be a more efficient way to pull this off. I don’t have time to outline the specifics, but the solution that I shared in the thread below could be modified for a use case where the separators are spaces instead of line breaks.

Ryan_de_Metz
5 - Automation Enthusiast
5 - Automation Enthusiast

@Justin_Barrett a follow-up question here if I may.

I have a string of (max) 3 product IDs that get pulled in via Zapier for product IDs as well as the cost for each ID that gets purchased. Up to now, I used a formula to split them as left and right as I only had a max of 2 records. In some cases, only 1 or 2 records will be passed and can be unique each time.

The sample of the data that is sent via Zapier currently looks like

378118,378113 and 597.00,200.00

The formula I am using for splitting the B data set is;

LEFT( RIGHT({Product ID purchased}, LEN({Product ID purchased}) - LEN({Product ID A}) - 1 ), FIND( ",", RIGHT({Product ID purchased}, LEN({Product ID purchased}) - LEN({Product ID A}) - 1 ) & "," )-1 )

I assume I need to use a MID function to get the middle data out but seem to be spinning my heels here. Can you help me out of my misery?

Perhaps. When you get three IDs, is it just something like this:

378118,378113,378121

…or is there more to it?

Yes, that’s how they go and will only be numbers for that data. The price data will also only be numbers but will include .00 as well.

Hey Mike, 

Would it be possible to split using commas instead of space? 

Thank you!