Solved

splitting content in long form text in a formula

Forum|Forum|2 years ago
February 1, 2023
14 replies
157 views

jm22
New Participant

TIA - this community is amazing and so helpful when I can't get my head around these more tricky ones. but I am learning more and more!

I have a bunch of method steps for our recipes:

COOK THE SALMON
Pat dry salmon with paper towel, remove the skin and cut into bite size pieces, discarding the skin. Add salmon, olives and courgette to the pot. Simmer, gently stirring occasionally, for 5-6mins, or until the salmon is cooked through and all the veggies are tender. Remove from heat. Check seasoning. "

MAKE THE AVOCADO DIP
Cut lemon into wedges.
Halve avocado, remove the stone and scoop out flesh with a spoon. Place avocado into a third medium bowl and mash with fork until smooth. Mix in aioli and (1/2 tbsp/1 tbsp) lemon juice. Season with salt, pepper and more lemon juice to taste. Set aside.

TIP: If you prefer, add avocado to the salad in STEP 5 and serve aioli on the side. "

MAKE THE HERBY VINAIGRETTE
Finely chop (half/whole) pack coriander, including the stalks. Add to small bowl with (1 tsp/2 tsp) honey, (2 tbsp/4 tbsp) olive oil and (1 tbsp/2 tbsp) white wine vinegar. Stir together until combined. Season to taste with salt and pepper. Set aside.

TIP: You may use your preferred sweetener instead of honey."

You can see they are all different.

I want to have one column for the heading (in caps) and another column for the method directions.

Do you think I'll have any joy with this one?

thanks!

Best answer by Justin_Barrett

As long as your heading is always on a single line (not spanning multiple lines using line breaks), then this will do the trick.

First formula field: "Heading"

IF(Recipe, REGEX_EXTRACT(Recipe, "[^\n]*"))

Second formula field: "Recipe Body"

IF(Recipe, TRIM(SUBSTITUTE(Recipe, Heading, "")))

+21

Justin_Barrett
Inspiring
Answer
Forum|Forum|2 years ago
February 1, 2023

As long as your heading is always on a single line (not spanning multiple lines using line breaks), then this will do the trick.

First formula field: "Heading"

IF(Recipe, REGEX_EXTRACT(Recipe, "[^\n]*"))

Second formula field: "Recipe Body"

IF(Recipe, TRIM(SUBSTITUTE(Recipe, Heading, "")))

jm22
Author
New Participant
Forum|Forum|2 years ago
February 7, 2023

Worked like a total charm! thanks!

CSM_PF
Participating Frequently
Forum|Forum|1 year ago
February 23, 2024

Is it possible to make the same thing on different columns

I'd like to put a line in a different column each time

+21

Justin_Barrett
Inspiring
Forum|Forum|1 year ago
February 23, 2024

Is it possible to make the same thing on different columns

I'd like to put a line in a different column each time

@CSM_PF Is this what you're looking for?

If so, that's pretty easy to set up. First add a formula field to count the number of line breaks in the source text (I'll explain the purpose behind this later):

LEN(Input) - LEN(SUBSTITUTE(Input, "\n", ""))

I named this field "Input Count", but feel free to name it whatever you want as long as the later formula fields reference it correctly.

The first extraction formula (Output 1) is the easiest:

IF(
  Input,
  REGEX_EXTRACT(Input, "[^\n]*")
)

This basically gets everything up to—but not including—the first line break, but only if the field isn't empty. If the input is empty, this field will also be empty.

From that point on, we're going to use the previous extracted text to find the next piece to extract following a consistent pattern.

Output 2 looks like this:

IF(
  {Item Count} > 0,
  REGEX_EXTRACT(Input, "(?:" & {Output 1} & "\n)([^\n]*)")
)

This basically says that if we've got at least one line break, find—but ignore—the first extracted text and the line break following it, then extract everything up to—but not including—the next line break.

Output 3 does the same thing, but increasing the line break count check and referencing Output 2:

IF(
  {Item Count} > 1,
  REGEX_EXTRACT(Input, "(?:" & {Output 2} & "\n)([^\n]*)")
)

Repeat this pattern for as many fields as you need.

As for why we do the line count in the first place, this is the easiest way to prevent an error if the REGEX_EXTRACT() function can't find a given chunk. For example, this accounts for situations where one record might have 5 pieces to extract, and another only 3, and prevents those later fields from showing #ERROR in their output.

+21

Justin_Barrett
Inspiring
Forum|Forum|1 year ago
February 23, 2024

Literally seconds after posting the above solution, I realized that there's a slight problem with it. In your sample text, you have two lines that both contain "Test again". That's going to throw off the pattern matching when it comes to finding the second of those two lines, and all the lines that come after them. Let me see if I can quickly find a workaround (I'm short on time this morning).

CSM_PF
Participating Frequently
Forum|Forum|1 year ago
February 23, 2024

Don't worry, there will never be equivalent lines normally. It was just an example!

CSM_PF
Participating Frequently
Forum|Forum|1 year ago
February 23, 2024

I'll try your solution straight away!

+21

Justin_Barrett
Inspiring
Forum|Forum|1 year ago
February 23, 2024

Okay, I found a way to make it work. It's not as clean as the first version, but it's accurate, which is more important.

The first two formulas for Output 1 and Output 2 stay the same. From that point on, you need to add all previous outputs and their separating newlines into the find-but-ignore part. Here are the formulas for Output 3 and Output 4 as examples:

IF(
  {Item Count} > 1,
  REGEX_EXTRACT(Input, "(?:" & {Output 1} & "\n" & {Output 2} & "\n)([^\n]*)")
)

IF(
  {Item Count} > 2,
  REGEX_EXTRACT(Input, "(?:" & {Output 1} & "\n" & {Output 2} & "\n" & {Output 3} & "\n)([^\n]*)")
)

As you can see, the formula will become longer with each successive field, but it gets the job done.

CSM_PF
Participating Frequently
Forum|Forum|1 year ago
February 23, 2024

Thanks @Justin_Barrett it works.

Could you tell me what is the good formula to separate the column A : Ben#///#test-bug#///#Feb 22, 2024 3:54pm UTC#///#Budget#///#Test again#///#Test again#///#Moderate in the following colums

Column B : Ben
Column C : test-bug
Column D : Feb 22, 2024 3:54pm UTC
Column E : Budget
Column F : Test again
Column G : Test again
Column H : Moderate

Thanks for your help

+21

Justin_Barrett
Inspiring
Forum|Forum|1 year ago
February 24, 2024

@CSM_PF The solution for that isn't much different than my originals above.

For the item count field, the formula changes to this. Because the separators are five characters long, the result needs to be divided by 5:

(LEN(Input) - LEN(SUBSTITUTE(Input, "#///#", ""))) / 5

If you can guarantee that each line's content will be unique, you can use the original versions in my first reply. Here's the modified version of the formula for Output 1:

IF(
  Input,
  REGEX_EXTRACT(Input, "[^#]*")
)

Output 2 now looks like this:

IF(
  {Item Count} > 0,
  REGEX_EXTRACT(Input, "(?:" & {Output 1} & "#///#)([^#]*)")
)

The rest of the pattern should be easy to figure out.

If you can't guarantee uniqueness among the different sections, then you'll have to use the second version with similar modifications.

CSM_PF
Participating Frequently
Forum|Forum|1 year ago
February 25, 2024

@Justin_Barrett it works perfectly except for the last data field in my example - Column H: Moderate - do you know why?

CSM_PF
Participating Frequently
Forum|Forum|1 year ago
February 25, 2024

This returns the same data as the previous column, i.e. "Test Again".

CSM_PF
Participating Frequently
Forum|Forum|1 year ago
February 25, 2024

Found a solution with the following formula :

IF( Input, REGEX_EXTRACT(Input, "[^#]*$") )

+21

Justin_Barrett
Inspiring
Forum|Forum|1 year ago
February 25, 2024

This returns the same data as the previous column, i.e. "Test Again".

@CSM_PF Glad that you found a solution!

As to why it repeated, that's likely because of the duplicate data as I said before. If you have two sections both with the same content—e.g. "Test Again"—the first version won't work. You'll need to use the second version because it progressively adds each previous section to the "to ignore" part of the regular expression. Your solution gets around that by specifically targeting what's at the end of the string after the final "#" symbol.

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded