splitting content in long form text in a formula

jm22 · ‎Jan 31, 2023

TIA - this community is amazing and so helpful when I can't get my head around these more tricky ones. but I am learning more and more!

I have a bunch of method steps for our recipes:

COOK THE SALMON
Pat dry salmon with paper towel, remove the skin and cut into bite size pieces, discarding the skin. Add salmon, olives and courgette to the pot. Simmer, gently stirring occasionally, for 5-6mins, or until the salmon is cooked through and all the veggies are tender. Remove from heat. Check seasoning. "

MAKE THE AVOCADO DIP
Cut lemon into wedges.
Halve avocado, remove the stone and scoop out flesh with a spoon. Place avocado into a third medium bowl and mash with fork until smooth. Mix in aioli and (1/2 tbsp/1 tbsp) lemon juice. Season with salt, pepper and more lemon juice to taste. Set aside.

TIP: If you prefer, add avocado to the salad in STEP 5 and serve aioli on the side. "

MAKE THE HERBY VINAIGRETTE
Finely chop (half/whole) pack coriander, including the stalks. Add to small bowl with (1 tsp/2 tsp) honey, (2 tbsp/4 tbsp) olive oil and (1 tbsp/2 tbsp) white wine vinegar. Stir together until combined. Season to taste with salt and pepper. Set aside.

TIP: You may use your preferred sweetener instead of honey."

You can see they are all different.

I want to have one column for the heading (in caps) and another column for the method directions.

Do you think I'll have any joy with this one?

thanks!

Justin_Barrett · ‎Jan 31, 2023

As long as your heading is always on a single line (not spanning multiple lines using line breaks), then this will do the trick.

First formula field: "Heading"

IF(Recipe, REGEX_EXTRACT(Recipe, "[^\n]*"))

Second formula field: "Recipe Body"

IF(Recipe, TRIM(SUBSTITUTE(Recipe, Heading, "")))

See Solution in Thread

Justin_Barrett · ‎Jan 31, 2023

As long as your heading is always on a single line (not spanning multiple lines using line breaks), then this will do the trick.

First formula field: "Heading"

IF(Recipe, REGEX_EXTRACT(Recipe, "[^\n]*"))

Second formula field: "Recipe Body"

IF(Recipe, TRIM(SUBSTITUTE(Recipe, Heading, "")))

jm22 · ‎Feb 07, 2023

Worked like a total charm! thanks!

CSM_PF · ‎Feb 23, 2024

Is it possible to make the same thing on different columns

I'd like to put a line in a different column each time

Justin_Barrett · ‎Feb 23, 2024

@CSM_PF Is this what you're looking for?

If so, that's pretty easy to set up. First add a formula field to count the number of line breaks in the source text (I'll explain the purpose behind this later):

LEN(Input) - LEN(SUBSTITUTE(Input, "\n", ""))

I named this field "Input Count", but feel free to name it whatever you want as long as the later formula fields reference it correctly.

The first extraction formula (Output 1) is the easiest:

IF(
  Input,
  REGEX_EXTRACT(Input, "[^\n]*")
)

This basically gets everything up to—but not including—the first line break, but only if the field isn't empty. If the input is empty, this field will also be empty.

From that point on, we're going to use the previous extracted text to find the next piece to extract following a consistent pattern.

Output 2 looks like this:

IF(
  {Item Count} > 0,
  REGEX_EXTRACT(Input, "(?:" & {Output 1} & "\n)([^\n]*)")
)

This basically says that if we've got at least one line break, find—but ignore—the first extracted text and the line break following it, then extract everything up to—but not including—the next line break.

Output 3 does the same thing, but increasing the line break count check and referencing Output 2:

IF(
  {Item Count} > 1,
  REGEX_EXTRACT(Input, "(?:" & {Output 2} & "\n)([^\n]*)")
)

Repeat this pattern for as many fields as you need.

As for why we do the line count in the first place, this is the easiest way to prevent an error if the REGEX_EXTRACT() function can't find a given chunk. For example, this accounts for situations where one record might have 5 pieces to extract, and another only 3, and prevents those later fields from showing #ERROR in their output.

Justin_Barrett · ‎Feb 23, 2024

Literally seconds after posting the above solution, I realized that there's a slight problem with it. In your sample text, you have two lines that both contain "Test again". That's going to throw off the pattern matching when it comes to finding the second of those two lines, and all the lines that come after them. Let me see if I can quickly find a workaround (I'm short on time this morning).

CSM_PF · ‎Feb 23, 2024

Don't worry, there will never be equivalent lines normally. It was just an example!

CSM_PF · ‎Feb 23, 2024

I'll try your solution straight away!

Justin_Barrett · ‎Feb 23, 2024

Okay, I found a way to make it work. It's not as clean as the first version, but it's accurate, which is more important.

The first two formulas for Output 1 and Output 2 stay the same. From that point on, you need to add all previous outputs and their separating newlines into the find-but-ignore part. Here are the formulas for Output 3 and Output 4 as examples:

IF(
  {Item Count} > 1,
  REGEX_EXTRACT(Input, "(?:" & {Output 1} & "\n" & {Output 2} & "\n)([^\n]*)")
)

IF(
  {Item Count} > 2,
  REGEX_EXTRACT(Input, "(?:" & {Output 1} & "\n" & {Output 2} & "\n" & {Output 3} & "\n)([^\n]*)")
)

As you can see, the formula will become longer with each successive field, but it gets the job done.

CSM_PF · ‎Feb 23, 2024

Thanks @Justin_Barrett it works.

Could you tell me what is the good formula to separate the column A : Ben#///#test-bug#///#Feb 22, 2024 3:54pm UTC#///#Budget#///#Test again#///#Test again#///#Moderate in the following colums

Column B : Ben
Column C : test-bug
Column D : Feb 22, 2024 3:54pm UTC
Column E : Budget
Column F : Test again
Column G : Test again
Column H : Moderate

Thanks for your help