Help

splitting content in long form text in a formula

Topic Labels: Formulas
Solved
Jump to Solution
1313 14
cancel
Showing results for 
Search instead for 
Did you mean: 
jm22
5 - Automation Enthusiast
5 - Automation Enthusiast

TIA - this community is amazing and so helpful when I can't get my head around these more tricky ones. but I am learning more and more! 

I have a bunch of method steps for our recipes:

COOK THE SALMON
Pat dry salmon with paper towel, remove the skin and cut into bite size pieces, discarding the skin. Add salmon, olives and courgette to the pot. Simmer, gently stirring occasionally, for 5-6mins, or until the salmon is cooked through and all the veggies are tender. Remove from heat. Check seasoning. "

MAKE THE AVOCADO DIP
Cut lemon into wedges.
Halve avocado, remove the stone and scoop out flesh with a spoon. Place avocado into a third medium bowl and mash with fork until smooth. Mix in aioli and (1/2 tbsp/1 tbsp) lemon juice. Season with salt, pepper and more lemon juice to taste. Set aside.

TIP: If you prefer, add avocado to the salad in STEP 5 and serve aioli on the side. "

MAKE THE HERBY VINAIGRETTE
Finely chop (half/whole) pack coriander, including the stalks. Add to small bowl with (1 tsp/2 tsp) honey, (2 tbsp/4 tbsp) olive oil and (1 tbsp/2 tbsp) white wine vinegar. Stir together until combined. Season to taste with salt and pepper. Set aside.

TIP: You may use your preferred sweetener instead of honey."

You can see they are all different. 

 

I want to have one column for the heading (in caps) and another column for the method directions. 

Do you think I'll have any joy with this one? 

 

thanks! 

 

1 Solution

Accepted Solutions
Justin_Barrett
18 - Pluto
18 - Pluto

As long as your heading is always on a single line (not spanning multiple lines using line breaks), then this will do the trick.

First formula field: "Heading"

 

IF(Recipe, REGEX_EXTRACT(Recipe, "[^\n]*"))

 

Second formula field: "Recipe Body"

 

IF(Recipe, TRIM(SUBSTITUTE(Recipe, Heading, "")))

 

Screen Shot 2023-01-31 at 9.50.39 PM.png

 

See Solution in Thread

14 Replies 14
Justin_Barrett
18 - Pluto
18 - Pluto

As long as your heading is always on a single line (not spanning multiple lines using line breaks), then this will do the trick.

First formula field: "Heading"

 

IF(Recipe, REGEX_EXTRACT(Recipe, "[^\n]*"))

 

Second formula field: "Recipe Body"

 

IF(Recipe, TRIM(SUBSTITUTE(Recipe, Heading, "")))

 

Screen Shot 2023-01-31 at 9.50.39 PM.png

 

jm22
5 - Automation Enthusiast
5 - Automation Enthusiast

Worked like a total charm! thanks! 

CSM_PF
5 - Automation Enthusiast
5 - Automation Enthusiast

Is it possible to make the same thing on different columns

I'd like to put a line in a different column each time

Capture d’écran 2024-02-23 à 14.04.25.png

@CSM_PF Is this what you're looking for?

Screen Shot 2024-02-23 at 6.15.47 AM.png

If so, that's pretty easy to set up. First add a formula field to count the number of line breaks in the source text (I'll explain the purpose behind this later):

LEN(Input) - LEN(SUBSTITUTE(Input, "\n", ""))

I named this field "Input Count", but feel free to name it whatever you want as long as the later formula fields reference it correctly.

The first extraction formula (Output 1) is the easiest:

IF(
  Input,
  REGEX_EXTRACT(Input, "[^\n]*")
)

This basically gets everything up to—but not including—the first line break, but only if the field isn't empty. If the input is empty, this field will also be empty.

From that point on, we're going to use the previous extracted text to find the next piece to extract following a consistent pattern.

Output 2 looks like this:

IF(
  {Item Count} > 0,
  REGEX_EXTRACT(Input, "(?:" & {Output 1} & "\n)([^\n]*)")
)

This basically says that if we've got at least one line break, find—but ignore—the first extracted text and the line break following it, then extract everything up to—but not including—the next line break.

Output 3 does the same thing, but increasing the line break count check and referencing Output 2:

IF(
  {Item Count} > 1,
  REGEX_EXTRACT(Input, "(?:" & {Output 2} & "\n)([^\n]*)")
)

Repeat this pattern for as many fields as you need.

As for why we do the line count in the first place, this is the easiest way to prevent an error if the REGEX_EXTRACT() function can't find a given chunk. For example, this accounts for situations where one record might have 5 pieces to extract, and another only 3, and prevents those later fields from showing #ERROR in their output.

Literally seconds after posting the above solution, I realized that there's a slight problem with it. In your sample text, you have two lines that both contain "Test again". That's going to throw off the pattern matching when it comes to finding the second of those two lines, and all the lines that come after them. Let me see if I can quickly find a workaround (I'm short on time this morning).

CSM_PF
5 - Automation Enthusiast
5 - Automation Enthusiast

Don't worry, there will never be equivalent lines normally. It was just an example!

CSM_PF
5 - Automation Enthusiast
5 - Automation Enthusiast

I'll try your solution straight away!

Okay, I found a way to make it work. It's not as clean as the first version, but it's accurate, which is more important.

The first two formulas for Output 1 and Output 2 stay the same. From that point on, you need to add all previous outputs and their separating newlines into the find-but-ignore part. Here are the formulas for Output 3 and Output 4 as examples:

IF(
  {Item Count} > 1,
  REGEX_EXTRACT(Input, "(?:" & {Output 1} & "\n" & {Output 2} & "\n)([^\n]*)")
)
IF(
  {Item Count} > 2,
  REGEX_EXTRACT(Input, "(?:" & {Output 1} & "\n" & {Output 2} & "\n" & {Output 3} & "\n)([^\n]*)")
)

As you can see, the formula will become longer with each successive field, but it gets the job done.

Screen Shot 2024-02-23 at 6.54.57 AM.png

CSM_PF
5 - Automation Enthusiast
5 - Automation Enthusiast

Thanks @Justin_Barrett it works.

Could you tell me what is the good formula to separate the column A : Ben#///#test-bug#///#Feb 22, 2024 3:54pm UTC#///#Budget#///#Test again#///#Test again#///#Moderate in the following colums

Column B : Ben
Column C : test-bug
Column D : Feb 22, 2024 3:54pm UTC
Column E : Budget
Column F : Test again
Column G : Test again
Column H : Moderate

Thanks for your help

@CSM_PF The solution for that isn't much different than my originals above.

For the item count field, the formula changes to this. Because the separators are five characters long, the result needs to be divided by 5:

(LEN(Input) - LEN(SUBSTITUTE(Input, "#///#", ""))) / 5

If you can guarantee that each line's content will be unique, you can use the original versions in my first reply. Here's the modified version of the formula for Output 1:

IF(
  Input,
  REGEX_EXTRACT(Input, "[^#]*")
)

Output 2 now looks like this:

IF(
  {Item Count} > 0,
  REGEX_EXTRACT(Input, "(?:" & {Output 1} & "#///#)([^#]*)")
)

The rest of the pattern should be easy to figure out.

If you can't guarantee uniqueness among the different sections, then you'll have to use the second version with similar modifications.

CSM_PF
5 - Automation Enthusiast
5 - Automation Enthusiast

@Justin_Barrett it works perfectly except for the last data field in my example - Column H: Moderate - do you know why?

CSM_PF
5 - Automation Enthusiast
5 - Automation Enthusiast

This returns the same data as the previous column, i.e. "Test Again".

CSM_PF
5 - Automation Enthusiast
5 - Automation Enthusiast

Found a solution with the following formula :

IF( Input, REGEX_EXTRACT(Input, "[^#]*$") )

@CSM_PF Glad that you found a solution!

As to why it repeated, that's likely because of the duplicate data as I said before. If you have two sections both with the same content—e.g. "Test Again"—the first version won't work. You'll need to use the second version because it progressively adds each previous section to the "to ignore" part of the regular expression. Your solution gets around that by specifically targeting what's at the end of the string after the final "#" symbol.