Splitting text to alternate columns separated by ;

Jack_Richardson · ‎Apr 07, 2021

Hi, I’m trying to formulate so I can split the text in one column (separated by a :winking_face: into multiple. Please can someone help!

Justin_Barrett · ‎Apr 07, 2021

Welcome to the community, @Jack_Richardson! :grinning_face_with_big_eyes: This can be done using four similar regular expression functions. Here’s the formula for {Name 1}:

IF(
  {One Line Separated by ;},
  TRIM(
    REGEX_REPLACE(
      {One Line Separated by ;},
      "(.*)(?:;)(.*)(?:;)(.*)(?:;)(.*)",
      "$1"
    )
  )
)

This works by capturing in groups—the parts wrapped in parentheses—but only matching and not capturing the semicolons. The REGEX_REPLACE() function effectively says, “Match these groups by the specified patterns, then replace the whole thing with just the first captured group (represented by $1)”. For the next field, change $1 in the final string with $2, which will return only the second group. Keep that pattern going for the other fields. If you need to capture more (it looks like you left out a possible URL group), just extend the regex pattern in the first string.

The only possible hiccup is that this only works if you always have the same number of groups to capture. If the number of groups changes for each record—maybe someone omitted their position or email—you won’t get correct results. There are ways around this, but it makes the formula a bit more complex. Holler if you need to go in that direction.

Jack_Richardson · ‎Apr 07, 2021

Thanks @Justin_Barrett ! Unfortunately, I think I will have to go down that direction as some may not be in the same form as others. Ie. some may not even have a position etc.

Also, attaching a screenshot of my results - it doesn’t seem to have split by the first delimiter here, any thoughts?

Justin_Barrett · ‎Apr 07, 2021

Because you’ve got five sections to grab, the regex will be longer. When I add a URL segment to my source text, I get the same thing:

With five sections, the regex string should be this for all five:

(.*)(?:;)(.*)(?:;)(.*)(?:;)(.*)(?:;)(.*)

You can then add a URL field to extract the fourth part ($4), which would make {Experience} the fifth part ($5).

When someone omits a section like the position, what does the output string look like? Does it leave the delimiters there with nothing between them, or does it just build a string with only four sections? I’m hoping it’s the former; it would make more sense, and it would mean that the regex would still work:

However, if the delimiters are omitted when content is omitted, it will definitely cause problems.