Help

Re: Root domain formula

Solved
Jump to Solution
2991 1
cancel
Showing results for 
Search instead for 
Did you mean: 
Jen_Baguhin
5 - Automation Enthusiast
5 - Automation Enthusiast

Is there any way to change different versions of domains to its root domains?

example:

If we add

Column 1
https://www.domain.com/
https://domain.com/
https://www.domain.com
https://domain.com
http://www.domain.com/
http://domain.com/
http://www.domain.com
http://domain.com

Column 2 Result in all rows of above
domain.com

1 Solution

Accepted Solutions
Joachim_Brindea
6 - Interface Innovator
6 - Interface Innovator

This removes subdomains, folders, etc and doesn't output anything if there is no url.

IF({url}="","",REGEX_EXTRACT( {url}, '^(?:https?:\\/\\/)?(?:[^@\n]+@)?(?:www\\.)?(?:[^.]+\\.)?([^:\\/\n?]+\\.[^:\\/\n?]+)'))

  

See Solution in Thread

15 Replies 15
Helmi
6 - Interface Innovator
6 - Interface Innovator

I’m not a big formula pro yet but I tried it out and it seems to work:

MID(
  URL, 
  IF(
    SEARCH('//www.', URL),
    SEARCH('//www.', URL) + 6,
    IF(
      SEARCH('//', URL),
      SEARCH('//', URL) + 2,
      IF(
        SEARCH('www.', URL),
        SEARCH('www.', URL) + 4,
        1
      )
    )
  ),
  LEN(URL)
)

here’s what it looks for me when testing:

image.png

I’m sure there are more efficient ways to do this and there might be some things left over that you don’t want but maybe a good starter still.

To add one little bit to @Helmi’s solution, this will also catch the ‘/’ at the end if it’s there:

MID(
  URL, 
  IF(
    SEARCH('//www.', URL),
    SEARCH('//www.', URL) + 6,
    IF(
      SEARCH('//', URL),
      SEARCH('//', URL) + 2,
      IF(
        SEARCH('www.', URL),
        SEARCH('www.', URL) + 4,
        1
      )
    )
  ),
  IF(
    SEARCH('com/', URL),
    LEN(URL) - 
    IF(
      SEARCH('//www.', URL),
      SEARCH('//www.', URL) + 6,
      IF(
        SEARCH('//', URL),
        SEARCH('//', URL) + 2,
        IF(
          SEARCH('www.', URL),
          SEARCH('www.', URL) + 4,
          1
        )
      )
    ),
    LEN(URL)
  )
)

hmm but then only if it’s .com, right? I’d better not rely on that. But good idea trying to catch that / - I was also thinking about trying to remove paths in the URL

Jen_Baguhin
5 - Automation Enthusiast
5 - Automation Enthusiast

WORKED! :slightly_smiling_face: thank you!

Kane_Jamison
4 - Data Explorer
4 - Data Explorer

Here’s what we use to turn raw urls into domains (retaining subdomain if there is one):

LEFT(SUBSTITUTE(SUBSTITUTE({URL cell},“http://”,""),“https://”,""),FIND("/",SUBSTITUTE(SUBSTITUTE({URL cell},“http://”,""),“https://”,""))-1)

This removes http(s)://, then looks for the first forward slash, and deletes anything after that.

It could fail if you have a slash-less URL such as domain:8080 or something like a parameter occurring without a trailing slash on the TLD.

Brad_Siefert
4 - Data Explorer
4 - Data Explorer

I came up with a much more robust, but also sorta dumb formula because I couldn’t get @Kane_Jamison’s solution to work.

IF(
  LEFT(
    SUBSTITUTE(
      SUBSTITUTE(  
        SUBSTITUTE({URL},"https://",""),"http://",""
      ),"www.",""
    ),
    FIND("/",
      SUBSTITUTE(
        SUBSTITUTE(  
          SUBSTITUTE({URL},"https://",""),"http://",""
        ),"www.",""
      )
    )-1
  ),
  LEFT(
    SUBSTITUTE(
      SUBSTITUTE(  
        SUBSTITUTE({URL},"https://",""),"http://",""
      ),"www.",""
    ),
    FIND("/",
      SUBSTITUTE(
        SUBSTITUTE(  
          SUBSTITUTE({URL},"https://",""),"http://",""
        ),"www.",""
      )
    )-1
  ),
  SUBSTITUTE(
    SUBSTITUTE(  
      SUBSTITUTE({URL},"https://",""),"http://",""
    ),"www.",""
  )
)
Yonathan_Cohen
5 - Automation Enthusiast
5 - Automation Enthusiast

My formula works perfectly, use it without moderation, just REPLACE “main domain name” by the URL_column_name that you want to clear:

LEFT(
  SUBSTITUTE(
    SUBSTITUTE(
      SUBSTITUTE({main domain name}, "http://",""), "https://",""), "www.", ""),
  IF(
    FIND("/",
      SUBSTITUTE(
        SUBSTITUTE(
          SUBSTITUTE({main domain name}, "http://",""), "https://",""), "www.", ""))=0, 
    LEN(
      SUBSTITUTE(
        SUBSTITUTE(
          SUBSTITUTE({main domain name}, "http://",""), "https://",""), "www.", "")), 
    FIND("/",
      SUBSTITUTE(
        SUBSTITUTE(
          SUBSTITUTE({main domain name}, "http://",""), "https://",""), "www.", ""))-1))

airtable formula to extract domain name from URL airtable

This is a bit more advanced, it will also remove the first part of such an URL: en.website.com
For example:
=>https://www.en.orchestrasales.com/home/view/
will be turned into “orchestrasales.com

LEFT(
  SUBSTITUTE(
    SUBSTITUTE(
      SUBSTITUTE(
          SUBSTITUTE(
              SUBSTITUTE({main domain name}, "http://",""), "https://",""), "www.", ""), "fr.", ""), "en.", ""),
  IF(
    FIND("/",
      SUBSTITUTE(
        SUBSTITUTE(
          SUBSTITUTE(
              SUBSTITUTE(
                  SUBSTITUTE({main domain name}, "http://",""), "https://",""), "www.", ""), "fr.", ""), "en.", ""))=0, 
    LEN(
      SUBSTITUTE(
        SUBSTITUTE(
          SUBSTITUTE(
              SUBSTITUTE(
                  SUBSTITUTE({main domain name}, "http://",""), "https://",""), "www.", ""), "fr.", ""), "en.", "")), 
    FIND("/",
      SUBSTITUTE(
        SUBSTITUTE(
          SUBSTITUTE(
              SUBSTITUTE(
                  SUBSTITUTE({main domain name}, "http://",""), "https://",""), "www.", ""), "fr.", ""), "en.", ""))-1))

Hi @Jeremy_Oglesby I don’t suppose I could ask you to add to this?

I have multiple top level domains in my data set e.g .com .org .ie etc etc

How can I modify the tld seach string below to pick up multiple tld values?

IF(
SEARCH(‘com/’, URL),
LEN(URL) -

Would really appreciate any help

Thanks,

Brian

Hey, @Brian_Moran

Sorry, I’m just seeing this. Luckily, in the 4 years since I wrote that formula you are referencing, Airtable has added some REGEX functions that should help with that.

Perhaps try replacing the entire SEARCH('com/', URL) function with this:

REGEX(URL, "[com|org|net|ie|io]/")

If that works, you can continue to add any other tld’s you might need inside those brackets, separated by a ‘pipe’ character |.

Thanks Jeremy! I’ll give that a try and report back!

Chris_Lally
4 - Data Explorer
4 - Data Explorer

I ended up needing to just get rid of the www. or https:// and such AND a trailing “/” IF there was one, hope it helps!

LEFT(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(URL, "http://",""), "https://",""), "www.", ""),
IF(RIGHT(URL,1)="/",
LEN(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(URL, "http://",""), "https://",""), "www.", ""))-1,
LEN(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(URL, "http://",""), "https://",""), "www.", ""))
))

Thanks! I used this and it worked great. It saved me a bunch of time to look up something someone shared instead of reinventing it myself.

Joachim_Brindea
6 - Interface Innovator
6 - Interface Innovator

This removes subdomains, folders, etc and doesn't output anything if there is no url.

IF({url}="","",REGEX_EXTRACT( {url}, '^(?:https?:\\/\\/)?(?:[^@\n]+@)?(?:www\\.)?(?:[^.]+\\.)?([^:\\/\n?]+\\.[^:\\/\n?]+)'))

  

This worked very well, thank you