Dec 05, 2022 09:54 AM
Hi All
I'm currently scraping data from a few event listing websites into Airtable to create a database of events for the LGBTQ community.
On a few of the websites, I can't extract the full event date from the listings, this only appears in the URL.
An example of the URL is:
https://www.admiral-duncan.co.uk/soho/02-01-2023/karaoke-showdown-with-candy-heals
Is there any easy way to do this?
The other issue is that each site link has a different URL length, but the way they show the date is the same each time:
https://www.the2brewers.com/london/08-12-2022/panto
Any suggestions?
Thanks
Dec 05, 2022 12:06 PM
Hi there,
I'm no formula expert, but this should do the trick, as long as the dates are indeed formatted the same way:
REGEX_EXTRACT({URLField},"\d\d-\d\d-\d\d\d\d")
Dec 05, 2022 12:13 PM
You could even go a step further and change the formatting of the date to read "February 01, 2023" for example:
DATETIME_FORMAT(REGEX_EXTRACT(Notes,"\d\d-\d\d-\d\d\d\d"),'MMMM DD, YYYY')