Mar 13, 2023 05:08 PM
I have addresses in multiple lines and am using REGEX EXTRACT in Airtable to separate the components (street address, suburb, state, post code) before sending them on for shipping labels.
I've used REGEX_EXTRACT({text input}, "^(?:.*\n){1}") to extract the first line and this works in most cases except when the street address is over two lines (see example B).
Can someone please help me with a REGEX EXTRACT that extracts every line except the last line (whether the address is 2 or 3 lines)?
Or is there a simpler solution I'm not aware of?
Example A
1 Test Street
Richmond, Perth 3021
Example B
1 Test Street
Unit 6
Richmond, Perth 3021
For reference on how I'm dealing with the rest of the components...
I've used REGEX_EXTRACT({text input}, "^.\r?\n(.)") to extract the last line (Subrub, State, Postcode) and then used REGEX_EXTRACT({text input}, ".*(\d{4})") to extract the post code.
Once I have the post code I'm matching it against a database to find the Suburb and State. I did this because I believe it's too difficult to do a REGEX that will extract the Suburb, State and Postcode individually. I think it's too difficult because both the Suburb and State can be one, two or three words long. If you have a better solution I'm all ears!
Solved! Go to Solution.
Mar 13, 2023 07:28 PM
Hmm, could you try:
REGEX_EXTRACT(Address,'(?s)^(.*)\n[^\n]*$')
Should do what you want?
---
I'm not too familiar with the suburb and state stuff I'm afraid. If you could provide some examples I can see what I can figure out!
Mar 14, 2023 03:31 AM
Here is a solution that extracts suburb and state, allowing for extra line breaks, spaces and differing capitalis ation:
AddressPiped: "¬" & REGEX_REPLACE(REGEX_REPLACE(Address,"[\n\\r]+", "¬"), " +", " ")
LastLine: LOWER(REGEX_EXTRACT(AddressPiped,"¬([^¬]+)$"))
LastLineWithShortState: REGEX_REPLACE(REGEX_REPLACE(LastLine, "victoria ([0-9]{4})", "vic $1"),"new south wales ([0-9]{4})","nsw $1")
State: REGEX_EXTRACT(LastLineWithShortState," (nsw|vic) [0-9]{4}")
Suburb: REGEX_EXTRACT(LastLineWithShortState,"^(.+) " & State & " [0-9]{4}")
LastLineProper: REGEX_EXTRACT(AddressPiped,"¬([^¬]+)$")
SuburbProper: LEFT(LastLineProper, LEN(Suburb))
SuburbProperFinal: UPPER(LEFT(SuburbProper,1)) & MID(SuburbProper,2,300)
StateFull: SWITCH(State,"nsw","New South Wales", "vic", "Victoria")
I've only done this for new south wales and victoria so far, the fields LastLineWithShortState, State and StateFull need finishing with the other states.
Mar 13, 2023 07:28 PM
Hmm, could you try:
REGEX_EXTRACT(Address,'(?s)^(.*)\n[^\n]*$')
Should do what you want?
---
I'm not too familiar with the suburb and state stuff I'm afraid. If you could provide some examples I can see what I can figure out!
Mar 13, 2023 11:18 PM
Thanks Adam, you're a legend!
The last line of each address is made up of the Suburb, State and Postcode.
Currently I'm only extracting the postcode and then matching it against a database to find the Suburb and State. I'm doing this because I don't know how to split the three elements.
This is solution is mostly fine however sometime it returns a Suburb that is not the one in the orginal address. This is because a single post code may apply to multiple suburbs. For example, the postcode 3121 is for Richmond, Cremorne, Burnley etc.
Removing the 4 digit post code is easy enough. It is always at the end and is always 4 digits. However splitting the Suburb and State is trickier. I think it would have to be something around the State given there are only 8 of them and thousands of Suburbs.
The8 States/Territories in Australia (where we deliver):
VIC - Victoria
NSW - New South Wales
QLD - Queensland
NT - Northern Territory
SA - South Australia
ACT - Australia Capital Territory
WA - Western Australia
TAS - Tasmania
Customers will write it either as an abbreviation listed or as the full name, (NSW, Nsw, nsw, new south wales, New South Wales, New south wales), which I think ads to the complexity.
Note: There are no commas or any characters splitting the elements.
Examples:
Richmond VIC 3121
Upwey Victoria 3158
Sydney New south wales 2000
St Albans Nsw 2021
Open to any thoughts on this and thanks again for solving the last one!
Stefaan
Mar 14, 2023 03:31 AM
Here is a solution that extracts suburb and state, allowing for extra line breaks, spaces and differing capitalis ation:
AddressPiped: "¬" & REGEX_REPLACE(REGEX_REPLACE(Address,"[\n\\r]+", "¬"), " +", " ")
LastLine: LOWER(REGEX_EXTRACT(AddressPiped,"¬([^¬]+)$"))
LastLineWithShortState: REGEX_REPLACE(REGEX_REPLACE(LastLine, "victoria ([0-9]{4})", "vic $1"),"new south wales ([0-9]{4})","nsw $1")
State: REGEX_EXTRACT(LastLineWithShortState," (nsw|vic) [0-9]{4}")
Suburb: REGEX_EXTRACT(LastLineWithShortState,"^(.+) " & State & " [0-9]{4}")
LastLineProper: REGEX_EXTRACT(AddressPiped,"¬([^¬]+)$")
SuburbProper: LEFT(LastLineProper, LEN(Suburb))
SuburbProperFinal: UPPER(LEFT(SuburbProper,1)) & MID(SuburbProper,2,300)
StateFull: SWITCH(State,"nsw","New South Wales", "vic", "Victoria")
I've only done this for new south wales and victoria so far, the fields LastLineWithShortState, State and StateFull need finishing with the other states.
Mar 14, 2023 03:36 AM - edited Mar 14, 2023 03:37 AM
Glad I could help!
re: the states
Hmm, yeah this is tricky. Given that it's either written as an abbreiviation or as a full name, I'd just hardcode it and call it a day, really
Link to base
And you can duplicate the base to see the formulas used!
Mar 14, 2023 03:40 AM
I posted a reply to this but edited it, it has now disappeared...
Mar 14, 2023 03:51 AM
I'll try again:
Here is a solution that extracts suburb and state, allowing for extra line breaks/spaces, and differing spelling of state and capitalisation (and also if the state name is in the suburb!):
AddressPiped: "¬" & REGEX_REPLACE(REGEX_REPLACE(Address,"[\n\\r]+", "¬"), " +", " ")
LastLine: LOWER(REGEX_EXTRACT(AddressPiped,"¬([^¬]+)$"))
LastLineWithShortState: REGEX_REPLACE(REGEX_REPLACE(LastLine, "victoria ([0-9]{4})", "vic $1"),"new south wales ([0-9]{4})","nsw $1")
State: REGEX_EXTRACT(LastLineWithShortState," (nsw|vic) [0-9]{4}")
Suburb: REGEX_EXTRACT(LastLineWithShortState,"^(.+) " & State & " [0-9]{4}")
LastLineProper: REGEX_EXTRACT(AddressPiped,"¬([^¬]+)$")
SuburbProper: LEFT(LastLineProper, LEN(Suburb))
SuburbProperFinal: UPPER(LEFT(SuburbProper,1)) & MID(SuburbProper,2, 300)
StateFull: SWITCH(State,"nsw","New South Wales", "vic", "Victoria")
The fields LastLineWithShortState, State and StateFull need finishing off with the other states.
Mar 14, 2023 03:54 AM
Here is a solution that extracts suburb and state, allowing for extra line breaks/spaces, and differing spelling of state and capitalisation (and also if the state name is in the suburb!).
AddressPiped: "¬" & REGEX_REPLACE(REGEX_REPLACE(Address,"[\n\\r]+", "¬"), " +", " ")
LastLine: LOWER(REGEX_EXTRACT(AddressPiped,"¬([^¬]+)$"))
LastLineWithShortState: REGEX_REPLACE(REGEX_REPLACE(LastLine, "victoria ([0-9]{4})", "vic $1"),"new south wales ([0-9]{4})","nsw $1")
State: REGEX_EXTRACT(LastLineWithShortState," (nsw|vic) [0-9]{4}")
Suburb: REGEX_EXTRACT(LastLineWithShortState,"^(.+) " & State & " [0-9]{4}")
LastLineProper: REGEX_EXTRACT(AddressPiped,"¬([^¬]+)$")
SuburbProper: LEFT(LastLineProper, LEN(Suburb))
SuburbProperFinal: UPPER(LEFT(SuburbProper,1)) & MID(SuburbProper,2, 300)
StateFull: SWITCH(State,"nsw","New South Wales", "vic", "Victoria")
The fields LastLineWithShortState, State and StateFull need finishing off with the other states.