Aug 29, 2018 05:11 PM
Possibly of interest to some, the following is an implementation of Soundex pseudo-homophone encoding. As there are variations of the algorithm, care should be taken if using its output to search externally encoded databases. (I think this conforms to U.S. Census encoding, but I’ve found contradicting statements.) To use, copy and paste the following into the formula configuration window of a formula field. It expects to use an existing field in the table named {Word}
as input, so adjust field names accordingly. (There are two instances of the field name {Word}
in the formula.)
Yes, it’s far from a perfect algorithm; yes, it’s rooted somewhat intractably in American English; yes, there are alternative encodings that may be more appropriate for many uses: Feel free to modify, enhance, and improve as you see fit — preferably releasing your improvements to the community afterwards.
LEFT(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
(
LEFT({Word},1)&
(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
UPPER({Word}),
'A','0'
),
'E','0'
),
'I','0'
),
'O','0'
),
'U','0'
),
'Y','0'
),
'H','0'
),
'W','0'
),
'B','1'
),
'F','1'
),
'P','1'
),
'V','1'
),
'C','2'
),
'G','2'
),
'J','2'
),
'K','2'
),
'Q','2'
),
'S','2'
),
'X','2'
),
'Z','2'
),
'D','3'
),
'T','3'
),
'L','4'
),
'M','5'
),
'N','5'
),
'R','6'
),
'1111','1'
),
'111','1'
),
'11','1'
),
'2222','2'
),
'222','2'
),
'22','2'
),
'3333','3'
),
'333','3'
),
'33','3'
),
'4444','4'
),
'444','4'
),
'44','4'
),
'5555','5'
),
'555','5'
),
'55','5'
),
'6666','6'
),
'666','6'
),
'66','6'
),
'0',''
)
)
),
'B1','B'
),
'F1','F'
),
'P1','P'
),
'V1','V'
),
'C2','C'
),
'G2','G'
),
'J2','J'
),
'K2','K'
),
'Q2','Q'
),
'S2','S'
),
'X2','X'
),
'Z2','Z'
),
'D3','D'
),
'T3','T'
),
'L4','L'
),
'M5','M'
),
'N5','N'
),
'R6','R'
)&'000',4)
Aug 29, 2018 05:40 PM
That’s a lot of substitutes :winking_face:
Impressive
Aug 29, 2018 06:19 PM
Amazingly, I didn’t have this going through my head…
https://www.youtube.com/watch?v=eswQl-hcvU0
…until your reply. :winking_face:
Sep 05, 2018 06:47 AM
It would be great if they had a powerfull regex replace function.
Sep 05, 2018 09:45 AM
Yeah, but then I’d have to go and learn regex… :winking_face:
Sep 06, 2018 06:13 AM
If you manage to build soundex with all that formulas, then you’ll feel comfortable with regexp :-). Much less typing.