Help

Re: Soundex implementation

895 0
cancel
Showing results for 
Search instead for 
Did you mean: 

Possibly of interest to some, the following is an implementation of Soundex pseudo-homophone encoding. As there are variations of the algorithm, care should be taken if using its output to search externally encoded databases. (I think this conforms to U.S. Census encoding, but I’ve found contradicting statements.) To use, copy and paste the following into the formula configuration window of a formula field. It expects to use an existing field in the table named {Word} as input, so adjust field names accordingly. (There are two instances of the field name {Word} in the formula.)

Yes, it’s far from a perfect algorithm; yes, it’s rooted somewhat intractably in American English; yes, there are alternative encodings that may be more appropriate for many uses: Feel free to modify, enhance, and improve as you see fit — preferably releasing your improvements to the community afterwards.

LEFT(
SUBSTITUTE(
 SUBSTITUTE(
  SUBSTITUTE(
   SUBSTITUTE(
    SUBSTITUTE(
     SUBSTITUTE(
      SUBSTITUTE(
       SUBSTITUTE(
        SUBSTITUTE(
         SUBSTITUTE(
          SUBSTITUTE(
           SUBSTITUTE(
            SUBSTITUTE(
             SUBSTITUTE(
              SUBSTITUTE(
               SUBSTITUTE(
                SUBSTITUTE(
                 SUBSTITUTE(
(
LEFT({Word},1)&
(
SUBSTITUTE(
 SUBSTITUTE(
  SUBSTITUTE(
   SUBSTITUTE(
    SUBSTITUTE(
     SUBSTITUTE(
      SUBSTITUTE(
       SUBSTITUTE(
        SUBSTITUTE(
         SUBSTITUTE(
          SUBSTITUTE(
           SUBSTITUTE(
            SUBSTITUTE(
             SUBSTITUTE(
              SUBSTITUTE(
               SUBSTITUTE(
                SUBSTITUTE(
                 SUBSTITUTE(
                  SUBSTITUTE(
                   SUBSTITUTE(
                    SUBSTITUTE(
                     SUBSTITUTE(
                      SUBSTITUTE(
                       SUBSTITUTE(
                        SUBSTITUTE(
                         SUBSTITUTE(
                          SUBSTITUTE( 
                           SUBSTITUTE(
                            SUBSTITUTE(
                             SUBSTITUTE(
                              SUBSTITUTE(
                               SUBSTITUTE(
                                SUBSTITUTE(
                                 SUBSTITUTE(
SUBSTITUTE(
 SUBSTITUTE(
  SUBSTITUTE(
   SUBSTITUTE(
    SUBSTITUTE(
     SUBSTITUTE(
      SUBSTITUTE(
       SUBSTITUTE(
        SUBSTITUTE(
         SUBSTITUTE(
          SUBSTITUTE(
UPPER({Word}),
           'A','0'
           ),
          'E','0'
          ),
         'I','0'
         ),
        'O','0'
        ),
       'U','0'
       ),
      'Y','0'
      ),
     'H','0'
     ),
    'W','0'
    ),
   'B','1'
   ),
  'F','1'
  ),
 'P','1'
 ),
                                  'V','1'
                                  ),
                                 'C','2'
                                 ),
                                'G','2'
                                ),
                               'J','2'
                               ),
                              'K','2'
                              ),
                             'Q','2'
                             ),
                            'S','2'
                            ),
                           'X','2'
                           ),
                          'Z','2'
                          ),
                         'D','3'
                         ),
                        'T','3'
                        ),
                       'L','4'
                       ),
                      'M','5'
                      ),
                     'N','5'
                     ),
                    'R','6'
                    ),
                   '1111','1'
                   ),
                  '111','1'
                  ),
                 '11','1'
                 ),
                '2222','2'
                ),
               '222','2'
               ),
              '22','2'
              ),
             '3333','3'
             ),
            '333','3'
            ),
           '33','3'
           ),
          '4444','4'
          ),
         '444','4'
         ),
        '44','4'
        ),
       '5555','5'
       ),
      '555','5'
      ),
     '55','5'
     ),
    '6666','6'
    ),
   '666','6'
   ),
  '66','6'
  ),
 '0',''
 )
)
),
                  'B1','B'
                  ),
                 'F1','F'
                 ),
                'P1','P'
                ),
               'V1','V'
               ),
              'C2','C'
              ),
             'G2','G'
             ),
            'J2','J'
            ),
           'K2','K'
           ),
          'Q2','Q'
          ),
         'S2','S'
         ),
        'X2','X'
        ),
       'Z2','Z'
       ),
      'D3','D'
      ),
     'T3','T'
     ),
    'L4','L'
    ),
   'M5','M'
   ),
  'N5','N'
  ),
 'R6','R'
 )&'000',4)
5 Replies 5
openside
10 - Mercury
10 - Mercury

That’s a lot of substitutes :winking_face:

Impressive

Amazingly, I didn’t have this going through my head…

https://www.youtube.com/watch?v=eswQl-hcvU0

…until your reply. :winking_face:

Jonas_Felix
5 - Automation Enthusiast
5 - Automation Enthusiast

It would be great if they had a powerfull regex replace function.

Yeah, but then I’d have to go and learn regex… :winking_face:

If you manage to build soundex with all that formulas, then you’ll feel comfortable with regexp :-). Much less typing.