Soundex implementation


#1

Possibly of interest to some, the following is an implementation of Soundex pseudo-homophone encoding. As there are variations of the algorithm, care should be taken if using its output to search externally encoded databases. (I think this conforms to U.S. Census encoding, but I’ve found contradicting statements.) To use, copy and paste the following into the formula configuration window of a formula field. It expects to use an existing field in the table named {Word} as input, so adjust field names accordingly. (There are two instances of the field name {Word} in the formula.)

Yes, it’s far from a perfect algorithm; yes, it’s rooted somewhat intractably in American English; yes, there are alternative encodings that may be more appropriate for many uses: Feel free to modify, enhance, and improve as you see fit — preferably releasing your improvements to the community afterwards.

LEFT(
SUBSTITUTE(
 SUBSTITUTE(
  SUBSTITUTE(
   SUBSTITUTE(
    SUBSTITUTE(
     SUBSTITUTE(
      SUBSTITUTE(
       SUBSTITUTE(
        SUBSTITUTE(
         SUBSTITUTE(
          SUBSTITUTE(
           SUBSTITUTE(
            SUBSTITUTE(
             SUBSTITUTE(
              SUBSTITUTE(
               SUBSTITUTE(
                SUBSTITUTE(
                 SUBSTITUTE(
(
LEFT({Word},1)&
(
SUBSTITUTE(
 SUBSTITUTE(
  SUBSTITUTE(
   SUBSTITUTE(
    SUBSTITUTE(
     SUBSTITUTE(
      SUBSTITUTE(
       SUBSTITUTE(
        SUBSTITUTE(
         SUBSTITUTE(
          SUBSTITUTE(
           SUBSTITUTE(
            SUBSTITUTE(
             SUBSTITUTE(
              SUBSTITUTE(
               SUBSTITUTE(
                SUBSTITUTE(
                 SUBSTITUTE(
                  SUBSTITUTE(
                   SUBSTITUTE(
                    SUBSTITUTE(
                     SUBSTITUTE(
                      SUBSTITUTE(
                       SUBSTITUTE(
                        SUBSTITUTE(
                         SUBSTITUTE(
                          SUBSTITUTE( 
                           SUBSTITUTE(
                            SUBSTITUTE(
                             SUBSTITUTE(
                              SUBSTITUTE(
                               SUBSTITUTE(
                                SUBSTITUTE(
                                 SUBSTITUTE(
SUBSTITUTE(
 SUBSTITUTE(
  SUBSTITUTE(
   SUBSTITUTE(
    SUBSTITUTE(
     SUBSTITUTE(
      SUBSTITUTE(
       SUBSTITUTE(
        SUBSTITUTE(
         SUBSTITUTE(
          SUBSTITUTE(
UPPER({Word}),
           'A','0'
           ),
          'E','0'
          ),
         'I','0'
         ),
        'O','0'
        ),
       'U','0'
       ),
      'Y','0'
      ),
     'H','0'
     ),
    'W','0'
    ),
   'B','1'
   ),
  'F','1'
  ),
 'P','1'
 ),
                                  'V','1'
                                  ),
                                 'C','2'
                                 ),
                                'G','2'
                                ),
                               'J','2'
                               ),
                              'K','2'
                              ),
                             'Q','2'
                             ),
                            'S','2'
                            ),
                           'X','2'
                           ),
                          'Z','2'
                          ),
                         'D','3'
                         ),
                        'T','3'
                        ),
                       'L','4'
                       ),
                      'M','5'
                      ),
                     'N','5'
                     ),
                    'R','6'
                    ),
                   '1111','1'
                   ),
                  '111','1'
                  ),
                 '11','1'
                 ),
                '2222','2'
                ),
               '222','2'
               ),
              '22','2'
              ),
             '3333','3'
             ),
            '333','3'
            ),
           '33','3'
           ),
          '4444','4'
          ),
         '444','4'
         ),
        '44','4'
        ),
       '5555','5'
       ),
      '555','5'
      ),
     '55','5'
     ),
    '6666','6'
    ),
   '666','6'
   ),
  '66','6'
  ),
 '0',''
 )
)
),
                  'B1','B'
                  ),
                 'F1','F'
                 ),
                'P1','P'
                ),
               'V1','V'
               ),
              'C2','C'
              ),
             'G2','G'
             ),
            'J2','J'
            ),
           'K2','K'
           ),
          'Q2','Q'
          ),
         'S2','S'
         ),
        'X2','X'
        ),
       'Z2','Z'
       ),
      'D3','D'
      ),
     'T3','T'
     ),
    'L4','L'
    ),
   'M5','M'
   ),
  'N5','N'
  ),
 'R6','R'
 )&'000',4)

Formula to replace special characters - slug creation
#2

That’s a lot of substitutes :wink:

Impressive


#3

Amazingly, I didn’t have this going through my head…

https://www.youtube.com/watch?v=eswQl-hcvU0

…until your reply. :wink:


#4

It would be great if they had a powerfull regex replace function.


#5

Yeah, but then I’d have to go and learn regex… :wink:


#6

If you manage to build soundex with all that formulas, then you’ll feel comfortable with regexp :-). Much less typing.