Generate Random[ish], Unique IDs

Note: I finally found (and finished) the quick little project I started back in March — right before all my teeth exploded. :wink:


Here’s another niche application: Code to generate random[ish] unique numbers in an arbitrary range — say, from 1 to 10000[ish].

What it does

Generates a unique output number for every input number within the range. The relationship between the two numbers is not immediately obvious — and is difficult (although far from impossible) to discern, even if a user manages to see several numbers output in sequence.

Why you might want to do it

If all you need is a unique, random identifier, you’re in luck: That’s what RECORD_ID() is.¹ The downside is that something like recdalXVhg9AkUMG9 can be unwieldy; at times you may need to generate, say, a 4- to 6-digit value that is (a) unique, (b) random, and (c) not impossible to hold easily in short-term memory. (Think of the temporary authorization codes or PINs one is sent via SMS or email when trying to validate an email address or mobile number.)

The post that got me started on this was from someone who wanted a way to generate a 4-digit random[ish] key value so his customers wouldn’t be able to ‘fish’ for related records by incrementing or decrementing part numbers. Similarly, you might want to assign a time-limited random PIN, possibly to validate a transaction on another system, without having to worry someone would be able to game the system [easily]. Or maybe you just want to be able to issue a unique customer or order ID without letting the competition know your sign-up or order rate…

How it works

Unfortunately, given Airtable’s current lack of persistence, there’s no easy way to assign pseudorandom IDs as one usually does: Create a record; generate a [pseudo]random number based on the ID previously assigned; save it to the record; store the last value generated for use in generating the next ID; rinse and repeat.² Instead, what we need to do is (a) use an already-existing unique value for each record and then (b) munge it in a way that is [reasonably] unpredictable yet still retains the original’s uniqueness.

The important thing to remember here is the ID is not a random value but merely a transmogrified version of another number: You might enter 6000 and 6001 and receive an output of 6158 and 2139, but every time you enter 6000 you will get 6158, and every time you enter 6001 you will get 2139. If you’re using the random number to hide the underlying value — and, obviously, you are; otherwise, why bother? — you’ll clearly want (1) to keep your seed value a secret and (2) to use a different seed value every time you make use of this code.

How it works, mathematically

I dunno: I stole the algorithm. (You can find a detailed explanation here.)

At a practical level, you need to select a seed value; that value also sets the maximum number of unique IDs that can be generated. (For instance, if your seed value is 10,000 — which it can’t be, for reasons I’ll explain in a moment — your code would output 10,000 unique values; the value output for an input of 10,001, though, would be the same as the one output for a value of 1.) That seed value is then plugged into a surprisingly simple formula, along with the value of an autonumber field, and the formula returns the transmogrified ID. (In this case, the autonumber is your input value, and the faux-random ID is your output.)

Now, here’s why the maximum value can’t be 10,000: The value chosen for your seed must meet two criteria:

  1. It has to be a prime number.
  2. The value of (Seed MOD 4) must equal 3.

OK, so right now you’re thinking, ‘Come on, Vann — how long is it going to take me to find a number that passes both of those tests?!?’

My answer: ‘About a minute.’

Finding the right prime number

Here’s how:

  1. Decide what your maximum value should be. Sometimes that will be determined by the number of digits you need; if you can only have 4 digits, then you need a maximum of 9999 or less. If you can get by with 5 or 6 digits, though, your random sequence will be harder to break. There will be gaps in your numbering — remember, Airtable currently supports a maximum of 50,000 records per base, so with 5 digits you’ll have at least 49,999 unused IDs; with 6 digits, 949,000 will go unused. However, among the 50,000 or fewer IDs allocated, you can be assured there will be no duplicates.
  2. Go to Wolfram Alpha (wolframalpha.com).
  3. Enter ‘previous prime [Maximum]’ — for instance, if you want to use a maximum value of 600,000, enter ‘previous prime 600000’.
  4. Wolfram will return the largest prime number less than [Maximum] – in this case, 599,999.
  5. Enter ‘[Prime] mod 4’ — e.g., ‘599999 mod 4’.
  6. Wolfram will return the result of that operation — in this case, 3.
  7. If the result of Step 6 is 3, stop: You have your seed value.

Otherwise, if the result of ’[Prime] mod 4' is anything other than 3, repeat Steps 3 through 7 until the result is 3 — but instead of [Maximum], use the value of the previously found [Prime]. For instance

previous prime 425000
        --> 424,997
424997 mod 4
        --> 1
previous prime 424997
        --> 424,967
424967 mod 4
        --> 3

Plugging it into the formula

The formula for generating the unique, random-ish output is deceptively simple. In the following, {Nbr} is an autonumber for the record and {Prime} is the prime value you found in the previous step:

IF(
    ({Nbr}*2)<{Prime},
    MOD({Nbr}*{Nbr},{Prime}),
    {Prime}-MOD(({Nbr}*{Nbr}),{Prime})
    )

And that’s it.

Tweaking the sequence

You might want to make a few modifications to the formula, either to shift the output into a slightly different range or to obscure the sequence further.

One downside to this algorithm is that the output returns a sequence of squares — 1, 4, 9, 16, and so on — until {Nbr} equals the square root of {Prime}. While in and of itself it doesn’t necessarily reveal the underlying algorithm, it does make predicting the next item in the sequence trivial, at least at first. There are several possible ways around this issue.

  1. Skip the first 10 to 50 entries — or however many you feel you need to avoid an obvious progression. To do so, create and delete new records in your table so the first ‘real’ record has a higher autonumber.
  2. Add an arbitrary number to either {Nbr} or the output value. Unless you’re operating at the upper boundary of the range — for instance, if you need a value of 4 or fewer digits, and you’re already using a {Prime} close to 10,000 — you can do so without worry.
  3. Shuffle the digits of the resulting random value. For instance, to shuffle a 6-digit response, you can do this:

First, convert the output to a fixed-length {String}:

REPT(
    '0',
    6-LEN(
        {Random}&''
        )
    )&{Random}

Next, shuffle the digits:

MID(
    {String}&'',
    4,
    1)&
MID(
    {String}&'',
    2,
    1
    )&
MID(
    {String}&'',
    6,
    1
    )&
MID(
    {String}&'',
    3,
    1
    )&
MID(
    {String}&'',
    1,
    1
    )&
MID(
    {String}&'',
    5,
    1
    )

Note: This may change how randomly distributed the output ID may be — but, then, you aren’t using this to provide security, are you?

The demonstration bases

You don’t need an example base to implement this code: The formulas are given in the post. However, I’ve provided links to 1,000-record and 10,000-record bases that also contain code from my data deduplication routines; this allows you to demonstrate the random output is, in fact, unique, and it lets you experiment with various ranges and adjustments to the output while making sure you haven’t compromised the uniqueness.

Warnings and caveats

  1. Because for any given {Prime}, numbers are output in the same sequence, you want to be sure not to use the same {Prime} as a seed more than once. You probably also want to choose uncommon values, rather than ones nearest a commonly used range. (For instance, don’t use 10007, the nearest prime to 10000 – and don’t use 424967, the value used in the demo bases! :wink:
  2. Because this code is driven by autonumber, if you change a record’s autonumber, you change the generated ID.³ This is probably not something you want to do on a live base.
  3. Most importantly, this code is not suitable for use in a security routine. It should be limited to gatekeeper applications, where both your and user’s exposure is limited.⁴

Notes

  1. Well, if you’re a CS or math major, you wouldn’t call it ‘unique,’ and you might not call it random — but for the vast majority of us, it is as near unique and random as we’ll ever need. RECORD_ID() appears to be (once you clip off the ‘rec’ prefix) a 14-digit base₆₂ number, which would mean it has 12,401,769,434,657,526,912,139,264 possible values. Even if the random-number generator powering it is slightly unrandom, the likelihood of generating the same value twice is, as they say, ‘negligible’.

  2. I should mention here the Batch Fill Block, a Pro-level feature, does an excellent job of creating random IDs for a bunch of records at a time; since those random numbers are saved in a data entry field, they are persistent — that is, they aren’t dynamically updated, as Airtable formula fields are. The code presented here, though, is better-suited for ongoing, real-time generation of a single, faux-random value.

  3. How do you change a record’s autonumber? Change {Nbr} to another field type, such as number; sort the records according to some criteria; change {Nbr} back to an autonumber field. Records are assigned a new autonumber, based upon the current sort.

  4. What do I mean by a ‘gatekeeper’ application? Let’s say you are managing a database of fan club members for a popular musical group, and you want to offer members first chance to buy tickets for an upcoming gig at a small venue. As one possible method of gatekeeping, you could generate unique PINs per member and then configure your ecommerce site to require matched email and PIN for purchase. The point here wouldn’t be security, as such, but simply an attempt at flow control by making it harder (but not impossible) for non-members to buy tickets. Your exposure would be limited, in any case, because you could still detect and reverse any non-member purchases after the fact.

4 Likes