
Nano ID:  A tiny, secure URL-friendly unique string ID generator for JavaScript - iskin
https://github.com/ai/nanoid
======
bmn__
For years already I've been using UUID v4 (generated from a good random
source) then base64url encoded, resulting in a 22 characters/bytes long
identifier.

The nanoid implementation does not really bring anything new to the table
except having fewer SLOC (which I personally don't find important). It is also
incompatible with
[https://tools.ietf.org/html/rfc4648#section-5](https://tools.ietf.org/html/rfc4648#section-5)
for no good reason. It is noticable again and again that devs publishing to
npm are not up to snuff, I wish I knew why, as this does not happen with other
dynamic languages/their code repos.

~~~
watty
Good points but this comment reeks of arrogance. It absolutely does occur in
other languages, why would you assume it doesn't?

~~~
sametmax
To be fair, most other high level languages have functions in their stdlib to
do common stuff like hashing, generating unique id, getting data from the OS
random generator, etc.

JS stdlib has always be lacking in almost every possible aspects, and hence
people have to reinvent the wheel again and again.

------
sigi45
I would still use uuid v4. Databases support this data type and reduce its
storage needs. You can also compact it if it is too long for your use case or
you like to save traffic.

~~~
zepolen
But that goes against the modern principle of reimplement everything.

~~~
artursapek
How are we supposed to learn anything if we never try to do it ourselves?

~~~
rootlocus
If you want to learn how UUIDs work, implement the standard.

~~~
artursapek
This isn't about UUIDs. I think the project creator in the OP was just
learning lower level things like working with randomness and bytes... I don't
see the problem. Experimenting is fun. Leave them alone.

~~~
scient
A bad a excuse for a bad library. See this time and time again in NodeJS
though.

~~~
dchest
This is a good library.

~~~
rootlocus
However pretty, I wouldn't use something non-standard for generating IDs
unless I was doing something embedded which required careful memory management
under hardware constraints, where I most definitely wouldn't be using JS.

~~~
rootlocus
@dchest So you frequently use varchar for your table index?

~~~
dchest
Why do you need VARCHAR? If you use SQL, you can use a fix char type to store
fixed-length random strings as IDs. If you use SQL which supports native UUID
type, sure, go ahead and use it. But the world is not limited to SQL databases
or even any databases.

In three project I'm working on right now, two identify objects by content
HMAC (one of them uses an SQL database), the other uses a mix of incrementing
integers and a random string for different use cases.

~~~
rootlocus
> two identify objects by content HMAC (one of them uses an SQL database), the
> other uses a mix of incrementing integers and a random string for different
> use cases.

If those two are using SHA-1, like git, you are still using a standard. The
third one however sounds like a potential mess.

~~~
dchest
_If those two are using SHA-1, like git, you are still using a standard_

Lovely, so I can officially claim I use a _standard_ that is required by many
commenters here if I just hash random bytes! Should I use a NIST-approved
standard or maybe GOST hash function will work? Can I use the output of
ChaCha20? If so, I'll be happy to read 16 bytes from /dev/urandom.

 _The third one however sounds like a potential mess._

Thanks for your analysis!

~~~
rootlocus
> Lovely, so I can officially claim I use a standard that is required by many
> commenters here if I just hash random bytes!

Stop pretending that you don't understand what a cryptological hash function
is, or that using a strong crypto function is the same as rolling your own, or
that using the hash function on your input is the same as running it on random
bytes.

> Thanks for your analysis!

Thanks for the anecdotal evidence.

------
falsedan
> _`random % alphabet` is a popular mistake to make when coding an ID
> generator. The spread will not be even; there will be a lower chance for
> some symbols to appear compared to others—so it will reduce the number of
> tries when brute-forcing._

I don't understand: is the default implementation of random in node.js flawed?
Surely it should produce a uniform distribution…

edit nevermind, found the link to
[https://gist.github.com/joepie91/7105003c3b26e65efcea63f3db8...](https://gist.github.com/joepie91/7105003c3b26e65efcea63f3db82dfba)
just above that paragraph

~~~
jimktrains2
0,1,2,3,4 % 3

0 => 0

1 => 1

2 => 2

3 => 0

4 => 1

So, a uniform input will produce more 0s and 1s than it will 2s.

~~~
heavenlyblue
Only if the alphabet size is not a multiple of two.

~~~
jimktrains2
More generally, only if your input is an integer multiple of your alphabet.

However, there are methods of converting uniform distributions that will
preserve uniformity. I'll have to find a good reference.

~~~
heavenlyblue
Well, it's obvious for me.

Issue is that the root reply seemed to have taken it as a JS implementation
issue (due to math.random) as it's quite obvious why you should not do the
modulus (or when you can), if you know the underlying maths.

So did I, as a matter of fact.

You - on the other hand had given the answer to the wrong confusion.

~~~
falsedan
> _Issue is that the root reply seemed to have taken it as a JS implementation
> issue_

I'm the root reply. jimktrains2's explanation exactly addressed my confusion.

To get a uniform distribution over an alphabet from random, I would do:

    
    
        var bucket_size = (random.MAX_VALUE - random.MIN_VALUE) / count(alphabet)
        return alphabet[random/bucket_size]

~~~
jimktrains2
Actually, that doesn't work either

max = 4, min = 0, count = 3

bucket_size = 4 / 3

0,1,2,3,4

0 / (4/3) = 0

1 / (4/3) = 3/4 = 0

2 / (4/3) = 6/3 = 2

3 / (4/3) = 9/4 = 2

4 / (4/3) = 12/4 = 3 = invalid index

but even if you adjust this to count - 1

bucket_size = 4 / 2 = 2

0 = 0/2 = 0

1 = 1/2 = 0

2 = 2/2 = 1

3 = 3/2 = 1

4 = 4/2 = 2

So, 0s and 1s are still more likely than 2 to appear.

One of the easiest, and slowest, ways is just to discard any trial in which
your uniform RNG generates a number above the largest index and try again.

For a larger alphabet, you essentially just collect multiple trials, treat
them as a single trial, and do similar tactics.

~~~
falsedan
> _bucket_size = 4 / 3_

I have an off-by-one error… your random function has a range of 5, so it
should be max - min + 1:

    
    
        bucket_size = 5 / 3
    

then

    
    
        0 / (5/3) = 0
        1 / (5/3) = 3/5 = 0
        2 / (5/3) = 6/5 = 1
        3 / (5/3) = 9/5 = 1
        4 / (5/3) = 12/5 = 2
    

> _So, 0s and 1s are still more likely than 2 to appear._

Yeah, I planned for 65535 randoms and 52 alphabet. It would still mildly bias
to the first range mod alphabet_size elements.

------
jmull
I like that this generates the ID directly by choosing random symbols.

But I don't like that it's equivalent to UUID v4 but different. The difference
doesn't buy anything significant over the standard.

With a pretty small change this could directly generate base64 encoded UUID v4
(without going through a two-step process of generating a UUID and then
encoding it: Most characters could be generated randomly from the base64
alphabet. The characters at certain indexes would need to use a restricted set
of base64 characters to set certain bits to the values specified by UUID v4.

------
pfooti
I really like the ulid library. Does what I want, compact representation, and
lexicographically sorting ulids in time order is great. The character subset
is URL friendly, so this is my go-to when I'm making nonces if I am not
worried about super-double-secure prngs.

[https://github.com/alizain/ulid](https://github.com/alizain/ulid)

------
Confiks
For a toy project, I'm currently using a 64-bit integer of which the first bit
is a sign bit (I'm using Postgres' bigint), the next 32-bits make a UNIX epoch
timestamp in seconds, and the rest of the 31 bits make a random number. In
this way, the ID is very easily sortable by time, although it's a bit of a
gimmick.

The scheme allows for 2 billion possible records per second, although if you
consider birthday collisions, it comes down to a lot less (but I could re-
insert on duplicates). I could later also use some of the random bits to
inscribe extra data into the ID.

The bits from the two halves are then interspersed to make sure the first half
looks random, and transformed to a 11-character id, for example
/^[0-9A-Za-z]{11}$/. I can reject some IDs beforehand if they match a list of
bad strings. The result is receiving an id such as "Vxe61yU5Cci" or
"WgOUBEO88gb".

~~~
mort96
What happens in 2038? I hope you're never representing those 32 bits of UNIX
timestamp as a signed integer anywhere. If you're making sure to always keep
it unsigned, what happens in year 2106?

EDIT: I'm not criticizing you, and a valid counter argument would be that you
make absolutely sure to always store the timestamp unsigned, that it doesn't
need to work until 2106 because it's a toy project, or that an overflow
wouldn't really matter because you make sure to keep it unique even after an
overflow, but I get scared any time storing a UNIX timestamp in 32 bits is
mentioned, and am wondering if you've considered it.

~~~
bpicolo
> but I get scared any time storing a UNIX timestamp in 32 bits is mentioned

You're scared because an average app hasn't considered a problem that's 21
years off? Seems like a premature worry

~~~
random023987
> You're scared because an average app hasn't considered a problem that's 21
> years off? Seems like a premature worry

'Those who cannot remember the past are condemned to repeat it.'. --George
Santayana

~~~
bpicolo
Yeah? Did people need to prepare for the y2k bug in 1979?

~~~
mort96
Well yeah, weren't a lot of the broken systems big cobol systems from that
era?

You never know what software ends up being still used for critical systems 20
years down the road, so it's stupid to knowingly make a new system which is
guaranteed to break in 20 years.

That said, it sounds like this guy has thought about it, and is making sure to
use all 32 bits to store the timestamp.

------
maxpert
Wait so `%` is bad but `& 63` is fine? Ain't you doing a mod as well??? It's
just that you have power of two (64 in this case) that you are trying to
exploit here... anyone who knows basics of computer science can tell you it's
same! If it's about reducing code size I can demonstrate smaller code:

module.exports = function (size) {

    
    
      return random(size || 22)
               .map(r => url[r & 63])
               .join('');
    

};

I am always disappointed to see these kind of packages, as mentioned somewhere
in comments above

> It is noticable again and again that devs publishing to npm are not up to
> snuff, I wish I knew why, as this does not happen with other dynamic
> languages/their code repos.

And I totally agree!

~~~
dchest
Check the code below:

    
    
          var masks = [15, 31, 63, 127, 255]
          ...
          var mask = masks.find(function (i) {
              return i >= alphabet.length - 1
          })
          ...
    
          var byte = bytes[i] & mask
          if (alphabet[byte]) {
            id += alphabet[byte]
    

Masking results in a random uniform number that is a power of two and greater
or equal than the alphabet length and then indexes that are greater than
alphabet length are rejected, which is a common way of avoiding modulo bias.

 _I am always disappointed to see these kind of packages, as mentioned
somewhere in comments above_

Because you didn't understand how it worked and assumed that it's bad?

~~~
daveFNbuck
I believe the criticism here is about index.js, which is what all the other
claims are about.

    
    
      var bytes = random(size)
      for (var i = 0; i < size; i++) {
        id += url[bytes[i] & 63]
      }
    

This part of the code just uses an alphabet size of 64, which makes the
approach actually used in the exported function identical to modding the byte
by the alphabet size.

~~~
dchest
Yes, with a power-of-two alphabet length there's no bias. The poster seems to
be confused about modulo bias claims, which apply to non-power-of-two
alphabets, not a particular way of calculating modulus.

------
dchest
My version:
[https://gist.github.com/dchest/751fd00ee417c947c252](https://gist.github.com/dchest/751fd00ee417c947c252)

~~~
dchest
(replying to myself as I can no longer edit:) My favorite feature in it is
.entropy(n) method: when you want some fixed amount of unpredictable bits, but
don't want to calculate how long the random string should be in your encoding
to contain that many bits.

------
danielbankhead
I designed something similar (in terms of URL-friendliness), but for
distributed systems:

[https://github.com/AltusAero/bronze](https://github.com/AltusAero/bronze)

It's designed for use cases where collision-resistance (both time-based
[UUID1] and random [UUID4]) is valued more than the size of the id. Can be
used as a module (Node.js & browser) and via CLI.

------
bernadus_edwin
Silly question, why js not provide method generate uuid without library? I
mean this is 2017. One method without param input, wont hurt anybody

~~~
dchest
UUID is a misguided standard that should have been killed long ago if not for
compatibility reasons, so I'm glad it's not in JS.

------
StavrosK
I wrote a Python version of this six years ago, it's been working beautifully
and I use it for IDs in all my APIs:

[https://github.com/skorokithakis/shortuuid](https://github.com/skorokithakis/shortuuid)

That way, elements are neither enumerable nor guessable.

------
duke360
tilde (~) doesn't looks a wise choice to me as may have special meaning in
certain context, fortunately the alphabet are redefinible. i would also prefer
if at least some part fo the nanoid would be machine dependent. anyway, it
works :)

------
bluefox
If you need a function that returns integers in some range [0..n)
equiprobably, just write that. Why conflate a bunch of concepts and have weird
limitations?

Oh, and by the way, fuck HN for deleting comments.

------
dfreire
You can also check
[https://github.com/dylang/shortid](https://github.com/dylang/shortid)

~~~
arnioxux
This is a perfect example of what _NOT_ to do. They reimplemented a seeded
linear congruential generator for their random number generator:

[https://github.com/dylang/shortid/blob/master/lib/random/ran...](https://github.com/dylang/shortid/blob/master/lib/random/random-
from-seed.js)

But since LCGs are easy to solve, after seeing a few values you can solve for
the seed and generate all past and future values.

But you don't even need to do anything fancy for that. Since their
implementation of LCG only has a state size of 233280 different values, you
can just brute force it. (also means that their rng could only ever generate
233280 different numbers to begin with)

Why the fuck do they have 2.5k stars!?

EDIT: They already have an open issue for it but project seems unmaintained:
[https://github.com/dylang/shortid/issues/70](https://github.com/dylang/shortid/issues/70).
Stay far far away.

~~~
dfreire
You're absolutely right, and this is huge...

------
loop0
How does it compares to hashids ([http://hashids.org/](http://hashids.org/))?

~~~
qilo
Hashids are reversible mappings to/from integers. Nano IDs are random strings.

------
magnat
String generated using default parameters has ~0.1% chance of containing four
letter english profanity.

~~~
ludicast
I'd consider that a feature not a bug :).

------
ijustdontcare
what about uuid, a well researched and established standard?

~~~
namelost
UUIDs are not secure unless they are version 4 (random) and generated from a
secure randomness source.

~~~
wruza
The same applies for any unique ids in general.

~~~
dchest
Yes, but _the holy UUID standard_ specifies multiple ways of generating them
from sticks and stones, invented before people knew how to properly generate
random bytes.

------
Kiro
Why wouldn't this be a one-liner?

~~~
dchest
It would be a pretty long line, containing a function wrapping
crypto.randomBytes, window.crypto.getRandomValues,
window.msCrypto.getRandomValues selected at runtime, and then a few lines to
select characters avoiding modulo bias. Try?

~~~
mort96
That sounds like a pretty interesting code golf-ish challenge actually, would
attempt if I had more than 3% battery on my laptop atm :(

