Needed this functionality in a Rust application I'm writing so I ported the Go c...

bojanz · on Dec 31, 2020

What are the benefits of Snowflake compared to ULID, which already has Rust implementations[1][2]?

abahlo · on Dec 31, 2020

Sonyflakes are generally more useful in massive scale environments as you won't run into conflicts due to the "machine-id" part (which defaults to the lower 16 bits of the private IP address).

There are more differences (128bit vs 64bit, rng vs time-based), I think they have different use cases.

eutectic · on Dec 31, 2020

You shouldn't ever have a collision with 128 bits of entropy.

j-pb · on Dec 31, 2020

More structure actually increases the chance for collisions. Because at that scale the chance of bit errors is dominating the equation. More rng protects against that.

ComputerGuru · on Dec 31, 2020

Ulid doesn’t have a “machine id” or “node id” component. I think you are mixing it up with one of the uuid variants.

anomaloustho · on Dec 31, 2020

Not sure if this is also a limitation of Snowflake. But ULIDs are only unique down to a certain timescale. (milliseconds I believe)

sylvain_kerkour · on Dec 31, 2020

Not exactly, They can be sorted, by default, only down to a millisecond, but you can use a monotonic generator to have them sorted, even if more than one Ulid is generated within a millisecond.

Other than that, they have 80 bits of randomness, enough to be unique even if millions are generated per second.

https://github.com/ulid/spec

echelon · on Dec 31, 2020

Nice work!

I'm curious, if you don't mind my asking. What are you building that requires that amount of scale? These are heavy hitting IDs (but of course you know that).

abahlo · on Dec 31, 2020

Thanks and I don't mind at all!

To be honest for the application I didn't want to expose a database serial (otherwise you'd know you're user #100, for example) or use UUIDs, so it's less about scale and more about obscuring ids. The library is well suited for huge scale scenarios nonetheless.

LaundroMat · on Dec 31, 2020

Why did you not want to use UUID's?

abahlo · on Dec 31, 2020

I already use UUIDs for various fields and didn't want the id to be confused with other fields. More of a clarity/style decision than technical.

tinus_hn · on Dec 31, 2020

Unfortunately this means you get to make the same mistake as originally made with UUID: the static machine ID is a privacy leak.

abahlo · on Dec 31, 2020

Is it? Not sure private ips are so sensitive, esp. the lower parts.

tinus_hn · on Dec 31, 2020

If it’s unique it’s a privacy leak. I’m not going to debate that, it’s common knowledge.

UUID was changed in the 90s to be a hash of this instead and later to just be a completely random number because there are so many bits the likelihood of a single duplicate being generated before the sun has swollen enough to consume the earth is slim so you don’t actually need these schemes to provide a unique number.

social_quotient · on Dec 31, 2020

Maybe they need the distribution part and not the scale. But I’m curious now too.