Hacker News new | past | comments | ask | show | jobs | submit login

Thanks for this reclarification, lead me to do a little searching.

https://mariadb.com/kb/en/mariadb/guiduuid-performance/

Has some helpful tidbits, like how pks get implicit copies to all other indexes making uuids quite expensive memory wise if you have a lot of referencing tables and a lot of indexes.

One of the recommendations I'm not sure on is to compress and reorder the uuid for time sequence.

I get the compression part (remove dashes, convert to bin with unhex), but I'm concerned reordering the uuid so the time sections are together (to order your writes better) might actually decrease entropy? and make collisions likely.

Does anyone here know much about uuid generation and whether the time sections reordered would make a difference across a table of around 100m~?

I think I might just go with internal incremental and external uuid. Solves the knowing key before insert problem with little downsides.

The time swap thing is now implemented in mysql8 (broken link in the mariadb post)

https://dev.mysql.com/doc/refman/8.0/en/miscellaneous-functi...




The probability of a collision can be estimated using the "Birthday Paradox" equation: http://planetmath.org/approximatingthebirthdayproblem

Say you can tolerate a 1 in 1 Billion probability of collision:

With completely random 128-bit UUIDs, you would need slightly over 800 Trillion Rows to have a 1 in 1 Billion chance of a collision. Equation: sqrt(2(2^128)ln(1/(1-1/1000000000)))

Adapting the UUID to contain the first 8 bytes as the number of milliseconds since the 1970 epoch would mean 64 bits of random data. You would need to insert over 192,000 rows IN A GIVEN MILLISECOND to have a 1 in 1 Billion chance of collision. Equation: sqrt(2(2^64)ln(1/(1-1/1000000000)))


> might actually decrease entropy? and make collisions likely.

It's rearranging in a deterministic manner, there's no loss of information, it's just a permutation. There is no impact to entropy nor the likelihood of a collision.


The UUID time-swapping trick seems like a bad idea outside of special cases. Overloading the UUID in this way makes you dependent on generating UUIDs correctly so that the ordering works. You can get insertion ordering by using an auto-increment integer primary key which also results in a relatively short key for secondary indexes. This is much easier to understand and cannot be messed up by application errors.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: