Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm curious, did you ever do testing on whether converting a UUID to a a pair of 64-bit ints and using a composite primary key worked well? I imagine it's annoying to deal with at both the schema and application level, but it is space efficient. Depending on how optimized numeric/decimal types are, those might by a good fit as well. You said other things degrade perf before that matters, so it may be moot, but wasting space for what's going to be in an index seems likely to have performance problems that affect other things subtly (such as ballooning your index memory usage).

If we're storing UUID's in canonical form that's 36 bytes of information (32+4 for separators, and if you're stripping separators why not convert to a numeric form anyway...). Even if we assume we're only storing 32 characters, that's 256 bits, so we're doubling the storage requirements and memory usage for an index. Assuming you're using UUID's because you have a lot of records and want to keep them unique, that space presumably matters (especially if you're using it multiple places because there are relations). If you don't have a huge amount of records, a sufficiently large auto-incrementing int field that increments by a random amount between 1 and 255, and a randomized bit shuffle codified into a reversible function (extend int to 128 bits, use previously decided bit shuffle to rearrange bits. Fully reversible) would solve that need and make everyone who has to deal with the DB's life easier. Just convert the ints to and from UUID representations between your view and controller.

I've never had to actually deal with UUID's at the DB level, so there could be some major downsides to these that I'm not thinking of. At this point, I'm just not sure what storing a UUID in a database gets you.



The two biggest things it gets us is an identifier that is unique across the database and also something that the application layer can generate and then provide to the database - instead of inserting a record and finding out what the id is after the fact. Secondarily it means that someone cannot guess what the next record is by trying to increment or decrement an integer key but like you say there are ways around that.

We do plan to eventually convert the text versions into real uuids but our performance/size isnt an issue at the moment and its a lot of risky engineering work to do.


> it gets us is an identifier that is unique across the database and also something that the application layer can generate and then provide to the database - instead of inserting a record and finding out what the id is after the fact

That's a good point, and makes sense. Also, as gzrm noted in another comment, you can make use of the different parts of the UUID to encode specific information, which may be helpful (but at the loss of uniqueness bits).


With respect to uniqueness, you can avoid collisions using, for example, some kind of sharding scheme, similar to what Instagram did:

https://engineering.instagram.com/sharding-ids-at-instagram-...


It's been quite a while since I've needed to, but when I was setting up HA databases, I just set the auto increment amount to some number more than the number of databases, and set the starting offset for each database to sequential ints (1,2,3...). This results in databases that can't generate duplicate ids. As a bonus, you know what database was used to create each record.


That's one way if you're using sequences to generate ids. If you're encoding information in UUIDs, you also have to take into account how many bits you want to allocate to the database (shard) id. One method I've used is to hard-code the shard id as the return value in a function: effectively a constant. (Edit to add: This serves the same purpose as setting the starting offset of the sequence.) You do have a bit of custom code on each database instance in this case, which is its own tradeoff.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: