> Except you literally can't do random distribution AND be compliant with UUIDv7...

lazide · on July 22, 2023

The point I'm making is all these shenanigans are completely unnecessary, don't really help, and make everything extremely hard to manage, reason about, and get performance from - all to try to force usage of a specific key format (UUID) in a situation which it is not designed for, and for which it is not suited.

It's square peg, round hole.

And folks working on Exabyte sized indexed datasets generally already get this. So I'm not sure why i'm even having this discussion? I'm not even getting paid for this!

Ciao!

Dylan16807 · on July 22, 2023

"it allows easy/cheap write combining" is not "completely unnecessary". What the heck, at least be consistent.

And it's not shenanigans! You could shard based on the first bytes of a key, or you could shard based on the last bytes of the key. Neither one should be harder. Neither one is shenanigans.

> It's square peg, round hole.

Going entirely random is an even worse peg.

wood_spirit · on July 23, 2023

Wow a long thread of back and forth and confusion :)

Fwiw I’m with Dylan on this one!

I have direct experience of absolutely humongous data processing using random bits for shard selection where each shard uses sorted storage and benefits from the sortability of the time bits so, with just the smallest buffering, all inserts are basically super fast appends.

This is super normal in my experience. And I can’t wait for the new UUID formats to land and get widely supported in libs to simplify discussions with event producers :)