Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How do you generate unique id's, how do you plan against ceilings?
5 points by bosky101 on March 9, 2008 | hide | past | favorite | 9 comments
Those of you familiar with the amazing problem that flickr had recently when they touched the ceiling of their numbering schemes for images.

in the erlang world , apart from having to hack our auto auto-incrementor for mnesiaDB. you also get to do interesting stuff like tuples that are'nt serialised as unique id's.

->{ node() , erlang:now() }

-> myReadLastIdAndIncrementByOne(table_name)

-> erlang:phash2 ( <List> )

-> erlang:term_to_binary(<List| Tupel|Integer>).

1) how do you generate unique id's for your app ?

2) Assuming you're thinking to grow real big (dont all YC'ers :) ), how do you plan to handle such ceilings ?



Your myReadLastIdAndIncrementByOne looks disturbing. Are you sure mnesia:dirty_update_counter/3 doesn't work for your needs?

As for ids in mnesia, you aren't (realistically) at risk of overflowing an integer storage size since you get automatic promotion to bignums. {node(), now()} works fine if you are working in a distributed context and don't need sequential IDs. Also erlang:md5/1, or crypto:sha/1 can give you unique IDs to use over distributed nodes. I wouldn't recommend erlang:phash2 because it only yields 27 bit to 32 bit hashes.

I'm not sure what good term_to_binary would be if you are keeping everything in mnesia.

I would recommend looking into mnesia:dirty_update_counter/3 first and then expand to either {node(), now()} or erlang:md5/1 or crypto:sha/1 if you find updating the counter at distributed nodes is a problem.


thanks for the feedback.

yes,im aware of update_counter it works gr8 for incrementing my N. but what if i have a tuple/list/ as the id.

shoud i be using mnesia in the first place ? would you use {node(),now()} if you were'nt on a distributed system? how do i notify when i reach a ceiling ? what's the latency in searching only within a specific node ?

im sure you can't rule out situations where a one-type-fits-for-all. i just threw couple of different methods out there. thanks for letting me knwo about md5,sha .

moreover i wanted to highlight tuples as id's . let's take this very url .

->{ node() , 132377} and simply read 32377

->{ node() , erlang:md5(132377) } and decrypt while showing?

-> how would you store this very url ?


I simply wouldn't worry about it.

If you have a problem like that you have the resources to solve it.

If you are starting a business and worry about problems like that you are not focusing on the right things.


Two reasonable choices.

1. Cast your ids to strings and handle them as strings everywhere.

2. Just use 64 bits. If you have a pentillion of anything it won't be you fixing the problem any more.


IIRC, the application I am working on right now uses 128 bit ids. In fact, I believe that they are just randomly generated. Since each row we fetch is already quite large, the extra 64 bits isn't much of a cost, but the ability to generate new IDs without any overhead or communication between processes is very beneficial. (The app is distributed between as many as 300 machines, all of which try their best to keep in sync with each other in realtime.) We just have to be careful about how the id generators are seeded.


300 machines ? nice! btw,did u use distributed erlang.and how do u make sure they unique ?


Actually, it's not Erlang, but Smalltalk. We don't have to make sure they're unique, that's the whole point. There are far less than 4 billion objects in the system. The changes of a collision are minuscule.


Have you looked into guids?


Don't use hashes for primary keys because they are verbose and they cause a huge amount of cache churn ( http://news.ycombinator.com/item?id=122869 ). Use a composite key instead.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: