
How do you generate unique id's, how do you plan against ceilings? - bosky101
Those of you familiar with the amazing problem that flickr had recently when they touched the ceiling of their numbering schemes for images.<p>in the erlang world , apart from having to hack our auto auto-incrementor for mnesiaDB. you also get to do interesting stuff like tuples that are'nt serialised as unique id's.<p>-&#62;{ node() , erlang:now() }<p>-&#62; myReadLastIdAndIncrementByOne(table_name)<p>-&#62; erlang:phash2 ( &#60;List&#62;  )<p>-&#62; erlang:term_to_binary(&#60;List| Tupel|Integer&#62;).<p>1) how do you generate unique id's for your app ?<p>2) Assuming you're thinking to grow real big (dont all YC'ers :) ), how do you plan to handle such ceilings ?
======
seiji
Your myReadLastIdAndIncrementByOne looks disturbing. Are you sure
mnesia:dirty_update_counter/3 doesn't work for your needs?

As for ids in mnesia, you aren't (realistically) at risk of overflowing an
integer storage size since you get automatic promotion to bignums. {node(),
now()} works fine if you are working in a distributed context and don't need
sequential IDs. Also erlang:md5/1, or crypto:sha/1 can give you unique IDs to
use over distributed nodes. I wouldn't recommend erlang:phash2 because it only
yields 27 bit to 32 bit hashes.

I'm not sure what good term_to_binary would be if you are keeping everything
in mnesia.

I would recommend looking into mnesia:dirty_update_counter/3 first and then
expand to either {node(), now()} or erlang:md5/1 or crypto:sha/1 if you find
updating the counter at distributed nodes is a problem.

~~~
bosky101
thanks for the feedback.

yes,im aware of update_counter it works gr8 for incrementing my N. but what if
i have a tuple/list/ as the id.

shoud i be using mnesia in the first place ? would you use {node(),now()} if
you were'nt on a distributed system? how do i notify when i reach a ceiling ?
what's the latency in searching only within a specific node ?

im sure you can't rule out situations where a one-type-fits-for-all. i just
threw couple of different methods out there. thanks for letting me knwo about
md5,sha .

moreover i wanted to highlight tuples as id's . let's take this very url .

->{ node() , 132377} and simply read 32377

->{ node() , erlang:md5(132377) } and decrypt while showing?

-> how would you store this very url ?

------
mixmax
I simply wouldn't worry about it.

If you have a problem like that you have the resources to solve it.

If you are starting a business and worry about problems like that you are not
focusing on the right things.

------
lacker
Two reasonable choices.

1\. Cast your ids to strings and handle them as strings everywhere.

2\. Just use 64 bits. If you have a pentillion of anything it won't be you
fixing the problem any more.

~~~
stcredzero
IIRC, the application I am working on right now uses 128 bit ids. In fact, I
believe that they are just randomly generated. Since each row we fetch is
already quite large, the extra 64 bits isn't much of a cost, but the ability
to generate new IDs without any overhead or communication between processes is
very beneficial. (The app is distributed between as many as 300 machines, all
of which try their best to keep in sync with each other in realtime.) We just
have to be careful about how the id generators are seeded.

~~~
bosky101
300 machines ? nice! btw,did u use distributed erlang.and how do u make sure
they unique ?

~~~
stcredzero
Actually, it's not Erlang, but Smalltalk. We don't have to make sure they're
unique, that's the whole point. There are far less than 4 billion objects in
the system. The changes of a collision are minuscule.

------
mwerty
Have you looked into guids?

~~~
xirium
Don't use hashes for primary keys because they are verbose and they cause a
huge amount of cache churn ( <http://news.ycombinator.com/item?id=122869> ).
Use a composite key instead.

