How do you generate unique id's, how do you plan against ceilings?

seiji · on March 9, 2008

Your myReadLastIdAndIncrementByOne looks disturbing. Are you sure mnesia:dirty_update_counter/3 doesn't work for your needs?

As for ids in mnesia, you aren't (realistically) at risk of overflowing an integer storage size since you get automatic promotion to bignums. {node(), now()} works fine if you are working in a distributed context and don't need sequential IDs. Also erlang:md5/1, or crypto:sha/1 can give you unique IDs to use over distributed nodes. I wouldn't recommend erlang:phash2 because it only yields 27 bit to 32 bit hashes.

I'm not sure what good term_to_binary would be if you are keeping everything in mnesia.

I would recommend looking into mnesia:dirty_update_counter/3 first and then expand to either {node(), now()} or erlang:md5/1 or crypto:sha/1 if you find updating the counter at distributed nodes is a problem.

bosky101 · on March 9, 2008

thanks for the feedback.

yes,im aware of update_counter it works gr8 for incrementing my N. but what if i have a tuple/list/ as the id.

shoud i be using mnesia in the first place ? would you use {node(),now()} if you were'nt on a distributed system? how do i notify when i reach a ceiling ? what's the latency in searching only within a specific node ?

im sure you can't rule out situations where a one-type-fits-for-all. i just threw couple of different methods out there. thanks for letting me knwo about md5,sha .

moreover i wanted to highlight tuples as id's . let's take this very url .

->{ node() , 132377} and simply read 32377

->{ node() , erlang:md5(132377) } and decrypt while showing?

-> how would you store this very url ?

mixmax · on March 9, 2008

I simply wouldn't worry about it.

If you have a problem like that you have the resources to solve it.

If you are starting a business and worry about problems like that you are not focusing on the right things.

lacker · on March 9, 2008

Two reasonable choices.

1. Cast your ids to strings and handle them as strings everywhere.

2. Just use 64 bits. If you have a pentillion of anything it won't be you fixing the problem any more.

stcredzero · on March 10, 2008

IIRC, the application I am working on right now uses 128 bit ids. In fact, I believe that they are just randomly generated. Since each row we fetch is already quite large, the extra 64 bits isn't much of a cost, but the ability to generate new IDs without any overhead or communication between processes is very beneficial. (The app is distributed between as many as 300 machines, all of which try their best to keep in sync with each other in realtime.) We just have to be careful about how the id generators are seeded.

bosky101 · on March 10, 2008

300 machines ? nice! btw,did u use distributed erlang.and how do u make sure they unique ?

stcredzero · on March 12, 2008

Actually, it's not Erlang, but Smalltalk. We don't have to make sure they're unique, that's the whole point. There are far less than 4 billion objects in the system. The changes of a collision are minuscule.

mwerty · on March 9, 2008

Have you looked into guids?

xirium · on March 9, 2008

Don't use hashes for primary keys because they are verbose and they cause a huge amount of cache churn ( http://news.ycombinator.com/item?id=122869 ). Use a composite key instead.