

Twitter Announces "Snowflake" for Unique Tweet IDs - jolie
http://engineering.twitter.com/2010/06/announcing-snowflake.html

======
paulsmith
Interesting, they have their own custom epoch, the "twepoch":

    
    
        // Tue, 21 Mar 2006 20:50:14.000 GMT
        val twepoch = 1142974214000L
    

According to the README they can fit 69 years worth of timestamps in 41 bits
with the custom epoch, since they don't care about any times that happened
before Twitter launched.

~~~
kingryan
That's the time of the first tweet. It saves us 30-something years of id
space.

~~~
dschobel
Totally OT but do you guys still know the content of the first tweet?

Apparently the best google-fu effort only yields #20:
<http://twitter.com/jack/status/20>

~~~
jmhodges
That is, as far as the archive goes, the very first public tweet.

------
seldo
The most interesting part of this post to me is the implication that despite a
lot of noise about Twitter switching to Cassandra, they haven't actually done
it yet -- if they had, they'd need snowflake in production, and they say that
it's not.

~~~
kingryan
It turns out big problems are big. We're moving moving existing data to
cassandra and putting a lot of new (mostly internal) data on cassandra.

~~~
seldo
I didn't mean to imply that it was easy, merely that I'd got the impression
from presentations that the migration had already happened.

------
petervandijck
Makes me think: when you scale databases up you always seem to have to loosen
up constraints. Similar to how the laws of nature change when you change scale
in nature, ie. quantum mechanics when you go really small, or weird laws of
nature when you look really far away in the universe.

~~~
RyanMcGreal
Consistency, Availability, Partition Tolerance: pick any two.

~~~
eagleal
Looks like our Universe chose the last two.

~~~
Groxx
I dunno, quantum mechanics / oddness with atomic-and-smaller implies partition
tolerance wasn't one of the choices.

Maybe Availability was all that was chosen? And it's doomed in a mere some-
billion years anyway...

~~~
eagleal
Still we have to confirm this theories. I just hope I'll be alive when all the
Universe physics has been explained (I mean that we can use the found laws and
change them knowing every single outcome), and we'll create other Universes.

------
brown9-2
Their approach for this is genius: they have some ideas for designs, some
things they tested out - but while it's still alpha and not yet in production
they open source it, allowing lots of other developers to take a look and find
problems with their ideas and hopefully make it even better.

------
jallmann
There is an implication of a very interesting math problem just waiting to be
investigated. Unique IDs are a natural fit for something like a hash function,
but now consider the "roughly sortable" requirement -- how about a homomorphic
hash (invent one!), or a hash with a weak/predictable avalanche property.

Of course, that is probably too much for twitter who just needs to Get It
Done, but I find such things interesting to think about.

------
timf
It would be an interesting problem to really try and get the timestamps around
10's of ms for the time sorting bound. With multiple datacenters you might get
up to ~10ms drift with NTPv4 so you're starting point is already pretty
crippled. And how would you even test? :-)

~~~
kingryan
NTP seems to work fairly well for this. It takes into account network
latencies and attenuates over time.

~~~
timf
Well, quoting from: <http://www.eecis.udel.edu/~mills/ntp.html>

" _Used in the Internet of today with computers ranging from personal
workstations to supercomputers, NTP provides accuracies generally in the range
of a millisecond in LANs and up to a few tens of milliseconds in the global
Internet_ "

But this is news to me, interesting:

 _"When kernel support for precision timing signals, such as a pulse-per-
second (PPS) signal, is available the accuracy can be improved ultimately to
the order of one nanosecond in time and one nanosecond per second in
frequency._ "

------
est
so much for the new features, now where can I get my 3201th tweet?

------
paradox95
I think this is big because releasing this open source moves Twitter into an
R&D/software type of company who is contributing to more than just their own
system. They are moving in the same direction as companies such as Google and
Facebook.

------
mootothemax
What an interesting problem! Definitely not something I had ever given much
thought to in all honesty. Do Twitter now have enough Tweets flying out at any
given moment in time that they need to have multiple ID-generating servers?

Not sure about anyone else but it makes my mind boggle!

------
kordless
This would be somewhat useful for logging data as well!

------
jrockway
Kind of pointless. Just use a UUID for the ID and a date for the thing you
sort by. Twitter stores a "posted at" time, so why not sort by that? Using an
ID column for a time-based sort when you are storing the time anyway is silly.

Oh well, at least it was a fun hack.

~~~
petervandijck
Creating an index on the posted_at time may be harder than it sounds, with
"tens of thousands of tweets per second" and a distributed database. The
"within a second" approach (ie. loosening up the constraint of ordre) sounds
pretty briliant to me.

~~~
jrockway
But you don't need to index on time; you can sort on the client. This is what
twitter clients already do, except they sort an integer instead of a datetime.

Remember: Twitter is nothing new. We have done massively distributed messaging
since the 70s. It's called e-mail.

~~~
frognibble
Because the APIs fetch recent tweets, the service needs an index on time.

Although it should be easy for client applications to sort on time instead of
id, that's not what all applications do. Twitter chose not to break clients
that sort on id.

~~~
jrockway
You could just fake the id later. Why ruin your architecture because your
users are lazy?

------
adriand
Just wondering if the Twitter users here are able to use the Twitter website
consistently. I cannot. I frequently am unable to view user profiles, load up
my Twitter page in order to post tweets, or actually post a tweet after
submitting the form. As a result my Twitter usage has dropped dramatically.
Anyone else having the same issues?

