

Twitter IDs to roll past 53 bits in a couple days (may break javascript apps) - there
http://groups.google.com/group/twitter-development-talk/browse_thread/thread/6a16efa375532182?pli=1

======
ars
For those wondering where 53 bits comes from, it's the size of the mantissa
(significand, fraction) in a 64bit IEEE float.

You can use 64bit floats (double) to store exact integers up to 53 bits.

This works in a lot of languages on non 64bit hardware, php for example.

------
simonw
My rule of thumb is to use strings for all IDs that belong to external
systems. A bunch of people got burned in the same way a few years ago when
Flickr photo IDs rolled past 32 bits.

~~~
drawkbox
Still surprised people use incrementing ints for ids over uuids for web apis
that may become immense.

<http://www.ietf.org/rfc/rfc4122.txt>

uuids are made just for this purpose, string based, never bigger than 40
characters (with dashes and curly braces).

Most products use uuids or Microsoft's name for them guids.

~~~
jonhohle
I prefer my ID's universally unique. I worry that globally unique ID's won't
scale as we begin to colonize other planets. (yes, its a joke.)

~~~
tjpick
Stating it is a joke kind of ruined the joke.

------
BigZaphod
When the first 32-bit troubles occurred, they should have just switched to a
string-based ID, IMO.

~~~
mbreese
That was my first thought... who on earth thought that a number (albeit
64-bit) would be enough for Twitter? Who even thought that a 32bit int would
have been enough?

I didn't know that Javascript couldn't handle numbers bigger than 53-bits, but
honestly, these should have been strings from the beginning.

~~~
jonknee
64 bits is enough to have everyone on Earth send over a billion Tweets and
still have enough room to find a new solution. That sounds like more than
enough to me.

~~~
Retric
The problem is they added a timestamp and people assumed they would not need
the full 64 bit ID.

IMO, it's not a bad idea on their part. A 32 bit UNIX timestamp * 2 ^ 32 + a
32 bit sequential id let's them track up to 4.2 billion tweets a second and
should work just find up to the year 2106.

Edit: As to why it's a good idea, you can have different systems handing out
ID's without stepping on each other’s toes or even talking to each other. _The
full ID is composed of a timestamp, a worker number, and a sequence number._
Granted, I would probably put the sequence number ahead of the worker number
so sub second tweets are better ordered vs. being ordered strictly based on
the system that generated them.

~~~
mbreese
Yeah, I guess this is why I didn't understand how they could use 64 bit
numbers to begin with... I couldn't see how they were going to be able to
generate them all without leaving huge gaps in the number-space. If you used
64bit ints you'd be unable to have one machine do the generating, so you'd
have to have some sort of offset for the worker who generated the ID at the
very least.

And once you've done that, why not just go all out and use UUIDs?

------
aaronsw
JavaScript apps have been broken for a couple weeks now with links to things
like:

<http://twitter.com/mattyglesias/statuses/7.1157195777E+15>

------
jmcnevin
Being in that majority of people who haven't built Twitter-scale systems in
the past, could someone explain to me why Twitter is moving to a new form of
twitter ID in the first place? What is wrong with their current system?

Also, it seems strange that they would include the new ID in string AND
integer form in their JSON. I realize that they don't want to break existing
javascript apps, but isn't there a significant bandwidth cost in adding that
sort of kludge to the API when you're serving a quadrillion of these api
requests every day?

~~~
dangrossman
How do you generate sequential IDs a quadrillion times a day if they're coming
from 100 different computers? Lots of very fast coordination.

In the new system, they don't have to coordinate every server just to make
sure IDs are sequential. They just use the time stamp and some machine-
specific information.

------
sedachv
I'm going to take the opposite position of most commenters here and decry the
fact that almost all programming languages get arithmetic wrong.

Adding two positive numbers should never result in a negative number. Adding a
one to an integer should result in the next largest integer.

There are new languages created all the time that don't have built-in support
for arbitrary precision integers or rational numbers. They might have some
neat ideas, but if your language can't even get arithmetic right, it's
garbage.

~~~
prodigal_erik
Arithmetic is a relatively self-contained part of a language. The problem
isn't that JavaScript's arithmetic is lame, it's that they didn't leave any
way for the programmer to fix it—even if you make rational bignums out of
arrays of doubles (or whatever), you can't make them drop-in replacements by
overloading operators. It's a strange oversight for a language which lets
_instances_ override methods.

~~~
pedrocr
>It's a strange oversight for a language which lets instances override
methods.

That's only because everything is an instance in Javascript. There are no
classes, it's prototype OO.

------
beagle3
That's truly, truly dumb on twitter's behalf. They should have started with
using the whole numeric range and/or switched to string ids long ago.

As I have suggested in a post 9 months ago, and as I would have designed this
system at any date since 2002 or so, the twit ID would be composed of userid
bits + time bits. In the post below I suggested 32+32, but other divisions are
acceptable depending on your "bot user" policy. Such an ID would at the same
time be sufficient (up to 4G users, up to 5 tweets/sec/user AVERAGE, up until
2030). You can have two times as many users for just "2 tweets/sec average".
Facebook only has 500M, so 4G should be sufficient.

Such a construct makes the entire system significantly simpler and more robust
to "meaningful" failure. I haven't seen a single thing tweeter has done right
in the technical sense.

They do deserve marketing and bizdev credit.

[http://www.reddit.com/r/programming/comments/b2u6t/twitter_o...](http://www.reddit.com/r/programming/comments/b2u6t/twitter_opensource_a_list_of_all_open_source/c0kq7wr)

~~~
morgo
-1. With hindsight it's easy to make a comment like yours, but many clients rely on the status_id to be monotonic.

You would be changing behavior in a way that breaks apps. Twitter doesn't want
to do that.

~~~
beagle3
-1 all you like. It's engineering, not hindsight -- when building a system, I always ask myself "how does this scale" which usually translates to "on what attribute does this shard".

Look at e.g. YouTube and many other sites around the same time. They knew what
they were doing; Twitter didn't.

I _have_ actually designed such a system in 1999, that used 48 bits, and it
worked perfectly well. (Only had 28 bits for the user-id, which would have
been broken at the 250M users -- alas the system never had more than than 5M;
This was in the years 1999-2003).

The only way you can shard absolutely monotonic is (effectively) randomly,
which is an option however you assign ids; but other assignments let you build
a much cheaper, much more robust system.

~~~
morgo
I don't see how your comparison is remotely fair to Twitter. Youtube is just a
website, if they wanted to change how they allocate video numbers, nobody
except for rogue bots will notice.

Twitter is an API. Even if they have the knowledge of how to fix past
mistakes, they need to ask for feedback, give plenty of notice, and set a
deadline of when the old version is cut off.

Maybe they didn't do everything perfect day one, but I don't think there are
any APIs the size of Twitter. Cut them some slack, they're not morons.

~~~
beagle3
The point was exactly that YouTube, which had a simpler problem to solve at
around the same time (or earlier) did it properly.

There do exist independent youtube clients (though not as many as Twitters's),
but using the encoding they did, youtube has made it so that it is never going
to be an issue, whereas for twitter it has already been a significant issue
twice (that I'm aware of).

It's very easy to dismiss sound engineering in retrospect as luck or as "how
could anyone have known".

------
there
[http://groups.google.com/group/twitter-development-
talk/brow...](http://groups.google.com/group/twitter-development-
talk/browse_thread/thread/71c25e20ddd3e3f0)

------
robryan
They have been doing the same thing for next page string that you have to pass
back to get the next page on an API request. Assuming this has been since the
new API come about at least.

