
Chess.com stopped working on 32bit iPads because 2^31 games have been played - NewGier
https://www.chess.com/forum/view/general/impossible-de-jouer-depuis-deux-jours-quel-est-le-probleme
======
eponeponepon
It's fascinating... the Y2K problem never came to fruition because - arguably
- of the immense effort put in behind the scenes by people who understood what
might have happened if they hadn't. The end result has been that the entire
class of problems is overlooked, because people see it as having been a fuss
over nothing.

I sometimes think it would've been better if a few things had visibly failed
in January 2000.

~~~
jfultz
If you were watching closely and knew what to look for in the first couple of
months of 2000, the failures _were_ there. But they were generally minor and
easy to overlook as Y2K problems.

I spotted something like half a dozen failures in various systems I interact
with which I strongly suspected, based upon the timing, were likely Y2K
problems that slipped through testing. For example, I received duplicate bills
for one of my credit cards for the January 2000 billing period, and then a
subsequent apology for the duplicate bills. They never said Y2K, but the
timing was very suggestive.

It's pretty much exactly what I expected from most companies...the big stuff
had been largely been dealt with, but a few things slipped through which they
could dismiss with some hand-waving. The thing that surprised me was that
there didn't really seem to be any high profile disasters (like a company that
couldn't ship products, an airline that couldn't issue tickets, or whatever)
at all...I figured there'd be at least a couple.

~~~
collinmanderson
in JavaScript, new Date().getYear() returns 117 :)

~~~
rimliu
And in early 2000 web was full of "Copyright 19100".

------
cm2187
Self-confidence as a programmer is when starting a new project, storing the
transaction ID as a long rather than an int...

~~~
rdtsc
> Self-confidence as a programmer is when starting a new project, storing the
> transaction ID as a long rather than an int...

uint64_t even

Or a UUID as others have suggested.

Technically C spec doesn't really say exactly how many bits int, long and long
long should be. If you want specific sizes and your code to be somewhat
portable use the specific bit sizes to make that clear. There are also types
for size-like things (size_t) and pointer and offset like things.

~~~
fpgaminer
> If you want specific sizes

I would go further and say you should _always_ use specific sizes, unless
forced otherwise. There's no reason not to.

~~~
majewsky
There's a usecase for lower-bounded types such as int_least32_t, where the
compiler may choose a larger type if it offers better performance. However, if
you're using that, the test suite should run all relevant tests for multiple
actual sizes of that particular type (through strategic use of #define, for
example).

~~~
masklinn
> There's a usecase for lower-bounded types such as int_least32_t, where the
> compiler may choose a larger type if it offers better performance.

If you're looking for the best performances you shouldn't use leastX types,
you should use fastX types (e.g. int_fast32_t for the "fastest integer type
available in the implementation, that has at least 32 bits").

The difference between "leastX" and "fastX" is that "leastX" is the _smallest_
type in the implementation which has at least X bits. So if the implementation
has 16, 32 and 64b ints and is a 32b architecture, least8 would give you a 16b
int but fast8 might give you a 32b one.

------
chesserik
Hey all. Thanks for noticing :P Obviously this is embarrassing and I'm sorry
about it. As a non-developer I can't really explain how or why this happened,
but I can say that we do our best and are sorry when that falls short.

\- Erik, CEO, Chess.com

~~~
Aloha
2 billion is a very large number that was probably not envisioned as reachable
in the near future - as a programmer I'd argue this is a pretty easy mistake
to make, and that while (slightly) embarrassing, its a good learning moment.

It's also really awesome that you're here, and that you guys were so honest
about the nature of the bug - this is really something that should be
encouraged.

~~~
chesserik
Maybe we should start a blog about all of the interesting bugs and challenges
we encounter. It certainly is white-knuckle pretty often when running at
scale. The number of devices, connections, features... I'm aging prematurely
:P

~~~
eru
A few articles would definitely be appreciated. Might even help with
recruiting fresh blood.

------
SomeHacker44
"This was obviously an unforeseen bug that was nearly impossible to
anticipate..."

Snarky... Except that there were probably years of games to notice that you
were approaching a "magic number" like 2^31.

~~~
CGamesPlay
I actually read that quote as fully sarcastic.

~~~
blktiger
As expected, sarcasm always translates correctly into textual form.

~~~
i_cant_speel
It's weird that you say that, because I always felt sarcasm didn't translate
well to text.

~~~
jazoom
I think it was sarcasm

~~~
mirimir
Sarcasm is recursive.

~~~
brlewis
Yours is the first comment in this chain that I can say pretty confidently
isn't sarcasm. So it kind of breaks the chain, making the sarcasm in this
chain non-recursive. Which means maybe you were being sarcastic after all?
Actually I don't even know if my own comment is sarcasm or not.

~~~
mirimir
Your comment comes across as sincere. Mine was sincere, but an overstatement.
I should have said that sarcasm tends to be recursive, until broken by
sincerity. Anyway, here's my take on the chain:

SomeHacker44 -- sincere

CGamesPlay -- sincere

blktiger -- sarcastic

i_cant_speel -- mildly sarcastic

jazoom -- sincere

~~~
edward_rolf
Yes because what we have always known about sarcasm and what this thread is a
perfect example of is how you can define something as sarcastic/not sarcastic
just by how it subjectively "comes across".

~~~
mirimir
Yes, you can guess. But assessment depends strongly on context. And Pow's Law
still applies. People can write messages that seem sincere, and then later
claim sarcasm. As in "I was only joking". Or people can write sincerely, but
come across as sarcastic, or _vice versa_ , and yet be ambiguous enough that
readers can't tell. That's where the /s flag help. Done intentionally, such
ambiguous messages can probe the reader's state of mind. Or set traps.

But maybe you're just being sarcastic ;)

------
pram
I recently experienced a nasty bug with BLOB in MySQL. The software vendor was
storing a giant json which contained the entire config in a single cell. It
ran fine for months, and then when it was restarted it totally broke. Reason
was: the json had been truncated the entire time in the database, so it was
gone forever. It was only working because it used the config stored in memory
on the local system. Nasty!

~~~
fpgaminer
For those, like my, that didn't recall off the top of their heads: BLOB in
MySQL can usually only hold ~64KB.

EDIT: Though I am curious why MySQL doesn't throw an error when you try to
store more than 64KB in BLOB?

~~~
wnoise
Huh, I thought BLOB was a backronym for Binary _Large_ OBject, not Binary
Medium Object.

~~~
p4lindromica
There are LONGBLOB, TINYBLOB, etc just like the equivalent TEXT fields

------
russellbeattie
This problem is more related to a programming underestimation than the actual
limitations of a 32bit CPU (which can happily process numbers or IDs that
arbitrarily big if you have the memory for it and program it correctly).

That said, this is definitely indicative of what's going to happen in just 20
years, 6 months and 20 days from now. I mean, we're still cranking out 32bit
CPUs in the billions, running more and more devices, and devs still aren't
thinking beyond a few years out. I know of code that I wrote 12 years ago
still happily cranking away in production, and there may be some I wrote even
longer than that out there... and I guarantee I hadn't given two thoughts
about the year 2038 problem back then, and I doubt many devs are giving it
much thought today.

It's truly going to be chaos.

~~~
protomyth
The sad part is people are going to look at the lack of a year 2000 event and
assume 2038 is going to be a "dud", when they fail to see all the damn work
that went into making sure Y2K was a dud including a significant portion of IT
hours and probably a lot of extra support laid in.

I expect 2038 to be a rare hell because of the nature of the devices. Y2K was
an IT problem, but 2038 will be an embedded system problem and that's going to
be a much more painful thing to audit. Moving from the server room to inside
equipment and walls is going to be fun.

------
jakub_g
Long long time ago, I created a poll on a small website I was maintaining. I
didn't expect much traffic and, so, not thinking too much about it, I put the
ID column to be a TINYINT (i.e. max value = 255)...

That was a valuable lesson.

(I actually generated most entries myself while testing stuff - live in prod
of course - and while there were probably fewer than 255 votes, the
AUTO_INCREMENT did its job and produced an overflow).

~~~
ryandrake
> "Long long time ago"

Seems you have learned your lesson :-)

------
throwaway2016a
Reminds me of the havoc that was caused when Twitter tweet IDs rolled over.
Resulting in every third party developer to update their apps (and at the time
there were a lot of those).

Twitter saw it coming and forced the issue. By saying that at a certain date
and time they would manually jump the ID numbers rather than wait for it to
happen at some unpredictable time.

~~~
syncsynchalt
They didn't roll over, they exceeded 2^53-1 which is the max Number which
doesn't truncate when treated as an integer in js. The solution was to treat
it as a string.

(Or we're thinking of different events, I apologize if so)

~~~
gilgoomesh
Twitter must have been misleading when they communicated the reasons for this
change since they did _not_ exceed 2^53-1, nor do they expect to exceed this
in the near future.

From a (former) Twitter dev:

> Given the current allocation rate, they'll probably never overflow
> Javascript's precision nor get anywhere near the 64-bit integer space.

[https://twittercommunity.com/t/discussion-for-moving-
to-64-b...](https://twittercommunity.com/t/discussion-for-moving-to-64-bit-
twitter-user-ids/9890/2)

~~~
syncsynchalt
Your link discusses 2^64, which applies to languages that have native integer
types.

The 2^53 problem was for Javascript, which has no native integer type, and is
thus limited by the mantissa size of Number (which is defined as an IEEE
double-precision float).

Twitter ids are unsigned 64-bit, since they're generated using Snowflake. That
link must pre-date the move to snowflake ids, and is speaking to the count of
tweets instead.

~~~
MikeHolman
I'd be incredibly surprised if they overflowed 9 quadrillion tweets. That's
like a million tweets per person on earth.

~~~
syncsynchalt
Haha, sorry, that thought was in reference to the link saying they hadn't
overflowed 2^31 yet. Two billion seems believable for tweets as of 2013.

------
ericfriday
This reminds me YouTube changed its view counter from 32-bit integers to
64-bit integers due to the popularity of 'Gangnam style'
[https://www.wired.com/2014/12/gangnam-style-youtube-
math/](https://www.wired.com/2014/12/gangnam-style-youtube-math/)

~~~
smitherfield
That was a joke; it was always a 64-bit integer.

~~~
akerro
WHAT?

~~~
andrepd
It was a joke, it was always a 64-bit integer.

~~~
akerro
I really want this not be a joke :(

------
shurcooL
Do we know when chess.com launched? If so, we can calculate the average number
of games being played per second.

~~~
CDRdude
Wikipedia says "June 2007", which I'll approximate to 10 years. That gives us
6.8 games per second.

------
rasz
were they ever expecting negative number of games? why signed integer?

~~~
Inityx
Because signed is default for some reason in most languages, and most
developers aren't taught to think critically about how decisions like simple
datatypes might affect scalability.

~~~
libria
The problem is momentum. I could use unsigned int everywhere, but then I have
to constantly typecast to int and back anywhere I use a library expecting
signed ints. If we all switched to unsigned int by default, then everything
would make more sense but we'll all live in typecasting hell during the
migration.

~~~
astrange
Unsigned by default doesn't make more sense than signed by default. The
behavior near 0 is surprising; if you underflow you either get a huge value
(anything not Swift) or you crash (Swift).

It was a mistake to use them for sizes in C++. Google code style requires
using int64 to count sizes instead of uint32 for good reasons.

------
vxxzy
How many other examples like this have occurred throughout computing history?

~~~
eicossa
IIRC the whole "Gandhi is gonna nuke the rest of the world" meme came from
such a bug occuring in the world-domination strategy game Civilization 2.

~~~
Bakary
They had meant to give him the lowest aggression rating possible, but
accidentally inputted -1, which then looped back the other way to the highest
rating possible. Nukes soon followed.

~~~
govg
Was it exactly that? I remember it being that you had to create a senate or
something (that decreased character aggression by 2 points). Gandhi started
with 1, so if you did it with Gandhi, then it'd underflow to the highest
possible aggression.

------
chesserik
Fun to read some of other stories where this bit them too (PacMan, WoW, and
eBay)! Anyway, new app has been approved by Apple and should be rolling out
soooooooooon....

Thanks for all the comments! Always lots to learn from.

------
abalone
So they probably just need to use longs instead of ints. But I'm curious, if
you were really stuck with a 32-bit limit on data types, what's your preferred
workaround? I'm thinking I'd add another field that represents a partition.
Are there other "tricks"?

~~~
reformatt
If you could only use 32-bit data types, you can get 64-bits from using 2
integers together like a long number. So the right integer would hit the max,
start over at 0, and increment the left integer. Then, using this idea you can
create a class of numbers that can have however many bits you want by using
more ints.

~~~
abalone
Cool, yeah, that's what I meant by partitioning, which I guess is more of a
database term.

------
key8700
eBay (almost) had this problem and I cannot find any articles about it online.
They were rapidly approaching 2^31-1 auctions. So they switched to a larger
integer, the switchover went badly, and they were mostly down for 4 days, if
my memory serves. This would be like 10+ years ago I think.

------
vitomd
A lot of comments but no one said the great time that we are living for chess.
So many games online, ready to be analysed and learn from them. After deep
blue people thought that it was the end of chess, but it´s only getting
better. Computers helping players to improve.

Chess.com is a great site, also lichess.org and chessable.com if you like
chess you should check them.

------
_pmf_
That's the most successful reason for failure.

------
inieves
The title is probably wrong, off by one.

You probably mean 2^31 -1.

~~~
Piskvorrr
Help me, Off-By-One Kenobi! You're one of my two last hopes!

------
mtkd
These are always the best problems to have

------
spullara
The other one to watch out for is the 53-bit javascript integer limit. That
caused the twitpocalypse when Twitter tweet IDs hit it. They had to switch to
strings in the JSON representation.

~~~
gilgoomesh
The 2009 Twitpocalypse concerned overflow of 31-bit precision. Twitter has not
yet hit 53-bits for raw number of Tweets, in fact, they passed 32-bits in 2014
and might not have reached 33-bits, yet.

Moving to strings for Javascript was really just safety planning for the
future since:

> Given the current allocation rate, they'll probably never overflow
> Javascript's precision nor get anywhere near the 64-bit integer space.

[https://twittercommunity.com/t/discussion-for-moving-
to-64-b...](https://twittercommunity.com/t/discussion-for-moving-to-64-bit-
twitter-user-ids/9890/2)

~~~
spullara
That discussion is about user IDs, not tweet IDs.

Tweet ID from today: 875423039323688960

Number of bits of precision necessary to represent it exactly: 60

Overflowed 53-bit precision long long ago. You can read about it here:
[https://dev.twitter.com/overview/api/twitter-ids-json-and-
sn...](https://dev.twitter.com/overview/api/twitter-ids-json-and-snowflake)

------
phonon
And I was just reading Heroku/Django discussing the same issue this morning!

[https://groups.google.com/forum/m/#!topic/django-
developers/...](https://groups.google.com/forum/m/#!topic/django-
developers/imBJwRrtJkk)

------
cwfrank
Issues like this are not uncommon on Chess.com. I've been playing there since
2008 or 2009. If you read recent comments about issues as they pertain to the
recent "v3" release ... as much is to be expected.

------
callumjones
> For f sake how are we supposed to Anderstand that. I suppose your French fry
> maker is broken ?

Didn't expect Chess.com and YouTube to have a crossover of users? Surprised
there isn't active moderation on a site this size.

~~~
Zanta
In my experience the chat on chess.com harbors a similar demographic to that
of most video games. You'd think that chess would attract a more mature player
base, but nope.

~~~
marcelluspye
The only time I've observed people in real life acting like people in video
games is at a chess tournament. Constantly trash-talking until they lose, then
accusations of cheating. You certainly don't see this type at all (or most)
chess events; I think the lack of entry free drew them out into the open.

~~~
cooper12
At my local library the loudest people aren't those on their phones or
laptops, but the chess players. It really surprised me considering chess can
be played completely non-verbally other than calling "check". Every time I'm
there, they constantly argue (most of the time it's because someone wants to
take back a move), trash talk to get on each others nerves, yell across the
table to other players in games, and talk loudly as if they were in a park. On
one hand I think it's great that the library provides a community space and
lets people use their chess sets, but on the other hand as someone who goes
there for quiet, it's very irritating. (I wish they had a game room or
something where they could go wild) Once upon a time libraries had mythical
status as a place of silence, to the point where people would shush each other
for the smallest noises... I actually stopped going to that library because of
noise issues and in general because of its size and limited seating.

------
yoz-y
What would be the best way to test for this kind of issues in advance. Testing
at theoretical limits at all endpoints?

~~~
damagednoob
I'm not sure this falls under testing. If you start with an empty database
each time you start the test you may never hit this issue.

I think this is more of a capacity planning issue.

~~~
yoz-y
My concern is that even if one plans for a sufficient capacity, there still
needs to be testing done to verify that the code actually works if the
capacity is nearing the theoretical limit. In this example the database id was
transformed into a 32 bit integer somewhere in the application code.

Usually when I hit some sort of unexpected bug in production I try to think
about what type of testing will prevent similar problems in the future.

------
fsiefken
Will the Lichess app and platform have this issue? And if not, why not?

~~~
shmageggy
Looks like lichess is using strings for IDs so they will not have this issue

[https://github.com/ornicar/lila/blob/master/app/controllers/...](https://github.com/ornicar/lila/blob/master/app/controllers/Game.scala)

------
nicky0
> an unforeseen bug that was nearly impossible to anticipate

Hmmm... :)

------
prh8
Real world example of why Apple is killing 32 bit apps on iOS.

~~~
dfox
This has nothing to do with CPU architecture.

~~~
_puk
"The reason that some iOS devices are unable to connect to live chess games is
because of a limit in 32bit devices which cannot handle gameIDs above
2,147,483,647."

~~~
paulddraper
That is a combination of code and architecture, not architecture alone.

------
mattkenefick
"Obviously unforeseen.. impossible to predict." Really? You don't know how to
properly store ID numbers?

IMPOSSIBLE to predict.

~~~
phailhaus
That's clearly sarcasm.

