
Gangnam Style breaks YouTube viewer count - stephenheron
https://plus.google.com/+youtube/posts/BUXfdWqu86Q
======
gavinpc
Can't quite tell if this is a joke, but here's a related "story about a bug"
from Doug Crockford [0]:

    
    
        I made a bug once, and I need to tell you about it.  So, in 2001, I wrote a
        reference library for JSON, in Java, and in it, I had this line
        
            private int index
        
        that created a variable called "index" which counted the number of characters in
        the JSON text that we were parsing, and it was used to produce an error message.
        Last year, I got a bug report from somebody.  It turns out that they had a JSON
        text which was several gigabytes in size, and they had a syntax error past two
        gigabytes, and my JSON library did not properly report where the error was — it
        was off by two gigabytes, which, that's kind of a big error, isn't it?  And the
        reason was, I used an int.
        
        Now, I can justify my choice in doing that.  At the time that I did it, two
        gigabytes was a really big disk drive, and my use of JSON still is very small
        messages.  My JSON messages are rarely bigger than a couple of K.  And — a
        couple gigs, yeah that's about a thousand times bigger than I need, I should be
        all right.  No, turns out it wasn't enough.
        
        You might think well, one bug in 12 years you're doing pretty good.  And I'm
        saying no, that's not good enough.  I want my programs to be perfect.  I don't
        want anything to go wrong.  And in this case it went wrong simply because *Java
        gave me a choice that I didn't need, and I made the wrong choice*.
    

[0]
[https://www.youtube.com/watch?v=bo36MrBfTk4&t=38m](https://www.youtube.com/watch?v=bo36MrBfTk4&t=38m)

EDIT: is there a reference for formatting comments? I've never been able to
find one.

~~~
danbruc
He did not need the choice but others do. And he is wrong when he says it
makes no difference whether you use a byte or eight of them. Yes, it will take
the same amount of time to add two of them but it will also cost eight times
more cache space and memory bandwidth to move them around. It may not be an
issue if you have a single number or ten of them but it certainly becomes one
if you have an array with millions or billions of them.

~~~
rtpg
There are use cases where you need the choice, but most people do not need the
choice.

Most programs written in the real world (enterprise-y Java apps) do not need
strong control on GC, choice of integer types, or many other things offered to
them. Reducing choice will increase code/tool quality.

I think that we should make the uncommon choice reallllly hard to put into
place. Make it a pain to configure the GC, give specific integer types really
long names. Just stop people from premature optimization and leave these tools
to people who know what they're doing.

~~~
danbruc
If you are unable to make good decision between different number types you
better don't write software, IMHO. How do you reason about the operations you
apply to your data if you are ignorant of the possible values?

~~~
rtpg
I write software, and the amount of times I even have to explicitly manipulate
numbers in a given week is very close to zero. Even iteration is done through
iterators instead of indexes, so writing a plus sign is done pretty sparingly.

Data manipulation beyond "pull out of database" or "submit user input to
database" is a lot rarer in enterprise software like this than in scientific
computing. I'm not saying it's bad to be aware of it, but software is more
than numbers.

~~~
danbruc
I develop enterprise software, too, and I definitely think it matters there,
too. You better make sure your database columns have the correct number type
or you will get in trouble if your inventory numbers or monetary values start
showing rounding errors.

~~~
ionforce
You are comparing two different types of numbers.

What was originally compared was different sized integral types.

What you are comparing is two numeric types that have a large semantic gulf
(fractions vs integers).

So your point is disingenuous in the context of the former.

~~~
danbruc
I just wanted to address your point that enterprise software does usually not
involve dealing with numbers.

When it comes to integers you are right - signed and 32 bits is a viable
choice in north of 90 % of all cases. And when I wrote you should be able to
make good decision about the number type to use I was already thinking of all
the number types, however I did not express this well. But then I really don't
see a lot of difference between being able to choose between integer, floating
point and decimal types on the one hand and various integer types on the other
hand.

------
ChuckMcM
The interesting meta-point though is that an audience of 20 million viewers is
a big hit [1] so a billion views is 20M people watching it 50 times or, 200M
people watching it 5 times. And 2 billion views is double that.

Put in perspective that is probably in excess the number of times the most
favored "I Love Lucy" show has been seen. Or put another way, you've got a
music video with the same eyeball impact as the highest rated television show
ever.

That says to me that either advertising on Youtube is a bargain or advertising
on TV is way over priced :-)

[1] [http://tvbythenumbers.zap2it.com/2014/02/10/the-walking-
dead...](http://tvbythenumbers.zap2it.com/2014/02/10/the-walking-dead-mid-
season-premiere-delivers-15-8-million-viewers/235855/)

[2]
[http://en.wikipedia.org/wiki/I_Love_Lucy](http://en.wikipedia.org/wiki/I_Love_Lucy)

~~~
derefr
Or advertising on TV seriously under-represents the total number of
impressions over time through alternate consumption streams. Right now,
supposedly "unpopular" shows are cancelled, and then immediately get a
successful Kickstarter from what turns out to be millions of fans who happened
to be watching only through Netflix, or iTunes, or DVD box sets.

(Of course, none of these streams show the same ads the original broadcast
does—but if you're a clever ad agency, you're already doing product-placement
instead of interstitials most of the time anyway.)

~~~
hsod
> Right now, supposedly "unpopular" shows are cancelled, and then immediately
> get a successful Kickstarter from what turns out to be millions of fans who
> happened to be watching only through Netflix, or iTunes, or DVD box sets.

Can you name any examples of this?

The closest thing I can think of is Veronica Mars which was Kickstarted many
years later and raised ~5 million dollars from 91,000 backers to make a single
movie.

I think perhaps the "alternate consumption streams" viewers are not as
lucrative as you think.

~~~
coldtea
> _The closest thing I can think of is Veronica Mars which was Kickstarted
> many years later and raised ~5 million dollars from 91,000 backers to make a
> single movie._

That everyone regretted backing.

~~~
akx
[citation needed]

~~~
coldtea
Or you can just watch the movie.

~~~
sesqu
I read a few reviews of the movie when it came out. All were positive.

The movie may not be good (my personal opinion), but clearly it met
expectations.

~~~
wmeredith
It was fan service. I think it pleased all the backers. As a standalone, it
was not a great movie, but I don't think it was supposed to be.

------
xanderjanz
Should have gone with unsigned ints, YouTube!

EDIT: Which is the solution they apparently implemented, converting signed to
unsigned at some higher layer.

~~~
timothya
From the Google C++ Style Guide:

 _" You should not use the unsigned integer types such as uint32_t, unless
there is a valid reason such as representing a bit pattern rather than a
number, or you need defined overflow modulo 2^N. In particular, do not use
unsigned types to say a number will never be negative. Instead, use assertions
for this."_ [0]

[0]: [http://google-
styleguide.googlecode.com/svn/trunk/cppguide.h...](http://google-
styleguide.googlecode.com/svn/trunk/cppguide.html#Integer_Types)

~~~
coolgeek
From the coolgeek style guide:

"Never use a signed type for a number that can never be negative"

One of my pet peeves is developers using int (instead of unsigned ints) for
primary keys in database tables.

~~~
sytelus
+1. Everytime I see _for(int i=0;...;i++)_ I wonder why we have developed this
habit of defaulting all int as signed and consider uint as taboo (most coding
guidelines asks not to use them unless "you know what you are doing"). Most of
the time we use integers for counting and so uint should have been more
natural. I did this in one of my libraries I was writing from scratch and I
was happy for a while but then I got in to trouble because there is lot of
code out there with interfaces expecting signed ints even though they should
using uint. So ultimately the legacy forced me back to default again at using
signed int.

~~~
yongjik
Well, in my case, from time to time I have to do these stuff:

    
    
        for (int i = x.size() - 1; i >= 0; i--) ...
        for (int i = 0; i < x.size() - 1; i++) if (x[i] < x[i+1]) ...
    

Both will blow up badly with unsigned ints.

(Well, to be fair, both will blow up with signed ints if x.size() is greater
than 2G, so it's a matter of expectations.)

~~~
Aldo_MX

      for( size_t i = x.size(); i-- > 0; ) ...

~~~
delinka
Now i is no longer an index, it's index-plus-one.

~~~
shdon
No, it's not. The check is done before the loop, so i has the correct value
inside the loop.

Still, the trick makes it look suspect and that's an argument against using
it.

~~~
Aldo_MX
> Still, the trick makes it look suspect and that's an argument against using
> it.

This is true. The code is confusing to people not used to it. A workaround
could be to hide this code inside a macro, so people not interested in digging
into the code would take the macro's word:

    
    
      #define REVERSE_LOOP( x, i ) for( size_t i = x.size(); i-- > 0; )
    

But unfortunately, that doesn't help with the fear that people has against
unsigned types.

------
jawedkarim
When youtube launched in April, 2005, the initial source code was based on
another completely unrelated website that I had worked on before, written in
PHP and running on Apache and MySQL. It’s always fascinating how
implementations of complex systems evolve.

~~~
diroussel
What was the original site for?

~~~
pavel_lishin
Maybe some sort of exchange site for Magic the Gathering cards?

------
SapphireSun
I love that they added an easter egg to the actual video. If you hover over
the counter, it briefly shows you the negative overflow value.

[https://www.youtube.com/watch?v=9bZkp7q19f0](https://www.youtube.com/watch?v=9bZkp7q19f0)

EDIT: I just realized that YouTube also posted a comment to that effect just
below the video. :P

------
leephillips
The interesting question to me is why this particular video is so wildly
popular. I don't generally go in for music videos, but I find this one
fascinating and have watched it a dozen times. I read an article that tried to
explain to non-Koreans like me the meaning of it all, and apparently there are
several layers of parody and social satire. I think I love it for its
combination of attitude, surrealism, bizarre humor, and self-mockery, plus the
music that seems to fit magically.

~~~
prawn
The explanations of Korean parody/satire are largely irrelevant to its success
given its popularity elsewhere, surely? I think it's the bizarre visuals that
had it spread (why I tweeted it when it first emerged), then catchiness plus a
repeatable dance move. It's the Macarena of its time in that regard.

Being Korean might've given it crossover appeal into much of Asia? Just a
guess.

~~~
lmm
> The explanations of Korean parody/satire are largely irrelevant to its
> success given its popularity elsewhere, surely?

I think they gave people who would otherwise have looked down on a silly craze
an excuse to enjoy the video.

------
Aldo_MX
Next milestone: 19th January 2038 03:14:07 GMT

~~~
ravenkat
Ah, So many systems going to fail on that day for using epoch with 32 bit.

~~~
Someone1234
Aren't most Linux servers already 64 bit? And we aren't even close to 2020.

I'm sure some software will need to be re-written between now and 2038, but I
don't think it will be quite as bad as Y2K just because that was only a 15
year gap (Sometimes less), whereas this is over 24 years.

I just think a lot of software will be naturally replaced between now and
then. And while there will be a slight mad scramble to fix stuff at the last
minute, I don't think it is Y2K-2.

~~~
RadioactiveMan
It'll definitely be interesting to see how many 32-bit embedded systems remain
in use in 2038 - and what effect the overflow will have on their
functionality.
[https://en.wikipedia.org/wiki/Year_2038_problem#Vulnerable_s...](https://en.wikipedia.org/wiki/Year_2038_problem#Vulnerable_systems)

~~~
kevin_thibedeau
Just as many as all of the 8-bit systems in use today. There is no need, in
the vast majority of cases, for wide data busses in embedded applications.
16-bit is going to die out, though, like the 4-bit and bitslice processors.

------
rodgort
Mea culpa. I can't remember why I didn't fix that when I reloaded the entire
schema. At least I widened the video ids.

------
Animats
This is a minor problem. In the 1980s, the number of tradable things with
ticker symbols in US markets passed 32767, and some new issues had to be
delayed until it was fixed.

------
jmount
Nifty example. Billionaires, trillion dollar budges, billion-view celebrities,
fast CPUs, and large memories: all reasons I am done with 32 bit architectures
(old article of mine, but only on large memories [http://www.win-
vector.com/blog/2012/09/i-am-done-with-32-bit...](http://www.win-
vector.com/blog/2012/09/i-am-done-with-32-bit-machines/) ).

~~~
diego
32-bit architectures have nothing to do with the size of different data types
that have existed forever. We had 64-bit longs in 8-bit cpus.

Also, there are perfectly valid applications that require numbers of 8, 16, 32
or 64 bits (or variable encodings with arbitrary precision). Petabytes,
embedded microcontrollers, etc.

~~~
jmount
Sorry I was unclear. 32 bits architecture can mean a lot of different things
(buss sizes, address word sizes, and so on). Mostly I am done with small
pointers (having to use segments to address all of your memory, or not being
able to memory-map a disk sucks) and small counters (only being able to put
signed 32 bit integers into a collection sucks).

------
rkachowski
I saw this a few days ago, at first I thought it was an easter egg on
youtube's part - saying "so many views we overflow!"

But it's real?! It seems incredibly absurd that it could actually overflow,
how are signed values useful for a count of views? How are you going to have
negative views?

~~~
divegeek
Java doesn't have unsigned integer types. Google is mostly a Java shop.

~~~
hobo_mark
That is mostly false.

------
antimora
It looks like it also broke the formatting on the number of the viewers:
"2151501252". This string does not have thousands separators.

Direct link to the video:
[https://www.youtube.com/watch?v=9bZkp7q19f0](https://www.youtube.com/watch?v=9bZkp7q19f0)

~~~
lstamour
It's a joke. Hover your mouse over it to see why ;-)

------
DigitalSea
Wow, this is cool. One video was able to exceed a 32 bit integer thus
requiring a change to a 64 bit integer, all caused by one man and one video.

~~~
acqq
It's just a 2 billion limit crossed, 32 bits can count up to 4 billion.
Afterwards, they certainly don't have to change to 64-bits, just add a few
bits more.

~~~
srtjstjsj
there's no reason to pick any number between 33 and 63

~~~
acqq
If you do it at home, for your 10 videos, there isn't. At youtube scale, they
certainly can benefit in having different number of bits in storage and
transit and in the CPU. Only on the CPU, and only if you actually want to use
the value in calculations the 64 bits is best step after 32. See also
discussion here:
[https://news.ycombinator.com/item?id=8691291](https://news.ycombinator.com/item?id=8691291)

------
ecesena
I'd be curious to know how they discovered it. Were they monitoring it? Did
someone report it? Did an alarm trigger? ...

------
thibauts
Why the hell would you want to store a counter as a signed int in the first
place ?

~~~
maaku
Java?

------
adad95
There is Easter Egg in the video counter. Hover with your mouse.
[https://www.youtube.com/watch?v=9bZkp7q19f0](https://www.youtube.com/watch?v=9bZkp7q19f0)

------
alejandc
unbelievable

------
jfmercer
I will always upvote anything related to Gangnam Style. Always.

------
tn13
Am I the only one who thinks that Google is posting this bug(!) just to make
the Google plus post popular ?

~~~
prawn
Probably not the only reason, but it'd be a bonus viral thing for Google+ and
YouTube.

------
IgorPartola
uint_32 strikes again! And one day we'll stop using it in favor of int_64, and
all unique identifiers will be string, and all will be well.

I remember when Twitter had rolled over their tweet ID's because they were
using an int type that was too short. Should have gone with variable length
strings to avoid that problem.

~~~
rinon
Var-length strings are often too slow to manipulate and store. Better choice
to use a fixed but large (64-bit) integer.

~~~
michaelgrosner2
Or just use a GUID.

~~~
Someone1234
That's 16 bytes instead of 8 bytes for a uint64 that still grants you
18,446,744,073,709,551,615 variations. Seems overkill, particularly if you
actually generate GUIDs "correctly" in which case you've allocated 16 bytes
but will only ever use a sub-set of them[0].

[0]
[https://en.wikipedia.org/wiki/Globally_unique_identifier#Alg...](https://en.wikipedia.org/wiki/Globally_unique_identifier#Algorithm)

~~~
aikah
Isnt mongodb using "hashes" for ids by the way? I wonder how efficient it is
and why they chose to do that.

~~~
kevinschumacher
They're not hashes. There's a few things embedded in the ObjectId (from the
docs - [http://docs.mongodb.org/manual/reference/object-
id/](http://docs.mongodb.org/manual/reference/object-id/)):

    
    
      ObjectId is a 12-byte BSON type, constructed using:
      
      a 4-byte value representing the seconds since the Unix epoch,
      a 3-byte machine identifier,
      a 2-byte process id, and
      a 3-byte counter, starting with a random value.
    

You can actually convert the _id to/from a timestamp, which lets you do cool
things like never keep a timestamp field (you can convert a datetime to an
ObjectId and use that for comparison).

~~~
MichaelGG
If the 4 bytes are unsigned, then did they just intentionally introduce a year
2038 problem?

12 bytes is enough to just store a random value, with low chance of collisions
until you've got around 100 trillion items. It confuses me why anyone would
want to waste bytes of an ID on low entropy values like machine and process
ID.

Or any combination of high res time plus random works nicely.

OTOH, Mongo's not exactly been a bastion of engineering excellence.

------
dogma1138
Every time i check the most viewed videos on YouTube i get depressed and lose
all faith in humanity. Landing on a comet gets you 250K views, anouncing the
discovery of the higgs gets you less than 100K, latest twerking video or
PewDiePie 2M at least...

~~~
tedunangst
Have you considered that the repeat viewing value of the Higgs boson
announcement diminishes rapidly?

~~~
throwaway1979
Not sure why the pessimistic poster is being downvoted. Factoring for repeat
viewers, it is sad how little society values scientific achievements. That
said, I was up in the wee hours of the morning watching the LHSC start up and
probably put more than 20 views on that song. Have ye hope :D

~~~
cLeEOGPw
One is little news report about recent scientific finding (that you don't even
need to watch since there are tons of other better sources of that) with viral
entertainment video, which is specifically made to get as many viewers as
possible. These are not of the same type to be compared.

