
17 Year Old creates website that predicts future of Digg - rjvir
http://gigaom.com/2010/08/24/hey-digg-this-17-year-old-knows-what-you-are-thinking/
======
lowglow
# Predicts with 67% accuracy

if ('pretty infographic' || 'sexy video or picture' || 'funny video or
picture' || 'xkcd' || 'liberal outrage' || 'cute animals' ) in DiggContent:
move DiggContent to FrontPage

------
RyanMcGreal
Apparently Digg in the future will be spitting out HTTP 500 errors.

------
bryanh
I bet a large majority of success would be based on that piece's past
performance (Reddit, Delicious, HN, repeat submissions, etc.). An "algorithm"
like that is probably pretty trivial... It would be interesting if it actually
has novel concepts powering the beast.

~~~
jacquesm
There was this post yesterday that actually showed that there is no strong
correlation between how articles do when posted to multiple sites.

In fact, if something got posted to digg and did well there that's a pretty
good indicator that on HN it will get killed.

~~~
luu
_In fact, if something got posted to digg and did well there that's a pretty
good indicator that on HN it will get killed._

Isn't that a strong (negative) correlation?

~~~
jacquesm
Hehe, good point. But it's not a hard rule though.

------
zbanks
The stories that he's wrong about are really important here, and would
determine if this is news or not.

If his site has a higher quality of content, good for him! It's like the
netflix prize, except no cash.

If it's worse than digg's page, then he hasn't improved anything.

Also, I'd be curious to see how "63% accuracy" is defined. In an ecosystem
where 1% of stories get through, whether this number is based on false-
positives or false-negatives will make a big difference. (He could be
underselling himself!)

~~~
rmc
There are usually 2 numbers used to measure the 'accuracy' of a test (be it a
"This link will reach the digg front page" or "This person has HIV" etc.).
Those numbers are the false positive rate (you said they'd get to the front
page, and they didn't), and the false negative rate (you said they wouldn't
get to the front page, and they did).

It's common for these numbers to be related. Decrease one number and the other
goes up. E.g. you could get a 0% false positive rate by just saying "Yes this
link will get to the front page" for all pages, however your false negative
rate would be massive, about 99.999999% (since you're predicting that every
link gets to the front page). This test would be useless because of the high
false negative rate. A breakthrough occurs when you are able to have a low
false positive and low false negative result. The holy grail of any test is
one that would have a 0% false negative and a 0% false positive rate.

Usually it's a trade of between false positive and false negative. Most
western justice systems would rather a high false negative than a high false
positive, "Better 10 guilty men walk free, than 1 innocent man goes to jail".

His statement about "63% accuracy" is ambiguous. What is he refering to? What
are the false positive and false negative rates?

~~~
what
I think he means 37% false positives. This guess is based on the archive
section, where he lists hits and misses. <http://digginthefuture.com/archive>

------
corruption
6 out of ten stories in the new links section can be predicted by whether the
top users vote on them. Really, only 6 out of 10? This seems extremely low
given the claims of gaming (and if I'm understanding it correctly).

~~~
bravo_sierra
How is 6/10 extremely low?

~~~
corruption
Well, people claim that digg is entirely run by the top users from what I
understand, and if only 6/10 stories that make it to the front page are
predicted by the top users voting patterns, theres not much to that claim
right?

EDIT: [http://www.seomoz.org/blog/top-100-digg-users-
control-56-of-...](http://www.seomoz.org/blog/top-100-digg-users-
control-56-of-diggs-homepage-content)

~~~
cfpg
If you look at the top users most of them submit thousands of stories.

For example mklopez[1] submitted 16k stories were only 10% hit the frontpage
and LtGenPanda[2] submitted 2,481 with a 42% popular ratio. You can see this
stats at the bottom right of the user profile on Digg.

[1]: <http://digg.com/users/mklopez>

[2]: <http://digg.com/users/LtGenPanda>

~~~
jacquesm
I think nickb is still the top submitter to HN, there are about 4250
submissions from him, runner up is cwan with 3750 and then edw519 with 3180, I
wonder what the top 3 on digg looks like.

------
AndrewMoffat
reddit.com?

~~~
shammydog
It will be interesting to see how often stories on reddit appear on the
frontpage of Digg. Clearly, several stories will make it, but it seems
unlikely that it would be near 63% or even half of that.

~~~
bravo_sierra
I can't find the source ATM, but apparently it's 10-20% daily, down from
30-40% a year ago.

Scroll to the bottom of this for something tangible:
<http://www.raterush.com/pages/digg-reddit>

