

Tell HN: The HN submission race made visual - jacquesm

A lot of the people that frequent HN have remarked on the high techcrunch content and other frequently submitted sites.<p>The reason for this is that there is a relatively small group of people that all 'race' to get articles from these outlets submitted.<p>The first one to make it is the one that will get points from all the other submitters, mostly because in their haste to be 'first' they forget to check whether the link has already been submitted.<p>It's the HN equivalent of the /. meme of 'first post', only with a karma boost as an incentive.<p>This leads to lots of borderline articles getting lots of time on the homepage, which in turn is a small but persistent factor in crowing out the more interesting stuff.<p>To protect the guilty and the innocent alike I've removed the usernames from the following report, the sort was by number of points per domain per user, so every line reflects a single individual submitting a certain domain.<p>So, for instance the first individual has submitted 716(!) links from the same domain (and first!).
======
jacquesm
I've put the list in a separate comment because the submission box really
doesn't like this list for some reason.

    
    
       +------+----------------------------+-------------+---------------------+----------+                                                                                        
       | s    | domain                     | submissions | pointspersubmission | redacted |                                                                                        
       +------+----------------------------+-------------+---------------------+----------+                                                                                        
       | 6234 | techcrunch.com             |         716 |              8.7067 | redacted |                                                                                        
       | 4453 | nytimes.com                |         628 |              7.0908 | redacted |                                                                                        
       | 3853 | ycombinator.com            |          42 |             91.7381 | redacted |                                                                                        
       | 2519 | paulgraham.com             |          16 |            157.4375 | redacted |                                                                                        
       | 2506 | techcrunch.com             |         206 |             12.1650 | redacted |                                                                                        
       | 1971 | techcrunch.com             |         363 |              5.4298 | redacted |                                                                                        
       | 1888 | thestandard.com            |         446 |              4.2332 | redacted |                                                                                        
       | 1692 | techcrunch.com             |         358 |              4.7263 | redacted |                                                                                        
       | 1471 | technologizer.com          |         543 |              2.7090 | redacted |                                                                                        
       | 1347 | sivers.org                 |          23 |             58.5652 | redacted |                                                                                        
       | 1315 | nytimes.com                |         139 |              9.4604 | redacted |                                                                                        
       | 1308 | techcrunch.com             |         207 |              6.3188 | redacted |                                                                                        
       | 1180 | jgc.org                    |          59 |             20.0000 | redacted |                                                                                        
       | 1178 | techcrunch.com             |         102 |             11.5490 | redacted |                                                                                        
       | 1158 | alleyinsider.com           |         413 |              2.8039 | redacted |                                                                                        
       | 1120 | catonmat.net               |          26 |             43.0769 | redacted |                                                                                        
       | 1090 | techcrunch.com             |         232 |              4.6983 | redacted |                                                                                        
       | 1082 | sfgate.com                 |         386 |              2.8031 | redacted |                                                                                        
       | 1076 | codinghorror.com           |          55 |             19.5636 | redacted |                                                                                        
       | 1036 | techcrunch.com             |         141 |              7.3475 | redacted |                                                                                        
       |  983 | singularityhub.com         |          83 |             11.8434 | redacted |                                                                                        
       |  921 | sethgodin.typepad.com      |          69 |             13.3478 | redacted |                                                                                        
       |  886 | nytimes.com                |          83 |             10.6747 | redacted |                                                                                        
       |  844 | paulbuchheit.blogspot.com  |          19 |             44.4211 | redacted |                                                                                        
       |  822 | gabrielweinberg.com        |          36 |             22.8333 | redacted |                                                                                        
       |  799 | zedshaw.com                |          14 |             57.0714 | redacted |                                                                                        
       |  784 | danieltenner.com           |           8 |             98.0000 | redacted |                                                                                        
       |  775 | centernetworks.com         |         195 |              3.9744 | redacted |                                                                                        
       |  757 | wired.com                  |         123 |              6.1545 | redacted |                                                                                        
       |  722 | antoniocangiano.com        |          95 |              7.6000 | redacted |                                                                                        
       |  718 | daemonology.net            |          22 |             32.6364 | redacted |                                                                                        
       |  708 | treehugger.com             |         280 |              2.5286 | redacted |                                                                                        
       |  699 | itworld.com                |         222 |              3.1486 | redacted |                                                                                        
       |  694 | readwriteweb.com           |         173 |              4.0116 | redacted |                                                                                        
       |  692 | igvita.com                 |          42 |             16.4762 | redacted |                                                                                        
       |  689 | wired.com                  |          45 |             15.3111 | redacted |                                                                                        
       |  685 | googleblog.blogspot.com    |          89 |              7.6966 | redacted |                                                                                        
       |  683 | whattofix.com              |          84 |              8.1310 | redacted |                                                                                        
       |  671 | asserttrue.blogspot.com    |         100 |              6.7100 | redacted |                                                                                        
       |  659 | devcentral.f5.com          |         303 |              2.1749 | redacted |                                                                                        
       |  652 | codinghorror.com           |          21 |             31.0476 | redacted |                                                                                        
       |  650 | 37signals.com              |          48 |             13.5417 | redacted |                                                                                        
       |  646 | david.weebly.com           |          21 |             30.7619 | redacted |                                                                                        
       |  637 | inc.com                    |           5 |            127.4000 | redacted |                                                                                        
       |  637 | joelonsoftware.com         |          13 |             49.0000 | redacted |                                                                                        
       |  632 | centernetworks.com         |         102 |              6.1961 | redacted |                                                                                        
       |  616 | andrewchen.typepad.com     |          46 |             13.3913 | redacted |                                                                                        
       |  615 | techcrunch.com             |          40 |             15.3750 | redacted |                                                                                        
       |  613 | techcrunch.com             |         120 |              5.1083 | redacted |                                                                                        
       |  611 | ajaxian.com                |         199 |              3.0704 | redacted |                                                                                        
       |  610 | steve-yegge.blogspot.com   |           7 |             87.1429 | redacted |                                                                                        
       |  599 | slash7.com                 |           9 |             66.5556 | redacted |                                                                                        
       |  592 | readwriteweb.com           |         157 |              3.7707 | redacted |                                                                                        
       |  587 | tom.preston-werner.com     |           5 |            117.4000 | redacted |                                                                                        
       |  586 | readwriteweb.com           |         107 |              5.4766 | redacted |                                                                                        
       |  553 | readwriteweb.com           |         130 |              4.2538 | redacted |                                                                                        
       |  539 | xconomy.com                |          94 |              5.7340 | redacted |                                                                                        
       |  538 | boston.com                 |          57 |              9.4386 | redacted |
       |  538 | googleblog.blogspot.com    |           8 |             67.2500 | redacted |
       |  526 | blog.asmartbear.com        |           8 |             65.7500 | redacted |
       |  523 | markevanstech.com          |         207 |              2.5266 | redacted |
       |  519 | datacenterknowledge.com    |          91 |              5.7033 | redacted |
       |  515 | economist.com              |          51 |             10.0980 | redacted |
       |  514 | 37signals.com              |          24 |             21.4167 | redacted |
       |  507 | infoworld.com              |         164 |              3.0915 | redacted |
       |  490 | linux-mag.com              |          90 |              5.4444 | redacted |
       |  487 | howtoforge.com             |         228 |              2.1360 | redacted |
       |  486 | esciencenews.com           |         103 |              4.7184 | redacted |
       |  484 | businessinsider.com        |         159 |              3.0440 | redacted |
       |  482 | particletree.com           |          15 |             32.1333 | redacted |
       |  474 | techcrunch.com             |          30 |             15.8000 | redacted |
       |  465 | github.com                 |          17 |             27.3529 | redacted |
       |  463 | paulgraham.com             |           3 |            154.3333 | redacted |
       |  462 | scripting.com              |         118 |              3.9153 | redacted |
       |  461 | mattmaroon.com             |          10 |             46.1000 | redacted |
       |  454 | nytimes.com                |          64 |              7.0938 | redacted |
       |  452 | 37signals.com              |          17 |             26.5882 | redacted |
       |  449 | tipjoys2cents.blogspot.com |          13 |             34.5385 | redacted |
       |  446 | blogs.zdnet.com            |         162 |              2.7531 | redacted |
       |  444 | blog.last.fm               |           2 |            222.0000 | redacted |
       |  440 | nytimes.com                |          37 |             11.8919 | redacted |
       |  435 | mattmazur.com              |          15 |             29.0000 | redacted |
       |  432 | blogs.harvardbusiness.org  |          47 |              9.1915 | redacted |
       |  431 | news.com.com               |         236 |              1.8263 | redacted |
       |  430 | arstechnica.com            |          77 |              5.5844 | redacted |
       |  426 | bits.blogs.nytimes.com     |          72 |              5.9167 | redacted |
       |  422 | redeye.firstround.com      |          41 |             10.2927 | redacted |
       |  422 | paulstamatiou.com          |          25 |             16.8800 | redacted |
       |  421 | 25hoursaday.com            |          48 |              8.7708 | redacted |
       |  420 | venturebeat.com            |         166 |              2.5301 | redacted |
       |  416 | howtosplitanatom.com       |          57 |              7.2982 | redacted |
       |  415 | rondam.blogspot.com        |          17 |             24.4118 | redacted |
       |  415 | foundread.com              |         135 |              3.0741 | redacted |
       |  414 | valleywag.com              |         161 |              2.5714 | redacted |
       |  410 | nytimes.com                |          35 |             11.7143 | redacted |
       |  408 | reynoldsftw.com            |          88 |              4.6364 | redacted |
       |  406 | economist.com              |          92 |              4.4130 | redacted |
       |  404 | nytimes.com                |          91 |              4.4396 | redacted |
       |  397 | nytimes.com                |          37 |             10.7297 | redacted |
       |  396 | adam.blog.heroku.com       |          17 |             23.2941 | redacted |
       +------+----------------------------+-------------+---------------------+----------+

~~~
robg
Can you reveal where I rank of the nytimes rows? I'm looking through face
palms hoping I'm not the 628 submissions. But it wouldn't surprise me
either...it's my daily morning read for over a decade and I've been coming
around here for about three years.

Is that an excuse? :)

In my defense, I can't say I've ever had a "First!" urge. I assume it's just
because I'm early to rise and on the East Coast.

~~~
ErrantX
my first thought was that some of these could well be innocent "click
bookmarklet" first thing in the morning.

~~~
robg
Yeah, the bookmarklet made it very easy to submit with one hand on the mouse
and the other on the coffee.

------
profquail
It would be a cool feature for HN if pg made it so that the number of times a
domain is submitted directly correlates to it's decay rate. Thus, articles
from really popular sites would fall off the front page faster, opening up
spots for articles from newer (to us) domains.

~~~
jacquesm
That's a really good plan.

It is very tricky to do such things without introducing subtle feedback loops
though.

------
10ren
That initial minor boost is enough to throw a submission high up the frontpage
- even 2 points, given close enough together in time, can do it.

 _simple solution_ : remove the automatic upvote given when an article is
submitted that has already been submitted. If people want to upvote it, it's
negligible effort to do so once they get to the story on HN. _bonus_
submitting an article is a quick hack to find it on HN. Often one would also
want to upvote the article, but not always e.g. one might be seeking the
comments to help evaluate the article. Counting these "submissions" as
"upvotes" is inaccurate.

 _summary_ : auto-upvoting of submissions is a needless, inaccurate and
distorting convenience.

~~~
johns
The submitter's vote isn't counted in the algorithm as far as I know. The
original algorithm (since chanced) I believe is (p - 1) / (t + 2)^1.5 with p
being the number of points. So you can see the initial vote is subtracted.

~~~
jacquesm
It isn't with the first one, but it is with all subsequent submissions.

------
petercooper
I wrote a scrappy script to process your report and find the "best" domains to
consistently get "first post" on by multiplying total submissions by average
points received per submission (i.e. total karma benefit). I'll ignore any
sites that got fewer than 300 submissions. Result:

    
    
      techcrunch.com       - 22229
      nytimes.com          - 10178
      readwriteweb.com     - 2482
      thestandard.com      - 1888
      centernetworks.com   - 1510
      technologizer.com    - 1470
      alleyinsider.com     - 1158
      sfgate.com           - 1081
      treehugger.com       - 708
      itworld.com          - 698
      devcentral.f5.com    - 658
      markevanstech.com    - 523
      howtoforge.com       - 487
      news.com.com         - 431
    

TechCrunch is by far in the lead. But why not? They publish good stuff
(usually) that HN readers like to vote up. Same for all the others in the list
too.

~~~
bootload
_"... TechCrunch is by far in the lead. But why not? They publish good stuff
..."_

Articles from TechCrunch span the topical or breaking news spectrum. On rare
occasions, there are posts that really should make the headlines. I'm thinking
of some posts Arrington made last year. Often I've tried reading TC articles a
month or a week later and there is no substance to them. Marshmallow news. Not
my cup of tea.

~~~
petercooper
Not your cup of tea and that's cool but.. a lot of people like ephemeral news.
TC posts aren't essays that'll make good book material, but in terms of a
cutting insight or an exclusive scoop that matters to a lot of people _right
now_ , TC is pretty good.

Also, this site is called Hacker _News_ , rather than Hacker Essays or Hacker
Articles. While I also _prefer_ the essays, articles, and deep blog posts, I
also couldn't say that TC stuff isn't relevant here because it _is_ usually
"news."

~~~
bootload
_"... a lot of people like ephemeral news. TC posts aren't essays that'll make
good book material, but in terms of a cutting insight or an exclusive scoop
that matters to a lot of people right now, TC is pretty good. ..."_

I'm not really disagreeing but I'm wary of posts from TC & a few others
because the posts tend towards sensation, scoops and controversy before fact.

------
yannis
Very interesting, is it possible to check quickly if the points follow
Benford's law? <http://en.wikipedia.org/wiki/Benfords_law> as a first
approximation to see if people are gaming HN?

~~~
jacquesm
Is that how it Benfords law works ?

I thought that Benfords law would allow you to distinguish between a set of
points that is made up and a real one, not between a real set made by real
users and one by real users + users gaming the system.

------
vaksel
I bet I'm somewhere high for Techcrunch(possibly #1...although 716 seems
high), I used to submit a ton of their stuff. Don't do that as often, since
I'm busy with my own site now.

edit: apparently it's not me, since searchyc only shows 91 submissions...and a
few of them are "Ask HN" types

edit #2: apparently it IS me, and searchyc just doesn't work that well.
(thanks for the email Jacques). I'm actually surprised that only accounted for
6K votes, I figured it'd be closer to 15K

~~~
jacquesm
So much for the accuracy of searchyc then.

~~~
vaksel
maybe I did something wrong

what I did was search for vaksel as a user, and then hit within results and
entered techcrunch.

------
johnfn
You know, after reading this, it would be interesting to compile a list of the
top averaging websites as sorted by points on HN. That would be a neat way to
find new sites that we collectively find interesting.

~~~
jcl
You mean, like this?: <http://top.searchyc.com/>

------
araneae
I've briefly considered writing a bot to just submit everything from reddit to
HN.

And then I realized that would make y'all cry.

