
Show HN: HN Domain Leaderboard - refrigerator
https://hnleaderboard.com/?
======
minimaxir
So that others can play with the data, here's a reverse engineering of the
BigQuery OP used to create the leaderboard:

    
    
       #standardSQL
       SELECT
         domain,
         COUNT(*) AS num_posts,
         perc_75,
         AVG(score) AS avg_score,
         (AVG(score) + 2*perc_75) * LOG(COUNT(*)) AS calc_score
       FROM (
         SELECT
           REGEXP_REPLACE(NET.HOST(url), 'www.', '') AS domain,
           score,
           PERCENTILE_CONT(score,
             0.75) OVER (PARTITION BY REGEXP_REPLACE(NET.HOST(url), 'www.', '')) AS perc_75
         FROM
           `bigquery-public-data.hacker_news.full`
         WHERE
           type = 'story'
           AND url IS NOT NULL )
       GROUP BY
         domain,
         perc_75
       ORDER BY
         calc_score DESC
    

Top 10000 results:
[https://docs.google.com/spreadsheets/d/1Z9atmizTAPkgFiBte2eQ...](https://docs.google.com/spreadsheets/d/1Z9atmizTAPkgFiBte2eQiQgxEAiMyMB7Q99fzMzfIJs/edit?usp=sharing)

(it's apparently not a perfect match since there appears to be a minimum # of
posts requirement for domains [e.g. without that requirement,
[https://news.ycombinator.com/from?site=pardonsnowden.org](https://news.ycombinator.com/from?site=pardonsnowden.org)
is #3], which should be added to the description of the leaderboard)

~~~
refrigerator
I think that's pretty much it, though my domain regex is much uglier! Yeah
you're right about a minimum # of posts cutoff — I set it to 25. Forgot to add
this to the description but have added it now. Thanks!

------
foob
Very cool, thanks for sharing! I did a somewhat similar analysis a while back
[1], and I found that many of the top domains either had a YC affiliation or
corresponded to extremely well-known companies or organizations. This made me
interested in finding lesser known blogs that also produce high quality
content. I tried to identify these by putting a limit on the number of unique
users who had submitted content from each domain. My thinking here was that
something like the GitHub blog would have submissions from many users, while
smaller personal blogs would probably be mostly self-promoted. Using this
approach, I was able to turn up some pretty interesting blogs that I had never
heard of before.

I think it could really increase the usefulness of HN Domain Leaderboard if
you added some additional filtering capabilities. Filtering based on the
category would probably be pretty easy because you have that information there
already, but perhaps also consider some measure of how broadly promoted each
domain is. The time range option is already pretty cool, and I'll bet that a
few more options would make it even more fun to play around with.

[1] - [https://intoli.com/blog/pareto-optimal-
blogs/](https://intoli.com/blog/pareto-optimal-blogs/)

~~~
refrigerator
Thanks! Really interesting blog post and cool idea for surfacing small high-
quality blogs.

I'd been planning to add filtering on categories from the start, but it was
meant to be a weekend project so my motivation had started to drop after that,
so I just wanted to put it out there. Will add extra filters in the next few
days!

------
aphextron
I'd really like to see the opposite of this: domains that have been flagged
multiple times and have a high submissions-to-upvotes ratio so that I can
filter them out.

~~~
peterwwillis
While we're asking for features, I would like the opposite of that: a way to
see only flagged domains, but ideally filtering out the spam/junk in some way.

------
ghayes
It would be great if you could add top posts from each of these domains. I am
really interested to see the top content I may have missed from a few of these
domains.

~~~
O_H_E
Until he adds that, we could rely on the official HN search

[https://hn.algolia.com](https://hn.algolia.com)

------
aaronhoffman
This is a little out of date but may be of interest here. This is a
visualization of the top 10,000 HN posts
[https://www.sizzleanalytics.com/Boards/sizzle/Hacker-News-
To...](https://www.sizzleanalytics.com/Boards/sizzle/Hacker-News-Top-Posts-
All-Time/dfb2af8e-67fa-47a7-892c-435de6321378)

------
mzzter
I would have thought bravenewgeek.com would make it onto the leaderboard since
his posts [1] are typically high quality.

[1]
[https://news.ycombinator.com/from?site=bravenewgeek.com](https://news.ycombinator.com/from?site=bravenewgeek.com)

------
sli
Kind of amazing that the Rust blog, something relatively new, is the top
domain of all time.

------
glaberficken
Ah! was searching around for exactly this just a week ago and gave up. Could
you add more granular date filters? (past month past week etc?) thanks for
doing it!

------
bhhaskin
Interesting that there are no News related domains in the list. I wounder if
that is due to the number of posts those domains have that never gain any
traction.

~~~
lefstathiou
Seems off to me. Anecdotally I am constantly reading Bloomberg news articles
linked to on front page of HN.

~~~
finnnk
Switch to the view that shows just the last year and you'll see lots of news
sites.

This matches what I've been sensing (rightly or wrongly) in that the mix of
stories has shifted from computer science and engineering to have a more
business and general interest mix since I first started reading HN regularly.
This goes for the sources as well as the stories themselves.

~~~
bhhaskin
I agree with your observations. It is getting further and further away from
it's roots.

------
raymondgh
How did you determine the domain categories?

~~~
refrigerator
I actually hand-labeled the categories, but only for roughly the top 100
domains in each time period. The categories are a little bit arbitrary (where
is the line between individual and blog, or blog and publication?) but I was
mainly interested in seeing the distinction between individuals and companies.

------
alexchamberlain
Is mean a valid statistic for this dataset?

I suspect that the score a link gets is highly variable and doesn't follow a
known distribution, therefore, taking a straight mean may not be a valid thing
to do, or at the very least, very very skewed.

That being said, cool idea, well executed.

~~~
refrigerator
[https://hnleaderboard.com/about](https://hnleaderboard.com/about) :)

~~~
et-al
Love the site. It's clear and to the point.

One thing, if I may add, is that you need a link from the "about" page back to
the main page.

Thanks again for sharing!

~~~
refrigerator
Glad you like the site :) Good point — I'll add that in.

------
downandout
Interesting that so many of the top sites are "individual". I always thought
that self promotion was shunned on places like HN, but I guess if you do it in
the "right" way, it can be a successful tactic.

~~~
grzm
The articles for those sites aren’t necessarily being submitted by the people
who run them. In most it’s likely others who found the content useful. I
suppose if you squint anything you post on your own site is self-promotion,
but that’s not the impression I get from your comment. Am I mistaken?

~~~
downandout
They all include somewhat useful articles, but some of them (once you arrive)
are also quite self promoting. I think the odds are quite slim that _this_
many others randomly stumbled across articles on obscure domains and submitted
them repeatedly, so yes I think the owners of them are in many cases
responsible for them having been submitted to HN.

But I wasn’t actually disparaging any of them, or suggesting anything
inappropriate has happened. I guess I’m just surprised that HN as a community,
with such a high degree of anti-commercialism on the site, has given these
sites this much exposure.

------
ninjakeyboard
Aphyr needs more upvotes :)

EDIT: never mind - on the three year view he's in the top 10

------
matte_black
I thought this would be a leaderboard of what users get the most votes for
comments on different topics.

------
dsacco
Karpathy got into the top 20 most upvoted domain submissions? I don't even
remember that many.

------
ecesena
nit: blog.pinboard.in is classified as individual

~~~
whorleater
Considering it's one person, it might as well be individual.

------
skullum
hnleaderboard insecure connection rip

~~~
billysielu
Also blocked by Cisco Umbrella. "This site is blocked due to a security threat
that was discovered by the Cisco Umbrella security researchers."

