
Show HN: Top PDFs and Papers Submitted to Hacker News - cloudkj
https://www.hackernewspapers.com/
======
minimaxir
Hmm. Reverse-engineering this page with BigQuery is surfacing a lot more
results than the page itself (for 2019 atleast).

[https://docs.google.com/spreadsheets/d/1he6ca0BBYbj2ZEOEpu8L...](https://docs.google.com/spreadsheets/d/1he6ca0BBYbj2ZEOEpu8L7Rqab8XLJoqBhkXYdSDVreE/edit?usp=sharing)

    
    
        #standardSQL
        SELECT id, title, url, score
        FROM `bigquery-public-data.hacker_news.full`
        WHERE timestamp > '2019-01-01'
        AND REGEXP_CONTAINS(url, '.pdf|arxiv.org')
        ORDER BY score desc

~~~
seedie
Thanks to your post I just learned about the big query public datasets
[https://www.reddit.com/r/bigquery/wiki/datasets](https://www.reddit.com/r/bigquery/wiki/datasets)

------
sytelus
One of the huge issue on HN is lack of good algorithms for ranking. Currently
all posts suffer from insufficient initial exposure to receive enough votes
and thus false negatives are extremely high on HN. Just go to new page right
now, take a look at most recent posting and you will find that _majority_ of
good genuine posts would never make it beyond their initial single upvote.
Very small subset gets re-posted few times and only randomly some will get
attention. Unfortunately papers are probably the biggest victim so you could
have as well have collected all the links, did kind-of-random sort and you
might still get fairly decent results. The bottom line is that never rely on
HN's flawed ranking. Use that as weak signal, combine with Reddit, citations,
tweet counts etc if you want to work around vast sea of false negatives that
HN currently is.

~~~
gmiller123456
I think this is what most social news sites get wrong about their points
systems. Someone who submits an article gets points for each vote it gets. But
the few heroes that waded through the submissions to initially upvote them get
nothing. Even though they likely did a lot more work than the submitter.

I'd think a better algorithm would be that if you upvote an article in the
"new" section that eventually makes it to the main page, you'd get at least as
many points as the submitter.

~~~
smitop
That incentivize upvoting everything, since there is no punishment for
upvoting posts that don't make it to the frontpage. That could be solved by
making votes a limited resource, but that approach has many downsides.

------
burtonator
We did one for Polar at the end of 2018:

[https://getpolarized.io/2019/01/08/top-pdfs-
of-2018-hackerne...](https://getpolarized.io/2019/01/08/top-pdfs-
of-2018-hackernews.html)

If you want an awesome PDF reader to read these you should check out Polar :)

Yes. There's a Linux version! ;)

[https://getpolarized.io/](https://getpolarized.io/)

------
Yetanfou
Could you add links to the relevant threads on HN to this listing e.g. by
linking the HN score there? This would give access to both the PDF for those
who just want to read the thing as well as to the discussion for those who
wonder what made it reach such dazzling heights on HN.

~~~
cloudkj
Thanks for the suggestion. I went ahead and made the change to link the score
to the original discussion. It will take a few moments for the CDNs to
propagate the changes.

One caveat is that some stories have multiple submissions; I just linked to
the one with the highest score for now, but will need to iterate a bit to
better handle multiple submissions.

------
danielecook
I put together something similar but for articles people are linking to on
Arxiv here:

[https://hntrending.com/links/all/arxiv/index.html](https://hntrending.com/links/all/arxiv/index.html)

~~~
Bolderman
Seems like it hasn't been updated for a few weeks. But it's a cool site!

------
ZoomStop
A link at the top for 2019 would be helpful, maybe change text color for year
being displayed? When navigating years returning to 2019 isn't intuitive.

~~~
cloudkj
Thanks for the suggestion. I added a link for 2019, and the style for the
active year is now bolded. I didn't want to deviate too much from the default
color scheme of the theme I'm using, but that may change in the future.

------
epicwhaleburger
Related: [https://fermatslibrary.com/](https://fermatslibrary.com/)

------
itdog
It will be convenient if there's a search box on the page

------
baby
Now if we could categorize this so that I can get the top crypto :)

------
karmakaze
It would be great if these could be tagged into a few categories.

~~~
cloudkj
That sounds like a fun and useful follow up. My initial thoughts would be to
do some unsupervised topic modeling with LDA. Happy to consider other
suggestions.

~~~
mathnmusic
Why not add these research papers in learn-awesome which is building a topic
graph? [https://github.com/learn-awesome/learn-
awesome](https://github.com/learn-awesome/learn-awesome)

