Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Top PDFs and Papers Submitted to Hacker News (hackernewspapers.com)
250 points by cloudkj 3 months ago | hide | past | web | favorite | 21 comments



Hmm. Reverse-engineering this page with BigQuery is surfacing a lot more results than the page itself (for 2019 atleast).

https://docs.google.com/spreadsheets/d/1he6ca0BBYbj2ZEOEpu8L...

    #standardSQL
    SELECT id, title, url, score
    FROM `bigquery-public-data.hacker_news.full`
    WHERE timestamp > '2019-01-01'
    AND REGEXP_CONTAINS(url, '.pdf|arxiv.org')
    ORDER BY score desc


Thanks to your post I just learned about the big query public datasets https://www.reddit.com/r/bigquery/wiki/datasets


One of the huge issue on HN is lack of good algorithms for ranking. Currently all posts suffer from insufficient initial exposure to receive enough votes and thus false negatives are extremely high on HN. Just go to new page right now, take a look at most recent posting and you will find that majority of good genuine posts would never make it beyond their initial single upvote. Very small subset gets re-posted few times and only randomly some will get attention. Unfortunately papers are probably the biggest victim so you could have as well have collected all the links, did kind-of-random sort and you might still get fairly decent results. The bottom line is that never rely on HN's flawed ranking. Use that as weak signal, combine with Reddit, citations, tweet counts etc if you want to work around vast sea of false negatives that HN currently is.


I think this is what most social news sites get wrong about their points systems. Someone who submits an article gets points for each vote it gets. But the few heroes that waded through the submissions to initially upvote them get nothing. Even though they likely did a lot more work than the submitter.

I'd think a better algorithm would be that if you upvote an article in the "new" section that eventually makes it to the main page, you'd get at least as many points as the submitter.


That incentivize upvoting everything, since there is no punishment for upvoting posts that don't make it to the frontpage. That could be solved by making votes a limited resource, but that approach has many downsides.


Back when pg used to run the community [0], he'd deeply think about problems [1] and then later solve them [2] but not without controversy [3]. news.yc was better for it with such a strong personality at the helm who'd withstand the backlash and hold on to their convictions. I guess, we need someone of that caliber, someone to step up to usher in to the post-pg era [4], as it were.

What I do to find my share of interesting links is to search on https://hn.algolia.com with different keywords and sift through those discussions, follow links in the comments, follow the commentators previous submissions and posts, follow other submissions from the website, gather new keywords and search again... it is kind of a rabbit hole once you start. Though you'd quickly begin to realise that it is great folks continue to submit links that don't end up hitting the front-page, as the aggregation of links is valuable even though the discussion is missing [5].

That said, I have stumbled upon a lot of interesting things aren't simply shared at all on news.yc, though, I guess this is by design. I just hope folks don't stop sharing links just because their previous submissions didn't generate discussion or upvotes on news.yc.

[0] https://news.ycombinator.com/item?id=7494708

[1] https://news.ycombinator.com/item?id=2403696

[2] http://paulgraham.com/hackernews.html

[3] https://news.ycombinator.com/item?id=3122233

[4] https://hn.algolia.com/?query=%22Ask%20PG:%22&sort=byPopular...

[5] Some would argue that few news.yc comments are worth their weight in upvotes anyway: https://danluu.com/hn-comments/


Team is doing a fine job. It's much harder than the early days. Many want to game the system so it's a SEO vs. anti-spam job to keep this place running smoothly.

Getting to the front page isn't predictable, it's like creating a hit song. Tons of great music gets unnoticed. There's bias and well known names have an advantage.


We did one for Polar at the end of 2018:

https://getpolarized.io/2019/01/08/top-pdfs-of-2018-hackerne...

If you want an awesome PDF reader to read these you should check out Polar :)

Yes. There's a Linux version! ;)

https://getpolarized.io/


Could you add links to the relevant threads on HN to this listing e.g. by linking the HN score there? This would give access to both the PDF for those who just want to read the thing as well as to the discussion for those who wonder what made it reach such dazzling heights on HN.


Thanks for the suggestion. I went ahead and made the change to link the score to the original discussion. It will take a few moments for the CDNs to propagate the changes.

One caveat is that some stories have multiple submissions; I just linked to the one with the highest score for now, but will need to iterate a bit to better handle multiple submissions.


A good suggestion. For those that want to look up the conversion before that suggestion is (hopefully) implemented, you can use the search of HN to search by submission URLs.


I put together something similar but for articles people are linking to on Arxiv here:

https://hntrending.com/links/all/arxiv/index.html


Seems like it hasn't been updated for a few weeks. But it's a cool site!


A link at the top for 2019 would be helpful, maybe change text color for year being displayed? When navigating years returning to 2019 isn't intuitive.


Thanks for the suggestion. I added a link for 2019, and the style for the active year is now bolded. I didn't want to deviate too much from the default color scheme of the theme I'm using, but that may change in the future.



It will be convenient if there's a search box on the page


Now if we could categorize this so that I can get the top crypto :)


It would be great if these could be tagged into a few categories.


That sounds like a fun and useful follow up. My initial thoughts would be to do some unsupervised topic modeling with LDA. Happy to consider other suggestions.


Why not add these research papers in learn-awesome which is building a topic graph? https://github.com/learn-awesome/learn-awesome




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: