
Show HN: Top PDFs Posted to Hacker News in 2019 Computed from Internet Archive - burtonator
https://getpolarized.io/2020/01/19/over-500-top-pdf-posted-to-hacker-news-2019.html
======
burtonator
I computed this list of top PDFs using the Internet Archive data.

Basically it crawls the front page of HN and finds any .pdf link, records the
number of upvotes, then sorts by score.

I did the same thing last year and it was a great way to find awesome/unique
PDFs.

Ironically, the first post was about web scraping being legal and here I am
computing the list via web scraping!

~~~
inetknght
> _it crawls the front page of HN and finds any .pdf link, records the number
> of upvotes, then sorts by score_

How do you deal with the numbers changing over time?

~~~
burtonator
I take the max in a 24 hour period.

I don't always have snapshots for all hours though so there might be some skew

------
jayaram
There's also
[http://www.hackernewspapers.com/](http://www.hackernewspapers.com/)

------
Invictus0
@dang: please note this user is constantly spamming self-promotion links to
his site.

~~~
ASalazarMX
Fellow user here. Is it forbidden? This user submits mostly content from his
blog (sometimes twice), but I can't find anything against self-promotion in
the guidelines.

This is the closest guideline:

> Don't solicit upvotes, comments, or submissions. Users should do those
> things when they run across something they personally find interesting, not
> because someone has content to promote.

