Ask HN: Can I download my HN data?
76 points by conroy on Dec 19, 2018 | hide | past | web | favorite | 27 comments
Github recently announced the ability to download all your Github data[0]. Facebook, Twitter, Instagram, and Google all offer similar services. Is there a way to do this for my comments, votes, and submissions on HackerNews?

[0] https://blog.github.com/2018-12-19-download-your-data/

FYI it's from 2015 but there's a dump in the BigQuery public datasets: https://bigquery.cloud.google.com/dataset/fh-bigquery:hacker... (along with a bunch of other interesting things)

I wrote a small python library to assist in querying the HN search API a little while back:


I also have an example of how to get all of a single user's comments:


You could use a similar method for submissions. I'm not sure how getting all the stories or comments you've voted on would work, however.

The stuff you can't scrape is the interesting stuff.

What stuff can you not scrape?

Sessions, IP addresses and User Angents associated with oneself.

Do we even know whether HN stores this information?

Well, no.

How somebody voted/flagged on posts.

If you’re in the EU you can refer to the GDPR and request all the data they have on you.

Sure you can, but HN is not GDPR-compliant, so you'll just get pointed at the public API.

How can HN just choose to be noncompliant? Aren't there penalties?

I don't understand the jurisdiction of GDPR very well, but I thought it applied to all EU users.

It does apply to all EU users and they risk fines if they don’t comply.

A law is only as good as it is enforced

It does not apply as HN is not in the EU. If an EU citizen does not want their data collected then they can choose not to participate, as the EU has no jurisdiction over HN.

That isn't how the GDPR works. There are many GDPR primers on the web, here is a random one. https://www.recode.net/2018/5/16/17360944/gdpr-us-business-e...

The EU claims that's not how it works. Everyone else claims that is how it works. It's highly unlikely that the EU will actually be able to enforce it globally.

It's highly unlikely that the EU will actually be able to enforce it globally

If you want to do business in some way with the EU, or have your business officers visit the EU, then that is how it works. The EU took a leaf out of the USA "global jurisdiction" book.

Yeah if Y Combinator chooses to never do business in the EU (considering one of their companies is Afrostream who does business in the EU, this may be up for debate), they may be able to get away with it. My company only targets US citizens and EU citizens would never get any value of any kind in any way at all from my business, so GDPR is not on my radar. But YC might have a harder time making that claim.

Only if they get hit with a notice to comply and then they have a grace period in which to comply, so that's how.

And if they don't comply, what then? None of the penalties can actually be enforced if they have no presence in Europe, unless the US decides to cooperate, which seems unlikely.

There's a process by which US courts can enforce foreign judgments. I wouldn't be surprised if supervisory authorities apply to do this for intransigent US companies.

Well this is why I always say "there are no rules, only realities".

you can be noncompliant if you have no jurisdication inside the eu, i.e. if hn has nothing inside the eu where the eu can actually send a fine. also I doubt that the eu would penal hn, because you can delete your username which will impersonate yourself and also hn does not really save that much personal data.

GDPR requests provide 30 days to respond. It's quite possible they have a manual script they can run to grab your data / delete your account, but it isn't exposed via the web.

As others say, you can get the comments and submissions via the API.

To start with you can try this for the public information: http://hnuser.herokuapp.com/user/conroy/json

I wrote hnuser (http://hnuser.herokuapp.com) a long time ago to show karma over time and haven't looked at in awhile but at first glance it seems to still work. It also exists as an npm package if you want to run it from the commandline.

Not directly, but you may find their API useful: https://github.com/HackerNews/API

Scrape it? It's in your profile.

