
Show HN: Export HN Favorites to a CSV File - gabrielsroka
I wrote a short JavaScript snippet to export HN Favorites to a CSV file.<p>It runs in your browser like a browser extension. It scrapes the HTML and navigates from page to page.<p>Setup and usage instructions are in the file.<p>Check out <a href="https:&#x2F;&#x2F;gabrielsroka.github.io&#x2F;getHNFavorites.js" rel="nofollow">https:&#x2F;&#x2F;gabrielsroka.github.io&#x2F;getHNFavorites.js</a> or to view the source code, see getHNFavorites.js on <a href="https:&#x2F;&#x2F;github.com&#x2F;gabrielsroka&#x2F;gabrielsroka.github.io" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;gabrielsroka&#x2F;gabrielsroka.github.io</a>
======
sbr464
I made an API for favorites a while back.

[https://github.com/reactual/hacker-news-favorites-
api](https://github.com/reactual/hacker-news-favorites-api)

Here's an example query:

[https://reactual.api.stdlib.com/hnfavs/?id=sbr464&limit=1](https://reactual.api.stdlib.com/hnfavs/?id=sbr464&limit=1)

~~~
gabrielsroka
Ok, now that's really smart!

Would a client using your API paginate to fetch, say, 50 pages? When I tried
it using ?limit=50, I got a 504 error.

Thanks!

(Edit, never mind, I see you explain it in the readme.)

~~~
sbr464
I had made it pretty quickly, the limit would be too large, thats 50*30, you
may need to provide an offset and make a few requests.

------
jaytaylor
This is cool, I love HN metadata, too :)

Plug for a related golang tool I wrote and use which exports favorites upvotes
as structured JSON:

[https://github.com/jaytaylor/hn-utils](https://github.com/jaytaylor/hn-utils)

Just

    
    
        go get github.com/jaytaylor/hn-utils/...

------
simonw
It's a shame favorites aren't exposed in the official HN API:
[https://github.com/HackerNews/API](https://github.com/HackerNews/API) \- this
is a smart workaround.

~~~
dang
Our plan is for the next version of HN's API to simply serve a JSON version of
every page. I'm hoping to get to that this year.

~~~
bhl
Are there plans for an export tool, e.g. a user downloading all their comments
and upvoted submissions? I tend to use the submission upvote button more than
the favorite one, and an export tool wouldn't require a user API key for non-
private info.

~~~
simonw
My tool here can export all of a user's comments:
[https://github.com/dogsheep/hacker-news-to-
sqlite](https://github.com/dogsheep/hacker-news-to-sqlite)

Not upvoted yet as they aren't in the API I am using.

------
dvfjsdhgfv
This is smart. I'm adding this to my HN favorites.

~~~
gabrielsroka
Thanks dvfjsdhgfv. If there's sufficient interest, I can easily turn it into a
Chrome extension.

(Edit: haha, I see what you just did there. A little recursive humor.)

------
catchmeifyoucan
This is great! My biggest problem was I couldn’t search through my upvoted
items to find the article I liked again. I used google custom search and
cleaned the data as flat urls.

[https://www.heyraviteja.com/post/projects/deep-search-
hn/](https://www.heyraviteja.com/post/projects/deep-search-hn/)

~~~
catchmeifyoucan
oops - I didn’t realize that favorites != upvoted.

------
zerop
Can it will be done with "Scrape Similar" Chrome plugin?

~~~
gabrielsroka
Thanks for the tip. I gave the "Scraper" extension a try, and 1) I got an
error, 2) it only seems to scrape 1 page -- it doesn't paginate (or, did I
miss something?).

I used the jQuery selector `a.storylink`.

------
rtcoms
Is there any way to find most Favorited items on HN ?

------
app4soft
Could someone convert it to _Python_ -script?

~~~
gabrielsroka
Part of the advantage of running JavaScript in your browser is that you might
already be authenticated and it can use your session. But, fetching your HN
favorites doesn't require authentication.

    
    
      #!/usr/bin/env python3
      import requests
      from bs4 import BeautifulSoup
    
      for p in range(1, 17):
          r = requests.get(f'https://news.ycombinator.com/favorites?id=app4soft&p={p}')
          s = BeautifulSoup(r.text, 'html.parser')
          print([{'title': a.text, 'url': a['href']} for a in s.select('a.storylink')])

~~~
app4soft
Thanks!

One more question: what is the best way stop it when it will reaches last
page?

> _for p in range(1, 17):_

Actually _p=17_ [0] is empty (as _p=16_ is maximum as for now).

Maybe, script should scrap pages from `1` to `infinity` UNTIL it detect next
message on page[0]:

> _app4soft hasn 't added any favorite submissions yet._

[0]
[https://news.ycombinator.com/favorites?id=app4soft&p=17](https://news.ycombinator.com/favorites?id=app4soft&p=17)

~~~
gabrielsroka
In Python, `range(1, 17)` produces the numbers 1-16. I hard-coded it just for
your favorites.

A better way to solve it is to look at the `len()` of the results, and stop
when it gets to 0:

    
    
      p = 1
      while True:
          r = requests.get(f'https://news.ycombinator.com/favorites?id=app4soft&p={p}')
          s = BeautifulSoup(r.text, 'html.parser')
          faves = [{'title': a.text, 'url': a['href']} for a in s.select('a.storylink')]
          print(faves)
          p += 1
          if len(faves) == 0:
              break

~~~
app4soft
Great!

~~~
gabrielsroka
I think this one is a little cleaner. I used some of the ideas in sbr464's
code.

    
    
        path = 'favorites?id=app4soft'
        while path:
            r = requests.get('https://news.ycombinator.com/' + path)
            s = BeautifulSoup(r.text, 'html.parser')
            print([{'title': a.text, 'url': a['href']} for a in s.select('a.storylink')])
            more = s.select_one('a.morelink')
            path = more['href'] if more else None

~~~
app4soft
Go on! ;)

------
abdullahkhalids
What is HN's GDPR compliant way of requesting a copy of all stored data? Email
dang?

~~~
tzs
Considering that there is no mention of GDPR in the HN FAQ or on the "legal"
page, my guess is that their position is that GDPR does not apply.

According to Article 3 of the GDPR, it applies to:

1\. Processing that takes place in the context of processors and controllers
that are in the Union, regardless of whether or not the processing itself
takes place in the Union.

2\. Processing the data of subjects who are in the Union by controllers or
processors who are not in the Union if the processing is related to offering
goods or services to such subjects in the Union or the processing is related
to monitoring the behavior of such subjects that takes place in the Union.

I don't know how HN is structured, but I've not seen any indication that they
are in the Union, so #1 probably does not apply.

#2 applies if they are doing processing related to "offering goods or services
to such subjects in the Union" or "monitoring the behavior of such subjects
that takes place in the Union".

One of the recitals elaborates on the first branch of that:

> In order to determine whether such a controller or processor is offering
> goods or services to data subjects who are in the Union, it should be
> ascertained whether it is apparent that the controller or processor
> envisages offering services to data subjects in one or more Member States in
> the Union. Whereas the mere accessibility of the controller’s, processor’s
> or an intermediary’s website in the Union, of an email address or of other
> contact details, or the use of a language generally used in the third
> country where the controller is established, is insufficient to ascertain
> such intention, factors such as the use of a language or a currency
> generally used in one or more Member States with the possibility of ordering
> goods and services in that other language, or the mentioning of customers or
> users who are in the Union, may make it apparent that the controller
> envisages offering goods or services to data subjects in the Union.

Does HN "envisage" offering services to people in the Union? Or are they a
site that is merely accessible from the Union without envisaging offering
services there?

There's a recital that elaborates on the second branch, too:

> In order to determine whether a processing activity can be considered to
> monitor the behaviour of data subjects, it should be ascertained whether
> natural persons are tracked on the internet including potential subsequent
> use of personal data processing techniques which consist of profiling a
> natural person, particularly in order to take decisions concerning her or
> him or for analysing or predicting her or his personal preferences,
> behaviours and attitudes.

Does the data HN stores about its users satisfy this? And if it does, is the
behavior being monitored taking place in the Union?

