Hacker News new | past | comments | ask | show | jobs | submit login

Direct URL to project details: https://devpost.com/software/tagger-news

A few comments:

1. To other commenters, as with the HN Vue demo a week ago (https://news.ycombinator.com/item?id=14284877), the project is a technical proof-of-concept; the aesthetics aren't the primary focus.

2. The Algolia API is better for scraping because it allows for bulk requests, unlike the official API (my old 2014 script still works I think: https://github.com/minimaxir/get-all-hacker-news-submissions...)

3) How much time did it take to manually label the training/test set before training the RF classifier? Even with topic modeling for extrapolating tags, accurate labeling for 20,000 submissions is a task.




One of the devs here.

1. That's the way we were thinking about it :)

2. Oh, excellent! We hadn't found that or we'd have used it, and we'll start working with it.

3. Tomorrow I'm going to blog about how we approached the machine learning. Short version; we manually came up with regular expressions to classify a training set based on titles. The idea is that when we experimented with manual annotations on titles, the vast majority of the time we were looking for only a few key words. There's no question that this adds biases and will not be entirely accurate, but manual inspection convinced us it was a good enough approach for our hackathon, and most of the articles we identified with the resulting algorithm would not have been found by the title regex alone.

You can see the table of regular expressions [here](https://github.com/dodger487/analyze_hn/blob/master/topics.c...) and a bunch of (pretty unstructured) analysis code [here](https://github.com/dodger487/analyze_hn/blob/master/hn-analy...).


This is awesome ! Congrats..

https://github.com/HackerNews/API

The firebase API is excellent. I have been using that to keep http://searchhn.com up to date in real time.

Also big query is updated every day with all comments and posts. https://bigquery.cloud.google.com/dataset/bigquery-public-da...

This is what I started with to update the Searchera (https://searchera.io) index which powers Searchhn


Oh, that was silly of us not to use BigQuery! I was just able to use that download a full million stories (though we still would have had the rate-limiting step of downloading the articles).

During a hackathon it can be hard to tell when to keep searching for an easy solution like that, as opposed to going with something slow you know will work- sometimes it turns out to be a dead end.

Thanks for the recommendations!


I've now blogged in more detail about building Tagger News- check it out here! https://news.ycombinator.com/item?id=14343854


Hey mate, you should follow this guide step by step when you deploy a django app: https://docs.djangoproject.com/en/1.11/howto/deployment/chec...

BTW, congrats for the projects, well done!


The Awful Reign of the Red Delicious (2014) (theatlantic.com) is tagged 'Microsoft' 'Apple'

Might wanna tweak that...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: