Hacker News new | past | comments | ask | show | jobs | submit login

Good stuff here. I have tried to build something similar before but nothing came out of it.

1. Repeated content.

I don't want to read what I have already read from a different place. I built a simple LSH based filter. I experimented with few ways to sort out text and process it. It worked.

2. Filter controversy.

I tried Bayesian fitler initially and moved to logistics regression using tf-idf. I settled on Bayesian because my dataset became very expansive. I used news-site corpus and manual entries from reddit/HN. I used sentiment analysis using a dictionary but it worked only in very specific cases. I do like some controversial and pessimistic content.

3. Filter clickbaits.

I couldn't filter the clickbait and gave up. There are ton of clickbaits on HN which I loved after I read them but ton of them are terrible and a huge waste of time. No reliable way to distinguish based on an article too. Length is not a good feature, negativity is not a good feature (I like to read strong opinions from say, founder of an open source analytics company criticising a big company for malpractices and how they fix those), sentence complexity is not good feature, and ton more.

4. Relying on user input is bad.

I read ton of nonsense everyday that I could go without knowing. I click on those links and that is a not me saying you should show me more of that. I don't want to do manual work of training something either. It's friction and I don't like it.

anyway, good luck on your site!




Thanks for your feedback. I appreciate the tip on the LSH filter. One thing that has helped so far is restricting the content manually and with heuristics, as it prevents clickbait/controversial/malicious content. If I can implement collaborative filtering in future, I'll need to think about how to weight 'stars' from users to prevent malicious behaviour and improve recommendations, but I am getting ahead of myself :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: