I built this small app in my spare time to aggregate books recommended on Hacker News. I personally find books recommended on HN to be super helpful, so I think this is the way that I can contribute back.
This book aggregation idea is not new. A bunch of sites have done similar things [1, 2, 3].
Yet one common limitation of those sites is that they have limited recall (i.e. not able to get a comprehensive set of book mentions), and thus don't paint an accurate picture of what the top books are. They're all based on insufficient rules, e.g., looking for Amazon Links. As you can see from my app, people often do not include Amazon links when recommending a book.
I wondered, why can't we just match book names? Well, not so easy. Some books have pretty short names, e.g. Meditations , or Steve Jobs . Some book name might as well be the name of a movie, e.g. Ready Player One . Simply matching the names of the books would produce a whole lot of irrelevant results.
This is where Deep Learning comes into play. Recent advances in large NLP models (transformers and BERT in particular) have made machine language understanding unprecedentedly accurate. It enables me to fine-tune a BERT model on a couple thousand labeled HN comments and predict accurately whether each word in a comment is part of a book or not - a task commonly termed as Named Entity Recognition (NER).
As a result, my app is able to present a whole lot more results while maintaining desirable accuracy. For example, NER works pretty well on the tough examples I mentioned ([4, 5, 6]). Compared to prior sites, my app captures 9-50X more mentions and thus presents a much more complete picture of what books are recommended on HN.
Furthermore, I've made sure that the comments are presented well in the UI because the recommendations are just as useful as the books. I highlighted the mentioned book name, and used a custom NLP-based ranking function to sort the comments. These are non-trivial improvements over prior sites, which I hope you can find useful.
Nevertheless, this app is not without limitations: 1) matching book names would fail when two books have the same or similar names; 2) although not often, this approach would wrongly classify some short stop-word names  and 3) sometimes NER fails to see that the commenter actually hates the book. These problems can be alleviated with more Deep Learning. For 1), one can use BERT to learn the authors mentioned which can be used as a filtering criteria. 2) and 3) should be fixable with more training data (currently there are only ~4,000 hand-labeled HN comments).
Lastly, I'd like to especially thank my gf who helped me label ~1,000 comments, which boosted the model accuracy by 5 percent! I also want to thank the people who create and maintain the HackerNews big query dataset . And of course, thank everyone on HN who recommends books to others.
Hope you enjoy this app! Feedback and suggestions are welcome :)
P.s. The amazon links are NOT sponsored. This app is free of monetization.