I was just telling a friend about a post I had read about buying cheap tools to start out. I couldn't find it at all googling, and the HN search didn't work. I saw this post, searched "cheap tools quality", and immediately found it. Tnank you!
I love this. The search box had an option to autocomplete terms I had recently searched, I clicked on one (a term I was interested in, obviously) and immediately found an interesting HackerNews post. Nice!
I think HackerNews and other link aggregators (e.g. reddit) have a kind of recency problem, where there is a lot of great content, but people only see the recent stuff. This seems like a great way to uncover some of the latent value of old HackerNews content.
Is there a way to suggest content too? e.g. If you liked that, you'd probably also like X, Y, and Z.
>I think HackerNews and other link aggregators (e.g. reddit) have a kind of recency problem, where there is a lot of great content, but people only see the recent stuff.
With Reddit and HackerNews, I want a relative ranking index. I can search by the top content, but something 5th today could have more votes than the top submission of 2015 because of forum/subreddit growth.
I want them ranked by something like the ratio of views to upvotes or upvotes compared to total upvotes for that day.
> I want them ranked by something like the ratio of views to upvotes or upvotes compared to total upvotes for that day.
Yeah, this is definitely possible with public data. I did something similar on reveddit [1] for removed reddit content. Hovering over the graph shows the item with the highest vote ratio [3], and clicking skips to that point in time. Code is here [2] for anyone interested and I apologize in advance..
Absolutely. HN user 'EvanMiller had a lot to say about this, 11 years ago. His tl;dr is that the ranking score should be "the Lower bound of Wilson score confidence interval for a Bernoulli parameter."
I believe HN's ranking system is extremely creative and works great for the main page and day-to-day use (my understanding is that it additionally makes use of time-decay terms for comments and stories). Like you said, it's really just historical search (i.e. algolia) that seems broken.
Agreed! The problem I see is that the score does not correlate well with the perceived quality of the community. I'm researching that for some time now and am in the process of preparing a blog article with data analysis and solution approaches.
I wasn't really saying anything about hndex, apart from that it indexes the contents of external articles submitted to HN, as you said.
The correction I was making was about HN Algolia search (which is linked from the search box at the bottom of each HN page), which only indexes content on news.ycombinator.com itself – i.e., article titles, comments and text-only posts like Show/Ask HN – but not the external content in submitted articles.
This is really useful! You might want to consider hiding the “More”-button if the current page isn’t filled up, so as to not just have an empty page when clicking it.
For some reason I couldn’t find my own blog post[0], even when searching for the embarrassing typo I made in the title - acommodating.
Very nice. Works as expected.
I stumbled about two things that could be improved:
- Add a search Button for convenient mobile use or if a user copy pasts things into to the search field using the mouse
- Add a comment counter on the result page, Since you index every article a lot of them have none or very few comments.
Oh and just a warning, depending on the jurisdiction providing the cache could be problematic under copyright laws since its basically a copy of the article.
That's awesome! I searched for MuleSoft, not expecting anything in particular, and found a random fact (at the end) that certainly wasn't in the title of the article.
Bug report: in Safari on dark mode, the text in the search box is almost the same color as the background (white on white).
I searched for "red light therapy" and none of the articles matched, not even the most recent article on red light therapy that was on the homepage. Same for "red light".
Neat project. It’s a bit depressing how many of the article links don’t work anymore. For example, only the official signal vs noise article works on the first page of results for Basecamp: https://hndex.org/?q=Basecamp
How does ranking work? Apologies if I’ve missed the explanation.
Articles which have been posted to HN often end up in the archives of Archive.org and Archive.is in my experience.
The OP site also has a “cached” link for each article, don’t know if you saw that.
Also, to the maker of HNdex, consider adding links to Archive.org and Archive.is next to the cached link you have, so that readers can click through and check if they have a version of it in case there’s images etc
They easily could (I think it would take the average HN user no more than a couple of hours to implement this functionality), but they won't, because they're very reluctant to make changes.
This is also why HN still looks like a site from the 90s, instead of New Reddit (thank God).
It’s not all bad. We still have HN as it is now. Status quo is its own kind of success. HN is essentially feature complete, so I don’t blame them for not trying to fix what ain’t broke.
Looks like this search engine searches the _articles_ and links that have been shared, whereas hn.algolia.com only searches title, author, and the text of a post if it is a text post.
Ohhhh I first thought this was a search engine for hacker news and I was wondering why the hell anyone would reimplement this, but this is fricking cool. Tbh I wanted to implement something similar for more than hackernews, but this is a nice thing to have. I especially love that you implemented a cache :)
Seems nice. Sort and filter functionality would probably add to this, but I will bookmark this for sure and try it out as a search engine for tech topics in general.
Holy crap this is fast! At first I was mildly disappointed it wasn't a live search like hn.algolia.com, but I was seriously not expecting the results page to load so quickly. I guess goog et al. trained me to expect garbage latency in internet search forms.
This seems to be a good complement to algolia, given that it searches through the linked pages instead of comments.
Minor nitpick: would it be possible to make it give a 'past' link, to search for all discussions on a result? Some of the 'comments' take you to duplicate posts with no comments instead of the more popular cases.
Very cool implementation. I am wondering have you considered using something like https://github.com/cliqz-oss/adblocker which can sit on top of headless browser and do not require bridging to an extension.
Nice tool, though I tried a few searches, and a few searches of <same text> + ' hackernews' on google, and I gotta say I like the google results better. Search options for comments/titles/users, date range, ask/show/jobs, sorting by date/votes (weighted?), etc would be nice additions.
Great idea, I appreciate how fast it returns results! Just needs more more control over the search parameters and figuring out why articles like the example I posted above aren't working and you got yourself a nice HN search.
Here we are in 2020, and web search is not what I’d hoped it would be. I would love to see more niche, small search tools like this rather than better general-purpose search. Building upon that, maybe an aggregator of search results from many small niche search engines. Actually, just writing that sentence reminded me of Searx [0], which I have to admit I haven’t tried in earnest but really should.
There's also Falcon[1] Chrome extension which does full text indexing on your browser history so if you read something and can't either Google it or find it in the browser's history, Falcon will broaden the search scope.
Why? Because those old posts have a "what's new" set of links. One of which contains my name. I'd suggest only searching the `<main>` element, perhaps?
This produces such a unique set of results for things.
Fantastic project and well worth creating!
As another commentor has noted - this totally disobeys recency bias and throws up interesting articles for a topic.
Edit - it is disappointing how many of these links 404. But even if that's the case the headline and intro is a set of time capsules of sorts nonetheless.
Could you add post points and how much time since/when was the article posted? It would help as some topics do not age as well as others. Having sorting (most popular, most recent) and range search (last year, custom range) would also help those who want to narrow their search.
Yes, with so much context missing from the results it's pretty confusing.
Ideally for this to be useful I'd want date on HN, date of article (although I see that'd perhaps be hard to extract from unstructured pages) and also the HN points, as those are a major proxy to quality usually.
As others have said it has a nice crisp UI and i like that it's so quick.
I would love to see the source and/or a write up about the stack used here. I've thought about making something similar, but got distracted while in the planning phase.
I don't know why but to me the background is #000 black with #fff text by default...
It's such an intense "dark mode" that I couldn't read the first full article of interest I found because the contrast was killing me, and then coming back to regular HN left me nearly blinded for a few secs
There are much fewer books in the last six or so weeks editions. Don't know why this is. Whether it is some scraper not working, or actually no books being mentioned.
The post in question: https://www.johndcook.com/blog/2020/07/25/worst-tool-for-the...