Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Hacker News with Tags (histre.com)
99 points by kirubakaran on May 12, 2023 | hide | past | favorite | 54 comments
Hi, I’m Kirubakaran. I’m building histre - a knowledge tool for individuals and teams.

One of the features of histre is to auto-organize your knowledge. I thought that a fun way to demo that could be to apply that to the Hacker News front page.

This page mirrors HN with tags automatically applied: https://histre.com/hn/

You can filter by or exclude multiple tags. For example, if you’re tired of posts related to ai and politics, this will remove them https://histre.com/hn/?tags=+all-ai-politics

The tags for the posts are picked by gpt-3.5

You can get these tags inside Hacker News itself with these open-source browser extensions for Chrome and Firefox:

Source: https://gitlab.com/histre/hn-tags

Chrome: https://chrome.google.com/webstore/detail/hacker-news-tags/i...

Firefox: https://addons.mozilla.org/en-US/firefox/addon/hacker-news-t...

People use https://histre.com/ to keep track of all kinds of web research, make highlights, collaborate with their teams, generate documentation from chat conversations, automatically extract information from pages and create comparison tables, etc. I’m excited to be building a comprehensive knowledge tool.

If you can play with it and share your thoughts, I’d really appreciate it.




Some cool uses for this:

- Remove AI and Crypto from HN https://histre.com/hn/?tags=+all-ai-cryptocurrency

- Don't miss any Lisp news https://histre.com/hn/?tags=+emacs+lisp


It seems pretty remarkable to me that despite how relevant and ubiquitous AI is becoming (e.g. generative search is coming to Google soon, and from their latest I/O conf it's clearly a big part of nearly all other G products), Hacker News discussions on the topic are of such low quality that it's desirable to block them out entirely. Any sufficiently advanced technology discussion is indistinguishable from tragic.


Maybe its the quantity, not the quality that makes it desireable to filter it out. I have examples where I’ve filtered highly actual news just because its so taxing to be constantly reminded of it.


It's just like the goddamn metaverse.

Since it is strongly tied into crypto shenanigans, now that we're in a crypto slump the discussions have (mercifully) slowed down. AI is still in its current hype cycle, and it will take some time for the next "winter".


slump hype-wise maybe, but market is doing surprisingly well.


I already use your HN integration to share my upvotes with friends and see their upvotes inline on the HN site [0]. Here's my feed of things I've upvoted [1]. It's the only HN enhancer I use besides hnreplies [2] to get alerted when someone replies to my comments. Congrats on this new feature! Although, I wish it looked orange like HN. Also I wish it was more condensed like HN so I don't have to scroll to see all the first page posts.

[0] https://histre.com/integrations/

[1] https://histre.com/collections/b9fqzrrh/alans-hacker-news-up...

[2] https://hnreplies.com


Thanks for using that feature! Here is more information on that: https://histre.com/features/share-hackernews-upvotes/

Re making the page dense like HN: I got the same feedback from couple of other people too. I can do that.


I like the idea here, but content isn’t getting tagged in quite enough detail to make it useful for me personally (yet). I would LOVE the ability to exclude all Twitter and Tesla posts from my feed, but it seems those posts aren’t often tagged as such.

Thanks for sharing!


Thank you! The set of tags that I pass to gpt to pick currently doesn't have company names, but I can totally see how it would be useful. I'll improve that. I've added #tesla and #twitter for now, but I'll have something more comprehensive soon.


Why is "Tesla cancels Model S and Model X deliveries in Australia and other RHD markets" labelled simply as "news". I would tag Tesla, cancellation, Australia, markets.

"New Directions in Cloud Programming [pdf]" is labelled pdf, programming. Should also include "cloud".

"EU's Cyber Resilience Act contains a poison pill" is tagged "security", should include "legislation", "opensource".


I'm making gpt-3.5 pick from a given list of tags. It will happily create all those tags you suggested, but I thought that if we don't make it pick from a list, then the tags could get too long-tail and as a result, not so useful. What do you think?


I agree - but you should also add some dynamic tags - <company>, <industry> and <country> - for example are very useful.


Great idea, it would be cool to have attributes like that. I'll implement this.


I agree, the list should be somehow curated, like dev.to


You need a way to add new tags to that list. Ask GPT to suggest some new tags not on that list, then you approve those to go on the list?


Great idea, will do!


Also "I am excited to welcome Linda Yaccarino as the new CEO of Twitter" (current front-page) is tagged 'javascript', which isn't particularly relevant.


Would love to know how it decided it was JavaScript-related.


When that page was fetched using the Python requests library, it returned

    <h1>JavaScript is not available.</h1>


Interesting!


I made a very similar product, but you're layout is much more polished.

Using it as personal kb too, i also crawl the top HN pages instead of mirroring it separately. I consider it another source of interesting links.

https://handlr.sapico.me/ Here is HN: http://handlr.sapico.me/Home/ByTag?Name=hn

Good luck, it's a difficult market for this type of product. I still think it's the perfect use-case, but people outside of HN/reddit? don't really "get it".

How old/big is you're dataset and how is your db handling it? I've got lots of data+tags and it tends to get a bit slower ( I've optimized it a bit already, but it still needs some work)

Note : dogfooding it

Ps. I think Gpt 2 for the tags would be decent enough + much cheaper.


Your app looks great! Re data: we just get the front page every few minutes from https://github.com/HackerNews/API The table is almost 10MB right now, so pretty small


What Algo are you using for tags? I'm using nested sets on ms sql fyi

For data, there's a Cron endpoint. So it's just a matter of hitting it to import from external sources.

In the case of hacker news, I'm importing an RSS endpoint. Which is probably only 1 of the 100? Feeds I'm importing over the last 4 years.


Cool, thanks for sharing.

I'm not using any hierarchy in the tags. It is just a many-to-many relationship between stories and tags.


I would prefer to have crowdsourced user-written tags instead of AI. That would let us make genuinely useful tags, like "author is an ignoramus" or "article takes ten pages to get to the point." More of a SponsorBlock for web links.


I like that. In fact I did consider it, but I figured AI tags would be a good starting point as it doesn't have the two-sided marketplace / chicken-and-egg problem. Perhaps there's a way to do that on top of this.


Congrats on making into HN. I like histre, and it has huge potential as a general tagging, and annotation manager for the web. If two way sync can be imagined, for e.g. I use histre extension, and someone else has automatically tagged a link or page, I am visiting, I will find categorization useful when I am reading the page. Histre should be transparent, and yet in front of everything as it can used to annotate a shared history.

Congrats and keep going! :)


Thank you so much Senthil! I really appreciate that. I agree that collective knowledge management has so much potential, yet to be unlocked.


Neat! One suggestion is that the tags are wayyyy off to the right hand side, making them less noticeable when scanning through posts. I think if you moved them underneath the title somewhere, it would make them a lot more "accessible" and likely to be used. That way I'll also see the tags as I'm browsing, which can add context subconsciously.


Thanks for the suggestion, I'll improve the layout. Do you like how it is rendered inside HN when you use the browser extension?


This looks like a very useful tool, but the contrast ratio is so low that I have a hard time looking at it. I don't know if there's a way to measure that for accessibility but I'd appreciate some sort of more contrast-y theme.


Thanks, I'll fix the contrast


I think this tool is super handy. Bookmarking it. I also liked the Histre, but I will suggest to add more information on Histre home page on what it can do? Not getting full picture from home page. All the best!


Thanks for the feedback. I'll improve the home page.


It looks great!

I was wondering if there are any plans to extend this feature to other sections and comments.

Also, I was curious about your plans for expenses if you're using GPT-3.


Thank you! I do plan to extend this, if people are interested.

GPT-3.5 is pretty cheap and I'm able to keep the number of tokens small. https://histre.com/ has other paid features that users might find useful, so it might work out as a freemium offering.


Thank you for launching with Firefox support!


Thanks, I'm glad you appreciate it! It wasn't as straightforward as it could've been as Chrome only accepts Manifest v3 now, and Firefox has some issues with Manifest v3. The extension is simple enough that we transform v3 to v2 for Firefox in the Makefile.


These tags are generated by some kind of classification algorithm? Are you just using the headlines to generate them?


I extract the title, headings (h1,h2,h3), and some meta data from the page content and send that along with the prompt to gpt-3.5 to pick the relevant tags from a set of tags.


So that is how you deal with the length limit? Did you just make up a list of tags?


Yes, I played around with sending first n chars from the web page text etc, but found that sending headings is to pick the tags.

I extracted the list from here as the starting point: https://lobste.rs/tags I spend a lot of time on HN haha, so I was able to expand on that list and I think the current list is pretty comprehensive. I can share the full list if you're interested.


I'd love to see the full list. Also, is there a way to filter posts using the chrome extension to exclude some tags?


Excluding via browser extension is doable. We'd need to:

1. either add a ui element to each tag to let the user exclude, or create a text input perhaps at the bottom of the page where the user can enter the tags they want excluded

2. save the above in local storage

3. after the tags for the page are fetched from the backend at https://gitlab.com/histre/hn-tags/-/blob/main/tags.js#L60 loop over the stories and hide the ones that have any of the tags

PRs welcome :-)

Here is the full list of tags as of now:

  a11y, acquisition, ai, algorithm, android, announce, api, apl, art, assembly, audio, auth, bitcoin, book, browser, c, c++, clojure, cogsci, compiler, compression, compsci, cryptocurrency, cryptography, css, culture, database, debugging, design, devops, distributed, dotnet, drugs, economy, editor, education, elixir, elm, emacs, email, energy, environment, erlang, ethereum, event, exploit, finance, fortran, freebsd, games, geography, golang, graphics, hardware, haskell, health, hiring, historical, interview, investment, ios, ipv6, java, javascript, job, julia, knowledge, kotlin, language, law, layoff, legal, linux, lisp, lua, mac, math, medical, ml, mobile, music, netbsd, networking, news, nix, nodejs, nuclear, openbsd, opensource, osdev, parallel, pdf, performance, perl, person, philosophy, php, physics, plt, politics, practices, privacy, productivity, programming, prolog, psychology, python, release, research, reversing, rss, ruby, rust, scala, scaling, science, security, shell, show, slides, space, startup, swift, systemd, tesla, testing, transcript, twitter, unix, video, vim, virtualization, visualization, wasm, web, webapp, windows, zig


How do you decide which tags get in included in the ontology?


I extracted tags from here https://lobste.rs/tags and expanded on that just from having spent a lot of time on HN


The “ml” tag on lobste.rs is about SML and OCaml, not machine learning (which is under “ai”), while here it seems to be about both. I find the original lobste.rs classification more useful.


I agree 100%. Since I'm getting gpt-3.5 to apply the tags, it happened to choose ML for Machine Learning. But I can fix this by adding more context to the prompt.


Sadly the “ml” tag is applied both to posts about machine learning and to posts about the programming language. I’m interested in the latter, but not the former.


Great point, I'll fix this. I'll ask for disambiguation in the prompt passed to gpt. I'd have expected this to work, but it needs to be better: https://histre.com/hn/?tags=+ml-ai


This seems like the subject of topic modelling. Is anyone working on that? If so message me.


Great. I miss the tags a huge lot.


Too bad the font is unreadbly small on mobile.


Ah! Thank you for the feedback, I'll fix this soon




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: