Show HN: Hacker News with Tags

kirubakaran · on May 12, 2023

Some cool uses for this:

- Remove AI and Crypto from HN https://histre.com/hn/?tags=+all-ai-cryptocurrency

- Don't miss any Lisp news https://histre.com/hn/?tags=+emacs+lisp

rhtgrg · on May 12, 2023

It seems pretty remarkable to me that despite how relevant and ubiquitous AI is becoming (e.g. generative search is coming to Google soon, and from their latest I/O conf it's clearly a big part of nearly all other G products), Hacker News discussions on the topic are of such low quality that it's desirable to block them out entirely. Any sufficiently advanced technology discussion is indistinguishable from tragic.

hxugufjfjf · on May 12, 2023

Maybe its the quantity, not the quality that makes it desireable to filter it out. I have examples where I’ve filtered highly actual news just because its so taxing to be constantly reminded of it.

dormento · on May 12, 2023

It's just like the goddamn metaverse.

Since it is strongly tied into crypto shenanigans, now that we're in a crypto slump the discussions have (mercifully) slowed down. AI is still in its current hype cycle, and it will take some time for the next "winter".

Yiin · on May 13, 2023

slump hype-wise maybe, but market is doing surprisingly well.

welder · on May 12, 2023

I already use your HN integration to share my upvotes with friends and see their upvotes inline on the HN site [0]. Here's my feed of things I've upvoted [1]. It's the only HN enhancer I use besides hnreplies [2] to get alerted when someone replies to my comments. Congrats on this new feature! Although, I wish it looked orange like HN. Also I wish it was more condensed like HN so I don't have to scroll to see all the first page posts.

[0] https://histre.com/integrations/

[1] https://histre.com/collections/b9fqzrrh/alans-hacker-news-up...

[2] https://hnreplies.com

kirubakaran · on May 12, 2023

Thanks for using that feature! Here is more information on that: https://histre.com/features/share-hackernews-upvotes/

Re making the page dense like HN: I got the same feedback from couple of other people too. I can do that.

escapedmoose · on May 12, 2023

I like the idea here, but content isn’t getting tagged in quite enough detail to make it useful for me personally (yet). I would LOVE the ability to exclude all Twitter and Tesla posts from my feed, but it seems those posts aren’t often tagged as such.

Thanks for sharing!

kirubakaran · on May 12, 2023

Thank you! The set of tags that I pass to gpt to pick currently doesn't have company names, but I can totally see how it would be useful. I'll improve that. I've added #tesla and #twitter for now, but I'll have something more comprehensive soon.

qwerty456127 · on May 12, 2023

Why is "Tesla cancels Model S and Model X deliveries in Australia and other RHD markets" labelled simply as "news". I would tag Tesla, cancellation, Australia, markets.

"New Directions in Cloud Programming [pdf]" is labelled pdf, programming. Should also include "cloud".

"EU's Cyber Resilience Act contains a poison pill" is tagged "security", should include "legislation", "opensource".

kirubakaran · on May 12, 2023

I'm making gpt-3.5 pick from a given list of tags. It will happily create all those tags you suggested, but I thought that if we don't make it pick from a list, then the tags could get too long-tail and as a result, not so useful. What do you think?

8ytecoder · on May 12, 2023

I agree - but you should also add some dynamic tags - <company>, <industry> and <country> - for example are very useful.

kirubakaran · on May 12, 2023

Great idea, it would be cool to have attributes like that. I'll implement this.

croisillon · on May 12, 2023

I agree, the list should be somehow curated, like dev.to

welder · on May 12, 2023

You need a way to add new tags to that list. Ask GPT to suggest some new tags not on that list, then you approve those to go on the list?

kirubakaran · on May 12, 2023

Great idea, will do!

giraj · on May 12, 2023

Also "I am excited to welcome Linda Yaccarino as the new CEO of Twitter" (current front-page) is tagged 'javascript', which isn't particularly relevant.

hammyhavoc · on May 13, 2023

Would love to know how it decided it was JavaScript-related.

kirubakaran · on May 13, 2023

When that page was fetched using the Python requests library, it returned

    <h1>JavaScript is not available.</h1>

hammyhavoc · on May 13, 2023

Interesting!

NicoJuicy · on May 13, 2023

I made a very similar product, but you're layout is much more polished.

Using it as personal kb too, i also crawl the top HN pages instead of mirroring it separately. I consider it another source of interesting links.

https://handlr.sapico.me/ Here is HN: http://handlr.sapico.me/Home/ByTag?Name=hn

Good luck, it's a difficult market for this type of product. I still think it's the perfect use-case, but people outside of HN/reddit? don't really "get it".

How old/big is you're dataset and how is your db handling it? I've got lots of data+tags and it tends to get a bit slower ( I've optimized it a bit already, but it still needs some work)

Note : dogfooding it

Ps. I think Gpt 2 for the tags would be decent enough + much cheaper.

kirubakaran · on May 13, 2023

Your app looks great! Re data: we just get the front page every few minutes from https://github.com/HackerNews/API The table is almost 10MB right now, so pretty small

NicoJuicy · on May 14, 2023

What Algo are you using for tags? I'm using nested sets on ms sql fyi

For data, there's a Cron endpoint. So it's just a matter of hitting it to import from external sources.

In the case of hacker news, I'm importing an RSS endpoint. Which is probably only 1 of the 100? Feeds I'm importing over the last 4 years.

kirubakaran · on May 14, 2023

Cool, thanks for sharing.

I'm not using any hierarchy in the tags. It is just a many-to-many relationship between stories and tags.

causality0 · on May 12, 2023

I would prefer to have crowdsourced user-written tags instead of AI. That would let us make genuinely useful tags, like "author is an ignoramus" or "article takes ten pages to get to the point." More of a SponsorBlock for web links.

kirubakaran · on May 12, 2023

I like that. In fact I did consider it, but I figured AI tags would be a good starting point as it doesn't have the two-sided marketplace / chicken-and-egg problem. Perhaps there's a way to do that on top of this.

orsenthil · on May 12, 2023

Congrats on making into HN. I like histre, and it has huge potential as a general tagging, and annotation manager for the web. If two way sync can be imagined, for e.g. I use histre extension, and someone else has automatically tagged a link or page, I am visiting, I will find categorization useful when I am reading the page. Histre should be transparent, and yet in front of everything as it can used to annotate a shared history.

Congrats and keep going! :)

kirubakaran · on May 12, 2023

Thank you so much Senthil! I really appreciate that. I agree that collective knowledge management has so much potential, yet to be unlocked.

phailhaus · on May 12, 2023

Neat! One suggestion is that the tags are wayyyy off to the right hand side, making them less noticeable when scanning through posts. I think if you moved them underneath the title somewhere, it would make them a lot more "accessible" and likely to be used. That way I'll also see the tags as I'm browsing, which can add context subconsciously.

kirubakaran · on May 12, 2023

Thanks for the suggestion, I'll improve the layout. Do you like how it is rendered inside HN when you use the browser extension?

etra0 · on May 12, 2023

This looks like a very useful tool, but the contrast ratio is so low that I have a hard time looking at it. I don't know if there's a way to measure that for accessibility but I'd appreciate some sort of more contrast-y theme.

kirubakaran · on May 12, 2023

Thanks, I'll fix the contrast

zerop · on May 12, 2023

I think this tool is super handy. Bookmarking it. I also liked the Histre, but I will suggest to add more information on Histre home page on what it can do? Not getting full picture from home page. All the best!

kirubakaran · on May 12, 2023

Thanks for the feedback. I'll improve the home page.

hackerloom · on May 12, 2023

It looks great!

I was wondering if there are any plans to extend this feature to other sections and comments.

Also, I was curious about your plans for expenses if you're using GPT-3.

kirubakaran · on May 12, 2023

Thank you! I do plan to extend this, if people are interested.

GPT-3.5 is pretty cheap and I'm able to keep the number of tokens small. https://histre.com/ has other paid features that users might find useful, so it might work out as a freemium offering.

AlchemistCamp · on May 12, 2023

Thank you for launching with Firefox support!

kirubakaran · on May 12, 2023

Thanks, I'm glad you appreciate it! It wasn't as straightforward as it could've been as Chrome only accepts Manifest v3 now, and Firefox has some issues with Manifest v3. The extension is simple enough that we transform v3 to v2 for Firefox in the Makefile.

PaulHoule · on May 12, 2023

These tags are generated by some kind of classification algorithm? Are you just using the headlines to generate them?

kirubakaran · on May 12, 2023

I extract the title, headings (h1,h2,h3), and some meta data from the page content and send that along with the prompt to gpt-3.5 to pick the relevant tags from a set of tags.

PaulHoule · on May 12, 2023

So that is how you deal with the length limit? Did you just make up a list of tags?

kirubakaran · on May 12, 2023

Yes, I played around with sending first n chars from the web page text etc, but found that sending headings is to pick the tags.

I extracted the list from here as the starting point: https://lobste.rs/tags I spend a lot of time on HN haha, so I was able to expand on that list and I think the current list is pretty comprehensive. I can share the full list if you're interested.

suvasco · on May 13, 2023

I'd love to see the full list. Also, is there a way to filter posts using the chrome extension to exclude some tags?

kirubakaran · on May 13, 2023

Excluding via browser extension is doable. We'd need to:

1. either add a ui element to each tag to let the user exclude, or create a text input perhaps at the bottom of the page where the user can enter the tags they want excluded

2. save the above in local storage

3. after the tags for the page are fetched from the backend at https://gitlab.com/histre/hn-tags/-/blob/main/tags.js#L60 loop over the stories and hide the ones that have any of the tags

PRs welcome :-)

Here is the full list of tags as of now:

  a11y, acquisition, ai, algorithm, android, announce, api, apl, art, assembly, audio, auth, bitcoin, book, browser, c, c++, clojure, cogsci, compiler, compression, compsci, cryptocurrency, cryptography, css, culture, database, debugging, design, devops, distributed, dotnet, drugs, economy, editor, education, elixir, elm, emacs, email, energy, environment, erlang, ethereum, event, exploit, finance, fortran, freebsd, games, geography, golang, graphics, hardware, haskell, health, hiring, historical, interview, investment, ios, ipv6, java, javascript, job, julia, knowledge, kotlin, language, law, layoff, legal, linux, lisp, lua, mac, math, medical, ml, mobile, music, netbsd, networking, news, nix, nodejs, nuclear, openbsd, opensource, osdev, parallel, pdf, performance, perl, person, philosophy, php, physics, plt, politics, practices, privacy, productivity, programming, prolog, psychology, python, release, research, reversing, rss, ruby, rust, scala, scaling, science, security, shell, show, slides, space, startup, swift, systemd, tesla, testing, transcript, twitter, unix, video, vim, virtualization, visualization, wasm, web, webapp, windows, zig

smilliken · on May 12, 2023

How do you decide which tags get in included in the ontology?

kirubakaran · on May 12, 2023

I extracted tags from here https://lobste.rs/tags and expanded on that just from having spent a lot of time on HN

GrumpySloth · on May 12, 2023

The “ml” tag on lobste.rs is about SML and OCaml, not machine learning (which is under “ai”), while here it seems to be about both. I find the original lobste.rs classification more useful.

kirubakaran · on May 12, 2023

I agree 100%. Since I'm getting gpt-3.5 to apply the tags, it happened to choose ML for Machine Learning. But I can fix this by adding more context to the prompt.

GrumpySloth · on May 12, 2023

Sadly the “ml” tag is applied both to posts about machine learning and to posts about the programming language. I’m interested in the latter, but not the former.

kirubakaran · on May 12, 2023

Great point, I'll fix this. I'll ask for disambiguation in the prompt passed to gpt. I'd have expected this to work, but it needs to be better: https://histre.com/hn/?tags=+ml-ai

bluepoint · on May 12, 2023

This seems like the subject of topic modelling. Is anyone working on that? If so message me.

qwerty456127 · on May 12, 2023

Great. I miss the tags a huge lot.

classified · on May 12, 2023

Too bad the font is unreadbly small on mobile.

kirubakaran · on May 12, 2023

Ah! Thank you for the feedback, I'll fix this soon