I extract the title, headings (h1,h2,h3), and some meta data from the page content and send that along with the prompt to gpt-3.5 to pick the relevant tags from a set of tags.
Yes, I played around with sending first n chars from the web page text etc, but found that sending headings is to pick the tags.
I extracted the list from here as the starting point: https://lobste.rs/tags I spend a lot of time on HN haha, so I was able to expand on that list and I think the current list is pretty comprehensive. I can share the full list if you're interested.
Excluding via browser extension is doable. We'd need to:
1. either add a ui element to each tag to let the user exclude, or create a text input perhaps at the bottom of the page where the user can enter the tags they want excluded