Being a long time reader of HN I'm sometimes frustrated that top HN content is often coming from popular websites or talk too much about the same thing (Zoom security issues, Facebook leaks, ...) I wanted to understand if this was possible to get only the 'original content' posts of HN (from the not-as-popular blogs and sites). Because to me this is the most interesting part of HN.
So I started analyzing HN new posts, and made a few discoveries :
- Some HN users post a lot of content, and post several links in a row, which will push away your post if they publish just after you
- You get a 30 min time window at busy hours (and 1h time window at non busy hours) between the time a new link is posted, and the link disappear from the 1st page of new links
- There is a second-chance pool for good stuff if a moderator detects it
As a result HN is overflowed with not really useful content and it is not always easy for original content to be noticed (Even if I think HN is doing a very good job compared to any other link aggregator that you can find)
So I tried to built a tool to filter out things like news websites, words that I want to blacklist, and users whose posts haven't been relevant to me.
That way I'm able to remove almost 80% of content and I can go through the list of all the links of the day before going to bed
For those who are interested, this is just a cron job querying the HN API every 3 minutes inserting the new links into a db, and a web server rendering the last 500 links.
You can see more on how the filters works here: http://hn.luap.info/about and you can also understand which links have been filtered here: http://hn.luap.info/links_flagged
i absolutely love that you have more than 30 articles per page.
"Enough Machine Learning to Make Hacker News Readable Again"
Found it! (this was made by hn user
I think I used it for "737" and "Boeing" for a while.
I can't tell the difference between most removed things and the things left alone - either in terms of quality or thematically.
Having a page that tries to summarize the workings of the filter on the inputs is pretty great though - more people who propose alternative rankings/filters should think of ways to do that.
It's definitely idiosyncratic, as one would expect. A more interesting question is 'does it produce interesting results'. To my eyes and tastes, not really. The filter easily misses piles of the sort of 'news' it is trying to avoid and the quality of the rest of what passes doesn't appear to be any better (to put it mildly) than the HN front page.
Mercilessly culling even slightly frequent submitters (this includes people who, say, mis-posted something and then quickly made another post to correct the problem) is a pretty fun idea though, I wonder what you'd end up if you applied this iteratively over a long period of time.
My goal is more to 'compete with' /newest in the sense that I don't think it is easily possible to get the best content publish directly out of an algorithm. If you have some ideas I would be interested to test them
Maybe as you said it doesn't produce interesting results, but I have the motivation to go through the full list every night and I always found some interesting content, whereas I never had the motivation to go through several pages of /newest
Oh! That makes an awful lot of sense, thanks. I wonder if you'd have got less confused feedback if you'd described it like that initially, I think a lot of the commentators (including me) somewhat misunderstood what you're trying to do.
> www.apple.com/macbook-pro-13/hn linktga
I feel like that counts as news/not original content/a popular site
That is peculiar.
The complexity of the task is that if you don't want to miss ANY quality content you will end up filtering almost nothing. I took the risk to miss few good content if that reduce the number of links to go through overall. But this is not an optimum
> You get a 30 min time window at busy hours (and 1h time window at non busy hours) between the time a new link is posted, and the link disappear from the 1st page of new links
Should /newest list the last-N-hours of new stories, instead of the 30 newest stories?
- all new links (default)
- links from not frequent domains (less than 2 times this domain in the last 2 days)
- links from not frequent posters (less than 2 posts in the last 2 days)
- 1 link max per user
A list of the last N hours of stories would be good, but a bit overwhelming.
I'd kill for a general form of that that worked uniformly on sites like HN, reddit, etc., and perhaps random forums, comment sections, and so on. The new interfaces are nice in many ways, but that was a true killer feature, and it's pretty much lost.
The goal would be to focus on small and light websites which is what I like the most. I doubt it would work effectively though.
Thanks for making this!
It's an online RSS reader that i built, and one of the features might be of interest to you: It automatically aggregates news articles to items in your RSS feeds.
That means that for most articles in your feeds, you have the original item from the website you subscribed to, but you also have a list of articles from different sources talking about the same story.
A nice side effect is that it can help avoid filter bubbles by giving more context to the stories you read.
He posted more than 2 two links in the last 1 hour
He posted more than 5 links in the last 5 days
He has posted more than 5 links in the last 30 days and among the posts he posted 30% were flagged
First, not everyone here is male. I'm a woman and a demographic outlier in other ways. If you want stuff that's "different," in theory, you are looking for people like me and your criteria would probably flag me plus your implicit assumption that everyone here is male de facto reinforces the very thing you say you want to combat: Homogeneity.
I don't post links daily anymore. I did at one time when I was homeless and trying to find 2-4 good stories to post daily was my cheap hobby because it amused me to try to make it to the leader board while I was a homeless woman and it was a hobby within my budget. I made it to the leader board under my old handle about a month after I got back into housing and then I changed handles cuz reasons.
When I do post links, I tend to post a few links within about an hour because I'm checking the news as part of my daily routine and if I see anything interesting, the odds are good that's when I will see it. And I do that in part because I am a demographic outlier and I have a pretty terrible track record of trying to predict ahead of time what will fly on HN and what won't.
So I try to look for a certain level of quality and that's about it. I really, really suck at trying to predict what HN wants to read.
I also post a lot of my own stuff, which ironically gets me flak at times. Some people complain that the only thing I post is my own writing, which isn't actually true. So that kind of feedback makes me feel like I "should" be posting a certain amount of stuff not by me in order to be acceptable to the community. Though, in practice, as my life gets busier, I simply fail to post as many articles to HN because I simply don't have the time to do that.
But some people are interested in some of the things I write and some of what I write does well and makes it to the front page. Among other things, I still write about homelessness and some people here are actually interested in my perspective on that topic. So I do continue to post my stuff here and let it sink or swim based on votes because I suck at predicting what will do well.
I'm not asking or even suggesting you change your process in some way. I'm just telling you what I see from my perspective and I'm doing that because I'm an indie writer who takes Patreon and tips to support my work. Most of my sites have no ads on them and I handle things the way I do so I can give a fresh perspective on topics.
I post my own blog writing because other people almost never post my stuff. That's extremely rare and my stuff would never see the light of day if I didn't post it myself.
So to my ear it sounds kind of like you are looking for people like me and your formula for flagging stuff probably already has me flagged. Which you may be perfectly happy with. You may know who I am and you may be reading this going "Good! You are one of the people I'm tired of hearing from!"
You do with that feedback whatever the heck you want. I don't need a reply or an explanation or a justification. I don't care.
Have a good evening.
(and my previous comment intentionally committed the same mistake)
Either give objective change requests or try to explain your point. ("your" as in an arbitrary reader, not you jgwil2)
While we are typing English, many places still do not mind using a default gender in their native tongue, and a non-native speaker can end up writing like that when writing in English. Heck, even English (the language) natives haven't fully settled in using "they" or which of the other variations the grand-uncle to this post mentioned.
I have no clue about the original poster's nationality, but I consider it rude to presume people are using this kind of language maliciously, especially on the Internet where you can't know what is acceptable in someone's culture.
Alternatives are "they", awkward constructs like "he or she", or explicitly naming the subject every time it is mentioned "the user".