Hi, I think discovering blogs and articles by other engineers is quite hard. Particularly non-commercial content.
I made blogosphere last Summer in order to improve discovery. It’s seeded with personal blogs that have at some point appeared on the HN front-page.
The intent of the site is to recommend blogs and articles to you, with the objective that it’ll get better the more you use it.
The site uses a co-occurrence matrix to recommend articles and bloggers to you based on the articles you ‘star’ and topics you follow. If you’re not logged in, the site will show you popular articles. It doesn’t track what you read, and doesn’t have google analytics. I have some ideas for improving recommendations (i.e. people like you also like.., and improving the auto tagging of articles), but I thought I’d get feedback on whether people want this first.
The site is at an MVP stage, there will be bugs and missing features, but feedback on whether this is interesting to you would be useful.
Wow, this is brilliant. I know my way around a computer and have programmed before, well enough to understand the discussions around here, but I am decidely not a "tech person" by any means. I think this concept would be really great if expanded to fields outside tech.
> it’s seeded with personal blogs that have at some point appeared on the HN front-page.
also this https://hnblogs.substack.com/ newsletter (than I manage) send you everyday with non commercial blogpost of HN of the day, if that can help seeds your aglo
I started with the contents of my own RSS feed, and then crawled Hacker News for popular blogs, and then crawled the feeds of those blogs (in order to get posts that didn't get upvoted on HN).
The time consuming part was manually auditing blogs for quality (and to ensure it's a personal blog, not e.g. a BBC feed). I came up with some heuristics (i.e. blogs with github.io subdomains that got upvoted were often good so I blanket approved those) to speed things along, but it did take a while.
There are various recommendation approaches I'd like to try. If the site had more users (beyond the ~4 friends I showed it to) I'd like to try out a collaborative filtering approach (https://en.wikipedia.org/wiki/Collaborative_filtering).
Looks good. I wonder if there'd be any interest in combining with my side project, which is a search engine for user-submitted personal and independent websites (and also search as a service). I could do with a nicer user interface and more structured data like yours, and perhaps you could benefit from features like user-submitted sites and scheduled (plus on-demand for verified sites) reindexing. Details in my profile, with links to the (open source) code on the site.
Good stuff here. I have tried to build something similar before but nothing came out of it.
1. Repeated content.
I don't want to read what I have already read from a different place. I built a simple LSH based filter. I experimented with few ways to sort out text and process it. It worked.
2. Filter controversy.
I tried Bayesian fitler initially and moved to logistics regression using tf-idf. I settled on Bayesian because my dataset became very expansive. I used news-site corpus and manual entries from reddit/HN. I used sentiment analysis using a dictionary but it worked only in very specific cases. I do like some controversial and pessimistic content.
3. Filter clickbaits.
I couldn't filter the clickbait and gave up. There are ton of clickbaits on HN which I loved after I read them but ton of them are terrible and a huge waste of time. No reliable way to distinguish based on an article too. Length is not a good feature, negativity is not a good feature (I like to read strong opinions from say, founder of an open source analytics company criticising a big company for malpractices and how they fix those), sentence complexity is not good feature, and ton more.
4. Relying on user input is bad.
I read ton of nonsense everyday that I could go without knowing. I click on those links and that is a not me saying you should show me more of that. I don't want to do manual work of training something either. It's friction and I don't like it.
Thanks for your feedback. I appreciate the tip on the LSH filter. One thing that has helped so far is restricting the content manually and with heuristics, as it prevents clickbait/controversial/malicious content. If I can implement collaborative filtering in future, I'll need to think about how to weight 'stars' from users to prevent malicious behaviour and improve recommendations, but I am getting ahead of myself :)
Some time ago this discussion happened, someone suggested a search engine. It was not much of a search engine but random site finder, but the database only comprised of user submitted sites that were expressly approved.
I found random posts from people's garages and amazing projects, one of them being a dad's nanosecond counting clocks to prove time dilation to his kids over the weekend.
I was pretty sure I bookmarked it, but I don't find it anywhere now.
Random site finder sounds like stumbleupon, which was my go to source for wasting time 15 years ago. You could visit random pages for hours on end and every single one would be fascinating. You could set up an account and choose categories if you wanted or you could just do it without an account and see what it gave you.
Web rings are another thing that comes to mind. They were super popular in the 90s and often very niche. A site joined a ring and placed a widget on their page and you could click it to jump from site to random site in the same ring.
I think about these things and wonder why there seems to be fewer tools these days for that kind of random discovery. Then again, maybe I'm just nostalgic for a time in my life in which I could spend more time wasting time at a computer.
There are tools for discovery and they are better than ever. Think wilby.me and awesome-* lists.
But there is so much noise, so very very much noise and utter crap out there that it is terribly hard to find what we are looking for.
I sympathise with you, I used to be proud in my Google-fu, saying if I came across something on the internet, I can find it again, but that's just not true anymore. It's not finding needle in a haystack anymore, now its finding needle in Mount Craperest.
Some quick feedback on the mobile layout, there isn't a lot of the actual blog post text visible due to the margins. I would suggest allowing the url and text to span the full width of the page. Also I would increase the margin between the title and the icon and star button, and potentially remove the word "star" to give more breathing room. It might also be nice to make the sidebar buttons an overlay to give more width as well.
Millionshort.com allows you to leave out the top x sites (100/1000/100000/1000000) from a search engine. Often it will give me some personal blogs in the results. Also nice for finding random information about some topics
Great tool, Thanks! After 15 minutes trying a few obscure searches, great -not- seeing them dominated by familiar sources. (Deeper research does -not- need endless repeats of introductory stuff!)
One thing I thought would be cool would be an API for similar blogs or articles, pretty much opening up an internal part of blogosphere. I think one could use that to provide a webring.
I love your site Bill. It’s a great idea, I use HN as a source, but the submission rate is so high that I often miss great posts.
Could I suggest my blog as a possible addition please? Markgreville.ie
Sorry, I haven't added a way for users to add blogs yet. If you could send me a link to an RSS/Atom feed I'd be happy to add more blogs. Auditing blogs for quality is quite time consuming so it'd be a good feature to add. Feel free to DM me (link in bio) if you'd prefer to send me a link privately.
I made blogosphere last Summer in order to improve discovery. It’s seeded with personal blogs that have at some point appeared on the HN front-page.
The intent of the site is to recommend blogs and articles to you, with the objective that it’ll get better the more you use it.
The site uses a co-occurrence matrix to recommend articles and bloggers to you based on the articles you ‘star’ and topics you follow. If you’re not logged in, the site will show you popular articles. It doesn’t track what you read, and doesn’t have google analytics. I have some ideas for improving recommendations (i.e. people like you also like.., and improving the auto tagging of articles), but I thought I’d get feedback on whether people want this first.
The site is at an MVP stage, there will be bugs and missing features, but feedback on whether this is interesting to you would be useful.