Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Blogosphere – Discover independent technical blogs (bilbof.com)
200 points by BillFranklin on April 3, 2021 | hide | past | favorite | 48 comments



Hi, I think discovering blogs and articles by other engineers is quite hard. Particularly non-commercial content.

I made blogosphere last Summer in order to improve discovery. It’s seeded with personal blogs that have at some point appeared on the HN front-page.

The intent of the site is to recommend blogs and articles to you, with the objective that it’ll get better the more you use it.

The site uses a co-occurrence matrix to recommend articles and bloggers to you based on the articles you ‘star’ and topics you follow. If you’re not logged in, the site will show you popular articles. It doesn’t track what you read, and doesn’t have google analytics. I have some ideas for improving recommendations (i.e. people like you also like.., and improving the auto tagging of articles), but I thought I’d get feedback on whether people want this first.

The site is at an MVP stage, there will be bugs and missing features, but feedback on whether this is interesting to you would be useful.


> It doesn’t track what you read, and doesn’t have google analytics.

Just wanted to say thanks for minimizing tracking of users.


Wow, this is brilliant. I know my way around a computer and have programmed before, well enough to understand the discussions around here, but I am decidely not a "tech person" by any means. I think this concept would be really great if expanded to fields outside tech.


https://findka.com/ is a similar service

> it’s seeded with personal blogs that have at some point appeared on the HN front-page.

also this https://hnblogs.substack.com/ newsletter (than I manage) send you everyday with non commercial blogpost of HN of the day, if that can help seeds your aglo


I wonder, do you handpick every blog post or did you write some sort of a crawler?


I started with the contents of my own RSS feed, and then crawled Hacker News for popular blogs, and then crawled the feeds of those blogs (in order to get posts that didn't get upvoted on HN).

The time consuming part was manually auditing blogs for quality (and to ensure it's a personal blog, not e.g. a BBC feed). I came up with some heuristics (i.e. blogs with github.io subdomains that got upvoted were often good so I blanket approved those) to speed things along, but it did take a while.


This would make for an interesting... technical blog post.


Have you tried to apply some ML stuff?


Back in August I had a look at using Latent Dirichlet Allocation (http://blog.echen.me/2011/08/22/introduction-to-latent-diric...) for improving how I auto-tag articles. It looked promising but I haven't deployed it yet.

There are various recommendation approaches I'd like to try. If the site had more users (beyond the ~4 friends I showed it to) I'd like to try out a collaborative filtering approach (https://en.wikipedia.org/wiki/Collaborative_filtering).


I took a course about bayesian stats and ML in college. We covered LDA. It's pretty awesome! I hope you can get it out and that it goes well.


Looks good. I wonder if there'd be any interest in combining with my side project, which is a search engine for user-submitted personal and independent websites (and also search as a service). I could do with a nicer user interface and more structured data like yours, and perhaps you could benefit from features like user-submitted sites and scheduled (plus on-demand for verified sites) reindexing. Details in my profile, with links to the (open source) code on the site.


Nice work, I really like this idea!


Good stuff here. I have tried to build something similar before but nothing came out of it.

1. Repeated content.

I don't want to read what I have already read from a different place. I built a simple LSH based filter. I experimented with few ways to sort out text and process it. It worked.

2. Filter controversy.

I tried Bayesian fitler initially and moved to logistics regression using tf-idf. I settled on Bayesian because my dataset became very expansive. I used news-site corpus and manual entries from reddit/HN. I used sentiment analysis using a dictionary but it worked only in very specific cases. I do like some controversial and pessimistic content.

3. Filter clickbaits.

I couldn't filter the clickbait and gave up. There are ton of clickbaits on HN which I loved after I read them but ton of them are terrible and a huge waste of time. No reliable way to distinguish based on an article too. Length is not a good feature, negativity is not a good feature (I like to read strong opinions from say, founder of an open source analytics company criticising a big company for malpractices and how they fix those), sentence complexity is not good feature, and ton more.

4. Relying on user input is bad.

I read ton of nonsense everyday that I could go without knowing. I click on those links and that is a not me saying you should show me more of that. I don't want to do manual work of training something either. It's friction and I don't like it.

anyway, good luck on your site!


Thanks for your feedback. I appreciate the tip on the LSH filter. One thing that has helped so far is restricting the content manually and with heuristics, as it prevents clickbait/controversial/malicious content. If I can implement collaborative filtering in future, I'll need to think about how to weight 'stars' from users to prevent malicious behaviour and improve recommendations, but I am getting ahead of myself :)


Some time ago this discussion happened, someone suggested a search engine. It was not much of a search engine but random site finder, but the database only comprised of user submitted sites that were expressly approved.

I found random posts from people's garages and amazing projects, one of them being a dad's nanosecond counting clocks to prove time dilation to his kids over the weekend.

I was pretty sure I bookmarked it, but I don't find it anywhere now.


Random site finder sounds like stumbleupon, which was my go to source for wasting time 15 years ago. You could visit random pages for hours on end and every single one would be fascinating. You could set up an account and choose categories if you wanted or you could just do it without an account and see what it gave you.

Web rings are another thing that comes to mind. They were super popular in the 90s and often very niche. A site joined a ring and placed a widget on their page and you could click it to jump from site to random site in the same ring.

I think about these things and wonder why there seems to be fewer tools these days for that kind of random discovery. Then again, maybe I'm just nostalgic for a time in my life in which I could spend more time wasting time at a computer.


There are tools for discovery and they are better than ever. Think wilby.me and awesome-* lists.

But there is so much noise, so very very much noise and utter crap out there that it is terribly hard to find what we are looking for.

I sympathise with you, I used to be proud in my Google-fu, saying if I came across something on the internet, I can find it again, but that's just not true anymore. It's not finding needle in a haystack anymore, now its finding needle in Mount Craperest.


There was recently this discussion: https://news.ycombinator.com/item?id=26506126 (in particular https://news.ycombinator.com/item?id=26507503 and comments)

Another recent one that came up is https://news.ycombinator.com/item?id=26618000

Is the one you look for listed in one of the threads?


Thank you! Found it within first link you shared!

It was wilby.me

This was the time dilation post : http://www.leapsecond.com/great2005/tour/


I'm glad it helped.



Found it in discussion linked in sibling comment:

Search engine : https://wiby.me/

Time dilation post : http://www.leapsecond.com/great2005/tour/


blogsurf.io ?


Found it in discussion linked in sibling comment:

Search engine : https://wiby.me/

Time dilation post : http://www.leapsecond.com/great2005/tour/


No, this wasn't it. My google-fu is also failing me.


I just started using it to find articles , and after reading a few , I already love it !

Thanks a lot for making this.


Some quick feedback on the mobile layout, there isn't a lot of the actual blog post text visible due to the margins. I would suggest allowing the url and text to span the full width of the page. Also I would increase the margin between the title and the icon and star button, and potentially remove the word "star" to give more breathing room. It might also be nice to make the sidebar buttons an overlay to give more width as well.


Someone please reboot the webring idea, include a search feature and kill google!

Something I'd love: a search engine for personal websites.


Millionshort.com allows you to leave out the top x sites (100/1000/100000/1000000) from a search engine. Often it will give me some personal blogs in the results. Also nice for finding random information about some topics


Great tool, Thanks! After 15 minutes trying a few obscure searches, great -not- seeing them dominated by familiar sources. (Deeper research does -not- need endless repeats of introductory stuff!)


Drew DeVault's blog uses openring to add a webring to his articles:

https://git.sr.ht/~sircmpwn/openring https://drewdevault.com/


I had a crack at that with the blog directory (https://bilbof.com/blogosphere/blogs), but it's pretty limited!

One thing I thought would be cool would be an API for similar blogs or articles, pretty much opening up an internal part of blogosphere. I think one could use that to provide a webring.


Refreshingly MVP broken design. Love it (non sarcastic, shows a focus on the value add).

But I can’t create an account on mobile (iPhone Pro Max). :(


Thanks for taking a look, sorry the login on your mobile isn't working, I've added that to my bug tracker.

My focus so far has been making sure recommendations are good enough, but it would be nice if one could use the site on a smartphone.


It's like a niche Technorati.


I love your site Bill. It’s a great idea, I use HN as a source, but the submission rate is so high that I often miss great posts. Could I suggest my blog as a possible addition please? Markgreville.ie


I have my favorites here https://bobbydreamer.com/irevere


I tried it. Already benefitted. Thanks for putting this together. Let us know how we can contribute, if any, to the features/code.

Thanks again.


I'd like to see a RSS page for the Home and Popular feeds since I've moved all my news sources to my RSS reader.


It does not contain my blog :P


Hi Adnan! Sorry for the delay, I've added your blog: https://bilbof.com/blogosphere/blogs/33759.


Thanks, Sir! :)


How do you add a blog?


Sorry, I haven't added a way for users to add blogs yet. If you could send me a link to an RSS/Atom feed I'd be happy to add more blogs. Auditing blogs for quality is quite time consuming so it'd be a good feature to add. Feel free to DM me (link in bio) if you'd prefer to send me a link privately.



Love it, great work!

One recommendation I have is Jacob Kaplan-Moss' blog: https://jacobian.org/index.xml


Thanks, I've added Jacob's blog! https://bilbof.com/blogosphere/blogs/33758


Thank you for this, I'll proceed to add them to my RSS feed




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: