Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: OpenOrb, a curated search engine for Atom and RSS feeds (idiot.sh)
259 points by lowercasename 7 months ago | hide | past | favorite | 56 comments
Alternative search engines are neat, as are RSS feeds. OpenOrb is a self-hosted app which allows visitors to search over a list of blogs you love. If you put your 10 favourite blogs in there, it'll search just those blogs and not show you any sponsored content or machine-generated garbage (unless... you follow blogs written by machines?)

Personal RSS feed readers can usually do this sort of thing, but RSS readers aren’t meant to be shared, so you can think of the search engine as a 'curated feed list as a public service'.

I wrote a longer blog post about OpenOrb here: https://raphael.computer/blog/openorb-curated-search-engine/




I really like the idea! At some point I put up a miniflux instance and it has surprisingly been a breath of fresh air for my content consumption. What miniflux and my setup lacks is a way to retrieve stuff I read and this OpenOrb might fit the use case... I will try it out!


https://github.com/miniflux/v2 in case anyone else was also wondering


What do you mean by "retrieve stuff I read"?


sometimes I stumble into stuff I’m sure I’ve already read something about in an article but if I didn’t bookmarked or made a note of it it’s very hard to find again (miniflux flushes content after a certain age)


You can set the age to arbitrary points in the past, if storage isn't a concern. I've actually found miniflux's search feature fairly solid for dredging up old stuff I've read!


I've set the following in Miniflux to stop it deleting things:

CLEANUP_ARCHIVE_READ_DAYS=-1 CLEANUP_ARCHIVE_UNREAD_DAYS=-1


Ah, I see, thanks for the explanation! I'm asking because I'm working on a similar project, which allows to both search and save blog posts permanently.


Speaking for me personally, I've always felt "search my history" should be implemented in the browser, not as an external tool. "Search and save blog posts" seems like a subset of the real problem.


but does this filter out the rss feeds that are just a headline and then a "click here to read the whole story" link?

that's what killed rss it wasn't google reader going away, it was the ad-weaponizing of the feeds themselves


Someone compiled this a while ago which is a pretty good starter list for content discovery: https://github.com/outcoldman/hackernews-personal-blogs

I've imported most of them into https://app.recessfeed.com/ and found some nice ones to follow through that


I really like the idea of feed/entry search but it seems to not return very relevant results... if I search for "software defined radio" with or without quotes I get lots of results that don't have those terms in them


> If you put your 10 favourite blogs in there, it'll search just those blogs...

10 feeds will not give you much recall. I have 50K+ feeds, 1M+ posts, and it just starts to give somewhat respectable results.


Have you dumped your feed list anywhere?


No. but you can try the search here:

https://roastidio.us/search


If you'd like to read RSS in you new tab: https://tabhub.github.io/


This is a cool idea.

When I search for "history" it returned only technical articles, and heavily favored dan luus website.

Are technical blogs the primary focus?


I believe the instance currently has very few blogs indexed: https://openorb.idiot.sh/feeds

But you can deploy your own instance and add any blogs you want.


Given that techy people have a strong disposition to have a blog, more so than other demographics, there's an implicit bias toward the technical within the blogosphere, especially in its diminished state.


@


've been thinking that I needed one of these, but you've already made it happen. That's really great.


Tangentially on this note, if anyone is interested, I can produce a list of every RSS feed known to the marginalia search crawler. It's a pretty noisy list, but any thing I can do to help the spread, discovery and adoption of RSS I'm happy to help with so just let me know.

I a tool in place to export this data to help power the experimental RSS preview feature[1], but haven't had the inspiration to do much with that yet.

[1] e.g. https://search.marginalia.nu/site/jvns.ca

--edit-- Ok so there was interest. Give me a moment, I'll need to run an extraction script. Check back in a few hours or bookmark https://downloads.marginalia.nu/exports/


I would be very keen to have access to that list and to, ideally, have a go at cleaning it up and producing a topical subset for broader use in certain fields I'm interested in (e.g. all the "developer blogs", say). I offer an OPML file of several hundred engineering/dev related blogs at https://engineeringblogs.xyz/ but I'm starting to think a little bigger.


I'd be keen on this and would import all of them into Recess (https://app.recessfeed.com) - also working on RSS adoption and discovery!


Alright, about half a million RSS feeds available at: https://downloads.marginalia.nu/exports/ [select feeds.csv]

The data is, as mentioned, pretty noisy. It's a best-effort guess as to which is the canonical RSS feed for the particular domain. There doesn't appear to be any convention for specifying this, so when there's multiple a fair bit of guesswork is involved. Expect a fair number of dead URLs, lots of spam from CRMs that generate uninteresting feeds.


Isn't there a way to integrate this type of info into the actual search engine? Ie, search for type:rss or atom and return the links to the RSS feeds?

[edit] I mean, to have it closer to what OP showed.


I know the Google search console lets you upload a site map, which can be an RSS feed, so the information is readily available. I suspect Google isn't incentivised to promote RSS, especially after they killed Google Reader.


I started a submission based platform ( bao.social but not currently resolving) as a side project because I missed the accessibility for RSS. would be keen on the list or even just connecting with you and OP


Feel free to shoot me an email if you want to have a chat, bounce ideas or whatever. That goes for other people as well ;-)

I'm a bit busy with finalizing my grant-funded work in the immediate future so reply times may be a bit slow, but such is life.


That would be _so_ cool! What an amazing resource that would be.


I think the community would be interested in list and you'd get a lot of downloads if you offered it up.


Nice! I was thinking about the same kind of tool a while back, and developed a community-based curated feed reader with full-text search. It's not public yet (sign ups are behind an invitation code), but search works for guests: https://minifeed.net/global


This is super nice, and it looks like it's going to have some really great features, well beyond OpenOrb's! Excited to keep an eye on this.


Thanks for the kind words! I'm still a bit hesitant to make a "Show HN" at this time, but there are indeed potentially interesting implemented and planned features, like:

- full text search across all blogs (implemented) and across blogs user subscribed to (planned)

- subscribing to users to see the blogs they follow in your "friendfeed" (implemented)

- favorites, with contents saved to permanent storage (implemented)

- custom lists of blogs and posts (planned)

- comments (not sure about this one yet)


You can find many RSS feeds, links in my repository

https://github.com/rumca-js/Internet-Places-Database/tree/ma...

It contains also domain lists, that include tag indicating, if it is personal, or not.


Great to it’s hosted on a free software forge too not locking in contributions!

Not sure I always agree that feeds should have the full post tho. This not only (obviously) bloats the size of the feed, but there are valid reasons to want to drive users to your site--especially if you have demos or you write about code & have your code blocks syntax highlighted (statically, never do this with a JavaScript) as it provides a better reading experience. You can put styling technically in Atom/RSS but even then, a lot of readers won’t be applying the styling. That said, I definitely appreciate the full post if your site is full of trackers, ads, marketing garbage or other bloat since I can skip the site. Is this some site engineers giving us the nod on a better UX? I read a gridiron football news site & boy does that feed become take a site from unusable to pleasant (good photography).


As a feed consumer I am always happy if a feed contains the full content, but I am not sure if the feed must also include all articles that a site ever published. That would basically make the feed a serialized version of the whole website (which is indeed what a few feeds that I subscribe to do by including sections that are common on personal sites like about/contact/now as items of their feed - but those are the minority). That would actually be fine as long as the archive is small or at certain size, when the feed is paginated. But I am under the impression that most feed generators do not have pagination in mind, also I don't know how well the individual aggregators and readers handle it on the consuming end.


For a brief moment, I thought this was related to https://OrbStack.dev


I wonder when RSS will experience its "Google Search in 1997" moment? Right now it's beginning to nibble at Yahoo Directory days


That would be 2005 when Google Reader launched. RSS for people who didn't know what RSS was.


No, I mean: "Google" moment as in what Google originally was. Let me rephrase in edit my original comment to "Google Search" moment.

Basically, when Google came on the scene in 1997, it blew away Yahoo Directory. Do I have my dates right? Hahaha :)


RSS if anything is in decline, rather than its ascent, because of the fact that in many ways it offers access to content in a way that diminishes ad views.

It's not impossible that it could come back from this state, and indeed, outside of this issue, there's nothing wrong with it as a system, and podcasts make heavy use of it. But it's worth being aware of this headwind.


rss is the advertising.

It allows me to conveniently keep track of tens of thousands of websites.

If you don't have a feed, no problem. Ill just read something else.

With few exceptions I can't be bothered to keep looking at a web page hoping something new has happened.


RSS advertises the content, but not the actual sponsors of such content (i.e. commercial ads). It's also pretty hard to make it track readers.

That's why the likes of Meta and Google just don't like it.


you want to advertise your sponsors in your advertising??


No I totally know it's in ascent, that's my point! Haha! :) Hmm, how to express what I'm saying more clearly -- seems it's been missed? Haha! :)

I mean, like RSS seems like where the web was in 1996 - on the ascent! - waiting for its "Google Search" moment, whereas these types of RSS curations in this product and others like it recently, a little bit like Yahoo Directory!


> No I totally know it's in ascent, that's my point! Haha! :)

How do you "know" this? Show some proof! RSS has two well-known use cases: news and podcasts. It is fighting a pitched battle against players with deep pockets who want you to consume content where they can monetize it with ads.

Google Reader survived for as long as it did because such a service is incredibly cheap to run. Google only ended it to push people to Google+. Many of the various competing providers that popped up during that period are still around, but I would not say it is flourishing.

This is what Google thinks of RSS:

https://trends.google.com/trends/explore?date=all&geo=US&q=R...

Note a rise and plateau centered around 2005 and a brief peak in 2013 (when Google killed Reeder).


I agree with your view, but if we put down our old greybeard hats for a minute - isn't it nice to see a new generation of people potentially getting excited about RSS? The parent comment is clearly by an optimistic youngster, who has just discovered an awesome technology that (he thinks) could change the world. And maybe it can! Just because we've seen it beaten once (well, a few times), it doesn't mean it's dead, and maybe, just maybe, there is something we can't see that will be the real RSS killer app.

Take podcasting - when RSS was first devised, nobody thought of such a use-case; it just happened that the media-attachment hacks tacked on top of it merged, at a particular time and place, with some other emerging tech (the iPod), creating something so good that it's still around.


I keep wanting to build a PICS rating service and (humorously) add it to my feeds.

https://www.w3.org/PICS/

The use as parential control is clutter to me. The referring to content selection as filtering was a terrible idea.

The content labels can exist on a different website, they can live in the html document and there is an rss element specifically for it.

You make up your own rating label and put a score behind it. There was an example for example that rated by canadiannes

https://www.w3.org/TR/REC-PICS-labels/#Example

As usual they spend a lot of words explaining something simple.

In its almost most simple form:

(PICS-1.1 "http://www.example.org/ratingservice" label for "http://www.example.com/foobar" rating (javascript 5 php 4 mysql 6 bloomfilter 10))

Looks a lot like hand crafted weights to me. If widely adopted you could make a fascinating tag cloud from your 100k rss subscriptions.


> The parent comment is clearly by an optimistic youngster

My only objection is when the "youngster" had his viewpoint questioned, his response was "no I totally know it's in ascent". Objective evidence points in the opposite direction.


Youth be youth, innit ;)


Hahah yeah based on your misinterpreted comment thread hahaah! :)


Ahahaha, yeah the original cykros comment was "If anything RSS is in ascent" because he took my parent comment to mean it was going down somehow, then I replied and sometime later he changed his comment. Then you guys came along hahahaha@! So funny hahaha :)

You know mercury is retrograde right now so there's a lotta confusion on here hahahah! :)


It's so funny how people take this stuff, it's so funny. Yeah I was resopnding to that vibe but it's more like the greybeards who love RSS who keep it alive, but what I see is it keeps building over time.

Hahahaah and it's so funny to see contrarian or animosity against RSS on HN because usually it's the total opposite. I guess it's just 'cause I'm saying it people love to disagree, right? Hahahah! :) so hilarios hahah omg :)


Hahaha, this is so funny because the original cykros comment was like "RSS is in ascent", and I was just agreeing with that in the course of the discussion. And then he changed hahaha! :)

So like the whole conversation became a non sequitur after that so I understand if you're confused. Hahahaha! :) The issues with async I guess hahaha so funny :)


I think you are being slightly over enthusiastic here.


Hahah! :) yeah in the context of the negativity on this thread and the changes in the comments it might seem like that but in a different context it will look right hahaah! :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: