Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Marginalia – Exploration Mode (marginalia.nu)
236 points by marginalia_nu on Jan 23, 2022 | hide | past | favorite | 53 comments
I've been a bit obsessed with the idea of flipping through the internet a bit like you would a magazine, of undirected browsing as a discovery mechanism, and I think I'm approaching something that's beginning to feel pretty fun.

The link at the top will return results out of a pool of approximately 10,000 domains, you can refresh to get new ones. You can also explore in a directed fashion by using the 'Similar Domains'-buttons. These are not random.

A sampler, beyond the random sites offered with the head link

https://search.marginalia.nu/explore/www.amiga-news.de

https://search.marginalia.nu/explore/www.aaronsw.com

https://search.marginalia.nu/explore/therealbitcoin.org

I don't have thumbnails for all 500k domains in the database yet, but I think it's getting to a number where it's reasonable useful.




This feels like what StumbleUpon was (a positive correlation :) ) -- would you be willing to add what criteria "similar" is based on in the info box upper left? For example I have no clue what this below domain is, so would be curious as to what the algorithm uses as "similar" to show me more (keywords? links? domain names? hosting providers? tech used? country located? etc.) https://search.marginalia.nu/explore/cblgh.org


It's mostly adjacency in the link graph. I use a mix of direct neighbors and Personalized PageRank to produce the list.


> This feels like what StumbleUpon was (a positive correlation :)

Also like Delicious Bookmarks in it's heyday.


Yeah, there was a lot of these discovery services back in the day. It's a shame they eventually all died and/or became astroturfing platforms.


What’s to stop a new one from releasing?


Probably profitability. I feel what happened to the last bunch was they lived on hopes and prayers until they started to run out of investor money, and then they self-cannibalized in a last ditch effort to break even and then they didn't and now they're gone.

Although I run my site non-profit with a damn near zero burn-rate so for me it's all the same if I have 3 visitors per day or 3 million.


A couple of questions:

1. Are you accepting collaborators? 2. do you have any plans to share the source?

I'd love to contribute somehow, either a) by donating some of my time and skills to improving the software, or b) by donating some infrastructure to expand the capacity / redundancy of your search engine.


I'm a bit limited by how little time I can put on this, which is the main reason why I don't currently have any plans on open sourcing it. Like I couldn't responsibly run that project, and just dumping it on github wouldn't be useful for anyone. Besides, most value comes from the database, which sadly would be fairly harmful if it came to circulate among search engine marketers.

Some sort of collaboration might be doable, but again, I don't actually have a lot of bandwidth to actually manage it. Right now I'm doing some API-level collaboration with Teclis.

I am sort of beginning to see hardware limitations though, so if you have ideas for how to get more I'd be interested to hear what you propose.


Are you running this on-premise or in a colocation / cloud environment?


> undirected browsing as a discovery mechanism

this seems to be a thoroughly underrated way of discovery and it's so disappointing when websites focus solely on search, forcing users to know and articulate what they came for.

browsing through these random sites you list is indeed a very fun -- and liberating -- experience. thank you for putting this together!


I do think a lot of recommendation algorithms these days are bit too good at finding things that are similar to what we like. Which means you never discover things that you'll like, but are not similar to the things you've tried before. It becomes incredibly samey after a while.

The great joy of, say, flipping through a magazine or browsing a library is that they are passive, and don't know who you are, and can't adapt what you read based on what you're likely to read. So you might read something unexpected, you might discover something you didn't even know about yourself.


thing is they're often really really bad too. e.g. when i 'explore' albums on youtube music it force feeds me a limited selection of new releases based on popularity, percieved genre preference, and my geographic location, probably some other stuff as well. less than 5% of those recommendations end up being of any interest to me.

meanwhile all i really desire is a full list of releases ordered by date and just let me sift through that myself, but there seems to be no way to get that list, at least not through the regular user interface.

it's very frustrating.


For the last few months, I’ve consumed every piece of media reviewed by the FT Weekend. It’s been a mixed bag, but I’ve made some wonderful discoveries.


thanks for the suggestion -- is this what you're refering to? https://www.ft.com/arts/music/albums

i'll probably add it to my sources, but still a comprehensive list of new releases would be a dream come true.


Yes, that’s the music part. I find a comprehensive list would be overwhelmingly long, and dominated by poor quality.

I also follow their recommendations of film, radio, and TV.


undoubtedly so, but i enjoy the process of curating media myself, sifting through loads of dirt and finding those hidden gems.

a list in the style of 0day music scene releases but legal, instantly streamable, and limited to current releases would be a perfect playground in that sense, yet i can't seem to find anything remotely similar even if it exists somewhere.


Try rateyourmusic.com's list: https://rateyourmusic.com/new-music/ (sort by: date)

Also their charts by genre/year are quite cool:

https://rateyourmusic.com/charts/


more comprehensive than anything i managed to find so far -- thanks for pointing me to this, it's a useful resource!


> browsing through these random sites you list is indeed a very fun -- and liberating -- experience. thank you for putting this together!

Yeah! I really liked the concept too.

This reminds me of the early Stumble Upon or even channel surfing cable tv back when it was analog!


I really liked StumbleUpon before it kinda turned to shit. I also kinda miss the feeling of not having everything be tuned for user engagement. It's a big part why there is no vote-arrows, thumbs up, stars, et cetera involved here. You shake the snow globe and get what you get.


You posted on related topics a few weeks back with your Marginalia projects and I spent an hour browsing your sites. Making the "small web" and its creative weirdness visible again pulls on my nostalgia strings. Good work!


A year ago I made a Chrome extension that took the top 10k links on HN and randomly sent the user there upon pressing the UI button. It was heavily influenced by the original StumbleUpon. I got stuck because of CORS issues with iframes, as I wanted to further recreate some of that original experience.

My next step was creating a drop down menu for the extension with info on tags, HN data, discussion, favoriting, rating, etc. I've moved far away from Chrome extensions as mobile usage has become predominant and extensions get more and more locked down.

I like the proxy route this version takes. I imagine scale could become difficult to manage because of the weirdness of proxying. Caching seems like it would be an interesting problems as well. Would caching site data bring up legal issues? What would good server-side cache expiration times be?

Anyways, I like this way of browsing the web. Discovery continues to be a hard problem, especially as more walled gardens emerge.


This is really cool. I have 7 tabs of quirky barely known websites open after maybe less than 5 minutes of interacting with Marginalia. This is so much fun!


One really useful thing about marginalia (research project) and kagi (production grade search engine) is that have proved that Googles, Bings (and DDGs) problems hasn't been just because the old stuff had disappeared but because it wasn't prioritized.

I had begun suspecting that the reasons for the mediocre results lately was because the web was in fact broken but now I now I used to be right: it was just a matter of what they chose to prioritize.


any final fantasy series related fansites? for me there's always at least 1-2 on the list whenever i reload.


Haha, the fansite-sphere is one of like 6-7 hotspots the random function favors. May be there's a smidge too many right now, but I've tried to get a good mix of bits and bobs with hopefully a little bit for everyone.


I almost kinda didn't bother because of the cloudflare DDoS protection. I know that can be petty, but I wouldn't have waited if it was from a Google results page for example.


I just turned it up a notch right now for a moment, a lot of people are really aggressively bot-scraping new HN submissions for whatever reason. It's like a minor DoS every time you submit a link.

That's fine for a blog I guess, but this I perform a non-trivial calculation for each request, so I'd rather not have bot spam. (This is hosted on a computer in my living room, so I can't just scale it up)


>> This is hosted on a computer in my living room

Since you serve Swedish weather info from marginalia I'm assuming you live in Sweden, is that correct? Could you very briefly explain how you host and serve pages from your living room and what your bandwidth is?

Does your ISP get cranky when you see DOS type of traffic?

I'm a fellow hobbyist search engine dev, also from Sweden. Whenever I demonstrate my search engine by hosting in the cloud the expenses get so big I have to go offline after a short while and I've therefore been contemplating personal, living room hosting.


> Since you serve Swedish weather info from marginalia I'm assuming you live in Sweden, is that correct?

It is indeed.

> Could you very briefly explain how you host and serve pages from your living room and what your bandwidth is?

100/100 mbit municipal broadband, through Bahnhof.

> Does your ISP get cranky when you see DOS type of traffic?

Haven't heard a word form them, although you'd be surprised how far I am from saturating my line. Your average bittorrent enthusiast probably uses a lot more. I do try to not be a nuisance though. Cloudflare helps take the edge off things, as does running a local DNS cache.

> I'm a fellow hobbyist search engine dev, also from Sweden. Whenever I demonstrate my search engine by hosting in the cloud the expenses get so big I have to go offline after a short while and I've therefore been contemplating personal, living room hosting.

You might also consider server rental. Can get away with SEK 2-4k/month. My server, including UPS and other expenses is like SEK 40k, plus I expect to burn through an SSD once a year or so.


Very helpful, thank you. Which part of Sweden are you in, by the way.


Up north.


> I perform a non-trivial calculation for each request

Any reason why caching wouldn't work here? Do the results have to be different on each request instead of being cached for a short while (10 seconds)?


Oh yeah, you could probably do some sort of caching to that effect. This is just a fun toy I hacked together, so it's not super optimized.


I mean fair enough. I gave you the time since I came from HN and knew the risk/reward for good content was strong.

If it was more than a toy, it would need to be less aggresive


It's awesome! I like most suggestions, it feels a bit like https://wiby.me/surprise but generally less weird.

I've often wanted to have a go at making my own search engine, and I think I would penalize any form of advertising (especially big ad networks, referral links) or tracking (Google Analytics, etc.) as these can create (or reveal) perverse incentives. This would likely get rid of most of the "SEO spam" that we see nowadays. Reading the about page[1], this seems like what you are doing here, but I can't really tell as it's light on details.

Q: would this be able to handle foreign-language sites? I don't yet have a blog/personal website, but if I did, I guess it would be mixed-language. Should I submit some of my friends' blogs, even though they might not be entirely (or at all) written in English?

A relatively new sort of search-engine junk, especially visible in non-English results from big search engines is also auto-generated (or probably machine-translated) websites, full of nonsensical content. They might be translated from genuine sites in other languages, I'm not sure. It would seem hard to fend these off, but luckily, fighting perverse incentives such as advertisement revenue probably gets rid of them too.

I also wondered if this was curated list, and if the list was available somewhere, but it seems it's just a good old spider, and I guess that exposing too much info about the metrics might enable some to game the system? Not that marginalia is big enough to make it an attractive target, of course!

[1]: https://memex.marginalia.nu/projects/edge/about.gmi


I'm keeping a few of the details intentionally sketchy, but in general, I do think it's relatively resilient to manipulation. I'm using a Personalized PageRank which uses the opinions of a secret subset of websites to calculate a ranking. I've also selected those websites to be not be particularly likely to be bribed.

Bilingual sites should be fine, I think. It will reject individual pages that don't have enough English text on them, but as long as it finds pages with English relatively easily they ought to get indexed.


Hey this works great. Found some new and interesting sites.


Feels pretty cool, more like the traditional internet.

After a few minutes I found, that I would prefer a page that is not left-aligned, something like

  #article {
    margin: 0 auto;
  }
A minor change that makes it much more comfortable to use IMHO.


Hmm, are you on mobile, desktop? What's your browser and screen resolution?

I did relatively recently redesign the whole stylesheet, so there's probably a few minor problems to iron out.


Desktop, Firefox 3840x2160 with window.devicePixelRatio = 1.5

So I run into the max-width of 160ch (which feels good), but I have a lot of whitespace on the right.


Hmm, yeah. I think I see what you mean. Good call. I've pushed a new CSS.


Cool, it looks great now. Thanks :-)


Quirky sites alleviate my disdain for humanity, somewhat. Thanks.


This is really cool, it's like being back in 1998 again when browsing the internet was exciting, bookmarked! I'll be exploring this for a while I think


The <h1> banner is "Search the internet" but are we only searching www servers.

Can we use marginalia.nu to search for servers offering other protocols like ftp.


It's been my ambition to support Gemini and Gopher down the line.


It's gem after gem after gem, this is brilliant


It looks like a book shop and I like the idea.

A nitpick though: Shouldn't the "capture in progress" pages be excluded from the random search?


Yeah, the whole thing isn't super polished, still a work in progress. There's also a few thumbnails that were captured mid-loading I'd like to improve down the line.

Right now it's a mix between domains I simply haven't captured a thumbnail for yet, and domains that for some reason won't be captured (errors, etc). Once I reduce the first category, I'll look for a way of hiding the second category.


At three of the four links I've clicked the page doesn't match preview at all. How often do you update previews?


I don’t get it - I clicked on about 10 sites and none of them look anything like the screenshot picture?


I wanted to provide an example of what the content of the websites look like, which you'll rarely find on the front page. So the screenshots are of URLs that are actually indexed by my search index. If you use the 'Info' link you can usually find the particular page. On the flip side, actually linking to those URLs may land you on a privacy policy or some weird deep link.

Dunno, maybe it's a confusing choice.


[deleted]




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: