Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Rediscovering the Small Web (2020) (neustadt.fr)
256 points by colinprince on Aug 31, 2024 | hide | past | favorite | 64 comments


Our contribution to the small web: https://kagi.com/smallweb

The site and list of blogs is open source, growing steadily by about 10 each day (almost at 15,000 at this point).

Every recent post from sites in Kagi Small Web is indexed and given preference in Kagi Search results.

How it works: https://blog.kagi.com/small-web

edit: The project just had its one thousandth commit!


https://wiby.me/ is also excellent. Someone else linked it elsewhere in the thread but worth riding the coat tails of the top post for anyone interested.


Our contribution to the small web: https://kagi.com/smallweb

After opening this web page, I pressed down arrow a few times to scroll the page. At first, I didn't understand why it only scrolled a few pixels.

It looks like there's a scrollable area within a scrollable area. The outermost scrollable area only scrolls a few pixels.

This is a badly designed web page.


It's StumbleUpon without the spam problem. I like it.


> How it works: https://kagi.com/small-web

I tried to access it. It displays a different web page in a frame, which has an invalid certificate (among other things, it is expired and is for the wrong domain name), and then when I bypass the certificate error, I get a 404 error.


Mission accomplished: sounds just like the web when I was in high school! ;-)


I know the friction to add websites is the point, but might I recommend a way to add our own websites without having to promote two others. My rinkydink website qualifies, but all the other small websites I know are all already on the list.


We make it high effort as we want to prevent low effort submissions - we only have limited resources available to review.


Here's my contribution to the small web: http://t3x.org


Wow, I remember discovering your page in the late 90s. Never thought I'd find it again!


Being found is the greatest problem that small web sites have these days. Glad you found it again! :)


Hey man I'm into meditation too. Nice to meet you.


This is such an useful set of links! thank you!


Pretty much everything that is linked to is also on T3X.ORG. :)


Careful, if you show noscript/basic (x)html does a good enough job, you will get attacked by big tech shadow-paid hackers (or idiotic ones), that to force you to use their javascripted grotesquely massive and complex web-engines.

...


I also dislike the skinner box that is today's web, but do you really think it's somebody's job to attack you for having your site be a document?


Well, maybe not a static document, but as soon as you have some basic HTML forms doing a good enough job, I would not be surprised to it gets attacked by big tech shadow-paid hackers (or idiotic ones) to push forward their massive and complex javascripted web engines which they have control over.

Look at whom the crime is a benefit in the end.


What? This can easily be averted by adding a captcha to the form (server-side validation so no JS needed) and/or some sort of rate limiter or firewall, e.g. blocking any IP address that sends too many requests too quickly.


One of the best internet experiences I had in a while is reading (and writing!) posts on bearblog.dev, check out their discover feed. Wholesome place.

In similar spirit, check out https://ooh.directory


If I'm excused for self-promoting, I've also made some tools for discovery:

e.g.

https://search.marginalia.nu/explore/random

https://search.marginalia.nu/site/t3x.org?view=info


Here's another self-promotion.

https://alexsci.com/rss-blogroll-network/

This uses OPML blogrolls to crawl blog-to-blog recommendations. I seeded it with the blogs I follow and various planets (https://indieweb.org/planet) and then recursively followed recommendations to build an organic network. Lots of the content is tech-related, indieweb, and smallweb. It's grown to 17 languages and over 4000 RSS/atom feeds.

As an example, the linked blog has a page here [1] and it was discovered by a recommendation by [2].

[1] https://alexsci.com/rss-blogroll-network/discover/feed-12ac5...

[2] https://alexsci.com/rss-blogroll-network/discover/feed-8ecf9...


This is cool!

Is the aggregate list supposed to update regularly?

https://alexsci.com/rss-blogroll-network/rss.xml


Yes, semi-regularly, I did a fresh update since your message.

To reduce storage size, only the title/link/metadata of latest post from each feed is saved. I run the crawler manually, aiming for weekly, but sometimes less frequently. So this won't catch every post and it lags behind quite a bit. I'm hitting some hosting sites faster than I'd like, especially ones that support custom domain names, so I'm planning on fixing up the rate-limiting strategy before I put it in a daily cron job.

There's a plan for ArchiveTeam to use the RSS feed as another way of discovering blog posts to archive. I don't think it's generally useful to point your feed reader at it as there's quite a diverse collection of content.


I just tried using google to search for sites I see in ooh.directory, and it's very hard to get them to surface. I can take exact specific phrases from them, like "Scaling a Digital Panel Voltmeter" and without quotes neither Google nor Bing will find the site, and with quotes, only Bing finds the site: (https://zzncx.top/posts/scaling-a-digital-panel-voltmeter/)

Personal blogs with real information just can't be found anymore.


Marginalia, mentioned in sibling comment, does exactly what you're looking for, I'm seeing the site as the top result without using quotes:

https://search.marginalia.nu/search?query=Scaling+a+Digital+...


It's kinda hit and miss with regards to these types of queries.

I've got better phrase matching in the pipe though, give it a few weeks and it should do this even better.


Excellent point made


The Reddit thread [1] in which the author introduces Bearblog explains the sorry state of today's Internet a bit. "What's the point of blogging if I can't track users and harvest their email addresses?"

[1] https://www.reddit.com/r/Blogging/comments/i8fmuc/%CA%95%E1%...


For me, the point of my pointless blogging is to sell dozens of books across the world with my words in them. That way, I feel like I’ve achieved immortality. No joke.


In the same spirit, here is a site devoted to getting off the centralized platforms:

https://landchad.net/


Wonderful collection of how-to’s to run your own server. Thanks for sharing.

Might I suggest (in the interest of privacy) that you give donators the option to use a Silent Payment address instead of a naked BYC address? I noticed you have a Monero address as well, so I assume you care about privacy


Wow this looks clean !


The web was so much more fun in the 90s.


Fun parts of the web still exist today, they're just struggling to be noticed. Arguably the biggest change since then is in signal to noise ratio.


And the algorithms we live by.

Google does not easily surface those websites. Social networks suppress posts with links.


The problem is that most of the interesting content I'm interested is posted in the not-so-fun parts of the web, so I feel forced to participate.


A lot of the good stuff got sucked in to walled gardens. People’s personal home pages or tacky MySpace pages were definitely more fun than the current semiprofessional content scroll. Forums like this very one were mostly subsumed in to Reddit. Nevermind the death of the bbs (not actually the internet I realise)


I feel like the internet needs a giant directory of indie websites. So you can actually surf around and find them.

The big modern search engines almost have to be intentionally hiding these websites because they're nearly impossible to find without using an alternative engine like wiby.me or search.marginalia.nu.


I was just going to post a comment similar to this. We've swung towards walled gardens of piles of content instead of graphs of individually curated links.

Exactly that "surfing" or "webring" or "stumbleupon" style of actually browsing in a larger content rather than searching or push-promote within that pile of content.


The walled gardens are better for many of the internet's main uses.

If I need to find out what vodka to buy I Google with site:reddit.com and pick the post that's obviously written by alcoholics. The small web can't touch that.


I don’t think Google hides small sites as much as people are really good at SEO for Google specifically.

Like my blog has literally 0 SEO and you’ll never find it, but a friend of mine has a blog where he does not post very often, but spends a lot of effort on SEO and it’s very easy to run into his blog.

The SEO meta destroyed small blogs.


It's impossible to say this is something they do, but it's worth noting that Google also has an economic incentive to mostly show commercial/ad-ridden results, as leading users to blogs with no adsense on them make them less money; so it would at least be in their interest to let the search results look like they do.

To fully understand Google you need to look at them not as a service that brings websites to people, but directs people to websites.


“we expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers.”

-- Sergey Brin and Larry Page

http://www.zdnet.com/article/google-advertising-and-search-e...


Appendix A in this paper is a gem as well:

http://infolab.stanford.edu/pub/papers/google.pdf


Contains the quote above and “The goals of the advertising business model do not always correspond to providing quality search to users.”

So what we observe in the deterioration of Google search was predicted by its creators, who made the deliberate decision to let this happen by accepting advertising.


Google went public in 2004, after that I don't think any amount of founder idealism could have saved it from shareholder pressure. If anything it's remarkable they held out as long as they did.


It’s so unfortunate that no one inside Google is taking any decision to clearly make things worse. It’s simply the structure of their business that is fundamentally wrong, and their founders had correctly identified the problem right at the start.


I think the game of SEO works in the favor of advertisers naturally.

Google doesn’t need to downrank small sites, it just happens.

Maybe it’s just semantics


It's not like it's impossible to combat search engine spam. The by far most effective tool is to just go after affiliate links and ad-heavy websites. Penalize those websites and 99% of the search engine spam vanishes.

Though as noted, this is not in Google's economic interest to do.



ooh.directory is fantastic, I particularly liked the stance that only add a few sites a week are added, which allows to "digest" these sites. Sadly, no new sites have been added since nearly three month. I assume this is just an instance of "Life happens" - it is a single person venture after all - but if there were a dozen similar attempts at handpicking and cataloguing the "good web", it would not hurt.


If anyone also misses StumbleUpon, there's something similar: https://cloudhiker.net


My contribution to the small web is a lightweight blogging platform: https://lmno.lol My blog is at https://lmno.lol/alvaro

You can drag and drop your entire blog from a single markdown file https://indieweb.social/@xenodium/112265481282475542

You can read the blogs from anywhere, even terminal (no JS needed).

No need to sign up or log in to try it out. I haven't officially launched, but if you'd like to start blogging now, I'll be happy to share an invite code.


Seems like a small web deserves a small client. Why use a "big web" client to read the small web. "Big web" clients are funded by advertising or advertising companies.

Bias disclosure: I have used a text-only client for the last 30 years.


My list of shared links is here: https://www.heyhomepage.com/?module=timeline&view=sharedlist

It's basically all the sites and feeds I follow daily with the Hey Homepage built-in RSS reader. You can browse the list and click around, or download it as an OPML file.

RSS = Really Social Sites; OPML = Other People's Meaningful Links


There was a push during Covid on Gemini pages. I did that for awhile, but the lack of real formatting and not being able to cross link articles became a stopper.

You can see get to some of them here

Collaborative Directory of Geminispace: gemini://cdg.thegonz.net/

But you need a Gemini reader


One site in this vein that I hope never goes away is https://rpgclassics.com/

I discovered it as a young lad lost when playing some RPGs on emulators in the early 2000s


i've been publishing things as html2 pages, but not interconnected in any way. so each page (or sometimes group of pages) will be dedicated to an exploration of a single subject. i then send those pages to people who i think might be interested in them. that's all, they otherwise don't see the greater internet. of course people are free to add them to link aggregators, etc. but i don't police this practice. i simply don't care for my output to be consumed by general public, or by llms, or by corporate media, or by whomever who is not my friend or in my immediate immediate circle of friends


Thank you for this. It has inspired me to delete my Reddit account and create an HN account. This gives me hope that the web can survive the social media era.


Well this is social media too. Beware of shifting complexities!


This is now the 7th time someone shares this link on HN. It must be worth a read


mobile devices, app-ification and the social media that really started to kill the small web, kind of ironically.

and if you're a front end developer it was apple launching the meta viewport tag in 2007 killed the simple front end.


"Today is September 11323, 1993"


yep. truly eternal.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: