The Return of the 90s Web

waltbosz · on June 18, 2020

What I miss most from the early days of the Internet is the content. It was all created with love.

My theory is that the high barrier to entry of online publishing kept all but the most determined people from creating content. As a result, the little content that was out there was usually good.

With today's monetized blogs, it is often content for content's sake. People don't try, or they write about topics which they are not really interested in, but did just to have a new post. Or often the writing is bad.

Maybe today's problem isn't the blogs, but the SEO that puts the crap blogs at the top of the search results. Or maybe I'm misremembering and the old content was crap too, or maybe my standards are higher than they were in my teenage years.

kickscondor · on June 18, 2020

People are still creating great stuff along these lines - you just won't find it through Google or Facebook or most of Reddit. Complex, interesting hypertext creations and web sites are still everywhere. But try typing "interesting hypertext" into Google or Facebook and see where it gets you. You can't search for something that's off the beaten track.

This is where directories come back in. Check some of these out:

* https://marijnflorence.neocities.org/linkroll/

* https://neonaut.neocities.org/directory/

* https://webring.xxiivv.com/ (which led me to this gem: https://dreamwiki.sixey.es/)

Competing with Google in search has become an insurmountable task. Personal directories attack from the opposite direction (human curation, no algorithm) in a way that actually puts Google far behind. It's kind of exciting and unexpected.

nextaccountic · on June 18, 2020

What we really need is a new Google, built on open principles (decentralized / peer to peer, fully free software, backed by a nonprofit), and focused on indexing the long tail of insightful content that is neglected by Google because it lacks SEO, popularity, links, and other metrics that Google find interesting but we don't necessarily do.

StillBored · on June 18, 2020

For a long time I assumed that google indexed pretty much everything it, and it was only a question of providing a specific enough set of search terms to drag up older content.

But what you hint at might be more correct these days. They are running a reverse wayback machine in that anything not changed in the last year gets removed. If you click the advanced search its "updated within" and the max timeframe is a year.

In fact it seems the date range example doesn't even work: https://developers.google.com/custom-search/docs/structured_...

If I fiddle with it, it returns a result, but I see an hit from just a few days ago at the top...

koreth1 · on June 18, 2020

> They are running a reverse wayback machine in that anything not changed in the last year gets removed.

Sometimes I wish that were true! Try Googling for, say, PostgreSQL documentation and the top result will often be for a 10-year-old version of the software.

pphysch · on June 19, 2020

Nitpick, that's kind of a non example because the official Postgres docs let you swap versions more or less seamlessly. IME I will click the top result then click the correct version for my Postgres SE queries. 2 clicks, 0 scrolling.

rpdillon · on June 19, 2020

You're right. I tend to click back, then revise my search to include the version, and then click the result, so Google gets the message that when I search for Postgres docs, I want the most recent version. I have no idea if this actually works, but I heard Google uses bounces to determine relevancy, so I thought it was worth a shot.

aarong11 · on June 19, 2020

Might as well piss in the wind. The number of back-links from different sources probably has a much larger effect.

koreth1 · on June 19, 2020

Sure, but 1 click and 0 scrolling would be better!

TuringTest · on June 19, 2020

Try the button I'm feeling lucky!

ZeikJT · on June 19, 2020

Before the omnibar became popular I used to use google as my homepage and I'd type website keywords then "I'm feeling lucky" to get to my frequent websites. I think bookmarks took up too much space vertically so this was my solution /shrug

raverbashing · on June 19, 2020

Well it might be easy to switch but it is a good example

Why is it that Google is thinking the older page is more relevant? Does PageRank outrank content (and Google is oblivious to similar pages that have different versions?)

ajmurmann · on June 19, 2020

One would expect that with their resources they could figure out for which topics recency matters and for which it doesn't.

FalconSensei · on June 19, 2020

that's why I always filter by last year

slightwinder · on June 19, 2020

> For a long time I assumed that google indexed pretty much everything it,

They did that for a long time, but some years ago the index grew so big, they started restricting it. I thing the general timeframe is 10 years or less till the last update.

> If you click the advanced search its "updated within" and the max timeframe is a year.

Because it makes no sense to go further. For older content you can define individual dateranges. And yes, it works fine for me. Tested a search for 2015 just now, first side had entries all from 2015.

> In fact it seems the date range example doesn't even work: https://developers.google.com/custom-search/docs/structured_....

All those examples are not working. Wasn't custom search retired some years ago?

stockstreaks · on June 19, 2020

This is really interesting, I never thought of it this way. Thanks for sharing.

majormajor · on June 18, 2020

Any new google will become the current google because it will be gamed. I agree with the parent poster about hand-curated directories being the key to finding better stuff. It's necessarily much more distributed - because one person can't index everything - and therefore much harder to algorithmically game.

Google has conditioned us into thinking that "an algorithm that automagically separates the wheat from the chaff" is the only way to do things. It worked for them for a while, but the adversarial forces of marketing, spam, malware, etc are very creative and fast-moving, and that's a lot for an algorithm to try to constantly reckon-with, so best case it'll probably stay a stalemate.

But that's not the only way things can be.

esfandia · on June 19, 2020

So we have hand-curated directories, and we have social networks. Put them together and you could have a scalable, crawlable, searchable, customizable and non-gamable decentralized indexing system. Just don't know why nobody has come with this yet. Maybe lack of (monetary) incentives? Maybe something for IETF or WWWC to initiate?

rkagerer · on June 19, 2020

Just for fun, imagine an alternate reality where nobody cracked the problem of an effective automated indexing algorithm. Hand-curated directories emerge, some structured by taxonomic categorization, others using keyword tagging, still others reliant on user-based rankings. The most successful grow in popularity and consolidate, becoming highly lucrative properties with recognizable brand names. As they grow to gargantuan size, research in the field explodes and innovators race to come up with improved methods to connect people to content they seek. Sergey Dewey and Larry Linnaeus make a breakthrough they call SageRank. Instead of computer code, it leverages "social algorithms" and game theory to incentivize participant behavior. Vladmir Bezos sinks billions into a clandestine effort to game the system. Once the story breaks, public backlash rallies into a worldwide, anti-"fake links" campaign. In one little corner of the internet, some schmuck says, "Imagine an alternate reality where two nerds came up with an impartial computer program to crawl the whole web..."

RNCTX · on June 19, 2020

Social networks are also gamed.

Since Twitter is their preferred platform, go put the activity of journalist Twitter accounts into a relational DB and start searching for who always boosts who. You'll find patterns. Of course there's nothing inherently wrong with this, but at the end of the day I don't need to know what a dozen NY Times journos think of a NY Times oped which is clearly written in bad faith, pushing a false narrative about a particular news event.

Non-gameable ultimately means people who influence the results can have no monetary interest in the results.

example:

Twitter thread...

https://twitter.com/jiatolentino/status/1263208982614814722

...vs reality...

https://wearyourvoicemag.com/jia-tolentino-parents-teachers-...

japanuspus · on June 19, 2020

This is alive and doing well: https://wiby.me/

codetrotter · on June 19, 2020

I tried to search for irish setter but none of the first results were related to irish setter dogs.

lsh · on June 20, 2020

On the third page of search results for 'irish setter' I found this: http://blackpeopleloveus.com/

That search engine is like a gold mine! I searched for "black people love us" in DDG and it was the first result, followed by an article written this year explaining how it came about and .. the web felt like such a smaller place back in 2002 and I just don't remember this at all.

Which makes me wonder about how newspapers and free to air tv kept the culture pretty shallow before exploding with the internet and now, possibly contracting again as our filter bubbles shrink? Just an errant though.

But so much novelty and interesting stuff at wiby.me - search for 'trump' and the first result is just surreal.

anaxag0ras · on June 19, 2020

Social networks are becoming walled garden so it's kind of tough to crawl them. I mean look at Facebook and Instagram, the biggest social networks. They have robots.txt configured to disallow any crawlers.

Facebook also used have RSS feed for public pages and posts. Now they've not only removed that feature but also have heavy restrictions for third apps.

kernel_pancake · on June 21, 2020

Facebook and the like made the decision to optimise for distraction instead of connection.

Services connected to the Fediverse, an alternative framework that focusses on connectivity, is slowing growing. It's only a matter of time before they are more successful than the walled gardens.

olah_1 · on June 19, 2020

That’s exactly what this guy is doing with Iris. It’s early days but definitely worth checking out.

https://www.hackernoon.com/what-is-wrong-with-the-internet-a...

api · on June 19, 2020

I was thinking the same thing. Search engines for the whole web are over. The concept doesn't stand up to spam of various kinds.

basch · on June 18, 2020

Agree on the long tail. And maybe a way to exclude bigger sites, and newer results (the time range search doesnt really work anymore.)

I had a search today, and 7 of the top 10 results were from today. What I was looking for was NOT news, it was historical. If I wanted news, I would click the news tab. Having 7/10ths of the results come from today makes using google to search all of the web ever near useless, as todays noise is noisier than ever.

I dont even care if they are defaults, but buttons to "exclude big sites" "exclude the news" or "exclude fresh results" would make search so much better.

spc476 · on June 19, 2020

Here you go: https://millionshort.com/

dredmorbius · on June 19, 2020

An equivalent which provides sources older than the past hour, day, week, month, year, decade, century, or millennium would be an opportunity here.

basch · on June 19, 2020

Google has that, it's just stopped working. You'll end up with articles from 1997 that say they updated 4h ago. And the amount of seo garbage that gets through has polluted the remaining results.

What would be really cool was if google could show what the results for a search looked like on a given day. Not the current algorithm, not any sites indexed since then, but what it looked like at the time. Going back to use 2010 Google would be a dream.

aspenmayer · on June 19, 2020

Part of me wonders whether this is technically possible. I’m sure it’s an answerable question.

basch · on June 19, 2020

There would be at least two ways to do it, neither of which are likely realistic to something google scale. One would be to cache results, so that any search searched before could be retrieved. This is limited in that you cant make new searches on old data, and possibly stores private information that shouldnt be accessible to others.

The other way would be to have the index and algorithm versioned, where you can target any instance of the algorithm against any version of the data.

aspenmayer · on June 20, 2020

Your second method is more in line with what I had in my mind as what you were getting at, and is pretty much the context of my original reply.

I am sure it’s technically possible going forward, but it would be interesting if such capabilities could be enabled for historical versions of the index and algorithm. Combined with anonymized historical zeitgeist data, some interesting digital archaeology could be attempted.

All the more reason to run your own crawler! What’s the state of the art for this area right now in self hosted solutions? Can you version your index and algorithm like we’re discussing and do these kinds of search-data time-traveling?

dredmorbius · on June 19, 2020

If you have an indicator of content on a specific date and can confirm no signoficant change ...

Though document fingerprinting is hard. Especially w/ fungible page elements.

Internet Archive has an angle here.

aspenmayer · on June 21, 2020

> Internet Archive has an angle here.

Are you referring to WARC type tooling or what? I don’t want to put words in your mouth. I’m a complete learner on this topic. I think gwern has written a bit about this broadly? I’m curious to know more about this, if you have time to share more.

https://www.gwern.net/Search

https://www.gwern.net/Archiving-URLs

https://en.wikipedia.org/wiki/Web_ARChive

slightwinder · on June 19, 2020

> What we really need is a new Google, built on open principles (decentralized / peer to peer, fully free software, backed by a nonprofit),

We have them, they suck.

> and focused on indexing the long tail of insightful content that is neglected by Google because it lacks SEO

How would you even define that? SEO is changing all the time. And google is fighting it all the time.

And how would you prevent SEO focusing on that new searchengine? If it becomes big enough, people will optimize for it.

Aeolun · on June 19, 2020

I think the goal is to exclude anything that was created with a popular CMS (to exclude some content marketing), or in the top 100k websites.

Then we might go back to something sort of interesting.

slightwinder · on June 19, 2020

That is a good startingpoint, but also something SEO can very easy hack. Obfusicating the usual hints of a popular CMS is not hard.

I wonder whether doing a parallel search on google and filter out by their top-results from your own results would be a feasiable solution? Add a filter on the top 500 websites, and whether known ad-sources are used and you might get slowly there.

Maybe instead of a smart searchengine it would be better to focus on a dumb focus which gives access to all the metadata of a page too, and allows people to optimize for themself. Fulltext alone is not the only relevant content for good results. Google knows it and uses them, but has very limited acces to it for the enduser.

ajmurmann · on June 19, 2020

I LOVE the idea of a search engine that lowers the rank of results with ads. This might be the key to success in finding projects of love rather than commercial click holes.

Aeolun · on June 19, 2020

Oh. That’s also an idea, but maybe we’d still get tons of ad-less blogs that are just link farms to other sites.

Still, I guess that’s only viable because Google rewards lots of links. If you just disable link relevance that part of gaming the system will be gone too.

derefr · on June 19, 2020

How about TF(PageRank)/IDF(PageRank)? As in, start by ranking the individual result URLs with their PageRank for the given query; but then normalize those rankings by the PageRank each TF(PageRank) result URL’s domain/origin has for all known queries (this is the IDF part.)

Then, the more distinct queries a given website ranks for (i.e. the more SEO battles it wins/the more generally optimal it is at “playing the game”), the less prominently any individual results from said website would be ranked for any given query.

So big sites that people link to for thousands of different reasons (Wikipedia, say) wouldn’t disappear from the results entirely; but they would rank below some person’s hand-written HTML website they made 100% just to answer your question, which only gets linked to on click-paths originating on sites that contain your exact search terms.

This would incentivize creating pages that are actually about one particular thing; while actively punishing not just SEO lead-gen bullshit; not just keyword-stuffed landing pages we see in most modern corporate sites; but also content centralization in general (i.e. content platforms like Reddit, Github, Wikipedia, etc.) while leaving unaffected actual hosting by these platforms, of the kind that puts individual sites on their own domains (e.g. Github Pages, WordPress.com, Tumblr, specialty Wikis, etc.)

———

A fun way to think of this is that it’s similar to using a ladder ranking system (usually used for competitive games) to solve the stable-marriage problem on a dating site.

In such a system, you have two considerations:

• you want people to find someone who’s highly compatible with them, i.e. someone who ranks for their query

• you want to optimize for relationship length; and therefore, you want to lower the ranking of matches that, while theoretically compatible, would result in high relationship stress/tension.

Satisfying just the first constraint is pretty simple (and gets you a regular dating site.) To satisfy the second constraint, though, you need some way of computing relationship stress.

One large (and more importantly, “amenable to analysis”) source of relationship stress, comes from matches between highly-sought-after and not-highly-sought-after people, i.e. matches where one partner is “out of the league of” the other partner.

So, going with just that source for now (as fixing just that source of stress would go a long way to making a better dating site), to compute it, you would need some way to 1. globally rank users, and then 2. measure the “distance” between two users in this ranking.

The naive way of globally ranking users is with arbitrary heuristics. (OKCupid actually does this in a weak sense, sharding its users between two buckets/leagues: “very attractive” and “everyone else.”)

But the optimal way of globally ranking users, specifically in the context of a matching problem, is (AFAICT) with IDF(PageRank): a user’s “global rank” can just be the percentage of compatibility-queries that highly rank the given user. This is, strictly speaking, a measure of the user’s “optionality” in the dating pool: the number of potential suitors looking at them, that they can therefore choose between.

If you put the user on a global ladder by this “optionality” ranking; and normalize the returned compatibility-query result ranking by the resulting users’ rankings on this global “optionality” ladder; then you’re basically returning a result set (partially) optimized for stability-of-relationship: compatibility over delta-optionality.

———

All this leads back to a clean metaphor: highly-SEOed websites—or just large knots of Internet centralization—are like famous attractive people. “Everyone” wants to get with them; but that means that they’re much less likely to meet your individual needs, if you were to end up interacting with them. Ideally, you want a page that’s “just for you.” A page with low optionality, that can’t help but serve your particular needs.

bayonetz · on June 19, 2020

Hey, nice comment! Please do this please.

jandrese · on June 18, 2020

Of course the problem is that if your site becomes popular it will be overrun by SEOs pushing monetized vapid content. This is how people earn money, they aren't going to stop just because you want to return to a time before they ruined the internet.

asdff · on June 18, 2020

Results could be better filtered. SEO spam isn't very nuanced, it all reads the same and frequently plagiarized. I wonder if you can crudely use something like turnitin to filter for original content.

Maybe you could try and make a model SEO article for your own search engine, just taking your existing SEO results and figuring out which parameters are contributing the most to their ranking, then filtering out results that contain these parameters that worked well in your model. Rinse and repeat as SEO writers try and step up their arms race, but they should always end up being foiled by your changes to the search engine after optimizing your own perfect SEO model regularly.

majormajor · on June 18, 2020

If it was that easy, someone would be able to do it.

Instead every algorithmic content delivery platform, from Google to Twitter to Youtube to Facebook, is constantly chasing just to keep from being underwater against the spammers.

jandrese · on June 18, 2020

SEO is exactly as sophisticated as it needs to be to get to the top of your rankings.

chongli · on June 19, 2020

SEO sites have an Achilles' heel though: they need to make money somehow, and 99% it's going to be ads. So filter them out based on that. Create a space for a noncommercial web.

jefftk · on June 19, 2020

They can also make money by selling things. Much harder to detect algorithmically than ads, and I don't know if you want to filter it out?

dredmorbius · on June 19, 2020

Or by scams, hoaxes, fraud, blackmail, pump-and-dump, propaganda, ....

Less than ads and subscriiptions, but enough to fund a lot of crap.

russellendicott · on June 19, 2020

At some point this monetization game must end and we just have to pay for things we want. If there was a search engine that met my needs I would gladly pay $15/month for access.

api · on June 19, 2020

I agree, but it is unbelievably hard to compete with free.

jb3689 · on June 19, 2020

> What we really need is a new Google

Maybe, maybe not. How useful is Google (or search in general) to you?

For me search is more of a convenience tool than for finding sites that have information. There are questions I need answered but without search would have an easy time figuring out (e.g. "how many cups are in a pint?"). Sometimes I want opinions, but I almost always am going to the same sources. Sometimes I use search because I'm too lazy to click around using a site's own search. The only things that are actually useful for me from search are for specific expert knowledge that I want in a structured manner (e.g. "what do I need to consider when buying a house?"), and those queries are incredibly few

I feel like search is slowly becoming irrelevant

ryan-allen · on June 19, 2020

> Maybe, maybe not. How useful is Google (or search in general) to you?

I think I use google in the same way as you.

Most of the time, I could go to the websites directly (MDN, Stackoverflow, HN), but sometimes I'm trying to find something I don't know about, by trying different terms. I usually do this when I want a particular product but I don't know what it's called or what it is "smallest itx case without gpu", "midi router no power supply", "waterproof tarp diy tent setup".

I installed a browser extension called uBlacklist as recommended by someone here a couple of weeks ago, so now 90% of the time Google search is like my Ctrl-P for MDN, Stackoverflow, etc since I've managed to filter out the sites I don't want to see results from.

WalterBright · on June 19, 2020

1. The internet is far too large to use human curation, an algorithm is needed.

2. An algorithm can always be gamed.

You're stuck.

rkagerer · on June 19, 2020

Can you substantiate #1?

The content on the sites I visit is created by humans. Until automation genuinely overtakes us, I'm not ready to accept at face value the scale of the internet has grown so large that humans couldn't tackle the problem.

WalterBright · on June 19, 2020

Google says there are 2 billion web sites. Each one may consist of a huge number of web pages (see wikipedia.org).

All I can say is, good luck with your human curation startup.

rkagerer · on June 19, 2020

Sure, it also estimates more than twice that many people in the world with access to the internet. And Wikipedia is already well-curated by humans.

If there were orders magnitude more pages than humans, I'd agree. But I'd also ask: Who created them all?

rkagerer · on June 19, 2020

Granted it's silly, but I'm fascinated by this thought experiment.

It's not easy to quantify the amount of useful content on the internet. The 2bn figure above seems to stem from registered domains, and depending who you ask [1][2] around three quarters of them are "inactive" (e.g. landing page for a parked domain).

At the other end of the spectrum, Google's index surpassed 130 trillion pages four years ago [3]. Point in favour of my opponent!

If everyone connected to the internet indexed one page a day over the course of their lifetime we might just about do it. And anyone creating a new page would need to [arrange to] index it themselves.

[1] https://www.internetlivestats.com/total-number-of-websites/

[2] https://hostingtribunal.com/blog/how-many-websites/#:~:text=....

[3] https://searchengineland.com/googles-search-indexes-hits-130...

WalterBright · on June 19, 2020

Wikipedia is one website. There are 2 billion. I don't believe wikipedia's methods scale to the internet.

Also, any setup that allows everybody to be a moderator will be promptly gamed.

What you want is what yahoo used to do - a hand-curated search engine. It worked when the internet was small, but got buried under the eventual avalanche of web sites.

summerlight · on June 19, 2020

We can say the exact same thing on all the technical debts that we've been accumulated in our code base. But I've never seen those technical debts properly payed off :D

ricardo81 · on June 19, 2020

#1 can somewhat be substantiated by some technical info, without knowing the true number of pages.

- Using DNS "zone files", the DNS database for TLDs (which are not available for all, but most) show there's circa 200 million domains registered at any moment

- A large percentage of these are parked, i.e. no unique content.

- Many domains are "tasted", i.e. bought, are alive for a few days then disappear, so potentially you waste time crawling them

- Lots of sites are database driven and can result in millions of pages that can be created in a day

- URL rewriting means you can have an almost infinite number of pages on any one site

- Soft 404s and duplicate content can be hard to spot and can waste resources in gathering/removing them

There's paid for resources like Majestic/Ahrefs/Moz that crawl the web to see who's linking to who and they all contain trillions of URLs.

I think the most detrimental fact is that pages often disappear or change, I don't have a recent number but I'm fairly certain there's a 10-15% chance that any link you see this year, will be gone next year. "Link rot". Hard to build a DMOZ style directory on that scale with that problem.

I don't think it is unmanageable, it just needs to be seen from different perspectives and managed by different groups of people.

clarry · on June 19, 2020

On the other hand, most of that is stuff that human curation can rightfully reject as not worth linking and indexing. We don't need to index the entire web, we need to index the good stuff (that is likely to stay there anyway and be worth archiving).

WalterBright · on June 19, 2020

How does one know what the good stuff is without looking at it?

indigochill · on June 19, 2020

#1 doesn't matter because there's no rule that the entire internet needs to be visible. People only care about the sites they find value in. Search engines are one way to quickly locate those sites.

throwingtheaway · on June 19, 2020

That any algorithm can be gamed does not imply that the least manipulable algorithm can be made arbitrarily bad.

You only need to choose an algorithm so that the amount of trash is less than you can handle.

WalterBright · on June 19, 2020

If it's so simple, why hasn't anyone managed to do it?

sacado2 · on June 19, 2020

I'm surprised by this content, because Google actually changed the web for the better in this regard. Anybody remembers the ezinearticles era? Where people kept publishing the same short articles filled with just a few keywords and nothing more, over and over again, just to get backlinks? Back then you couldn't make a search on any topic without falling on page after page of those light articles filled with keywords only.

Those article directories were eventually murdered by Google within the blink of an eye maybe a decade ago, and quite frankly on any given topic nowadays it's way easier to find good content rather than SEO filler. Google's algorithms will nowaydays favor fresh (aka published or updated recently), long content over short, "popular" (aka lots of links on it), duplicated content.

MaxBarraclough · on June 19, 2020

It's a pity the semantic web never took off. It might have greatly reduced the need for sophisticated centralised search-engines.

fit2rule · on June 19, 2020

I think the thing that happened, is the OS vendors spent too much time addicted to the web themselves, and forgot that one of the duties of the core OS of any device is to serve data from it, safely and securely, to any other device.

Instead, most stacks are designed to enforce a walled garden, from which very little is shareable unless you go through the gateway (the web client app) to some approved destination.

Semantic web and total availability at the personal-computer level, are aspects of OS design I wish had been paid more attention.

Basically what we have now is a very, very expensive system of dumb terminals.

greggman3 · on June 19, 2020

The semantic web will never work because there is no way to enforce it. If the topic "horses" become popular people will just start tagging their spam pages with "horse" to try to get people to look at their pages.

As an example go look at the tags on soundcloud. People tag their songs with whatever they think will get people to take a listen.

zozbot234 · on June 19, 2020

> The semantic web will never work because there is no way to enforce it. If the topic "horses" become popular people will just start tagging their spam pages with "horse" to try to get people to look at their pages.

There are plenty of trusted sources that would not spam their pages in this way. And if you spam too much, you risk getting dropped from search indexes, directories, links etc. because your source is just not useful.

MaxBarraclough · on June 19, 2020

Dishonesty is a problem the web always faces, whether it's lying to a person or to a search-engine.

It's not as if the semweb people weren't aware of the problem, and it's not self-evidently a fatal blow to the idea.

slfnflctd · on June 18, 2020

It's a fuzzy concept, but I think you're pointing toward something there's a need for. PageRank is/was useful in a lot of ways, it's just not enough by itself. Its weaknesses have been ever more apparent, and it has become less effective over time.

There are so many possible viable methods for ranking search results! Particularly now with higher level textual analysis using AI/ML/[buzzword], and perhaps more importantly, the resurgence of interest in curated content. People are getting better at discerning curated-for-revenue vs. curated-with-love.

the_other · on June 18, 2020

> It's a fuzzy concept, but I think you're pointing toward something there's a need for. PageRank is/was useful in a lot of ways, it's just not enough by itself. Its weaknesses have been ever more apparent, and it has become less effective over time.

Would you speak to why you think this way about PageRank? What are its shortcomings?

To me, who only paid surface-level attention to this, it seemed like Google results were best when PageRank was the dominant metric. As they moved more and more in the direction of prioritising news, commerce and the aspects we call SEO, “number of links pointing TO the resource” became less and less important in the ranking. And as that happened, the quality of results dropped, and the content silo-ing rose up.

PageRank was peer-review. SEO is “who shouts the loudest”.

Jtsummers · on June 19, 2020

SEO with Google was about gaming PageRank. A lot of work happened in the 00s trying to prevent that, but it requires an army of moderators and others to identify good/bad links and link sources. Is HN a good source? Sure. Are HN comments a good source? Not once spammers realized they could drop links in comments to boost their site's results. How do you tell the difference? Plain PageRank is weak against this, and that's what motivated a lot of changes to the ranking system.

tonyarkles · on June 19, 2020

Nofollow helps a fair bit but requires cooperation from all high ranking sites to ensure that they don’t leak their PageRank juice.

hedora · on June 19, 2020

They had to abandon page rank because it was widely gamed.

As far as I can tell, the main reason Google succeeded was that other search engines let advertisers buy placement for keywords (and didn’t label paid links). I heard from an industry insider that was able to strip the paid links that the engine they worked on gave results that were very similar to Google’s.

The second big reason was that pagerank was a useful signal that hadn’t already been gamed to the point of uselessness. I think this let a tiny team blindside an entrenched industry.

That’s not to say there’s no technical insight behind the page rank algorithm, but it was only a useful signal for a few years.

jabl · on June 19, 2020

Before Google/pagerank took over, sites were successfully gaming the rankings of search engines like Altavista by adding every keyword they could think of in the html header, or hidden in the page with an invisible font etc.

It got to the point Altavista became more or less useless, and when Google showed up on the market they quickly took it over.

Seems the time is ripe for a new revolution. Doesn't have to be a better search engine, could be something completely different.

ricardo81 · on June 19, 2020

I don't think there needs to be a new Google or a requirement to be distributed.

Simply having half a dozen or more search engines per country/language, with their own indexes and algorithms should help see the web more fully.

ATM in English, Bing and Google have the largest indexes and Mojeek has its own index though smaller. DDG, Ecosia and others are just the Bing index re-ordered.

I enjoyed the OP, though. And I think niche directories/blogrolls would be progress. The current centralised web is a result of everyone dancing to the tune/rules of the large platforms.

CJefferson · on June 18, 2020

The problem with any such system is there is money from capturing eyeballs, so once any system gets popular a lot of people will dedicate time to spamming it. I don't know how avoid that.

akavel · on June 19, 2020

Hm; a sibling comment about p2p made me start thinking about some kind of "authority" system, with personal tracking. I know it'd be a lot of work for everyone, bit it still sounds maybe doable to me. Like if everything posted to the "directories" was signed by its author, and you'd be able to mark entries (links) as valuable or not for yourself, which would be tracked in your local authors DB, by tweaking particular author's total score. Then you could share your list with friends, who could merge it into theirs, but still marked as coming from you. I know, kinda complex (and reminescent of gpg's "web of trust"), but maybe still could be an alternative to centralized services like google? Also the emerging p2p networks like dat/hypercore or ipfs or ssb already auto-sign people's published data, so maybe this could be tapped into...

bduerst · on June 19, 2020

Even those types of authority systems can be gamed as well, unless you're also building your own recaptcha pattern analyzer too.

The problem with decentralization is that it creates a power vaccuum that is filled by the most interested actors. Even bitcoin, with it's decentralization-by-design is actually centralized to a handful of miners in China.'

If the goal is to rebuild something that is anything else other than profit based, you need to make sure the organization running it is strictly non-profit.

Cilvic · on June 19, 2020

Maybe there is a way to align spamming and curation? I mean what if all those people working on SEO would actually work on curation?

mrfusion · on June 19, 2020

If you have all the other stuff you might not really need peer to peer. Or maybe work in peer to peer after getting established?

Also I like the idea of making search like Wikipedia where people can edit results. Obviously you’d need super genius level safeguards to protect against scammers but Wikipedia does it ok-ish.

raspyberr · on June 19, 2020

wiby.me can be useful for finding sites.

pythonwizz · on June 19, 2020

So, basically Yacy?

https://yacy.net/

renewiltord · on June 18, 2020

The alternative to Google for discovery is Reddit. Reddit is the modern web directory:

- Directories are Communities defined by their link rules

- Human-curated

- Easy to start a new one

- SSO across all communities

- Built-in forum technology

- Unified comment technology for every website

You can get communities like reddit.com/r/Sizz for instance or larger ones like /r/esp8266 or massive ones like /r/sanfrancisco or planet-sized ones like /r/pics. And reddit itself plays the role of a meta-directory, with little directory networks (SFW-Porn being "the pretty pictures of the world" directory network) sitting between Reddit itself and the subreddits.

Reddit is an amazing amazing thing.

kickscondor · on June 18, 2020

Except:

- Most subreddits are hostile to self-promotion of Web stuff. If you're unknown, you're going to have to be socially involved there enough that people know your name. (Though I agree that small subs like /r/esp8266 are begging for self-made content.)

- Related: people don't know your name because it's in small gray text. You're just another comment.

- You need upvotes. A personal directory requires only one upvote.

- You need to be on-topic. Your work may not fit Reddit's categories.

- Reddit mods are generally more like forum mods than librarians.

- Agreevotes do not equal quality. I don't want to overstate this, but I like that I'm not seeing vote counts on personal directories.

Reddit is cool - but it has its own rules and its own culture that goes with it. I personally wouldn't call it 'the modern web directory'. I do think it's less hostile to the Web than many other platforms - and certain subs like /r/InternetIsBeautiful and /r/SpartanWeb do good work.

loco5niner · on June 19, 2020

> Most subreddits are hostile to self-promotion of Web stuff. If you're unknown, you're going to have to be socially involved...

Great! I'm there for the content, and self-promotion is usually the worst content.

I like that HackerNews tags this stuff with ShowHN so I can decide whether I want to look at it or not.

mrfusion · on June 19, 2020

There’s no problem submitting blog articles though. Even your own.

akdas · on June 19, 2020

I've had my blog post removed from r/algorithms for self-promotion.

Mediterraneo10 · on June 19, 2020

Reddit is a feed, and therefore a link that was posted a month ago is no longer going to be seen by anyone. Basically, whatever appears to users between now and the moment when things drop off the page is a crap shoot, and Reddit will be inferior to a real web directory that can amass a larger collection of interesting links and, importantly, preserve them all together at one time.

nkozyra · on June 18, 2020

I don't find reddit particularly navigable nor user friendly. It's a social media platform, not a discovery or search one.

Even the communities I care about have limited utility.

mech422 · on June 19, 2020

I thought it was just me... Every time I end up on a reddit (from a search result), its a click fest trying to see anything except 'top comments' and ultimately ends up very unsatisfying.

claudiawerner · on June 21, 2020

Reddit is horrible, and I say this as someone with combined ~27k comment karma on two accounts, and years of use. The mods are too happy to delete posts they don't personally approve of (whether they break sub or Reddit rules or not), and I recently got suspended on what appears to me to be a false charge according to the rule itself (a debate on whether some fanfiction should be illegal or not (it shouldn't be) got me suspended - a topic, it is worth noting, is covered by several serious academics). Bitter rant incoming.

The annoying part is that not only does a sub mod get the privilege of deleting posts at will, but there is no appeals process for that, and Reddit doesn't listen to suspension appeals at all.

Aside from that, the downvote/upvote culture is bad (even if I am usually in its favour) and encourages dogpiling and groupthink. Ironically, Reddit with its "don't downvote for disagreement" element of Reddiquette is worse at this than HN with its "downvoting for disagreement is fine" policy.

The site redesign, infinite scroll, a handful of mods controlling many major subs, nonsensical or inconsistent administration and rules, unjust dishing out of punishment, advertisements in the main feed, widespread outrage bait, and endless drama means Reddit is no different to platforms like Twitter where short, witty and possibly fallacious content thrives.

I can count on a hand the number of times I've actually valued information I've obtained from Reddit comments or submissions. That does not justify the amount of time and energy I've poured into the website, which I could have better spent simply not using social media (which Reddit now is). These days, if I had to use social media, I'd pick the Fediverse over Reddit every day of the week. It takes a lot of time to realize that highly upvoted comments (and every comment) is really just "someone's opinion, man" which the current zeitgeist dictates people will agree with.

On Reddit, politics is entertainment (r/PublicFreakout etc.), mocking and hate is central (r/SubredditDrama, r/unpopularopinion), administration is done in the interest of advertisers and personal opinions (phrases such as "weird hill to die on"), and moderation attracts people who would rather bask in the power afforded to them more than people who would rather carefully curate and foster discussion. The one sub which works to its purpose is r/ChangeMyView.

Spending time arguing with random people on the Internet is mentally taxing, very unlikely to achieve a change in opinion of the persons involved or the observers, and terrible for stimulating and interesting discussion. Next time I want to argue a point, I'll get a blog with a comment section, or these days, without one. If my friend told me they were going to register a Reddit account, I'd tell them everything I've just said in this post.

account42 · on June 23, 2020

> The mods are too happy to delete posts

While that does happen, my problem is more often that mods of smaller subreddits are inactive or unwilling to moderate them so they end up being filled with low effort memes instead of actually interesting content.

gspr · on June 19, 2020

2020 has given us a lot of crazy stuff, so I guess I won't be too surprised if it also gives us the return of webrings :-)

I'm calling it now, the hottest startups will be "disrupting search using artisinally crafted rings of websites"!

tmh88j · on June 19, 2020

>I guess I won't be too surprised if it also gives us the return of webrings

You're in luck! https://news.ycombinator.com/item?id=23549471

skoodge · on June 19, 2020

Honestly, I don't think that's as crazy as it sounds. AI/ML-driven feeds often promise more than they actually deliver and by now more and more people are starting to realize that AI/ML is not the holy grail it was made out to be, at least in situation where you cannot throw massive resources at the problem.

Maybe not exactly in the form of webrings, but who knows, why wouldn't it be time for the pendulum to swing from the whole AI hype back in the other direction? There is a lot to be said for conscious curation on your terms and your devices vs algorithmic decisions made in the cloud for you.

ozfive · on June 19, 2020

Damn, this is all classy. I started writing html in 1996. My first job was at a boutique web shop building vertical portals in Coldfusion. Thanks for the great links. It's inspired me to put up a plain old html/JavaScript homepage again!

kickscondor · on June 19, 2020

Hey this is great. Post the link here if you like.

stockstreaks · on June 19, 2020

Developers should come together and build an open-source curated platform where you can find 'only good - interesting, well-written'articles on all topics can be found.

Actually on the Hacker News guidelines it kind of describes this. And although it seems like the articles posted here are higher quality, they eventually get lost after 2-3 days.

SilasX · on June 18, 2020

Wow, those sites are blazingly fast and usable compared to most that I see these days. (With the possible exception of the third not making it visually obvious that the list items are links and having to figure it out from the context.)

killjoywashere · on June 19, 2020

Well, sheeeeit. If we're going for fever dreams, don't forget the electric sheep! https://electricsheep.org/

yosser · on June 19, 2020

It's still all indexed by google. For example; all you need to do is search for a unique seeming piece of text from your dreamwiki gem, and the site will appear on the first page of googles' results:

https://www.google.com/search?q=WHO+ROAMS+THE+KIRUGU+NIGHT&o...

Blaiz0r · on June 19, 2020

Yeah strangely the web ring has become back into value.

tomc1985 · on June 18, 2020

If DMOZ could have held on for a few more years...

kickscondor · on June 18, 2020

I'm not certain DMOZ is the way to go. The big centralized directories are too hard to keep current. They get slammed with submission. And you end up with so many editors that no one has a sense of ownership.

I mean - maybe it's possible. Perhaps a really focused team could figure it out. (The 'awesome' directories have kind of figured that out, by having specialized directory.) But these personal directories are really sweet because they don't have to cover a topic. They can just be a collection of good stuff, who knows what.

zozbot234 · on June 18, 2020

Federation largely solves these problems. The biggest interop issue is sticking to a common/interoperable classification wrt. topic hierarchies, etc. and even then that's quite doable.

kevin_thibedeau · on June 18, 2020

A solution would need crowdsourced collective vetting of sites along with a reputation system to keep out spam and bad actors without devolving into Wikipedia style personal fiefdoms.

tdeck · on June 19, 2020

I remember checking out DMOZ 15 years ago and it was already irrelevant.

duskwuff · on June 18, 2020

DMOZ started stagnating around 2005. By the time it closed, it was already considered irrelevant, and much of the index had been unmaintained for years -- there was really nothing left to hold on to.

mtgp1000 · on June 18, 2020

I think what you're saying about reduced barriers to entry has lowered the standard of all popular media.

It used to be expensive to publish anything - especially the further back in time you go. So classics for example typically represent particularly bright writers, as having something published before the printing press, and widely disseminated, was simply unlikely to happen.

But today anyone can create an account on YouTube or stream on twitch and it doesn't matter if the content is of any particular quality or veracity, so long as the common man sees what he wants to see.

I think there's a major secondary effect, in that now that we are surrounded by low quality media, the average person's ability to recognize merit in general is lessened.

chrisjarvis · on June 18, 2020

The secondary effect you mention is absolutely the case. There is unlimited media and unlimited platforms on which to consume it. "Content" is truly a commodity now. I would like to try to make watching movies/tv a special thing again for myself, as opposed to little more than background noise. I think this will require careful curation and research, rather than just trusting an algorithm.

dragandj · on June 18, 2020

Yes, there is unlimited media and content, but the thing is that most of this content is either total crap, or polished content that was too much optimized for the median viewer. There is great, non-polished but authentic content for every niche, but it is very, very difficult to find it. Such content is not a commodity, but unfortunately, it seems that the average content is good enough for the average viewer...

kickscondor · on June 18, 2020

I don't quite follow - if low quality media is everywhere, doesn't high quality media stand out?

Perhaps you're saying that so much low quality media drowns out the high quality media - such that it can't be found. The ratio is off, right?

reaperducer · on June 18, 2020

if low quality media is everywhere, doesn't high quality media stand out?

Because there is so much low-quality content, it's become nearly impossible to find the high-quality content. Needles in haystacks.

kickscondor · on June 19, 2020

Well, yes - that's my second sentence.

worble · on June 18, 2020

>if low quality media is everywhere, doesn't high quality media stand out?

You would think so, but more often than not, most people don't want high quality. What happens is that the media that panders to the lowest common denominator stands out the most, since that what the majority focus on.

curiouser2 · on June 19, 2020

The psychology of the "hot take" on twitter is a prime example of this. Is it less friction to read a 3 page blog post that critically analyzes a subject and takes into account different viewpoints that all have merit, or to read someone's 280 character reaction?

I am guilty myself, I often find myself jumping to the comments section even here on HN to understand what people are taking away from an article without even finishing it.

sacado2 · on June 19, 2020

And that's where we need more journalitic (does that word exist?) writing. Even by non-journalists, I mean.

Make a long form content, start with the most important information in the first paragraph, and give more and more developments in the following paragraphs. Someone who thrives for short content will be happy with the first paragraph. Someone who want to delve into the details will ready each and every word. Heck, your first paragraph could even be a tweet containing a link to the long form.

This is clickbait taken backwards. You will get very few clicks as you already delivered the main information for free, but those who clicked will be there for a good reason.

dragandj · on June 18, 2020

It stands out, but it's so far away from you when you search for it, that it's below the horizon...

pjc50 · on June 19, 2020

You can't determine the quality of media without consuming it. So the whole Akerloff "market for lemons" process applies: low quality cheap content dominates.

reaperducer · on June 18, 2020

I think what you're saying about reduced barriers to entry has lowered the standard of all popular media.

You're more right than you know.

When there were only a handful of television channels, the content was higher quality than what we have now.

When there were only a couple of dozen cable channels, the content was higher quality than the endless reruns we have now.

When publishing a book went through the big publishing houses, the quality of what was available was higher than it is now where anyone can self-publish and pretend to be an expert.

See also: radio.

Content can only be created so fast. There are only so many talented content creators out there. While the number of media channels has exploded, the number of good content creators has not kept pace. Keep adding paint thinner, and eventually you can see through the paint.

The internet was supposed to give everyone an equal voice. All it ended up doing is elevating the drek and nutjobs to have equal footing with people who know what they're doing and what they're talking about. The quality is drowned out by the tidal wave of low-grade content.

koreth1 · on June 19, 2020

> When there were only a handful of television channels, the content was higher quality than what we have now.

As someone who watched TV in the 1970s, before cable was a thing, I have to disagree here. I think we look back today and see "MASH" and "Columbo" still holding up great after 40+ years and think of it as representative. But nearly all TV was formulaic dreck back then, just like it is now. And even if the average quality level is worse now (which I'm not convinced is the case, but let's assume) the quantity is much higher and there's a large amount of really good stuff to choose from on the right side of the bell curve.

It's true that content can only be created so fast, but it can also only be consumed so fast. Once you have access to enough high-quality content to fill all the spare time you want to spend watching TV or reading or gaming or whatever, having more of it to choose from doesn't improve your experience much.

krapp · on June 18, 2020

>My theory is that the high barrier to entry of online publishing kept all but the most determined people from creating content. As a result, the little content that was out there was usually good.

The barrier wasn't that high. Making a site on Geocities, Tripod or Angelfire wasn't that difficult. Writing 90's style HTML wasn't exactly writing a kernel in C, and most of those services had WYSIWYG editors and templates anyway. Few of the people publishing to the web in the 90s were programmers, so the technical knowledge required was minimal.

And plenty of people are publishing high quality content on the modern web, even on blogs and centralized platforms. I follow writers, scientists and game developers on Twitter, watch a lot of good content on Youtube, read a lot of interesting conversations on Reddit. The fact that people publishing content nowadays don't have to write an entire website from scratch has little to do with their personal passions (or lack thereof), whether they're interesting or (and ye gods how I've come to hate this) "quirky." That's like saying writers can't write anything worth reading unless they also understand mechanical typesetting.

As far as the old content goes, of course most of it was crap. Sturgeon's Law applies to every creative medium. Most blogs were uninteresting, many personal sites were just boring pages full of links or stuff no one but the author and maybe their few friends cared about. In both cases, between the old and new web, a bias for the past (as HN tends to have) leads people to only remember the best of the former and correlate it with the worst of the latter.

pwdisswordfish2 · on June 19, 2020

We are asked to believe that advertising is required for there to be any content on the Web.

However, comments like this seem to be proof that is not true.

I have personal memories of what the Web was like in 1993 but there are so many people today who are feeding off the advertiser bosom what are the chances anyone will listen. No one wants to hear about what the network was like before it was sold it off as ad space. Young people are told "there is no other way. We must have a funding model". Even his article is rambling about "the problem of monetization". No ads and "poof", the Web will disappear. Yeah, right. More like the BS jobs will go away. This network started off as non-commercial.

There was plenty of high quality content on the 90's Web. Even more on the 90's Internet. That is because there were plenty of high quality people using it, well before there was a single ad. Academics, military, etc. It all faded into obscurity so fast. Overshadowed by crap. The Web has become the intellectual equivalent of a billboard or newspaper. The gateway to today's Web is through an online advertising services company. They will do whatever they have to do in order to protect their gatekeeper position.

duxup · on June 18, 2020

I remember visiting big corporate websites and there was always a little corner for the 'webmaster' often with a photo of the server the site was running on... or a cat... or something like that.

Geocities was a beautiful mess as ... it was just folks trying to figure out HTML and post silly stuff, but it was genuine.

netcan · on June 19, 2020

My favourite spot was a little later, early & mid 2000s. The barriers had been lowered, but publishing still required some motivation. There was a ton of content to discover, a lot of discovery channels and "information wants to be free" still felt like a prevailing wind.

I think part of the reason was, as you say, lower standards. We were being exposed to content that didn't have an outlet before that. The music was new, black, polished chrome... to borrow a Jim Morrison line.

A bigger part is discovery though. Blogrolls & link pages were a thing. One good blog usually lead you to 3 or 4 others.

These days, most content is pushed, often by recommendation engines. Social media content is dominated by quick reaction posts, encouraged by "optimization."

The medium is the message. In 98, the medium was html pages FTPed to some shoddy shared host to be read by geeks with PCs. In 2003, it was blog posts. In 2020 it's facebook & twitter.

nicbou · on June 19, 2020

I think we have far more quality content than ever before. YouTube is a goldmine of high quality content, and so are the other publishing platforms.

The signal to noise ratio might have gotten worse, and discovery might be flawed, but the absolute quantity of quality content has never been higher.

swader999 · on June 19, 2020

Like this site, I learned how to build a strong bike rim from Sheldon. https://sheldonbrown.com/

hedora · on June 19, 2020

I used Sheldon Brown’s site when I rebuilt my first decent roadbike (well, it was $100, including the parts I put into it; “decent” is relative).

julianeon · on June 18, 2020

I had just started college and I remember going to the computer lab and clicking around for hours at a time, at night. Just going from blog to blog, reading interesting stuff. You didn't have to have a particular goal in mind - one blog would lead to another interesting blog would lead to another one, endlessly. They would all be engagingly written, to a high standard of quality.

Like you, I know things have changed, but I still can't imagine I could do that today, going from blog to blog, without running low on material within ~60 minutes.

EDIT: I see the webring links here now, I may try them.

gen220 · on June 19, 2020

I came of age in the early 2000s, and Wikipedia was this for me. Eventually, it ran out of meaningful depth to me and in college I hopped to surfing books. You can surf on books til the end of time.

I think hypertext as a medium has a lot going for it that books don’t, but I don’t think we’ve figured out distribution, quality control, and discovery sufficiently to make the internet so stimulatingly surfable.

There was a period of time a few centuries ago when adults looked at kids reading romance novels the way that adults today look at kids scrolling through TikTok. I think all mediums go through cycles.

On the other side (when the wild internet’s commercial viability wanes, and people can no longer make easy money hosting mediocre terrible, SEO-driven drivel), I think a lot of the good content will survive the great filter, and that’s when we’ll be able to appreciate it for what it is/was. The next 20 years might be rough, but the work of this generations’ Dickenses and Pushkins and Gogols will survive.

moksly · on June 19, 2020

People are still creating massive amounts of cool content, it’s just really hard to find. I play blood bowl, I care a lot about the statistics, and I’ve searched a lot on the topic over the past few years.

The best result I could find concerning some official data from FUMBBL (a place you can play blood bowl) was a blog entry from 2013.[1] My circle of friends and the different leagues I play in have been using that as reference for years. We’ve searched and searched to find the data source to no avail.

The other day I’m randomly site: searching for some thing completely unrelated and find a source for live FUMBBL data[2]. You’d think that was the first search engine result related to blood bowl statistics on any search engine, as it’s really the best damn source I’ve ever seen, but it’s not.

I know you were probably referring to something a little more interest based. Well I once sat next to a retired biology professor at a wedding, and it turned out he ran an interest site, detailing all the plants specific to the danish island Bornholm. I don’t care much about plants, but it was exactly a 90ies styles page. Unfortunately I didn’t save the link (I don’t care much about plants), because I haven’t been able to find it since, despite searching for his name.

So I think it’s still there, it’s just not easy to find it.

[1] http://ziggyny.blogspot.com/2013/04/fumbbl-high-tv-blackbox-...

[2] http://fumbbldata.azurewebsites.net/stats.html

vb6sp6 · on June 18, 2020

> My theory is that the high barrier to entry

I have the same feelings about social media. It used to be that you only had to listen to your stupid Uncle at Thanksgiving. Now he constantly spews his garbage on Facebook

fit2rule · on June 19, 2020

>monetized blogs

I believe that we have the 'web' today because big decisions were made about how little control the end-user (i.e., consumer) should have over the content made by producers, and that the #1 priority for all technology involved in the web has been to separate producer from consumer as stringently as possible.

If we had the ability to safely and easily share a file that we create on our own local computer, using our own local computer, to any other computer in the world - we would have a nice balancing act of user-create content and world-wide consumption.

Instead, we have walled gardens, and the very first part of the wall is the operating system running on the users computer - it is being twisted and contorted in such ways as to make it absolutely impossible for the average user (i.e. the computer owner/user) to easily share information.

Instead, we have web clients and servers, and endless, endless 'services' that are all solving the same thing for their customers: organising documents in a way people can read them. And all the other things.

And its all so commercial, because there is a huge gate in the way, and it is the OS Vendors. They are intentionally stratifying the market by making the barrier to entry - i.e. ones own computing device - untenable to serve the purpose.

Imagine a universe where OS vendors didn't just give up to the web hackers, in the early days, and instead of making advertising platforms, pushed their OS to allow anyone, anywhere, to serve their documents to other people, easily, directly from their own system. I.e. we didn't have a client-/server age, but rather leaped immediately to peer-to-peer, because in this alternative universe, there were managers at places like Microsoft that could keep the old guard and the new young punks from battling with each other .. which is how we get this mess, incidentally.

There really isn't any reason why we all have to meet at a single website and give our content away. We could each be sharing our own data directly from our own devices, if the OS were being designed in a way to allow it. We have the ability to make this happen - it has been intentionally thwarted in order to create crops.

Give me a way to turn my own computer, whether it is a laptop or a server or my phone, into a safe and easy to use communications platform, and we'll get that content, created with love, back again.

Its sort of happening, with things like IPFS, but you do have to go looking for the good stuff .. just like the good ol' days ..

Nasrudith · on June 19, 2020

The operating system isn't needed for that at all and blaming the vendors is fundamentally misplaced. That directly sending to other computers via direct IP connections is technically possible if you don't operate any firewalls or similiar to block it. And of course non-fixed and shared IP addresses would complicate the system.

You would find out quickly why you don't actually want that when your own "unisystem" designated port server/client gets hammered with unsolicited requests and exploit attempts. Easy is a matter of interfaces although other design choices would come with costs. Safe however would be far harder. If you set it to a whitelist well you lose discoverability instantly.

fit2rule · on June 20, 2020

I don't agree with your point at all. I believe the vendors are fundamentally to blame for the situation with the web today .. the walled garden was intentionally created to protect the bigger consumers.

And none of the issues you state as being the reason why we can't have nice things, are actually valid reasons. OS Vendors could solve the problem of serving content from ones own local PC's quite effectively - the issue is not the technology, but rather the ethics of the industry, which prefers to have massive fields of consumers to farm from..

inimino · on June 18, 2020

My only correction is that there was a lot of content out there! We didn't call it that, of course, because we're people and not corporations, so we just called it articles, blogs, rants and musings. A lot of it is still out there and a lot more is on the wayback machine!

MaxBarraclough · on June 18, 2020

> content for content's sake

I think there's some truth to this. Some junior developers make it a goal to be seen as a respected blogger, so they feel the need to write something, even if they have nothing to say.

izietto · on June 18, 2020

My theory is that the more companies you throw in the less humanity you can find

foolmeonce · on June 19, 2020

I think the problem is that hiring practices across the entire "intellectual worker" market is a market for lemons that specifically ruins anything of quality.

Some blogs were great (i.e. created to solve the problem of too much interest in what one is up to even to answer internal questions individually) and signaled a few great minds that can be hired at a discount.. Those engineers told also pretty good co-workers in stage 1, by stage ~3, managers tell their underperforming direct reports to blog whatever they understand about what their group is doing in the hopes that they (either improve or better yet:) become a burden somewhere else.

rawoke083600 · on June 19, 2020

I think you are right about the "high barrier" being a filter for good work yes !

Also I miss the wide and wonderful design and color scheme of the 90's :) Long before bootstrap or "material design" !

dsparkman · on June 19, 2020

I so agree. I miss good old fashion text based content. Simple, readable, and fast. There was a reason that the default links blue and underlined. Visited links were purple. You could scan an site and quickly determine where you had been before.

tech-historian · on June 19, 2020

If you want the 90s web.. you can find it here:

https://www.versionmuseum.com/websites

TheMightyLlama · on June 19, 2020

The whole idea of SEO in order to get clicks for your advertising based revenue model feels “bad” to me. Content is created which is controversial because that will get the most eyeballs on page. The side effect is that we veer towards a broken society the moment we go down that route.

I had an idea of a search engine that allowed you to permanently remove domains or pages with certain keywords as a paid service.

tikiman163 · on June 19, 2020

I think you have good points, but I would also add to it that the high barrier to entry also prevented people from being copy cats. A simple messenger was a major accomplishment, so when something worked there weren't 10,000 copies of it by the end of the week, so nearly all content you found was actually the original and not just some repost or copy paste job.

eloisant · on June 19, 2020

I feel like you can find better quality today. Sure you have to dig a little more.

But at the time it was so magical compared to pre-web where the only content you could find was professionally published magazines or books, and suddenly you had all this niche content about stuff that wasn't worth publishing, all in a few clicks.

pentae · on June 19, 2020

This. Often you'll find this low quality SEO content is just a result of people hiring low-cost copywriters often from developing countries, the 'uncanny valley' of written content

blackrock · on June 19, 2020

I miss the lack of ads.

But they had all those annoying pop ups and pop unders.

Nowadays, Google just tracks you everywhere you go. Even when you’re browsing incognito.

pull_my_finger · on June 19, 2020

Well you can do your part by not using google analytics/captcha/amp or other google spy services on your own sites.

beamatronic · on June 18, 2020

There was less copyright concern back then too. Remember “Make James Earl Jones speak”? Or the Hamster Dance?

MichaelMoser123 · on June 18, 2020

i remember that most of the early internet was under construction. The noise to signal ratio wasn't very high - but that might have been due to some universal constant of physics.

JKCalhoun · on June 19, 2020

My theory is that there was no "money in it" back then.

;-)

JohnBooty · on June 18, 2020

I can't wait for server-side rendering to take its place in the sun again.

There are many use cases for which a client-side framework like React is eesential.

But I feel the vast majority of use cases on the web would be better off with server-side rendering.

And...

There are issues of ethics here.

You are kidding yourself to an extent when you say that you are building a "client-side web app." It is essentially an application targeted at Google's application platform, Chromium. Sure, React (or whatever) runs on FF and Safari too. For now. Maybe not always. They are already second-class citizens on the web. They will probably be second-class citizens of your client-side app unless your team has the resources to devote equal time and resources to non-Chromium browsers. Unless you work in a large shop, you probably don't.

Server-side rendering is not always the right choice, but I also do see it as a hedge against Google's, well, hegemony.

Polylactic_acid · on June 18, 2020

The less stuff on the server the better imo. Whenever I can get away with it I use a static site generator or I use vuejs with a json file containing all the data for the site. Being able to just drop a static set of files in to a webserver without any risk of security issues in my code is great. I also hate the tools for backend rendering since if you need any kind of interactivity it becomes so much easier if you had just built it all in vue/react with no downsides other than not running in someones cli web browser.

djha-skin · on June 19, 2020

At my old company they were moving from client-side to server side because they had 30 different clients -- Roku, desktop, smartphone, Xbox, etc. -- and all of them had to reproduce the same logic. At the time I left they were trying to centralize all of that logic on the server and then just put the lipstick on the pig at the end.

lawn · on June 19, 2020

Phoenix LiveView gives you excellent interactivity with backend rendering.

JohnBooty · on June 19, 2020

I desperately want to check that out when I have some free time.

(In other news, I would desperately like some free time)

ksec · on June 19, 2020

There are quite a few others inspired by Liveview. On ASP.Net , PHP Laravel, Python Django and Stimulus Reflex on Ruby Rails.

y-c-o-m-b · on June 19, 2020

> I use vuejs with a json file containing all the data for the site

Seconded. It makes hosting a breeze; especially when you can just throw it on GitHub Pages within minutes. I also like being dependent on only a text editor and a browser ... a combo still performant on just about any device from the last 20 years if not longer. No need to install full-featured IDEs and the various dependencies.

timwis · on June 20, 2020

> Being able to just drop a static set of files in to a webserver without any risk of security issues in my code is great.

What security risks are removed by using a client side app instead of a server side one?

Polylactic_acid · on June 21, 2020

No database, no code running on your system other than nginx which I trust a whole lot more than myself.

ashleyn · on June 21, 2020

None, you're just shifting the risk to the user.

peruvian · on June 19, 2020

There’s an exponentially growing recruiting, training, and hiring industry based on developers who only know how to write fat JS frontends. I think everyone has probably experienced high growth in their company’s frontend team. Many developers choose to write them not out of expertise but to have them in their resume and for job security.

JohnBooty · on June 19, 2020

    developers who only know how to write fat JS frontends

Yeah. This is often a big challenge for small companies / small teams.

Nearly every web app is now two apps, and it's increasingly infeasible for any developer to have a mastery of both backend and frontend stacks.

Not necessarily a terrible problem when you have dozens of developers, but a lot of dev teams are 1-3 people. Instead of web dev circa 2010 where you might reasonably have a 3-person team of people who can each individually work anywhere in the stack, now your already-small team is bifurcated.

In many ways this is an inevitable price of progress... 100 years ago you had one doctor in town and they could understand some reasonable percentage of the day's medical knowledge. Today, we can work medical miracles, but it requires extreme specialization.

At least in the development industry, we can choose not to pay that price when it's not necessary.

arcturus17 · on June 19, 2020

I don’t understand why members of a dev team back in the day would’ve been more capable of being full-stack than today...

The front/back divide also existed back then, with barely any possibility of a front person ever touching the back-end (a possibility that exists nowadays, without going into its merits or demerits).

For a reasonably ambitious and industrious individual nowadays it’s not unreasonable to become really good at one client and one server technology. There’s more to it than writing server-side code with HTML templating for presentation, for sure, but it remains well within the grasp of many people.

dsparkman · on June 19, 2020

It is because we understood the full-stack.

There was not "frontend" and "backend" developers. There were designers and developers. Designers created designs. They were usually delivered as PDFs, because the bulk of them came from print design backgrounds.

Their designs were then implemented by developers. Senior level developers tended to more of the application level heavy lifting (server-side scripting & db), with junior level developers working on converting designs to html, then to templates. By the time a junior developer moved on to app code, they were well versed and had mastered HTML and all the weird edge cases. They knew HTML.

The first real wave of "frontend" and "backend" developers came on the scene when you had designers learn Flash. They started driving more complex applications and there was a more bifurcation.

Granted even in small teams of the era, you had developers prefer "front" or "back". We tended to value "A jack of all trades is a master of none, but oftentimes better than a master of one"

dumb1224 · on June 19, 2020

I missed the era with flash though. I remember the original Gorillaz website which was a full flash-based game / exploration / gallery type of thing. It was so cool and I've not seen anything like that I ended up spending hours and hours on that site...extremely patiently because of the slow internet speed at that time in my college dorm in China (2002-ish I think?). The quality was astonishingly good.

slightwinder · on June 19, 2020

> I don’t understand why members of a dev team back in the day would’ve been more capable of being full-stack than today...

Because the stack in the old days was smaller and level of quality was far lower. In the '90s there was no css and javascript, nor did anyone care for accesability, multiple languages, security or interactive features. In the old days we had real documents with links and simple structure, not apps and mature frameworks. It was something on the level you get today with simple markup like markdown or org-mode.

te_chris · on June 19, 2020

Phoenix liveview solves this and the overhead of Ex is much lower than the overhead of mainting separate codebases and APIs, in our experience.

arcturus17 · on June 19, 2020

Many of us also choose to write it because it’s fun and the results can be excellent.

esperent · on June 19, 2020

Nobody denies React and co. are excellent tools. The frustration occurs when people who enjoy these tools claim that, if you choose not to use them, or suggest they're not fit for a certain task, you're WRONG. In the case of React in particular, this happens a lot. Then of course the people who don't want to use a framework feel frustrated, and next thing you know it's a flame war. I've seen this happen multiple times on Twitter.

JustSomeNobody · on June 19, 2020

This happened when Mongo started getting popular, too. Mongo supporters would get nasty if you suggested using a relational database. That, eventually, cooled off. Hopefully this "React or nothing" phase will too.

iphone_elegance · on June 19, 2020

I run into very few people who claim this.. oh you mean on Twitter? well there's the problem.

esperent · on June 19, 2020

Twitter is where a large percentage of web devs hang out. If something is problematic, you can't dismiss it as "but that's just Twitter". If it's a common mode of discourse on Twitter, then it's legitimately a part of developer culture.

In any case, discussions around front end frameworks and especially React are scarcely any better here. Although they are usually politer at least.

whywhywhywhy · on June 19, 2020

> Twitter is where a large percentage of web devs hang out

This isn't true, Twitter is a place where a large percentage of Twitter users who are interested in web development talk about it.

Twitter is not a lens on the entire internet, it is a lens on a bubble of a bubble.

globular-toast · on June 19, 2020

> Twitter is where a large percentage of web devs hang out

Source?

tpmx · on June 18, 2020

Back in the 90s I was working on a commercial server-side rendering "content management solution". Revision control and team workflow (with quite nice conflict resultion UIs). Templating using XSLT (sadly, because it's insane). Super expressive and solid stuff in general though. We used the actual cvs binary as a version control backend.

I left this part of the business space for the browser business at this time, but I assumed that the server-side rendering stuff would keep evolving. It didn't.

Then the "PHP CMS" wave came, and dumbed everything down.

Then the "reduce the server to an API" and "let insanely complex javascript frameworks/apps deal re-invent the browser navigation model" came... and here we are.

techhackblob · on June 19, 2020

It's been a ride watching all this abstraction and design patterns make everything more complex

untog · on June 18, 2020

I think there's a lot of truth in what you're saying and the core problem is that we somehow decided there was one correct way to make a web site, and that was to use React.

Are you creating a complex webapp? Use React. Go nuts! But are you making a mostly static page (blog, marketing site, whatever)? Then don't use React. It adds entirely unnecessary bloat and complication.

gpapilion · on June 18, 2020

The industry has gone back and forth here forever between thick and thin clients, and I view this as an extension. Largely we all use thick clients now (PCs, phones, and things with way too much compute power), and the move to chrome or chromium based browsers made the behavior predictable. The pendulum swinging back is really an acknowledgement that the advantages provided client side rendering, don't always outweigh the networking costs. Data visualization is one of these areas I wonder if the javascript methods provide a real advantage vs server side rendering.

zozbot234 · on June 18, 2020

Except that phones have only gotten thinner and thinner since the 1990s... And PC's (laptops, at least) are not far behind.

alphazino · on June 18, 2020

https://wikipedia.org/wiki/Thin_client https://wikipedia.org/wiki/Fat_client

jacobr · on June 19, 2020

If I only evaluated our tech stack on a pure “what’s the most efficient way to deliver HTML to the user” I’d not choose React, but I don’t know of any other framework that ticks the other boxes required in a large organization.

- Code sharing - how do you share reusable snippets of code that includes both the SSR logic and the JS that is also required on the frontend. React’s component model is fantastic where a team can develop a component independently

- Skill set - getting 50 React developers to write HTML and JS should be fine if other problems were solved, but often the suggested solution is obscure things like Elm or Elixir

- Even if most of what a company builds is static marketing content, other parts can be more app-like and having developers be able to share code and use the same basic technology is a great productivity booster

rudolph9 · on June 19, 2020

https://nextjs.org/ is a react wrapper framework that has some nice SSR features. It makes a ubiquitous dev experience where the majority of the app can be rendered server or client side. Last time I worked with it about a year ago it was only the first page load that was rendered server side though but the first one the one that arguably affects the user experience the most.

DrFell · on June 19, 2020

Despite the literal phrasing, server-side rendering, as in SSRing a SPA, is not even close to what was happening in the 90's. That was much simpler, pure, and fun.

pdxandi · on June 19, 2020

I love ️Vercel (formerly nextjs). It's such a great platform and their free tier services do a lot of the heavy lifting of configuring a website and managing deployments.

Kingdutch · on June 19, 2020

Slight correction to avoid confusion between Open Source product and PaaS(?) company in people's searches: Next.js the product is still named Next.js. The company behind it, Vercel was formerly Zeit :) Their command line tool "Now" has been renamed to Vercel

bob1029 · on June 19, 2020

I am aggressively pursuing a universal 100% server-side UI framework (not just targeting the web). If AMD has demonstrated anything to me, it's that the server is the place to do as much work as you can. Offloading work to client devices feels like stealing to me at this point. Websites are very abusive with your resources.

chooseaname · on June 19, 2020

> Offloading work to client devices feels like stealing to me at this point.

Kinda is. It makes my mobile work harder and thus uses more battery.

s_y_n_t_a_x · on June 19, 2020

React isn't bound to web, there's quite a few React Native targets (ios, android, macos, windows, qt, gtk)

I recommend React when making an app, web or otherwise.

I recommend vanilla HTML + CSS with optional JS when making a website.

purerandomness · on June 18, 2020

I recently watched the "Helvetica" documentary that was posted here a few days ago [0], where they briefly mention "Grunge Typography" [1], a seemingly dead-end branch of typography that, for some strange reason, became pretty popular for a short period of time.

After some years however, consensus amongst designers formed that what they've created was a pile of illegible garbage, and realized that there was no other way than completely dismiss that branch, go back to the roots, and evolve from a few steps back.

I feel the same kind of consensus is slowly forming around ideas like SPAs, client-side rendering and things like CSS-in-JS.

We saw the same happen with NoSQL and many other ideas before that.

We recently deployed an entire SaaS only using server-side rendering and htmx [2] to give it an SPA-like feel and immediate interactivity where needed. It was a pleasure to develop, it's snappy and we could actually rely on the Browser doing the heavy lifting for things like history, middle click, and not break stuff. I personally highly recommend it and see myself using this approach in many upcoming projects.

[0] https://www.hustwit.com/helvetica/

[1] https://www.theawl.com/2012/08/the-rise-and-fall-of-grunge-t...

[2] https://htmx.org/ (formerly "Intercooler")

hoorayimhelping · on June 18, 2020

>We recently deployed an entire SaaS only using server-side rendering and htmx [2] to give it an SPA-like feel and immediate interactivity where needed. It was a pleasure to develop, it's snappy and we could actually rely on the Browser doing the heavy lifting for things like history, middle click, and not break stuff.

I'm glad we're coming back to server side rendering with some JavaScript for interactivity, but from about 2005 - 2015 this was simply known as web development. You didn't need to worry about breaking the back button or middle mouse or command clicks because they just worked.

I feel like with React, we made the actual JavaScript code simpler at the expense of everything else in the system becoming way more complex.

yoz-y · on June 18, 2020

Depends on what you build. I find a REST server + client side rendered frontend quite simpler to grok than a server side rendered page. Mostly because the separation between UI and data is really clear and all of the CLI interfacing comes free as well. There is certainly a way to split this well with SSR, but it's also easier to fall into the trap of tightly coupling the parts.

TheRealPomax · on June 19, 2020

Sure, but you can just as easily fall into the trap of massively over-engineering your client-side rendering because you just grab packages off the shelf for every little thing instead of going "it's a little bit more work, but there is no reason to any of this when the browser already does this", like ending up with a 100kb bundle for what is effectively just a standard form that would work better, with less code, if it just used a normal form with built-in validation for almost everything, with the final acceptance/rejection determined by the api endpoint.

It really depends on which SSR approach we're comparing to which client-rendering approach, and who you're optimizing for.

dewey · on June 18, 2020

I have fond memories of creating images with Grunge fonts in some pirated copy of Photoshop and then positioning them with HTML tables and Dreamweaver.

themodelplumber · on June 18, 2020

Those were good times in a lot of ways!

Also I may be an outlier, but IMO grunge as a textural expression still benefits lots of contemporary design projects. In fact if you know how to work within broader principles of design, maybe you stop caring as much about what's current, because that's just one of many outcomes that may or may not be appropriate for the message...