My theory is that the high barrier to entry of online publishing kept all but the most determined people from creating content. As a result, the little content that was out there was usually good.
With today's monetized blogs, it is often content for content's sake. People don't try, or they write about topics which they are not really interested in, but did just to have a new post. Or often the writing is bad.
Maybe today's problem isn't the blogs, but the SEO that puts the crap blogs at the top of the search results. Or maybe I'm misremembering and the old content was crap too, or maybe my standards are higher than they were in my teenage years.
This is where directories come back in. Check some of these out:
* https://webring.xxiivv.com/ (which led me to this gem: https://dreamwiki.sixey.es/)
Competing with Google in search has become an insurmountable task. Personal directories attack from the opposite direction (human curation, no algorithm) in a way that actually puts Google far behind. It's kind of exciting and unexpected.
But what you hint at might be more correct these days. They are running a reverse wayback machine in that anything not changed in the last year gets removed. If you click the advanced search its "updated within" and the max timeframe is a year.
In fact it seems the date range example doesn't even work: https://developers.google.com/custom-search/docs/structured_...
If I fiddle with it, it returns a result, but I see an hit from just a few days ago at the top...
Sometimes I wish that were true! Try Googling for, say, PostgreSQL documentation and the top result will often be for a 10-year-old version of the software.
Why is it that Google is thinking the older page is more relevant? Does PageRank outrank content (and Google is oblivious to similar pages that have different versions?)
They did that for a long time, but some years ago the index grew so big, they started restricting it. I thing the general timeframe is 10 years or less till the last update.
> If you click the advanced search its "updated within" and the max timeframe is a year.
Because it makes no sense to go further. For older content you can define individual dateranges. And yes, it works fine for me. Tested a search for 2015 just now, first side had entries all from 2015.
> In fact it seems the date range example doesn't even work: https://developers.google.com/custom-search/docs/structured_....
All those examples are not working. Wasn't custom search retired some years ago?
Google has conditioned us into thinking that "an algorithm that automagically separates the wheat from the chaff" is the only way to do things. It worked for them for a while, but the adversarial forces of marketing, spam, malware, etc are very creative and fast-moving, and that's a lot for an algorithm to try to constantly reckon-with, so best case it'll probably stay a stalemate.
But that's not the only way things can be.
Since Twitter is their preferred platform, go put the activity of journalist Twitter accounts into a relational DB and start searching for who always boosts who. You'll find patterns. Of course there's nothing inherently wrong with this, but at the end of the day I don't need to know what a dozen NY Times journos think of a NY Times oped which is clearly written in bad faith, pushing a false narrative about a particular news event.
Non-gameable ultimately means people who influence the results can have no monetary interest in the results.
That search engine is like a gold mine! I searched for "black people love us" in DDG and it was the first result, followed by an article written this year explaining how it came about and .. the web felt like such a smaller place back in 2002 and I just don't remember this at all.
Which makes me wonder about how newspapers and free to air tv kept the culture pretty shallow before exploding with the internet and now, possibly contracting again as our filter bubbles shrink? Just an errant though.
But so much novelty and interesting stuff at wiby.me - search for 'trump' and the first result is just surreal.
Facebook also used have RSS feed for public pages and posts. Now they've not only removed that feature but also have heavy restrictions for third apps.
Services connected to the Fediverse, an alternative framework that focusses on connectivity, is slowing growing. It's only a matter of time before they are more successful than the walled gardens.
I had a search today, and 7 of the top 10 results were from today. What I was looking for was NOT news, it was historical. If I wanted news, I would click the news tab. Having 7/10ths of the results come from today makes using google to search all of the web ever near useless, as todays noise is noisier than ever.
I dont even care if they are defaults, but buttons to "exclude big sites" "exclude the news" or "exclude fresh results" would make search so much better.
What would be really cool was if google could show what the results for a search looked like on a given day. Not the current algorithm, not any sites indexed since then, but what it looked like at the time. Going back to use 2010 Google would be a dream.
The other way would be to have the index and algorithm versioned, where you can target any instance of the algorithm against any version of the data.
I am sure it’s technically possible going forward, but it would be interesting if such capabilities could be enabled for historical versions of the index and algorithm. Combined with anonymized historical zeitgeist data, some interesting digital archaeology could be attempted.
All the more reason to run your own crawler! What’s the state of the art for this area right now in self hosted solutions? Can you version your index and algorithm like we’re discussing and do these kinds of search-data time-traveling?
Though document fingerprinting is hard. Especially w/ fungible page elements.
Internet Archive has an angle here.
Are you referring to WARC type tooling or what? I don’t want to put words in your mouth. I’m a complete learner on this topic. I think gwern has written a bit about this broadly? I’m curious to know more about this, if you have time to share more.
We have them, they suck.
> and focused on indexing the long tail of insightful content that is neglected by Google because it lacks SEO
How would you even define that? SEO is changing all the time. And google is fighting it all the time.
And how would you prevent SEO focusing on that new searchengine? If it becomes big enough, people will optimize for it.
Then we might go back to something sort of interesting.
I wonder whether doing a parallel search on google and filter out by their top-results from your own results would be a feasiable solution? Add a filter on the top 500 websites, and whether known ad-sources are used and you might get slowly there.
Maybe instead of a smart searchengine it would be better to focus on a dumb focus which gives access to all the metadata of a page too, and allows people to optimize for themself. Fulltext alone is not the only relevant content for good results. Google knows it and uses them, but has very limited acces to it for the enduser.
Still, I guess that’s only viable because Google rewards lots of links. If you just disable link relevance that part of gaming the system will be gone too.
Then, the more distinct queries a given website ranks for (i.e. the more SEO battles it wins/the more generally optimal it is at “playing the game”), the less prominently any individual results from said website would be ranked for any given query.
So big sites that people link to for thousands of different reasons (Wikipedia, say) wouldn’t disappear from the results entirely; but they would rank below some person’s hand-written HTML website they made 100% just to answer your question, which only gets linked to on click-paths originating on sites that contain your exact search terms.
This would incentivize creating pages that are actually about one particular thing; while actively punishing not just SEO lead-gen bullshit; not just keyword-stuffed landing pages we see in most modern corporate sites; but also content centralization in general (i.e. content platforms like Reddit, Github, Wikipedia, etc.) while leaving unaffected actual hosting by these platforms, of the kind that puts individual sites on their own domains (e.g. Github Pages, WordPress.com, Tumblr, specialty Wikis, etc.)
A fun way to think of this is that it’s similar to using a ladder ranking system (usually used for competitive games) to solve the stable-marriage problem on a dating site.
In such a system, you have two considerations:
• you want people to find someone who’s highly compatible with them, i.e. someone who ranks for their query
• you want to optimize for relationship length; and therefore, you want to lower the ranking of matches that, while theoretically compatible, would result in high relationship stress/tension.
Satisfying just the first constraint is pretty simple (and gets you a regular dating site.) To satisfy the second constraint, though, you need some way of computing relationship stress.
One large (and more importantly, “amenable to analysis”) source of relationship stress, comes from matches between highly-sought-after and not-highly-sought-after people, i.e. matches where one partner is “out of the league of” the other partner.
So, going with just that source for now (as fixing just that source of stress would go a long way to making a better dating site), to compute it, you would need some way to 1. globally rank users, and then 2. measure the “distance” between two users in this ranking.
The naive way of globally ranking users is with arbitrary heuristics. (OKCupid actually does this in a weak sense, sharding its users between two buckets/leagues: “very attractive” and “everyone else.”)
But the optimal way of globally ranking users, specifically in the context of a matching problem, is (AFAICT) with IDF(PageRank): a user’s “global rank” can just be the percentage of compatibility-queries that highly rank the given user. This is, strictly speaking, a measure of the user’s “optionality” in the dating pool: the number of potential suitors looking at them, that they can therefore choose between.
If you put the user on a global ladder by this “optionality” ranking; and normalize the returned compatibility-query result ranking by the resulting users’ rankings on this global “optionality” ladder; then you’re basically returning a result set (partially) optimized for stability-of-relationship: compatibility over delta-optionality.
All this leads back to a clean metaphor: highly-SEOed websites—or just large knots of Internet centralization—are like famous attractive people. “Everyone” wants to get with them; but that means that they’re much less likely to meet your individual needs, if you were to end up interacting with them. Ideally, you want a page that’s “just for you.” A page with low optionality, that can’t help but serve your particular needs.
Maybe you could try and make a model SEO article for your own search engine, just taking your existing SEO results and figuring out which parameters are contributing the most to their ranking, then filtering out results that contain these parameters that worked well in your model. Rinse and repeat as SEO writers try and step up their arms race, but they should always end up being foiled by your changes to the search engine after optimizing your own perfect SEO model regularly.
Instead every algorithmic content delivery platform, from Google to Twitter to Youtube to Facebook, is constantly chasing just to keep from being underwater against the spammers.
Less than ads and subscriiptions, but enough to fund a lot of crap.
Maybe, maybe not. How useful is Google (or search in general) to you?
For me search is more of a convenience tool than for finding sites that have information. There are questions I need answered but without search would have an easy time figuring out (e.g. "how many cups are in a pint?"). Sometimes I want opinions, but I almost always am going to the same sources. Sometimes I use search because I'm too lazy to click around using a site's own search. The only things that are actually useful for me from search are for specific expert knowledge that I want in a structured manner (e.g. "what do I need to consider when buying a house?"), and those queries are incredibly few
I feel like search is slowly becoming irrelevant
I think I use google in the same way as you.
Most of the time, I could go to the websites directly (MDN, Stackoverflow, HN), but sometimes I'm trying to find something I don't know about, by trying different terms. I usually do this when I want a particular product but I don't know what it's called or what it is "smallest itx case without gpu", "midi router no power supply", "waterproof tarp diy tent setup".
I installed a browser extension called uBlacklist as recommended by someone here a couple of weeks ago, so now 90% of the time Google search is like my Ctrl-P for MDN, Stackoverflow, etc since I've managed to filter out the sites I don't want to see results from.
2. An algorithm can always be gamed.
The content on the sites I visit is created by humans. Until automation genuinely overtakes us, I'm not ready to accept at face value the scale of the internet has grown so large that humans couldn't tackle the problem.
All I can say is, good luck with your human curation startup.
If there were orders magnitude more pages than humans, I'd agree. But I'd also ask: Who created them all?
It's not easy to quantify the amount of useful content on the internet. The 2bn figure above seems to stem from registered domains, and depending who you ask  around three quarters of them are "inactive" (e.g. landing page for a parked domain).
At the other end of the spectrum, Google's index surpassed 130 trillion pages four years ago . Point in favour of my opponent!
If everyone connected to the internet indexed one page a day over the course of their lifetime we might just about do it. And anyone creating a new page would need to [arrange to] index it themselves.
Also, any setup that allows everybody to be a moderator will be promptly gamed.
What you want is what yahoo used to do - a hand-curated search engine. It worked when the internet was small, but got buried under the eventual avalanche of web sites.
- Using DNS "zone files", the DNS database for TLDs (which are not available for all, but most) show there's circa 200 million domains registered at any moment
- A large percentage of these are parked, i.e. no unique content.
- Many domains are "tasted", i.e. bought, are alive for a few days then disappear, so potentially you waste time crawling them
- Lots of sites are database driven and can result in millions of pages that can be created in a day
- URL rewriting means you can have an almost infinite number of pages on any one site
- Soft 404s and duplicate content can be hard to spot and can waste resources in gathering/removing them
There's paid for resources like Majestic/Ahrefs/Moz that crawl the web to see who's linking to who and they all contain trillions of URLs.
I think the most detrimental fact is that pages often disappear or change, I don't have a recent number but I'm fairly certain there's a 10-15% chance that any link you see this year, will be gone next year. "Link rot". Hard to build a DMOZ style directory on that scale with that problem.
I don't think it is unmanageable, it just needs to be seen from different perspectives and managed by different groups of people.
You only need to choose an algorithm so that the amount of trash is less than you can handle.
Those article directories were eventually murdered by Google within the blink of an eye maybe a decade ago, and quite frankly on any given topic nowadays it's way easier to find good content rather than SEO filler. Google's algorithms will nowaydays favor fresh (aka published or updated recently), long content over short, "popular" (aka lots of links on it), duplicated content.
Instead, most stacks are designed to enforce a walled garden, from which very little is shareable unless you go through the gateway (the web client app) to some approved destination.
Semantic web and total availability at the personal-computer level, are aspects of OS design I wish had been paid more attention.
Basically what we have now is a very, very expensive system of dumb terminals.
As an example go look at the tags on soundcloud. People tag their songs with whatever they think will get people to take a listen.
There are plenty of trusted sources that would not spam their pages in this way. And if you spam too much, you risk getting dropped from search indexes, directories, links etc. because your source is just not useful.
It's not as if the semweb people weren't aware of the problem, and it's not self-evidently a fatal blow to the idea.
There are so many possible viable methods for ranking search results! Particularly now with higher level textual analysis using AI/ML/[buzzword], and perhaps more importantly, the resurgence of interest in curated content. People are getting better at discerning curated-for-revenue vs. curated-with-love.
Would you speak to why you think this way about PageRank? What are its shortcomings?
To me, who only paid surface-level attention to this, it seemed like Google results were best when PageRank was the dominant metric. As they moved more and more in the direction of prioritising news, commerce and the aspects we call SEO, “number of links pointing TO the resource” became less and less important in the ranking. And as that happened, the quality of results dropped, and the content silo-ing rose up.
PageRank was peer-review. SEO is “who shouts the loudest”.
As far as I can tell, the main reason Google succeeded was that other search engines let advertisers buy placement for keywords (and didn’t label paid links). I heard from an industry insider that was able to strip the paid links that the engine they worked on gave results that were very similar to Google’s.
The second big reason was that pagerank was a useful signal that hadn’t already been gamed to the point of uselessness. I think this let a tiny team blindside an entrenched industry.
That’s not to say there’s no technical insight behind the page rank algorithm, but it was only a useful signal for a few years.
It got to the point Altavista became more or less useless, and when Google showed up on the market they quickly took it over.
Seems the time is ripe for a new revolution. Doesn't have to be a better search engine, could be something completely different.
Simply having half a dozen or more search engines per country/language, with their own indexes and algorithms should help see the web more fully.
ATM in English, Bing and Google have the largest indexes and Mojeek has its own index though smaller. DDG, Ecosia and others are just the Bing index re-ordered.
I enjoyed the OP, though. And I think niche directories/blogrolls would be progress. The current centralised web is a result of everyone dancing to the tune/rules of the large platforms.
The problem with decentralization is that it creates a power vaccuum that is filled by the most interested actors. Even bitcoin, with it's decentralization-by-design is actually centralized to a handful of miners in China.'
If the goal is to rebuild something that is anything else other than profit based, you need to make sure the organization running it is strictly non-profit.
Also I like the idea of making search like Wikipedia where people can edit results. Obviously you’d need super genius level safeguards to protect against scammers but Wikipedia does it ok-ish.
- Directories are Communities defined by their link rules
- Easy to start a new one
- SSO across all communities
- Built-in forum technology
- Unified comment technology for every website
You can get communities like reddit.com/r/Sizz for instance or larger ones like /r/esp8266 or massive ones like /r/sanfrancisco or planet-sized ones like /r/pics. And reddit itself plays the role of a meta-directory, with little directory networks (SFW-Porn being "the pretty pictures of the world" directory network) sitting between Reddit itself and the subreddits.
Reddit is an amazing amazing thing.
- Most subreddits are hostile to self-promotion of Web stuff. If you're unknown, you're going to have to be socially involved there enough that people know your name. (Though I agree that small subs like /r/esp8266 are begging for self-made content.)
- Related: people don't know your name because it's in small gray text. You're just another comment.
- You need upvotes. A personal directory requires only one upvote.
- You need to be on-topic. Your work may not fit Reddit's categories.
- Reddit mods are generally more like forum mods than librarians.
- Agreevotes do not equal quality. I don't want to overstate this, but I like that I'm not seeing vote counts on personal directories.
Reddit is cool - but it has its own rules and its own culture that goes with it. I personally wouldn't call it 'the modern web directory'. I do think it's less hostile to the Web than many other platforms - and certain subs like /r/InternetIsBeautiful and /r/SpartanWeb do good work.
Great! I'm there for the content, and self-promotion is usually the worst content.
I like that HackerNews tags this stuff with ShowHN so I can decide whether I want to look at it or not.
Even the communities I care about have limited utility.
The annoying part is that not only does a sub mod get the privilege of deleting posts at will, but there is no appeals process for that, and Reddit doesn't listen to suspension appeals at all.
Aside from that, the downvote/upvote culture is bad (even if I am usually in its favour) and encourages dogpiling and groupthink. Ironically, Reddit with its "don't downvote for disagreement" element of Reddiquette is worse at this than HN with its "downvoting for disagreement is fine" policy.
The site redesign, infinite scroll, a handful of mods controlling many major subs, nonsensical or inconsistent administration and rules, unjust dishing out of punishment, advertisements in the main feed, widespread outrage bait, and endless drama means Reddit is no different to platforms like Twitter where short, witty and possibly fallacious content thrives.
I can count on a hand the number of times I've actually valued information I've obtained from Reddit comments or submissions. That does not justify the amount of time and energy I've poured into the website, which I could have better spent simply not using social media (which Reddit now is). These days, if I had to use social media, I'd pick the Fediverse over Reddit every day of the week. It takes a lot of time to realize that highly upvoted comments (and every comment) is really just "someone's opinion, man" which the current zeitgeist dictates people will agree with.
On Reddit, politics is entertainment (r/PublicFreakout etc.), mocking and hate is central (r/SubredditDrama, r/unpopularopinion), administration is done in the interest of advertisers and personal opinions (phrases such as "weird hill to die on"), and moderation attracts people who would rather bask in the power afforded to them more than people who would rather carefully curate and foster discussion. The one sub which works to its purpose is r/ChangeMyView.
Spending time arguing with random people on the Internet is mentally taxing, very unlikely to achieve a change in opinion of the persons involved or the observers, and terrible for stimulating and interesting discussion. Next time I want to argue a point, I'll get a blog with a comment section, or these days, without one. If my friend told me they were going to register a Reddit account, I'd tell them everything I've just said in this post.
While that does happen, my problem is more often that mods of smaller subreddits are inactive or unwilling to moderate them so they end up being filled with low effort memes instead of actually interesting content.
I'm calling it now, the hottest startups will be "disrupting search using artisinally crafted rings of websites"!
You're in luck! https://news.ycombinator.com/item?id=23549471
Maybe not exactly in the form of webrings, but who knows, why wouldn't it be time for the pendulum to swing from the whole AI hype back in the other direction? There is a lot to be said for conscious curation on your terms and your devices vs algorithmic decisions made in the cloud for you.
Actually on the Hacker News guidelines it kind of describes this. And although it seems like the articles posted here are higher quality, they eventually get lost after 2-3 days.
I mean - maybe it's possible. Perhaps a really focused team could figure it out. (The 'awesome' directories have kind of figured that out, by having specialized directory.) But these personal directories are really sweet because they don't have to cover a topic. They can just be a collection of good stuff, who knows what.
It used to be expensive to publish anything - especially the further back in time you go. So classics for example typically represent particularly bright writers, as having something published before the printing press, and widely disseminated, was simply unlikely to happen.
But today anyone can create an account on YouTube or stream on twitch and it doesn't matter if the content is of any particular quality or veracity, so long as the common man sees what he wants to see.
I think there's a major secondary effect, in that now that we are surrounded by low quality media, the average person's ability to recognize merit in general is lessened.
Perhaps you're saying that so much low quality media drowns out the high quality media - such that it can't be found. The ratio is off, right?
Because there is so much low-quality content, it's become nearly impossible to find the high-quality content. Needles in haystacks.
You would think so, but more often than not, most people don't want high quality. What happens is that the media that panders to the lowest common denominator stands out the most, since that what the majority focus on.
I am guilty myself, I often find myself jumping to the comments section even here on HN to understand what people are taking away from an article without even finishing it.
Make a long form content, start with the most important information in the first paragraph, and give more and more developments in the following paragraphs. Someone who thrives for short content will be happy with the first paragraph. Someone who want to delve into the details will ready each and every word. Heck, your first paragraph could even be a tweet containing a link to the long form.
This is clickbait taken backwards. You will get very few clicks as you already delivered the main information for free, but those who clicked will be there for a good reason.
You're more right than you know.
When there were only a handful of television channels, the content was higher quality than what we have now.
When there were only a couple of dozen cable channels, the content was higher quality than the endless reruns we have now.
When publishing a book went through the big publishing houses, the quality of what was available was higher than it is now where anyone can self-publish and pretend to be an expert.
See also: radio.
Content can only be created so fast. There are only so many talented content creators out there. While the number of media channels has exploded, the number of good content creators has not kept pace. Keep adding paint thinner, and eventually you can see through the paint.
The internet was supposed to give everyone an equal voice. All it ended up doing is elevating the drek and nutjobs to have equal footing with people who know what they're doing and what they're talking about. The quality is drowned out by the tidal wave of low-grade content.
As someone who watched TV in the 1970s, before cable was a thing, I have to disagree here. I think we look back today and see "MASH" and "Columbo" still holding up great after 40+ years and think of it as representative. But nearly all TV was formulaic dreck back then, just like it is now. And even if the average quality level is worse now (which I'm not convinced is the case, but let's assume) the quantity is much higher and there's a large amount of really good stuff to choose from on the right side of the bell curve.
It's true that content can only be created so fast, but it can also only be consumed so fast. Once you have access to enough high-quality content to fill all the spare time you want to spend watching TV or reading or gaming or whatever, having more of it to choose from doesn't improve your experience much.
The barrier wasn't that high. Making a site on Geocities, Tripod or Angelfire wasn't that difficult. Writing 90's style HTML wasn't exactly writing a kernel in C, and most of those services had WYSIWYG editors and templates anyway. Few of the people publishing to the web in the 90s were programmers, so the technical knowledge required was minimal.
And plenty of people are publishing high quality content on the modern web, even on blogs and centralized platforms. I follow writers, scientists and game developers on Twitter, watch a lot of good content on Youtube, read a lot of interesting conversations on Reddit. The fact that people publishing content nowadays don't have to write an entire website from scratch has little to do with their personal passions (or lack thereof), whether they're interesting or (and ye gods how I've come to hate this) "quirky." That's like saying writers can't write anything worth reading unless they also understand mechanical typesetting.
As far as the old content goes, of course most of it was crap. Sturgeon's Law applies to every creative medium. Most blogs were uninteresting, many personal sites were just boring pages full of links or stuff no one but the author and maybe their few friends cared about. In both cases, between the old and new web, a bias for the past (as HN tends to have) leads people to only remember the best of the former and correlate it with the worst of the latter.
However, comments like this seem to be proof that is not true.
I have personal memories of what the Web was like in 1993 but there are so many people today who are feeding off the advertiser bosom what are the chances anyone will listen. No one wants to hear about what the network was like before it was sold it off as ad space. Young people are told "there is no other way. We must have a funding model". Even his article is rambling about "the problem of monetization". No ads and "poof", the Web will disappear. Yeah, right. More like the BS jobs will go away. This network started off as non-commercial.
There was plenty of high quality content on the 90's Web. Even more on the 90's Internet. That is because there were plenty of high quality people using it, well before there was a single ad. Academics, military, etc. It all faded into obscurity so fast. Overshadowed by crap. The Web has become the intellectual equivalent of a billboard or newspaper. The gateway to today's Web is through an online advertising services company. They will do whatever they have to do in order to protect their gatekeeper position.
Geocities was a beautiful mess as ... it was just folks trying to figure out HTML and post silly stuff, but it was genuine.
I think part of the reason was, as you say, lower standards. We were being exposed to content that didn't have an outlet before that. The music was new, black, polished chrome... to borrow a Jim Morrison line.
A bigger part is discovery though. Blogrolls & link pages were a thing. One good blog usually lead you to 3 or 4 others.
These days, most content is pushed, often by recommendation engines. Social media content is dominated by quick reaction posts, encouraged by "optimization."
The medium is the message. In 98, the medium was html pages FTPed to some shoddy shared host to be read by geeks with PCs. In 2003, it was blog posts. In 2020 it's facebook & twitter.
The signal to noise ratio might have gotten worse, and discovery might be flawed, but the absolute quantity of quality content has never been higher.
Like you, I know things have changed, but I still can't imagine I could do that today, going from blog to blog, without running low on material within ~60 minutes.
EDIT: I see the webring links here now, I may try them.
I think hypertext as a medium has a lot going for it that books don’t, but I don’t think we’ve figured out distribution, quality control, and discovery sufficiently to make the internet so stimulatingly surfable.
There was a period of time a few centuries ago when adults looked at kids reading romance novels the way that adults today look at kids scrolling through TikTok. I think all mediums go through cycles.
On the other side (when the wild internet’s commercial viability wanes, and people can no longer make easy money hosting mediocre terrible, SEO-driven drivel), I think a lot of the good content will survive the great filter, and that’s when we’ll be able to appreciate it for what it is/was. The next 20 years might be rough, but the work of this generations’ Dickenses and Pushkins and Gogols will survive.
The best result I could find concerning some official data from FUMBBL (a place you can play blood bowl) was a blog entry from 2013. My circle of friends and the different leagues I play in have been using that as reference for years. We’ve searched and searched to find the data source to no avail.
The other day I’m randomly site: searching for some thing completely unrelated and find a source for live FUMBBL data. You’d think that was the first search engine result related to blood bowl statistics on any search engine, as it’s really the best damn source I’ve ever seen, but it’s not.
I know you were probably referring to something a little more interest based. Well I once sat next to a retired biology professor at a wedding, and it turned out he ran an interest site, detailing all the plants specific to the danish island Bornholm. I don’t care much about plants, but it was exactly a 90ies styles page. Unfortunately I didn’t save the link (I don’t care much about plants), because I haven’t been able to find it since, despite searching for his name.
So I think it’s still there, it’s just not easy to find it.
I have the same feelings about social media. It used to be that you only had to listen to your stupid Uncle at Thanksgiving. Now he constantly spews his garbage on Facebook
I believe that we have the 'web' today because big decisions were made about how little control the end-user (i.e., consumer) should have over the content made by producers, and that the #1 priority for all technology involved in the web has been to separate producer from consumer as stringently as possible.
If we had the ability to safely and easily share a file that we create on our own local computer, using our own local computer, to any other computer in the world - we would have a nice balancing act of user-create content and world-wide consumption.
Instead, we have walled gardens, and the very first part of the wall is the operating system running on the users computer - it is being twisted and contorted in such ways as to make it absolutely impossible for the average user (i.e. the computer owner/user) to easily share information.
Instead, we have web clients and servers, and endless, endless 'services' that are all solving the same thing for their customers: organising documents in a way people can read them. And all the other things.
And its all so commercial, because there is a huge gate in the way, and it is the OS Vendors. They are intentionally stratifying the market by making the barrier to entry - i.e. ones own computing device - untenable to serve the purpose.
Imagine a universe where OS vendors didn't just give up to the web hackers, in the early days, and instead of making advertising platforms, pushed their OS to allow anyone, anywhere, to serve their documents to other people, easily, directly from their own system. I.e. we didn't have a client-/server age, but rather leaped immediately to peer-to-peer, because in this alternative universe, there were managers at places like Microsoft that could keep the old guard and the new young punks from battling with each other .. which is how we get this mess, incidentally.
There really isn't any reason why we all have to meet at a single website and give our content away. We could each be sharing our own data directly from our own devices, if the OS were being designed in a way to allow it. We have the ability to make this happen - it has been intentionally thwarted in order to create crops.
Give me a way to turn my own computer, whether it is a laptop or a server or my phone, into a safe and easy to use communications platform, and we'll get that content, created with love, back again.
Its sort of happening, with things like IPFS, but you do have to go looking for the good stuff .. just like the good ol' days ..
You would find out quickly why you don't actually want that when your own "unisystem" designated port server/client gets hammered with unsolicited requests and exploit attempts. Easy is a matter of interfaces although other design choices would come with costs. Safe however would be far harder. If you set it to a whitelist well you lose discoverability instantly.
And none of the issues you state as being the reason why we can't have nice things, are actually valid reasons. OS Vendors could solve the problem of serving content from ones own local PC's quite effectively - the issue is not the technology, but rather the ethics of the industry, which prefers to have massive fields of consumers to farm from..
I think there's some truth to this. Some junior developers make it a goal to be seen as a respected blogger, so they feel the need to write something, even if they have nothing to say.
Some blogs were great (i.e. created to solve the problem of too much interest in what one is up to even to answer internal questions individually) and signaled a few great minds that can be hired at a discount.. Those engineers told also pretty good co-workers in stage 1, by stage ~3, managers tell their underperforming direct reports to blog whatever they understand about what their group is doing in the hopes that they (either improve or better yet:) become a burden somewhere else.
Also I miss the wide and wonderful design and color scheme of the 90's :) Long before bootstrap or "material design" !
I had an idea of a search engine that allowed you to permanently remove domains or pages with certain keywords as a paid service.
But at the time it was so magical compared to pre-web where the only content you could find was professionally published magazines or books, and suddenly you had all this niche content about stuff that wasn't worth publishing, all in a few clicks.
But they had all those annoying pop ups and pop unders.
Nowadays, Google just tracks you everywhere you go. Even when you’re browsing incognito.
There are many use cases for which a client-side framework like React is eesential.
But I feel the vast majority of use cases on the web would be better off with server-side rendering.
There are issues of ethics here.
You are kidding yourself to an extent when you say that you are building a "client-side web app." It is essentially an application targeted at Google's application platform, Chromium. Sure, React (or whatever) runs on FF and Safari too. For now. Maybe not always. They are already second-class citizens on the web. They will probably be second-class citizens of your client-side app unless your team has the resources to devote equal time and resources to non-Chromium browsers. Unless you work in a large shop, you probably don't.
Server-side rendering is not always the right choice, but I also do see it as a hedge against Google's, well, hegemony.
(In other news, I would desperately like some free time)
Seconded. It makes hosting a breeze; especially when you can just throw it on GitHub Pages within minutes. I also like being dependent on only a text editor and a browser ... a combo still performant on just about any device from the last 20 years if not longer. No need to install full-featured IDEs and the various dependencies.
What security risks are removed by using a client side app instead of a server side one?
developers who only know how to write fat JS frontends
Nearly every web app is now two apps, and it's increasingly infeasible for any developer to have a mastery of both backend and frontend stacks.
Not necessarily a terrible problem when you have dozens of developers, but a lot of dev teams are 1-3 people. Instead of web dev circa 2010 where you might reasonably have a 3-person team of people who can each individually work anywhere in the stack, now your already-small team is bifurcated.
In many ways this is an inevitable price of progress... 100 years ago you had one doctor in town and they could understand some reasonable percentage of the day's medical knowledge. Today, we can work medical miracles, but it requires extreme specialization.
At least in the development industry, we can choose not to pay that price when it's not necessary.
The front/back divide also existed back then, with barely any possibility of a front person ever touching the back-end (a possibility that exists nowadays, without going into its merits or demerits).
For a reasonably ambitious and industrious individual nowadays it’s not unreasonable to become really good at one client and one server technology. There’s more to it than writing server-side code with HTML templating for presentation, for sure, but it remains well within the grasp of many people.
There was not "frontend" and "backend" developers. There were designers and developers. Designers created designs. They were usually delivered as PDFs, because the bulk of them came from print design backgrounds.
Their designs were then implemented by developers. Senior level developers tended to more of the application level heavy lifting (server-side scripting & db), with junior level developers working on converting designs to html, then to templates. By the time a junior developer moved on to app code, they were well versed and had mastered HTML and all the weird edge cases. They knew HTML.
The first real wave of "frontend" and "backend" developers came on the scene when you had designers learn Flash. They started driving more complex applications and there was a more bifurcation.
Granted even in small teams of the era, you had developers prefer "front" or "back". We tended to value "A jack of all trades is a master of none, but oftentimes better than a master of one"
In any case, discussions around front end frameworks and especially React are scarcely any better here. Although they are usually politer at least.
This isn't true, Twitter is a place where a large percentage of Twitter users who are interested in web development talk about it.
Twitter is not a lens on the entire internet, it is a lens on a bubble of a bubble.
I left this part of the business space for the browser business at this time, but I assumed that the server-side rendering stuff would keep evolving. It didn't.
Then the "PHP CMS" wave came, and dumbed everything down.
Are you creating a complex webapp? Use React. Go nuts! But are you making a mostly static page (blog, marketing site, whatever)? Then don't use React. It adds entirely unnecessary bloat and complication.
- Code sharing - how do you share reusable snippets of code that includes both the SSR logic and the JS that is also required on the frontend. React’s component model is fantastic where a team can develop a component independently
- Skill set - getting 50 React developers to write HTML and JS should be fine if other problems were solved, but often the suggested solution is obscure things like Elm or Elixir
- Even if most of what a company builds is static marketing content, other parts can be more app-like and having developers be able to share code and use the same basic technology is a great productivity booster
Kinda is. It makes my mobile work harder and thus uses more battery.
I recommend React when making an app, web or otherwise.
I recommend vanilla HTML + CSS with optional JS when making a website.
After some years however, consensus amongst designers formed that what they've created was a pile of illegible garbage, and realized that there was no other way than completely dismiss that branch, go back to the roots, and evolve from a few steps back.
I feel the same kind of consensus is slowly forming around ideas like SPAs, client-side rendering and things like CSS-in-JS.
We saw the same happen with NoSQL and many other ideas before that.
We recently deployed an entire SaaS only using server-side rendering and htmx  to give it an SPA-like feel and immediate interactivity where needed. It was a pleasure to develop, it's snappy and we could actually rely on the Browser doing the heavy lifting for things like history, middle click, and not break stuff. I personally highly recommend it and see myself using this approach in many upcoming projects.
 https://htmx.org/ (formerly "Intercooler")
It really depends on which SSR approach we're comparing to which client-rendering approach, and who you're optimizing for.
Also I may be an outlier, but IMO grunge as a textural expression still benefits lots of contemporary design projects. In fact if you know how to work within broader principles of design, maybe you stop caring as much about what's current, because that's just one of many outcomes that may or may not be appropriate for the message...
Really? Citation needed?
I guess I have never heard graphic designers say anything about David Carson except that he's one of the most innovative and influential designers in the past 30 years. IMO his graphic design was amazing and perfect for the context (music and surf magazines). I loved getting my monthly issue of Raygun and marveling at what neat designs they had done this time.
The decline of it is pretty easy to explain: the same thing that killed print altogether also killed print music magazines (i.e. the internet).
The sentiment I paraphrased was stated by designer Maximo Vignelli, although he didn't single out David Carson.
I'm invested in Elixir and there are some interesting, different trade-offs being made there for highly interactive things with Phoenix LiveView. And there is the Lumen compiler to potentially in the future not need JS as one could write Elixir to get WASM for the interactivity needed.
My bet is still mostly on server side rendering and static as much as possible. Current JS does have the JAMstack ideas that I find a healthier direction.
Back before the age of SPAs we had those "dynamic" apps which updated parts of the DOM by firing requests to the backend, getting a piece of rendered HTML and just throwing it there.
It was an absolute nightmare to maintain and I'm happy we're past that.
As for server rendering, I've been getting good results with Sapper.js . Here's an example of a webcomic page I started doing for a friend of mine, but it never took off:
It's a pure HTML static web page, which started off as a server-side-rendered app, but with Sapper's export tool I didn't actually need to deploy the backend, since the content of each page is deterministic with respect to the URL.
My friend is not technical, but understands what FTP is and is able to edit a JSON file, so that's how we went about this.
> It was an absolute nightmare to maintain and I'm happy we're past that.
Yeah, this model was adopted by noted failure GitHub and look where it got them.
Hell, I used to work for one such business, namely CKEditor.
It's really, really basic but I was impressed with the feedback I received from it, many people were impressed by how slick and fast it was. And indeed, I went looking for professional photographer websites and indeed, what a huge mess most of them are. Incredibly heavy framework for very basic functionality, splash screens to hide the loading times etc... It's the electron-app syndrome, it's simpler to do that way so who cares if it's like 5 orders of magnitude less efficient than it should be? Just download more RAM and bandwidth.
Mine is a bunch of m4 macros used to preprocess static HTML files, and a shell script that generates the thumbnails with image magic. I wonder if I could launch the new fad in the webdev community. What, you still use React? That's so 2019. Try m4 instead, it's web-scale!
 I actually use this one and it's just fine - great, in fact - but seeing M4 does make me feel like a lot of people may have spent a lot of time reinventing a wheel.
M4 is old and clunky but it gets the job done without having to install half a trillion Node.js dependencies. I also know that my script will still work just fine 5 years from now (or even 50 years in all likelihood).
That being said, don't trash your favourite JS framework right away, while m4 is perfectly fine for simple tasks I don't even want to imagine the layers of opaque and recursive macros you'd end up having to maintain for any moderately complex project. It's like shell scripts, it's very convenient but it doesn't really scale.
I'm not doing anything too clever with Pug, but there are certainly a bunch of things that it makes quite easy that would otherwise be awkward or complex.
Lots of things work really nicely as well: includes, sections, configuration, and it certainly cuts down on typing. I'm absolutely not contemplating a switch from Pug to M4.
I just find it interesting that there's this thing that's been hanging around for decades that would do at least a partial job and is still decent for simpler use cases.
Sadly this holds to an unexpected degree for Mustache.
But I definitely wouldn't recommend using m4 for anything beyond a quick hack, unless you're a m4 wizard (or willing to become one) and you trust that whoever is going to work on the project is going to be one too. It's definitely clunky and not very user friendly.
(Note: I still think React is an awesome library! I'm sure there are devs that are super productive with it too. It just wasn't the best fit for me and my company)
It was the worst decision they have ever made. The site they ended up with was incredibly slow, and given the relatively few pages on the site you never really make for that initial load in time saved later.
It's also incredibly hard to write well, requires a special third party service to show anything in Google and is incredibly hard to manage.
They don't realise this of course, and are now attempting to solve the management and initial load issues by splitting the app up into three distinct apps. It won't help.
React + React DOM is something like 35kb gzipped. That’s not nothing (don’t forget caching) and pushing the initial render to the client (though not strictly necessary) does incur a bit of a penalty, but I think the benefits outweigh the drawbacks in many more use cases than people give credit.
The real problem is two-fold: The first, as I stated above, is wrapping the entire application in a client-side implementation. As many people are pointing out this is often unnecessary. You don’t need to go full “SPA” in order to benefit from the vdom.
The second (related) reason is when developers just start adding 3rd party dependencies without considering their impact (or if they are necessary). React is a library, and for the features you get it’s really not that big. If that’s all you are using to add that extra sparkle to some of your pages I firmly believe you are getting the absolute most “bang for your buck”.
The ultimate approach is Beaker Browser though. You can actually just write your whole site in Markdown (/index.md, /posts/batman-review.md, /posts/covid-resources.md) and then write a nice wrapper for them at /.ui/ui.html. This means you can edit posts with the built-in editor - and people can 'view source' to see your original Markdown! It's like going beyond the 90s on an alternate timeline.
(A sample of this is this wiki: hyper://1c6d8c9e2bca71b63f5219d668b0886e4ee2814a818ad1ea179632f419ed29c4/. Hit the 'Editor' button to see the Markdown source.)
Writing for money and reservation of copyright are, at bottom, the ruin of literature. No one writes anything that is worth writing, unless he writes entirely for the sake of his subject. What an inestimable boon it would be, if in every branch of literature there were only a few books, but those excellent! This can never happen, as long as money is to be made by writing. It seems as though the money lay under a curse; for every author degenerates as soon as he begins to put pen to paper in any way for the sake of gain. The best works of the greatest men all come from the time when they had to write for nothing or for very little....
Brain Pickings articulates my reasons well, though really, just read the source:
Well, we're making progress towards reducing if not eliminating profit through authorship.
It just seems very idealistic to expect most websites to cater to a browser that hasn’t received an update in 8 years running on a processor that has a fraction of the power of any Raspberry Pi 2 or above.
Modern iPads are served the “real” web by default, further emphasizing why you would benefit from upgrading at least once a decade.
Add in advertising. And a browser that you might have trouble controlling script execution on.
Add in tracking. Remember that story about how eBay fingerprints your browser in part by mapping the open websockets you have? That costs power and cycles. And the cookies have been piling up since the 90s.
Speaking of storage, it's now much more common to use localStorage. Now anyone with a website on the internet can store appreciable amounts of stuff on your computer if you visit their website. And they can read and write that storage as much as they want while you're on their site, without regard for your computer's performance.
And all of this is just considered normal and regular. This isn't even getting towards abusive behaviour like running a crypto miner in a browser or something. This is just web applications continually expanding their resource entitlements.
The web is a great honking tire fire. Many articles and books have been written about this, many of them are summarily dismissed by web developers as ivory tower nonsense. But the trajectory of system requirements for displaying mainly-text documents is growing at an unsustainable pace, and there is eventually going to be some kind of reckoning.
I have a $3000 1 year old laptop, and sometimes slack gets so slow I have to kill the browser process and start over again. The issue is not hardware.
Keep in mind, the 1st gen iPad was single core and only had 256MB of RAM, which was the same RAM capacity as the then current iPod touch. Compare it to any PC with a single core Intel CPU and the experience will be largely the same, except of course the Intel CPU will require x10 more energy to run.
And yes, it did have a really great CPU for it's time. It's core CPU architecture remained largely unchanged for 2 more generations. The iPad 2 added an extra core (A5), and with the iPad 3rd gen, Apple gave it more GPU power in the SoC (A5X).
The web has regressed when an old iPad (or PC) can no longer reliably view it. It's not like words got harder to display in the intervening decade.
A blog post, news article, or a tweet shouldn't require a quad core CPU and gigabytes of RAM to be readable.
Those are some rose colored glasses. The first page (technobuffalo) loaded in this video takes 10 to 15 seconds:
Based on what was said, that page had already been loaded once, and it still took that long.
Loading a Google Search took only about two seconds, because of how extremely minimal and optimized the search page was back then.
Loading The NY Times took about 8 to 12 seconds, depending on where you draw the line.
So, at the time, maybe we were used to webpages taking a while to load on mobile devices, and it seemed very reasonable. I’m sure there were some extremely simple websites that loaded quickly, but The NY Times was one that Apple promoted heavily at the time (apparently) as demonstrating what a good experience the iPad browser was.
Nowadays, we hold the web and our devices to much higher performance standards.
It’s not an apples to apples comparison because the content is entirely different, but the context of this thread is that modern websites are significantly harder to load and render.
Even though content is supposedly much more resource intensive today, my 2018 iPad Pro loaded the technobuffalo home page initially in less than 3 seconds, and subsequent page clicks are even faster.
This iPad loads google search results even faster than the original iPad.
This iPad loads the New York Times in under 3 seconds, with subsequent page clicks taking the same or less time.
Based on the numbers, a contemporaneous 2010 iPad was 3x to 5x slower at browsing the 2010 web than my 2018 iPad is when it comes to browsing the 2020 web, and that’s a roughly two year old iPad design, so it should be even more at a disadvantage. My iPad is also rendering more than 5x as many pixels while doing that.
In conclusion, the original iPad was severely underpowered.
> A blog post, news article, or a tweet shouldn't require a quad core CPU and gigabytes of RAM to be readable.
Conceptually, I agree, but every image and video we use for content in websites now is significantly higher resolution and quality than they were back then. If you just want to read text, then you’re correct.
The video you linked showed this capability. That TechnoBuffalo page was a pathological case for the iPad rendering and it was still interactive fairly quickly even if all of the resources weren't finished loading. I had the original iPad and browsing worked just fine on it. Even when pages took a long (multiple seconds) time to load they were scrollable and interactive. I could read the content as everything loaded.
It's not shocking that your modern iPad renders pages faster than the model released a decade prior. Not only does it have far more power and memory but the network (both last mile and far end) is faster. It's also got an extra decade of development on WebKit. The web is more bloated but the modern iPad has ramped up its power to compensate.
Look at Reddit versus old.reddit.com. The "modern" Reddit page has poor interactivity even on my current iPad. The old.reddit.com site, which is similar in complexity to 2010's Reddit, renders damn near instantly and has no interactivity issues.
> Conceptually, I agree, but every image and video we use for content in websites now is significantly higher resolution and quality than they were back then. If you just want to read text, then you’re correct.
The main point was that the 2010 iPad was being given the most favorable conditions, and it still lost horribly, because even compared to contemporaneous devices, it was very underpowered, unlike current iPads:
- your claim is that websites are substantially heavier now (which I agree), putting the 2018 iPad at a disadvantage
- the 2010 iPad was browsing early 2010 websites in that video, so we've had 10 years of bloatification since then
- the 2018 iPad Pro is browsing 2020 websites, websites built years after it was released, so surely more "bloated" than they were in 2018
- being 2010 websites, they were probably much simpler to render
- the 2010 iPad's screen had 5x fewer pixels to contend with
Nowhere was I saying that the 2010 iPad was super slow to render 2020 webpages in all their bloaty goodness. That would be an obvious conclusion. If the 2010 iPad's performance was so good at the time, but only became slower as the web became much more bloated, why was it still so much slower at browsing 2010 websites than my 2018 iPad is at browsing 2020 websites?
The 2010 iPad was actually slow from the beginning, as the video proves. Since it was slow back then, it shouldn't be surprising that it's slower and more painful now that websites want to support higher resolution experiences by default. Yes, they could put effort into giving old devices a lower res experience, but why? That old browser is one giant security vulnerability at this point, and no one should be browsing any websites they don't control on that thing.
Even with all those advantages being in the 2010 iPad's court, it was still 3x to 5x slower than a 2018 iPad browsing 2020 websites at 5x the resolution. This is not even a 2020 iPad Pro -- this is a 2018 iPad Pro. Imagine how much worse a 2008 iPad would have been at browsing 2010 websites, if it had existed.
You say that it's "not shocking" that they ramped up the power so it can browse better, but the point is that we're loading substantially heavier websites today significantly faster.
How is that possible? Because the 2010 iPad was severely underpowered. If it had been running on a chip that was equivalent to laptop processors of the era (as my iPad Pro's chip is), then it would likely have loaded the 2010 websites about as quickly as my iPad is loading 2020 websites.
> Once a page was loaded and rendered the scrolling, tapping links, and interacting with forms was all usable fast. Even while the content was loading you could interact with the page.
Yes, it's very impressive how much interactivity Apple was able to give the 2010 iPad with its really terrible processor, once the loading finished.
That interactivity isn't because the chip was any good. I remember very clearly that it was because you were basically scaling a 1024x768 PNG while you zoomed in and out. Once you let go, the iPad would take a second to re-render the page at the new zoom level, but you were stuck staring at a blurry image for a second after zooming in. The GPU was really good at scaling a small image up and down. The CPU was not so good at rendering websites.
It was also very easy at the time to scroll past the end of the pre-rendered image buffer, and you would just stare at a checkerboard while you waited on the iPad to catch up and render the missing content. iOS actually drastically limited the scrolling speed in Safari for many years to make it harder for you to get to the checkerboard, but it was still easy enough.
> Look at Reddit versus old.reddit.com. The "modern" Reddit page has poor interactivity even on my current iPad. The old.reddit.com site, which is similar in complexity to 2010's Reddit, renders damn near instantly and has no interactivity issues.
New Reddit is one of the worst websites on the entire internet right now, if not actually the worst popular website in existence. I really don't understand how that hasn't been scrapped at this point. It's not representative of modern web experiences, except possibly in your mind. YouTube, The NY Times, Facebook, Amazon... these are all modern web experiences that work great on anything approaching reasonable hardware.
That said... modern frontend that doesn't run proparly on a core 2 intel machine with no adblocking should be massacred.