A few helpful search engines:
A recent movement to build personal Yahoo!-style directories:
* https://href.cool/ (my own project)
The above resources are focused on general blogging and personal websites - for software and startups, I would refer to the appropriate 'awesome' directories. (https://github.com/sindresorhus/awesome or https://awesomelists.top)
If you know of any more, please list them - a small group of us are collecting these and trying to encourage new projects.
Here's another big art repository:
And a very well-documented collection (a "wiki") of paintings, also non-profit:
Another interesting one is Aaaaarg (according to Monoskop's wiki, originally with one less "a", acronym of Artists, Architects, and Activists Reading Group):
Basically it's a collaborative environment for reading, annotating and discussing texts. The content is submitted by users and (thus) of high quality.
I think you need an invite to access the community. Also, the domain used to be aaaaarg.org, but I think they faced copyright issues of some kind and had to find an alternative domain. (Not sure about this; excellent new suffix, though!)
EDIT: More precise description:
I’ve been listening to the BBC’s Introducing Mixtape podcast for a while. I also use Spotify and really enjoy its recommendation but the 6 Music podcast is just stellar.
As paywalls restore quality journalism, I believe a renaissance for curated content is possible.
As in, do you believe this uptick in quality journalism has already happened/is happening? And what associates it with paywalls? Presumably you'd have to be seeing quality journalism behind paywalls for this to be case?
Welp, apparently this is the rare case where ‘avant-garde’ isn't the same as ‘experimental’ or ‘underground.’
For the past two weeks I have been trying to find an old website by searching for "old mysterious site search engines" and "how to search deep parts of web" and "search engine tricks old site" and Google has not returned anything, even when I limited the time span to 2005-2006.
I made the same search on Wilby and it returned search lores (maintained by the hacker Fravia) as the first result! I was so happy to find that website again because I haven't been on it for 10 years. Unfortunately I just found out that Fravia passed away in 2009 because of cancer :(...
Wiby seems like a search engine Fravia would have enjoyed.
There are many ways you could seed that search, but as totally-not-a-movie-buff I decided to check IMDb's list of lowest rated movies  and chose a title further down the list (The Wicker Man), on the theory that only dedicated people would be talking about movies that are bad, but not bad enough to be the worst. Searching for blogs (as identified by inurl:blog) mentioning "The Wicker Man"  does turn up a few promising results, like .
In my opinion there has to be widespread fatigue of Google just somehow managing to return a large chunk of something like 1,000 sites for pretty much any search. It's in part SEO, but it's also like the article mentions - Google makes money from ads. These sites they spam at you generate substantial revenue for Google - no name sites do not. Being the world's largest advertising corporation and search engine is one hell of a conflict of interest in terms of delivering what the user wants, instead of delivering what Google wants.
My DDG-from-shell Bash function:
Focus is either on the submission field or gets there on first tab. Search button is focused on next tab, and visible.
Wiby seems amazing - the first three surprise me links were a human powered ornithopter, lego maniacs and a guide to knife throwing. Thank you for sharing!
This is a very new group that has sprung up in the last few months.
Glad this is becoming more organized and I will follow along with interest.
It brings back a little of the wonder of the old web.
A smaller subset somehow seems bigger, more infinite.
Incidentally (and rhetorically), how I have not heard of micro.blog? It looks amazing! This whole thread has become a goldmine of interesting things.
I'm personally not a fan of directories that try to tackle the _entire_ web - it's just too sprawling. So I tend to not recommend them; you have to drill down pretty deep to get anywhere. I think 'awesome' directories (and Reddit wikis) have proven how well niche directories can work - and so I like to encourage folks to build their own directories that encompass their personal view of the web. They act like those 'little libraries' you see on the roadside or at pubs - but for the web.
Thanks for bringing these sites to attention, can't way browse them when traffic is lighter.
The author says the article was removed in 2006 ("[...] posts, were not accessible anymore") and then he re-posted the article at a new domain in 2013. That means any copy/crawl/repost of the article from 2006-2012 is now the oldest living, and thus "original", version of the article. His 2013 repost was seen as just another blog-spam copy.
Google is not forgetting the old web unless we see evidence of content disappearing from the index that have been consistently hosted at the same domain & URL since their original posts. Unless you properly 301 your URLs to new locations and consistently host your content, it's a guessing game for the crawler to determine where the original content has moved to.
No matter how you search for the content on Google, nothing comes up:
DuckDuckGo has it:
I checked the wayback machine and the content has constantly been on that url for over 10 years.
This is the first example of an old forum page I tried after reading the article. So I tend to think it's true. Google is discarding the "classic" web.
Yes. Not just over the past 2-4 months, but over the past five years or so.
It's become so bad that Google is no longer the most useful search engine for me.
He said they haven't noticed any regressions. I said I figured that would be the case but I can definitely feel the difference as a daily user.
I tend to believe that if user complaints about new problems or regressions increase over statistical noise - there is a problem.
Well said. This is a big problem. We see a similar problem with the use of telemetry data as well.
I had no idea about Hummingbird though.
I had assumed that Google search had gone downhill because it started trying to "personalize" my search results. That wasn't a great explanation though, as I don't use a Google account.
Hummingbird seems a much more likely explanation.
I do a full clear on my web browser (cookies / offline storage / history, everything) and then open YouTube in a private browsing window and it asks me which of my two Gmail accounts I want to log in with. I'd guess it's just a combo of external IP and browser fingerprint, but it's creepy.
I understand this being default behavior, but there really needs to be a way to disable it.
On the other hand the non-exact hits that it returns push me from time to time in the right direction.
Having said this, I don't know of course if A) I'm too old (40) and the mindset of the younger search-people has now changed and/or B) Google just doesn't index tech forums as much as it used to and/or C) there are just fewer forum-posts and/or D) my problems became more complex (don't think so) and/or etc... .
I tried (and still try from time to time) to use DDG and Bing but without success.
Does anybody else have the same impression?
So ... although I feel like we might be having the same issue, I'm not sure I'm using DDG correctly enough to say it's a problem.
Any search term not wrapped in quotes can be randomly ignored today. It can inject keywords it thinks you want (but really don't). Google is great for searching modern sites like Stack Overflow, but it seems to have lost interest in servicing power users.
Could something as innocent as training a new neural net or testing a buggy version of the algorithm on subsets of users. But it could also be as sinister as driving traffic to those in bed with Google, silencing opposition, or effectively whitewashing the entire internet...
They are maximising ad revenue, not the search relevance/usefulness.
Stopped using "google" as a verb a long, long time ago, in favor of just saying "search". I don't think that's ever confused anyone.
It's my opinion that a large portion of the websites on the front page of any search (quora and pinboard anyone) are completely bought and paid for.
I frequently suspect they're starting to optimize more for $ than they were before, and ML just gives them more ways to make that number go up another % or so... but it often comes with impossible-to-predict and wildly inhuman edge cases. It's a pretty common trend when companies start focusing on small number increases - each A/B test shows improvement, but the product as a whole worsens and it drives people away in time.
Maybe some similar change is coming to Search.
To me they started spiraling down when they started to give too much power to designers. Form over content is a terrible idea for a search engine ...
Aspiring science fiction authors, or Neal Stephenson, should write a novel about a world where ML tuned models optimize everything to be just good enough not to churn customers while maximizing margins. (Also applicable to non-profit items like politicians and universities)
So then I do the quotes thing, especially quoting phrases that 100% for sure must exist on some web pages, along with all my other keywords and pretty soon I'm at "no pages found". Pull back just a little, and it's page after page of entirely unrelated-to-what-I-want blogspam.
Looks like only page 6 is indexed for some reason. The site owner would be able to check the webmaster tools on Google to see why.
the end user searching for the content
the webmaster or author of the content
the search provider
If I'm searching for something that I know exists and I cant find it there is no excuse. The search provider failed to do its job.
There is not but the webmaster should have done this and that. He was hit by a bus 10 years ago and we should be happy the content is still available.
A good search provider would link a vanished website to archive org if the content is exactly what the customer wanted.
Long long ago when posting interesting links in comments didn't trigger commercial hysteria people would cite bits of texts and have a link to the full text. Later this became simply citing a chunk of text. I use to drop a few lines from the citation into the search engine and find the original work.
As i'm writing this there are exactly 45 search results above the one that should have been displayed.
There is no excuse like HN not ranking enough, they didn't not index the page, the other results didn't match the query better.
If we do this with 4 exact lines from a less popular site it will end up some place on page 20 of the search results.
Another example, I really don't care for indexing but here is an article that I always (jokingly) refer to as my greatest work.
The exact title:
A really weird result. Safe to say nothing matching is there.
The first many words from the text:
It doesn't find it.
Then we check if it is even indexed...
And there it is! Why does it even crawl the page?
It also lists websites that have the number 8616 on them and ones with both the word "blog" and "here" in the text.
I'm not suppose to laugh?
you can force the site with "site:... ": https://www.google.com/search?q="metallica+only+played+2+son...
it's doesn't find page 5 with these terms but find page 6.
There is probably an issue within the page 5 itself.
How is Google supposed to find that out?!
Which goes against the original mission of Google to "organize the world's information and make it universally accessible".
A "bug" could be an option, but I don't expect that to be the reason. It's too easy to find examples of forgotten content. And I don't think a bug of that magnitude in Googles core business would go unnoticed.
Which core business are you referring to?
It forced my to solve a bunch of CAPTCHAS too.
Also on the "Million Short" search engine mentioned by kickscondor:
I never saw that one. Do they have their own crawler?
> Rumors spread that large link pages (for surfing) might be considered “link farms” (and yes on SEO sites they were but these things eventually trickle down to little personal site webmasters too) so these started to be phased out. Then the worry was Blogrolls might be considered link farms so they slowly started to be phased out. Then the biggie: when Google deliberately filtered out all the free hosted sites from the SERP’s (they were not removed completely just sent back to page 10 or so of the Google SERP’s) and traffic to Tripod and Geocities plummeted. Why? Because they were taking up space in the first 20 organic returns knocking out corporate and commercial sites and the sites likely to become paying customers were complaining.
SEO seems to have become a huge obstacle course that smaller websites can't play.
> Then the worry was Blogrolls might be considered link farms so they slowly started to be phased out. Then the biggie: when Google deliberately filtered out all the free hosted sites from the SERP’s...
That's all observable fact.
Why? Because they were taking up space in the first 20 organic returns knocking out corporate and commercial sites and the sites likely to become paying customers were complaining.
I think the more reasonable, less diabolical motive was that the blogs and free hosted sites were largely link farms that no one wanted to visit.
It sucks for the few legitimate pages on those platforms, but when most of the legitimate page is the rare gem in a minefield of automated copies of other blogs, just with SEO links and ads inserted.
It's like a comments section: without moderation or captchas or both, a "thriving local community" on, say, a small town news site can be overwhelmed by automated pharmaceuticals spam. Then the newspaper kills the comment section, not out of any malice towards the original community but because they don't want to deal with the spam.
And yeah, dealing with spam and black hat SEO does take resources. If you (or worse, your chosen blog host) don't keep the weeds down, soon your pasture will be overrun and burned off.
Where I don’t agree with you is in the portrayal of the Web as largely comprised of link farms and “few legitimate pages”. I spend a lot of my time cataloging the hidden corners of the Web and it is mostly individuals working on their personal Web projects. Spam is simple to identify (much more so than ‘clickbait’) and many of the reasons people don’t read personal websites any more isn’t because interesting and mind-blowing projects on the Web are too rare. (I don’t have statistics to back this up, but I feel like they are more common on the Web than on social media.)
That sounds interesting. Do you have a list of some interesting projects that you're willing to share?
Thankyou for asking. If you know of any sweet links, pass them along!
The problem is that Blogspam is now a (legitimate) industry much bigger than Google can manage.
Google Search became a playground for marketing firms to dump content made by low-paid freelancers with algorithmically chosen keywords, links and headers. It's SEO on large scale. Everything is monitored via analytics and automatically posted to Wordpress. Every time Google tweaks its algorithm to catch it, they're able to A-B test and then change thousands of texts all at once.
Personal blogs can't even dream about competing with that.
In fact, those companies are actively competing with personal blogs by themselves: via tools like SEMRush and social media monitoring, they know which blogs are trending and use their tools to produce copycat content re-written by freelancers and powered by their SEO machine.
I know a startup that is churning 10 thousand blogposts per day on clients blogs, each costing from 2 to 5 dollars for a freelancer to write according to algorithmically defined parameters.
Just wait until they get posts written via OpenAI-style machine learning: the quality will be even lower.
Not only that: there's no need for black hat SEO anymore. Blogposts from random clients have links to others clients blogs, and it is algorithmically generated in order to maximize views and satisfy Google's algorithm. They have a gigantic pool of seemingly unconnected blogs to link to, so why not use it.
The irony is that companies buy this kind of blogspam to skip paying AdSense. Why pay when you can get organic search results? So not only they're damaging the usefulness of the SERP, they're directly eating Google's bottom line. These blogs also have ZERO paid advertising inside them, since they're advertising themselves.
That's the reason Bing, DuckDuckGo and Yandex still have "old web" results.
That puts Google in a very difficult position and IMO they're not wrong to fight it.
But look at it another way: you have lots of humans writing - and it's all of varying quality. Why not let the humans decide what's good? The early Web was curated by humans, who kept directories, Smart.com 'expert' pages, websites and blogrolls that tried to show where quality could be found. Google's bot war (and the idea that Google is the sole authority on quality) eliminated these valuable resources as collateral damage.
Maybe the problem is that PageRank (or whatever they call it these days) has run its course. I mean, it supposed to gauge "what humans think is good", but it's failing miserably. It's indeed time for a more curated, artisanal, web.
What gives me pause here is all the anecdotes in this thread about other engines getting results right. If the real answer is "PageRank has been successfully flooded by bots", then everyone would have bad results.
What I suspect, off nearly no evidence, is that Google is using ad tracking to inform a notion of search relevancy. My nearly unjustified belief is that that system is the one being flooded by bots.
Piracy is gone, but you will find hundreds of automatically generated credit card phishing sites full of Google Ads, sometimes promising pirated versions but serving a trojan, sometimes showing a credit card form. Some of them are on the first page, sometimes before legitimate websites.
But if their efforts in fighting it are a large part of the reason that Google search results are getting downright bad, then they're wrong in how they're fighting in.
What I mean is: I don't think their fight is misguided or evil this time, they're trying to keep the result pages useable for end users. They're just doing a terrible job out of it. (Or: they're doing a worse job than spammers)
Isn't Google responsible for making Internet advertising accessible and widespread? They developed and launched AdWords (2000) and AdSense (2003).
Absolutely right. I recently started a blog, and was disheartened to learn that I have sign up for accounts with several search engines, conform to their standards and rules, give them a bunch of data... and still sometimes have mysterious issues with indexing with no real recourse. How much time and effort do I really want to spend to play the seo game? I have a job, projects, and hobbies, I don't have the time or patience to play their game of "let's fuck with things randomly until you get indexed and ranked higher". That was fun for a few hours, but I'm done with it.
And, of course, consider having a blogroll of the sites you follow, which is all of our little way of contributing to the effort of finding each other. :)
If you have an older post that's great but not changed, it'll become less prominent. So go in, edit in some changes, and now it's fresh and ready to be indexed prominently again.
If this is how it goes, I guess it helps in a way. The articles we care about get attention and don't drop off. But there's so much of the old web we might lose in the haystack.
 : https://www.tbray.org/ongoing/When/201x/2018/01/15/Google-is...
We've come to rely on Google too much, so much that if you are not on Google you don't exist. That's a problem with researchers that are looking for articles to cite.
- it was previously available on Google Scholar
- it cannot be retrieved, or the search on Google Scholar gives a misleading result (for example it gives another article, as explained in )
Please help to make a list of scholar dropouts! Thank you.
 https://news.ycombinator.com/item?id=19604722 HN comment with evidence
 https://news.ycombinator.com/item?id=19604955 HN reply with another evidence
Fingers crossed they don't get dropped from the main index too.
This rationalization doesn't change the fact that it's incresingly hard or impossible to find certain things on Google, that they are effectively biased against certain types of websites and certain types of pages (even when the content is perfectly good) and that other search engines seem to be able to deal with these issues much better.
I can, very loosely and anecdotally, confirm.
My personal website has been online for about 20 years and I just picked some deep strings of text and searched for them and google has the whole thing indexed just fine ...
I first posted it on
then I had a 301 redirect there for a couple of years to
until I stopped paying for the .de domain. About 5 years ago I made another 301 redirect to
which is still in place. DDG finds it but not Google and actually neither does Bing.
They're so big that it's worth blackhats spending significant resources to game their algorithm. That induced them to implement a spam filter which is now discarding the ham along with the spam.
Which means that smaller search engines that aren't being targeted by spammers are now giving better results. That is a major long-term problem for Google if they can't avoid throwing the baby out with the bathwater like this.
People only use Google because it has historically had the best results. They'll get some way on inertia now, but that doesn't last forever. They need to fix this or they're ultimately in trouble, and we could be heading for a landscape where being a search engine above a threshold size is a liability.
It's odd to put forward the hypothesis that DuckDuckGo is now better at search (aggregation) than Google is at search. But that seems to be where we have landed.
I think Google has been explicit about this (I may be wrong, but I seem to remember thinking about this because Google themselves said it). Essentially, I believe, they are no longer concerned about being a way to navigate all the material found on the internet. Instead, they are concerned with answering the question posed by each search attempt.
A few years ago they made a push to answer questions to the point it was in their product description on their "how Google search works" page. To quote it exactly, it used to say their objective is to "return timely, high-quality, on-topic, answers to people's questions."
And that's kind of the whole problem and why there is space for a search that actually returns results from the web in a clear and logical way.
I've been thinking about this, and it seems very plausible to me. Which means that Google Search isn't really "search" anymore -- which explains why it's become so bad at that!
Too bad. I remember when Google had the best search engine going. It was a real game-changer. Those days are long gone.
Google certainly doesn't seem to value feedback at all. It's practically impossible to get in touch with a human to ask for help and Google's feedback forms have always felt like a black hole.
I can't recall the exact search term, but I kept looking for some site I visited some time ago, and no combination of words could get it to actually find the actual site. I finally just gave up and found it in my browser history.
To be fair DDG rarely works for me in that way, either. I think that kind of old-school, precise search engine's just dead now. It seems like everyone's indices are a lot "fuzzier" and full of holes, like they're discarding large parts of pages from the index if those parts don't look important to the algo. Not just deprioritizing, but tossing those pieces out entirely. Except the algo's very wrong.
On the other hand, for day to day work at least for me it is still indispensable. Googling a random exception out of an unexpected stack trace works far better than with DDG for instance.
As it is with most big companies that profit from ad revenue. They seem to consider performance indicators to be sufficient to know if a new feature is good or bad, instead of worrying about written customer feedback.
I didn't make an estimate because I literally do not care about the monetary cost. Why should Google be exempted from basic standards of customer service just because it's profitable to do so? That's the exact opposite of being a productive member of society.
Fact is, users aren't perfect and will always need help to use services provided by others. A company which does not help its own users because it's less profitable to do so is an unethical company and should not have any business whatsoever in a civil society.
If that makes the company not profitable then perhaps the company should actually sell their services for a cost instead of "free", or perhaps become a non-profit (and reduce taxes), or become absorbed by the government (and citizens' taxes provided the "clearly beneficial but not profitable" services), or... you know... stop being profitable and stop being an unethical business.
or whatever. Fiddle around until you find "verbatim" and choose it.
Maybe somewhere there is a Google disk with the the hash of an exact phrase the author typed into the search box. But statistically, that hash won't be found in hot memory vector space when cosine similarity runs on a nearby server. Finding the phrase would require a batch job that runs much longer than the engineered time limit Google imposes on search queries. Without a "let me know in 24 hours" option, Google's search will partition data into what should and shouldn't be accessible. That partition will always be according to Google's business goals. All the information may be indexed, but only the fraction of the index beneficial to Google will ever be accessible to ordinary users.
The crux of the story is that there is no business case for Google to return the author's web pages in search results even if the Wayback Machine implies that Google could.
In a way I’m had to hear that Google is delisting the older content because I thought I was doing something wrong.
But it’s still frustrating for my visitors because every few months I get a message about how they can’t believe all the information there is on the site that they’ve searched for for years but never found through search engines but it’s all right there on this one site. (It’s something of a regional history site.)
I guess those sites have involuntarily become part of the “dark web.”
I know very little about google's SOP, but had the impression they periodically rescan stuff.
Duck Duck Go finds it and shows it. According to the logs, Google spiders it, but chooses not to show it.
Here is one specific example out of dozens I've seen. There is a short satirical rant "published" on Pastebin called The Java Way. Posted in 2015. Unfindable on Google. It was indexed and findable around the time it was posted.
First result on DDG:
The worst part is that Pastebin uses Google for its own search.
No information is available for this page.
Sometimes, the same idea is available in a book, in a TED talk, and in a podcast. Some of us are curating such resources categorized by topic / format / year / difficulty / estimated time. Our GitHub repo received 100+ stars in less than a week, so I thought it would be a good time to show it to HN. I'd love to get some feedback and critique from the HN community where I have learned and discovered so much.
Here's the Show HN post: https://news.ycombinator.com/item?id=19604295
Agreed. And if we came up with a standard format for such lists you could make them searchable and we could end up with distributed searchable curated indexes that are not centrally controlled. And that is quite compelling compared to centralized fully algorithmic search systems run by mega-corps.
Like it's sometimes feel like web completely frozen and all content moved into closed gardens. I switched to DDG a while ago for this and bunch of other reasons, but I wonder if someone else noticed this. Anyone?
Anyways, I find myself using Bing more and more often these days, because the search results dig more deeply into the 'obscure'.
I'm not at all upset by this. It seems to me that as Google's results are not completely satisfactory, more people will make use of various alternatives. Maybe one day, search will become decentralized again, somewhat like it was in the 1990s, when you regularly made use of many search engines, like Altavista, Lycos, Excite, and Yahoo.
I would imagine that there must still be metasearch sites out there somewhere that submit your query to several search engines. I need to find one again and would appreciate recommendations.
I agree with the observation that this is about shifting everything to current data, because people overwhelmingly care about things that happened a few days ago. There used to be a long tail of users searching for old data and references, but I suspect they're fading away. Biasing the index towards recency also has legal advantages for Google, because delisting old content makes it less likely to receive takedown requests in connection with "right to be forgotten" legislation.
What do you do with these pages after you've crawled them? You need to build an index out of them, and serve that index out of some kind of low latency storage (DRAM, Flash). That makes increasing the index size very expensive. The index size has to be limited, and selecting the right pages to include in the index is thus a core quality feature for a search engine.
I still suspect that this whole thing is more about bias (and personalization, be it correct or incorrect) in the results.
It's actually more complicated than just a single static index, which is also why it's unrealistic to expect a search engine to be deterministic at scale.
How would a search engine distinguish between the two kinds of queries, tens of thousands of times a second?
And how would one architect such a two-tiered system, particularly with an eye toward cascading failures?
How much extra state (internal connections, memory for partial results, etc.) would such a new search type create?
How do you deal with a new kind of hot spots now?
What if millions of people suddenly activate such an option?
What if a botnet does it?
I'd break up the indices into digestible chunks, perhaps chronologically by year/month crawled, and then run all queries simultaneously (in parallel) against all those index chunks and combine the results at the end. Infinitely scalable and can be tweaked to ensure specific response times.
And there'd definitely be no need to set some arbitrary date cut-off; just add a few more virtual machines. I'd bet that's what Google was doing, and then scaled back those machines to save money and boost profits.
Still, you can't keep partial results around forever, unless you want to make searches a lot more expensive, having to add a lot of capacity just to deal with the buffer bloat. Each query touches at least a thousand machines. Adding "a few more virtual machines" isn't going to cut it, especially if you have to handle tens of thousands of requests per second.
Information that people regularly access for whatever reason will tend to remain relatively visible. But, yeah, relatively obscure older content is just going to get drowned out unless you know exactly where and how to look. One might argue with Google's criteria around relevance. However, that older information is going to get harder and harder to find just in the natural course of things.
This has been a problem for three years, it's incredibly frustrating, and also demotivating.
I to have complained about the search results of google going down hill. I was told "I was just to technical".
Whatever the case, the web is not the same as it was in the early 2000s and it really is sucking if you are wanting to search for something.
It happens far less frequently than in other places of the web, but I've seen it happen often enough with some of my comments.
It took me a few days to safely upgrade to a new Ubuntu version with a new enough Python to successfully run letsencrypt, without also breaking the weird custom apache configuration rules that had accreted over the years.
An article from 2011, it's not particularly damaging anyway, just the kind of things that happen in tech:
If you search the title of the article:
- DuckDuckGo: First result
- Bing: First result
- Yahoo: First result
-Dogpile: First result
- Yippy: First result
- Google: Does not show (I've gone through the 4 pages of results with no luck). To find it you need to use the "site:zdnet.com" option, and then it's the first result.
Their search engine algorithm is designed to favour rich media content and websites that are growing, because this is how Google grows and learns.
Favouring the "old web" would not help Google's business model which favours growth.
It wasn't that long ago that any Google search would return a big list of blog entries; personal, non-commercial blogs, that is. That was the case with YouTube, as well. I remember people making and uploading videos just for the sake of it. Even I did that (I had a January 2006 account), and the purpose was only one: sharing what you liked to engage in. Nothing more. I guess that's part of the long-gone, old web. When I browse the web nowadays, I feel like I am constantly being sold something, because I actually am.
That old web still exists! It tends to get drowned out by all of the commercial sites, and you won't find more than a hint or two of it through Google, but it is still there...
This might be the straw for my personal camel's back.
Might be just that. More information: https://www.google.com/amp/s/searchengineland.com/googles-de...
Only the relevance search has results for usenet posts, you can't order by date, and other then using date ranges, there's no way to only see usenet posts.
For example searching for "gamer" before 1/1/2000:
Maybe archive.org could take this on.
> I also find misleading the title of BoingBoing’s report of this story: “Google’s forgetting the early web”. The two posts mentioned here are not “early web”, nor really “old”.
While the title of this author's post is "Indeed, it seems that Google IS forgetting the old Web"
And today, if the big search engines decide something will (no longer) be indexed, they can make it effectively unreachable.
I think you mean "unfindable", not "unreachable". It may seem pedantic, but I think there's a critical difference there.