Ask HN: Is the past disappearing on the web?

samwillis · on Jan 26, 2022

> I feel like I'm encountering more and more sites and articles where I can't seem to find the date.

It seems to me that its become standard practice on marketing type blogs for corporate websites to remove the date from their posts. I think its because (from personal experience) the company will go though a burst of "blog productivity" create a load of content but then not touch it for years, they don't want that content to look out of date or their website to look stagnant.

Removing the date from their posts, or any other content, hides how old it is and therefore obscures how active they are at crating new content.

Most companies try to use their blogs to attract new customers, a new customer may visit their website once or twice and will never see the blog again, it's not important that they do. They don't want it to look stale.

As a counter example, an interesting thread from yesterday [0] was about how CloudFlare use their blog not as a marketing tool but for technical content and attracting employees. They very regulally use their blog, and so keep the date on it showing how fresh it is.

0: https://news.ycombinator.com/item?id=30070422

pimlottc · on Jan 26, 2022

There’s a popular concept in the content industry of “evergreen content”, meaning posts that are always relevant or useful regardless of when they were originally posted. The idea is that articles will have a longer “shelf life” if they are not tied to a particular date or recent event.

Of course, producing true evergreen content takes more effort than just removing the publish date but that’s one easy way to fake it.

bo1024 · on Jan 26, 2022

Originally posted: <date>

Most recent update: <date>

Edit: it’s only a matter of time till an article stops getting updated. Of course I doubt people in the “content industry” (tell me that’s not a real thing) care what happens to anyone else after they stop updating their ‘evergreen’ page.

igneo676 · on Jan 26, 2022

A past employer almost went with this approach, but with a technical audience it's hard to cover all the edge cases with RSS. People get cranky when they get pinged when an article updates superficially, but you also want people updated when the article is basically completely rewritten

Safer to just publish a new article when there's a sufficient amount of content to update. Link the articles together somehow. Maybe add a disclaimer to older (and not updated) articles that they might no longer be valid

Also solves the problem others have mentioned where they don't trust the date on articles. If you have a solid previous article to link to, you're more likely to build trust that the new one really is new

kevincox · on Jan 26, 2022

I don't see how this is a problem with RSS. RSS (and the other feed formats) have a concept of an entry ID. If user's don't want to see updates they only show items with a new entry ID.

To get even more nuanced generally a superficial update shouldn't change the "updated" time, this gives another level of control.

RajT88 · on Jan 26, 2022

I see some articles which always have a "last updated" of today. You can refresh a day later and watch it change.

bbarnett · on Jan 26, 2022

people in the “content industry” (tell me that’s not a real thing)

I think if there is a thing of any sort, there's an industry for it.

And if it sound dumb to an engineer type, it's more likely there's a lot of it.

da39a3ee · on Jan 26, 2022

It’s too general though isn’t it. It would be like the “doing stuff” industry being a thing. There might be people who use the term, but that doesn’t mean the term points to any actual phenomenon in the real world.

supernovae · on Jan 26, 2022

It's a huge industry. There are services to buy writers, buy content, trade content, link content, guest post so on and so forth.

bo1024 · on Jan 26, 2022

That was my reaction, it’s scary right. Like need an article on safely installing a light switch? Call the content industry. Managing stored PII and passwords? Content industry. Medical information? Content industry.

jrochkind1 · on Jan 26, 2022

I see that a lot these days with "most recent update" that I'm pretty sure is a lie. Or maybe they changed some tiny thing to make it not a technical lie.

bbarnett · on Jan 26, 2022

Marketing is all about lying. Making up fake reviews, customer testimonials.

Exaggerating (which is lying) about the product.

Explaining how good it will make you feel, or better yet, showing photos of people very happy, using the products.

Acting = lying, photshopped photos = lying, fictitious scenarios of use = lying.

Lying is required to write a story (when you read a fiction book, you want the lie about reality, for it is fun to imagine.)

This is a lie all parties engage in willingly.

But marketing types have a sole purpose. To lie, but make it look like truth.

So they don't care one iota about really updating an article or not.

nerbert · on Jan 26, 2022

Movie with CGI such as Star Wars = lying?

TuringTest · on Jan 26, 2022

Fiction is lies only when you want to pass them off as the truth:

"That's right -- and when you get to the human world, the Nothing will cling to you. You'll be like a contagious disease that makes humans blind, so they can no longer distinguish between reality and illusion. Do you know what you and your kind are called there?"

"No," Atreyu whispered.

"Lies!" Gmork barked

― Michael Ende, The Neverending Story

TuringTest · on Jan 26, 2022

There's in the same dialog this warning about advertising and politics:

“When it comes to controlling human beings there is no better instrument than lies. Because, you see, humans live by beliefs. And beliefs can be manipulated. The power to manipulate beliefs is the only thing that counts

... Who knows what use they’ll make of you? Maybe you’ll help them to persuade people to buy things they don’t need, or hate things they know nothing about, or hold beliefs that make them easy to handle, or doubt the truths that might save them.”

sabhiram · on Jan 26, 2022

need a revisit, that seems appropriate and deep

TuringTest · on Jan 26, 2022

It is! Ende was raised under the esoteric Anthroposophy philosophy, and I suspect much of his works are dedicated to debunk its nonsensical beliefs. The second half of The Neverending Story is a philosophical treaty on itself (it hurts the book as an action story, but gives you a lot to ponder about).

His child books certainly have a deep second reading as adults.

bbarnett · on Jan 26, 2022

All fiction is a lie, by its very definition.

After all, it claims the unreal as real. Fiction, lies.

The difference with things like fiction in books and movies, for entertainment, is that the listener of the lie, knows it is a lie, and listens for entertainment.

It is not a lie, pretending to be true.

abyrne10 · on Jan 26, 2022

I would argue that a lie involves deliberate deception, not just untruth. Fiction, then, is not a lie, as by definition it is something imagined or invented, and therefore not created to deceive.

bbarnett · on Jan 26, 2022

I think deceive was created to explain the difference in intent.

nicbou · on Jan 26, 2022

Yep, the whole thing is made up!

warrenm · on Jan 26, 2022

and then someone figures out how to put a `<?php echo date('l, F jS, Y'); ?>` into the "Most recent update:" line ... voila! Perennially up-to-date!

openknot · on Jan 26, 2022

I first heard about "evergreen content" from the local news industry. To add examples of what it looks like (to demonstrate the extra effort required as you mentioned), here are a couple recent examples from The Guardian (because there's no paywall):

1. "Top 10 novels inspired by Greek myths" (not tied to a recent event): https://www.theguardian.com/books/2022/jan/26/top-10-novels-...

2. "How to make pea and ham soup - recipe": https://www.theguardian.com/food/2022/jan/26/how-to-make-pea...

These are unlikely to pull lots of traffic versus a more timely event (e.g. interview with an author for a recent book release for the first section; except maybe the second as it's a regular column). However, the usefulness about evergreen content for a local print publication was to fill content for the print edition when there was a slow news week (not enough timely news to fill the pages).

jonathankoren · on Jan 26, 2022

The ultimate piece of evergreen content I’ve heard of is Peter Stark’s “Frozen Alive” for Outside Magazine.[0] It’s a story about what hypothermia does to you, told as a second person narrative. It was first written 25 years ago. It was a hit article in 1997, and since at least 2016, it is one of Outside Magazine’s most read articles every year.[1][2]

[0] https://www.outsideonline.com/2152131/freezing-death

[1] https://niemanstoryboard.org/stories/peter-stark-and-as-free...

[2] https://www.npr.org/2019/01/31/690468853/why-peter-starks-fr...

ivanhoe · on Jan 26, 2022

> They don't want it to look stale.

But that can also be extremely counterproductive when the content you publish has a natural expiry date - and in many areas of expertise (pretty much anything other than pure marketing talk) things change over time. Potential client seeing the obsolete information might rightfully presume that the company is out of the loop or just plain unprofessional. If you have a date on the page it's far less likely to happen.

COVID recommendations and measures are one such (rather extreme) example where many big players endangered their credibility because they've failed to properly mark the outdated content.

sofixa · on Jan 26, 2022

That's why The Guardian is a pretty great source for news, they add a banner on all old articles saying it's X years old, so beware, information might be stale.

rgrieselhuber · on Jan 26, 2022

It also serves as its own form of memory hole, if readers can’t get the context of the date after a few searches, they are likely to just give up.

taubek · on Jan 26, 2022

I must say that I put more trust in blog posts that put the notice "Updated: 10/2021") at top of their posts. This communicates to me that this topic was important at some time in the past and that someone is still taking car of the content and is updating from time to time.

Stability and old content can be good. Not everything is being updated and not everything needs to be updated. I'm all for putting dates on pages and blog posts :)

narag · on Jan 26, 2022

When they are just dishonest, saying that it's updated today or recently using a script is just as easy.

They just lie. I was trying to find out why VLC stopped working for me a few months ago [0] and landed in a terrible site where they suggested to do a lot of cargo cult driver updating and whatnot. A few "comments" thanked the author for the comprehensive (and totally fake) information.

This kind of crap seems to be winning. My other recent quest, finding a uni site about consciusness, psychedelia, emerson and 60s music, that I used to visit 20 years ago, was also unsuccessful.

[0] https://code.videolan.org/videolan/vlc/-/issues/25976

(nobody assigned)

NavinF · on Jan 26, 2022

On that note, Reddit seems to be faking their last modified date metadata for SEO purposes. I’ve seen many old threads (where all the comments are 5 years old with no edits) show up in my Google search results with today’s date below the link. This coincided with and is likely related to the removal of automatic thread archiving 4 months ago.

bbarnett · on Jan 26, 2022

This also signals an end game, where the last vestiges of true improved, customer facing improvements are over.

With nothing left, one delves into short term growth numbers.

Because of this, some indexes may de-stress reddit by date search, and, by freshness. And some end users will click, become annoyed more often, and stop clicking on reddit links.

They clearly have nothing left, and no other idea how to improve reddit. They have signaled they are well past peak growth.

taubek · on Jan 26, 2022

I understand you. I've noticed this kind of behavior on some review sites. They put the latest month in the title but who knows when did they actually update it?

stingraycharles · on Jan 26, 2022

Just today I was reading a blog post about some framework (KubeFlow) not being “ready for prime time yet”, and even though the article had great technical details, the fact that it lacked a date made it so much less valuable: it’s highly relevant whether this conclusion was drawn last month or two years ago.

I understand why this happens, but part of me really wishes we could stop this. Maybe there is some archive.org extension that can show me “this page first appeared on $date”.

mlac · on Jan 26, 2022

If you found it through google, or search it on google, you can see when it was cached.

dmortin · on Jan 26, 2022

One can even envision an extension which fetches the caching dates automatically and shows them right beside the search results.

shishironline · on Jan 26, 2022

Open developer tools and type `document.lastModified` in the console. Though this is painful for every web page, is useful in these instances. Maybe someone with experience in developing Chrome extensions, can develop one for getting this plus other useful page info by just clicking on it in the toolbar

qw · on Jan 26, 2022

"last modified" only works for static content, and not pages that are rendered dynamically or where the header is set based on cache settings.

Even for static content, the date may be wrong as the output may have been generated multiple times since it was first created.

shishironline · on Jan 26, 2022

Thanks for the info

iggldiggl · on Jan 28, 2022

Wordpress-based blogs (maybe others, too) often have a timestamp hidden in an HTML Meta-tag even if the date is otherwise hidden from regular view.

bryanrasmussen · on Jan 26, 2022

why would companies who are trying to deceive not go through the trouble of changing the last-modified header? At any rate lots of sites don't set last modified anyway so that makes it January 1, 1970, GMT.

thatwasunusual · on Jan 26, 2022

I have the same feeling. Actually, I've seen it in real life. A company I consulted a few years ago created lots of "evergreen content." Now, they had a reason to do so, but...

It would be cool if archive.org (or any others) had an API that made it easy and quick to look up "first seen" timestamp for any given URL.

jrwr · on Jan 26, 2022

Here is the API you are looking for: -- https://archive.org/wayback/available?url=example.com&timest... -- this will show the oldest timestamp of a archive that the wayback has. The trick is to set the timestamp to be /really/ old and it will show the first snapshot it has.

jrochkind1 · on Jan 26, 2022

I don't think you can tell if "the text" of the page has changed since then without manually looking though. I'm not sure how you'd solve this technically, it's probably more than just telling you if the html bytes are identical, since small changes probably happen all the time to "the same article".

Or it might be good enough, because this kind of content is rarely updated at the same url?

mark-r · on Jan 26, 2022

I wish Wayback had an indicator for how much the page has changed from one capture to the next. A simple count of the diffs should give you a good idea.

thatwasunusual · on Jan 26, 2022

That is irrelevant, at least for me. :)

jrochkind1 · on Jan 26, 2022

Why are you interested in when the URL was first there, but not if it has entirely different content than when it was first there?

thatwasunusual · on Jan 27, 2022

Because I'm only interested in when Wayback first saw the page. Nothing magical. :)

thatwasunusual · on Jan 26, 2022

Very nice, and thanks for that! I'm unable to find usage rules, though.

bloak · on Jan 26, 2022

BBC television programmes used to give the year in Roman numerals, in the copyright notice, I think. It has been suggested that this was to make it harder for people to notice how old the programme was. The same technique wouldn't work so well on a web site because you couldn't do what the BBC did and leave the Roman number on the screen for about 0.5 s so that most people don't have time to decypher it. Also the dates at the end of the last century were particularly hard to read in Roman. Here's an example. Time yourself:

meigwilym · on Jan 26, 2022

> It has been suggested that this was to make it harder for people to notice how old the programme was

This is tempting but I don't know how much credence to give it.

It's more to do with a style that was taken on and kept. My father still writes the month in a date using a Roman numeral, e.g. 25/XII/21.

The date format has been around for many years. There weren't that many BBC programmes made, even over the course of many years. To try and convince viewers that a programme was not "old" was difficult. It might be black and white, and the quality certainly would have looked dated.

mark-r · on Jan 26, 2022

I think the convention started with movies.

LocalH · on Jan 26, 2022

I got it in seconds because I used your context of being a date "at the end of the last century", see the VIII and immediately guess "1998"

brimble · on Jan 26, 2022

Common in movies, before home media or even broadcast was A Thing, and in some other places. I don't think that's the explanation.

raverbashing · on Jan 26, 2022

I'm not sure how true this urban legend is but it is not too hard once you get the hang of it

(And dates aside, how the program ages is more important)

INTPenis · on Jan 26, 2022

Like a true millennial I just entered it into Google and was surprised at the result.

kps · on Jan 26, 2022

I was disappointed by [MCMXCVIII]; [MCMXCVIII in decimal] does better, but I really hoped for more like the unit conversion panel.

JMS2021 · on Jan 26, 2022

samwillis · on Jan 26, 2022

Just to add an interesting example of a middle ground, the fly.io blog post [0] currently top of HN [1] has the published date "hidden" at the bottom of the article. Their content is technical content that can go out of date but is also useful marketing content. The post is from August 2021 but has been posted to HN today.

0: https://fly.io/blog/run-ordinary-rails-apps-globally/

1: https://news.ycombinator.com/item?id=30083764

saurik · on Jan 26, 2022

FWIW, I started hating putting dates on most of my stuff because I simply got sick of people incessantly asking "is this still up to date?" because the date happened to be from even just a few months ago much less a year ago.

nosianu · on Jan 26, 2022

Why not add something together with the date telling people for how long this information will be valid? Does not have to be a date, can be "indefinitely" or, in the case of software "until new major version update". Works best when you also revisit old content occasionally and update that information. I think you drew the wrong conclusions from those inquiries.

helsinkiandrew · on Jan 26, 2022

> I simply got sick of people incessantly asking "is this still up to date?"

I don't know what or where 'your stuff' is - but isn't that a fair question for a lot of topics? Writing about data structures or political protests might be valid for decades but a lot of technical writing about languages, platforms, even products and companies can age very quickly and unlike news or culture might be of less interest to people in the future.

Many people find technical writings whilst searching google for a solution to a problem. I love the articles that have a date and version of whatever platform something is relevant for (even better when someone adds a "this was written for version X.Y things have changed in version X.Z see the doc at..." etc)

randcraw · on Jan 26, 2022

And if you remove the date, people will still ask the same question. Inevitably.

worble · on Jan 26, 2022

It's a valid question, you don't have to respond to them but at least if the date is there the user themselves can have a fighting chance at looking up what's changed since then.

It's entirely possible a tutorial made for software 3 months ago or something might have outdated information if there was a new, backwards incompatible release, or that a news article might be missing information that was revealed only a few days ago, etc.

dusted · on Jan 26, 2022

Yeah, that can be a problem, I used to put a Created and an Updated date, but now I just have a Last updated date.

jzer0cool · on Jan 26, 2022

I notice this sometimes also in academic papers. I would like to see a date but it is not usually presented. Is this for similar reason?

pjc50 · on Jan 26, 2022

That seems surprising - many citation formats involve the date?

KineticLensman · on Jan 26, 2022

The problem is the downloaded pages of the papers themselves, which often completely lack proper bibliographic data (or even header/footer info other than a page number). Compare this with a page from a corporate tech report, where each page might have the title and document number somewhere.

marcosdumay · on Jan 26, 2022

They used to come in magazines, with the date on the front, and authorship metadata on the index.

Nowadays they come as individual papers, but the publishers kept the format... because, why change anything when you have guaranteed profits?

sebastianconcpt · on Jan 26, 2022

In academic content, the lack of date is the most pernicious!

kristjansson · on Jan 26, 2022

Depending on the source, it might not be published yet, or the PDF you grabbed might be a pre-print while the 'officially' published article is behind a journal paywall.

Generally, I take the publication dates of the cited works as representative of the age of the paper.

rjmunro · on Jan 26, 2022

I've heard of some marketing things updating the date periodically without changing the contents in order to trick people and/or search engines into thinking it's new.

markdown · on Jan 26, 2022

These days they'll change the title.

An article with the title "The ten best pencils you can buy [January 2022]" written in 2017.

anthony_romeo · on Jan 26, 2022

  The ten best pencils you can buy {{ datetime.date.today().strftime('%B %Y') }}

_bkyr · on Jan 26, 2022

Normal SEO advice is to update your dates and copy so Google thinks it's "fresh".

albert_e · on Jan 26, 2022

one strange example for me --

AWS has tons of training content they make freely available to their partners and wider community.

In many cases though there are no dates! (youtube shows date of upload but AWS' own training sites lack such markers. We have to guess based on copyright year in the slide footers)

It is clear that they invest heavily in creating new training content. In fact they essentially repeat the same content multiple times in many live tech talks and partnercasts etc. So there is no dearth of new content. It is also well known that they release new features very frequently -- so knowing how recent the content is helps a lot. But they still do this -- seemingly deliberately.

They recently overhauled their whole digital learning portal -- renamed it AWs Skill Builder, built it using the docebo LMS/CMS portal -- changed a lot of things but didnt make any effort to add a published date to any of the courses.

Frustrating.

lowmagnet · on Jan 26, 2022

Their API docs are incorrect half the time because they rarely update them. Finding documentation on their dynamic parameter system that all APIs use dumps you out on one page with every possible parameter domain.

rdegges · on Jan 26, 2022

At Snyk (https://snyk.io) we're actually working on a new blog process to refine this. Essentially, a problem almost every technical blog has is: when we publish articles, are they ephemeral -- or are they evergreen?

If you treat blog posts as ephemeral, it means you'll write them once, ensure they're accurate, then leave them there forever. Unfortunately, with technology stuff, that rarely works. Technologies change, libraries break, facts now might be different in two years, etc.

One of the things we're currently working on is tagging all of our technical content so that once a year it pops up in a review board somewhere and someone reviews it for accuracy, updates it if necessary, etc.

This way, technical stuff will still be useful to readers (hopefully) a couple of years from now.

dariosalvi78 · on Jan 26, 2022

That sounds like more something that you would do on a wiki than a blog ..

lvncelot · on Jan 26, 2022

This is absolutely infuriating when it's meant to be informative, especially about something in a fast-changing landscape, like how-tos on Kubernetes, for instance.

indigochill · on Jan 26, 2022

> like how-tos on Kubernetes

Honestly although I've occasionally derived benefit from these, I think I'm reaching a tipping point where I feel the plethora of how-to articles with their ads and newsletter pop-ups and everything are less productive than just some plain old documentation and taking the time to fundamentally understand the tech so that I no longer need the how-to. Mainly because I trust Kubernetes to keep their documentation up-to-date but who knows how current a random how-to article is, never mind the marketing bloat they're usually polluted with.

mejutoco · on Jan 26, 2022

I agree. The date itself is irrelevant if the content stands on its own (if it mentions the version they are using, for example, or it features arguments explicitely). If bad seo articles stop being read (independently of date) better content, with proper versions listed and argumentation (like official documentation), could take their place.

pxtail · on Jan 26, 2022

Very easy solution which I'm using when searching for something with expiry date is to look for date first and if it's not there then immediately leave site and not even bother to read anything there, if this is regular practice and domain frequently is at the top then after few times I'm just not visiting it anymore.

mancerayder · on Jan 26, 2022

It's a sad state of affairs that marketing needs drive Google. I can hear the "duh" response to that in people's minds (yes we know how Google gets revenue) but - this is the index to the world's data. The World's data portal. It has such repercussions that it's adjusted to suit the needs of online sales and marketing sensibilities.

setgree · on Jan 26, 2022

> the company will go though a burst of "blog productivity"

One of the most insightful comments I ever read on HN pointed out that marketing folks are good at selling things, period, and that includes selling things internally. So when nascent companies are wondering why the product doesn't sell itself, the first thing they do is hire a Director of Marketing. That person promises deliverables from day 1, and what's more deliverable/visible than a blog? Then they leave -- in my experience, the typical marketing exec's tenure at a startup is about a year -- and no one else feels like putting in the work. Also, by that time, most people have seen that the blog never really drove engagement in the first place.

headmelted · on Jan 26, 2022

Increasingly I feel like this is one of those pieces of metadata that we should be moving out of the page.

I would suggest moving it into the browser (i.e. read a meta tag or header) but the obvious problem is that they’ll just be forged and it would almost immediately become pointless.

Search engines could help here. If Google were to provide a last cached date (or a date of the last significant change) in the search result that would be far more useful. They certainly have this information from crawling, and it would be difficult to forge as constant substantial changes to game the system would be both expensive for the author, and harmful to the page’s ranking.

toss1 · on Jan 26, 2022

> I feel like I'm encountering more and more sites and articles where I can't seem to find the date.

Moreover, since static pages are no longer a thing, the system cannot even retrieve the date the file was created/modified, always returns the timestamp from when it was presented on the browser.

Getting EXIF data from images might provide a clue, but the image creation/edits often do not correlate with the text...

If anyone knows how to extract such date info, it'd be helpful

bliteben · on Jan 26, 2022

I don't understand why this isn't a seo penalty. I feel like early on with google they actively looked for things like this to rank sites.

supernovae · on Jan 26, 2022

SEO wise most blogs will expose metadata that shows publish date and last update date.

andrewstuart2 · on Jan 26, 2022

I've experienced some major frustration due to Azure's documentation doing exactly this. Though it may not be limited to Azure. There's just a date on every document and usually it's a month or two old. Many of the examples I've found via Google don't even compile or aren't relevant any more.

I hope this is a trend that can be reversed.

bleachedsleet · on Jan 26, 2022

> It seems to me that its become standard practice on marketing type blogs for corporate websites to remove the date from their posts.

Worse still, I’ve encountered sites that automatically update their edit dates to be current as a way to optimize SEO. I’ve found articles with decades old information claiming to have been written mere hours or days prior.

TheOtherHobbes · on Jan 26, 2022

Many news sites seem to set the content date to today so they always appear in searches, even though the content is years old.

nostrademons · on Jan 26, 2022

I wonder how much of this is misguided SEO, thinking that if they leave the date off Google will assume it's fresh content and always surface it in search results, and not realize that Google knows exactly when they first crawled a page.

api · on Jan 26, 2022

That is exactly it. People are using recent as a proxy for quality.

mark-r · on Jan 26, 2022

Even more true for open source software. If it isn't recently updated, people assume the project is dead.

metadat · on Jan 26, 2022

To be fair, bit-rot is a real thing for software, because the world keeps changing. In most cases it takes effort and upkeep to keep everything working.

Annoying, for sure, but this is the reality of software and technology as the landscape continues to evolve over time.

m-i-l · on Jan 26, 2022

Anecdote: On of the best friends of my oldest daughter moved to a new rented flat recently. When I saw the address for a playdate, I recognised it and said to my daughter "I think you went to a birthday party at this address when you were 3-5 years old when another of your friends must have lived here". To corroborate this, I looked up the calendar on my phone, where I'm pretty organised about putting dates and locations of events. However, at that point I discovered that Google Calendar only keeps 2 years of history!

So modern technology is literally erasing our pasts. Not just calendar entries, but messaging systems (people used to keep handwritten letters for decades), and possibly even photos (if we're not careful about preserving them).

Edit: See my clarification of the 2 years in comment https://news.ycombinator.com/item?id=30084620 below. I still think the point remains - we do not own or value our digital data in the same way as physical objects, and there is a much heightened risk of that data disappearing as a result, either by the owners of the platforms the data is stored on archiving the data or by us not valuing it enough to preserve exports and backups through long periods of time.

edwinyzh · on Jan 26, 2022

> modern technology is literally erasing our pasts

It's not modern technology that's erasing our pasts, but cloud-based services owned by someone else. So I'v always keen on local software - for instance, I'm currently upgrading my was staled desktop mindmap software (http://innovationgear.com/mind-mapping-software/) ;)

Kerrick · on Jan 26, 2022

Modern technology erases it too. I am already seeing the drives that I used in college to store my raw camera files and LR catalogs degrade to the point of not being able to read the data as I try this month to move them to fresh storage. For some of my oldest photos, I only have whatever processed JPGs I happened to upload to a cloud provider that stuck around this long. The silver-based negatives from my film work at that time, though, are still fine.

wholinator2 · on Jan 26, 2022

Might be a long shot, but GRCs Spinrite has saved a couple drives of mine. It won't work if there's a problem with the physical connection to the drive but almost everything else it can fix (at least momentarily) and actually grants a nice speed boost in the process for drives that have gotten too messy.

dsl · on Jan 26, 2022

Aside from all the flashy handwavy marketing "technospeak" what SpinRite actually does under the hood is re-read each bit from the disk thousands of times and then see if it gets more 0s or 1s.

It works well when a drive is past the point where its internal error correction code no longer works, but you just want to beat the right answer out of it with a baseball bat.

These days professional disk recovery by a lab is sub-$1000 and they recommend not using brute force tools like Spinrite because it can also exacerbate the problem before they get hands on it.

seanieb · on Jan 26, 2022

I just checked, this isn't true. I've Google Calendar entries from over 10 years ago. Where did you get the idea that they delete anything older than 2 years?

machinerychorus · on Jan 26, 2022

There is a setting in one's Google account to keep one's data forever or automatically delete it after a certain time period, perhaps they changed it?

(side note: the title of this post is a little sensational imo. the past has always "disappeared". Don't forget that you will die some day and eventually there will be no trace of your existence)

rPlayer6554 · on Jan 26, 2022

where is this setting?

machinerychorus · on Jan 26, 2022

In the "Activity Controls"[0] section of one's google account, there's an option for "auto-delete".

Now that I'm looking for it, I don't see Calendar listed as one of the products controlled here, but it might be grouped under GMail or something.

[0] https://myactivity.google.com/activitycontrols

samstave · on Jan 26, 2022

Uh, my gmail stops at ~2009 or so...

I lost a bunch of emails from my now passed mother which I went to go look for, and found out that Gmail chopped history.

rdschouw · on Jan 26, 2022

Yikes! Just checked and I have emails going back to 2000 (imported, account is from 2007) in my Gmail. It is a workspace account though.

swah · on Jan 26, 2022

[flagged]

DocTomoe · on Jan 26, 2022

Many offline calendar apps do prune old calendar items after a certain amount of time ... and sync that pruning back to GCal.

m-i-l · on Jan 26, 2022

I've only got 1 year of calendar entries on the official Google Calendar app on an up-to-date Android 12 phone. A quick search suggests the mobile app only stores 1 year but allows searching of 2 years[0], but the desktop/browser one longer, so that must have been where I got the 2 years from. Unfortunately I'm out on my mobile now so won't be able to confirm desktop/browser until later.

[0] https://support.google.com/calendar/thread/818677/how-far-ba...

exikyut · on Jan 26, 2022

Try generating a Google Takeout from https://takeout.google.com/ and see what turns up in it.

While it's always interesting/practical ("just because") to have a full takeout sitting around locally, that would probably create a multiple GB tome (all of Gmail, Drive, ...) that would a day or two to generate and then potentially a while more to download, so for this experiment hitting "Deselect all" and then checking "Calendar" should be good enough.

The process of generating the takeout archive and downloading it are distinctly separate steps with some indeterminate amount of time in between, so you could very likely initiate the takeout from your phone, then follow up on the download link(s) it emails you when you get to a computer (IIRC the archive(s) hang around for a couple of days).

If you do indeed find deleted entries (nooo), I found that (a) Google Calendar itself has no auto-delete functionality (https://support.google.com/calendar/thread/3530801/how-do-i-...), but that if you have your calendar set up to sync with other software, that software might be issuing "deletion" requests (possibly under the guise of auto-cleanup...?) that is then getting synced back to Google Calendar (https://support.google.com/calendar/thread/8849196/my-calend..., https://www.makeuseof.com/events-deleted-automatically-googl...).

m-i-l · on Jan 26, 2022

Back at a desktop computer now. Thanks for the tip. The only time I'd used Google Takeout was when I migrated away from Google Play Music, but in that case it generated 100s of gigabytes of data over dozens of zip files so I didn't do much with it in the end. Looks like it is quite useable for Google Calendar though. I've data going back to 2013, which is when I moved from an Evolution (offline) calendar to the (online) Google one, so it is quite possible it could go back longer for people who have been using it longer. And it turns out that my daughter had been to a birthday party at the same building and floor in 2014, but at a different door (flat 3 rather than flat 2).

x-sp · on Jan 26, 2022

My god you people are everywhere now...

lsaferite · on Jan 26, 2022

> you people

Out of curiosity, what people are you referring to?

Dangeranger · on Jan 26, 2022

I assume they are referring to how the question posed by the GP was worded as if this conversation was happening on Reddit, and this is not Reddit.

x-sp · on Jan 26, 2022

Exactly that

SuoDuanDao · on Jan 26, 2022

I think that was a Poe's law moment.

ulnarkressty · on Jan 26, 2022

> Not just calendar entries, but messaging systems

Logged in to my yahoo account that I haven't used in more than 10 years to look at some convos I've had with a good friend. There was nothing there. Yahoo forums are full of people that want to recover messages from their deceased loved ones, and won't be able to.

I find it fascinating that the only traces remaining of our lives will probably be archived on some NSA server somewhere... Flashbacks of that X-Files episode with the underground bunkers full of DNA.

crucialfelix · on Jan 26, 2022

A client of mine had his email account accessed by somebody in Nigeria (could see date and location of last login). That person then deleted everything after accessing the account.

seoulmetro · on Jan 26, 2022

That's why manual backing up of your chatlogs is such an amazing (and often cringey) thing. Just don't back them up anywhere online.

I've got conversations from 15+ years ago with girls I liked, friends I loved and everything in between. It's hard to watch, but if I ever feel nostalgic I can bring them up in a flick.

I'm not even 30 yet.

Demcox · on Jan 26, 2022

Not to start a war but iCalender has saved everything since I first started using an iPhone in 2013.

edit: Make that 2011

poisonborz · on Jan 26, 2022

There used to be an old restriction for the (official) mobile apps to sync 1 year in the past. I don't experience this anymore, so appears to be lifted. Maybe using some legacy client?

sebastianconcpt · on Jan 26, 2022

Interesting anecdote! It facilitates to rewrite the past, influence the present, control the future. We need to be mindful where to invest our records.

0xbadcafebee · on Jan 26, 2022

I've moved a bunch of times in my life. Each time, I move a box full of junk that's been sitting there since the last move. I never open it or go through it, and I always forget about it until the next move. Certainly there must be some attachment to the junk, but I never think about it until I move. I'm not sure that my digital junk is any different.

samstave · on Jan 26, 2022

https://youtu.be/PjKpPerVuU0?t=66

marmarama · on Jan 26, 2022

Google's search quality has taken a serious nosedive in the last couple of years - the last 6 months in particular.

I think they implemented some new form of search term widening which is far too strong, so the results you want are often buried among pages and pages of results for the general category of things that you searched for rather than close matches for your keywords. Combined with the recency bias that other people have talked about and you end up with a lot less useful search for precise searching.

This coincides with a large increase in the number of surveys that my partner has been getting through the Google Rewards program that ask whether or not a recently used search term gave relevant results. Obviously that's just anecdotal, but it does feel like there are substantial changes in the algorithm, and not necessarily for the better.

skinkestek · on Jan 26, 2022

Has been going on for a decade in my region but it seems recently other people on HN has been hit too.

With that said, welcome to the club. I feel sorry for you all who have to go through this now; I've had a decade to adapt.

_8zc2 · on Jan 26, 2022

I've been seeing this too. So after reading your comment and the one above, I did a search to see what operators are still supported by Google and stumbled across this page: https://ahrefs.com/blog/google-advanced-search-operators/

It's nice to see that AND and OR still work, but from a few tests, it seems that most of us should be prefacing our searches with "allintext:" (without the quotation marks). It works well to dramatically improve the results of my searches. I think I'll write myself a google search page that automatically prepends it to my searches.

jerrre · on Jan 26, 2022

From that article:

> “search term” > > Force an exact-match search.

This is no longer working in my experience, at least for full phrases (for example error messages), same in DuckDuckGo

_8zc2 · on Jan 26, 2022

Agreed. Quotation mark functionality in Google seems to be sadly dysfunctional.

I guess I buried the lede. I probably should have just focused on the allintext: operator in my comment. That's the operator that makes a huge difference in my searches and that I wanted to share.

skinkestek · on Jan 26, 2022

I got the message loud and clear :-)

Now I just wonder if that is Kagis magic trick

- and how long it will take before Google "repairs away" this feature too :-(

skinkestek · on Jan 26, 2022

Wow!

I have been a really advanced Google user and somehow I have missed that particular one!

Edit:

1. Reading through it I am fairly certain some of these has been defect too. define: in particular is one that I used to use, but that I was sure was broken. Today it worked again though.

2. At least double quotes still doesn't work (or maybe they include text from links pointing to site or something)

3. How did you find this? I've been looking for something like this recently :-)

_8zc2 · on Jan 26, 2022

I use duckduckstart.com and I don't recall adding a bang to the search, so it was probably in the startpage.com results when I searched for "Google search operators."

fudged71 · on Jan 26, 2022

Most of these aren't working anymore... extremely frustrating

Taylor_OD · on Jan 26, 2022

They are optimizing for a certain type of search. It seems to me that type of search favors recency and the broadest results possible.

While it seems to be getting worse to me, for most people this is probably what they want. Searching for "Javascript .splice(,) ES5" and "Vermont State Flag" are two pretty different types of searches.

It's actually not that hard to get the results you want for a more specific search if you learn the boolean operators and use them effectively.

BizarroLand · on Jan 26, 2022

just the other day I was trying to find general information about how a cuckoo clock works and I had to wade through multiple pages of companies selling cuckoo clocks, companies selling clocks, clock oil, watch sellers... I have ublock origin but the "non-ad but still loaded with commercial results" experience was terrible.

Guess I need to add blacklist to my extensions so I can start paring down the BS responses.

primax · on Jan 27, 2022

There are recipes I've been cooking for 15 years that are on forum posts. I've searched for them all this time with google, and now they have disappeared.

I can find them by manually navigating around the archives of the forums, but to most modern search engines they do not exist.

We are losing a lot with the direction that these search engines are going in.

rc_mob · on Jan 26, 2022

i been messing with other search engines a lot. kagi and marginalia. nothing too good or bad so far. I really want to be done with google.

ddg seems like a clone of google so its not much better

skinkestek · on Jan 26, 2022

> Google will return irrelevant results from today rather than relevant results from 10 years ago.

Tip: leave Google behind for now.

That site has the last few years been very useful but only in the same way as my very cheap electrical saw: because I didn't have access to anything better.

For someone who has tried good tools like Festo, Milwaukee, Hitachi or old Google it is just a painful reminder of the past and how good life used to be.

It works but hasn't sparked joy for close to a decade.

After kagi and marginalia came into my life my life has improved significantly.

Note: I'm not saying Googlers are evil or dumb now but I will point out that engineers there have incentives stacked against them.

dijonman2 · on Jan 26, 2022

It’s very clear Google has censorship and biases. If you search for some subjects you will get vague unrelated results, but legitimate results from say Bing. An example of this is anything related to the sex industry.

I find this with coronavirus as well.

marginalia_nu · on Jan 26, 2022

All search engines have some form of implicit bias. An unbiased search engine would be beyond useless at actually finding anything but extremely well specified queries. The trick is to tune the bias to favor results that are interesting and relevant.

This is also why having just one big search engine is a bad idea.

zelphirkalt · on Jan 26, 2022

At the very least though, I would expect my search engine to rank results higher, if they contained the words I was looking for, without having to wrap in +"..." every single word. This seems to be something, that few search engines are capable of. Is this very basic thing labeled as "advanced" now?

hallway_monitor · on Jan 26, 2022

That's how most search algos work out of the box. Google used to work that way. I agree; it's getting worse. It's not obvious if this is being done purposely but the result is that Google is basically unusable for a lot of searches now.

marginalia_nu · on Jan 26, 2022

Google's moved to vector search, which doesn't lend itself to keyword prioritization in the same way classic keyword search does. I think that's 90% of the problem.

skinkestek · on Jan 26, 2022

So finally I have a name for the utter madness that hit Google a decade ago?

Vector search is that the thing that makes it ignore my search query and search for something else?

Also, is this the problem Bing has?

And is it much much cheaper since they choose to use only use something so utterly ridiculously broken?

Edit: I'm obviously exaggerating heavily here but this has cost me so much time and frustration.

I agree with others: if pages exist that contains the exact matches why don't return them first?

otherotherchris · on Jan 26, 2022

Not bias, deliberate disappearance of specific categories of ideas. Independent blogs and forums not owned by large silicon valley companies have disappeared, for example. So have all discussions of current events, except for "mainstream" news publications.

Whether this is because of a short term greed motivation to maximize adsense yield or a larger conspiracy of global information control isn't clear yet. But it amounts to the same in practice.

wadadadad · on Jan 26, 2022

Can you give examples of search queries, and blogs and forums that you would expect to show up from those queries?

z80x86 · on Jan 26, 2022

Here’s an example on Google search:

‘site:4chan.org wuhan’ returns no results ‘site:4chan.org wuhan institute of virology’ returns 1,710 results

endisneigh · on Jan 26, 2022

Examples?

PinguTS · on Jan 26, 2022

Don't know what you want to tell us by comparing software tools with hardware tools. Quality hardware tools from the last century are still doing what they are designed for and in professional context even work bettern then most current tools as the old hardware were designed to last.

skinkestek · on Jan 26, 2022

Point is last fall I had both

- inferior power tools (because I couldn't afford good ones last I bought)

- inferior search tools (because Google has broken itself)

The similarities are that I was stuck with bad tooling and when you know how big the difference is it hurts.

For people who never took advantage of how good Google used to be or who have never used good tools it probably wouldn't hurt as much.

endisneigh · on Jan 26, 2022

What’s better than Google?

tleb_ · on Jan 26, 2022

https://kagi.com/

https://search.marginalia.nu/

3pt14159 · on Jan 26, 2022

I just tried "python convert datetime to unix timestamp" on marginalia and it came out with nothing. Kagi I signed up for the beta and I'm on a waitlist.

Still nothing better that Google for me.

skinkestek · on Jan 26, 2022

Read the how to on marginalia. Edit, it is right beneath the search box and says: [...] A concrete example: How do I cook steak? will probably not be helpful, Steak Recipe will give better results (just Steak is pretty good too).

After that try: python datetime timestamp

or: python convert time

or something similar. Edit: see also marginalias answer next to me.

As I wrote earlier: marginalia is most for fun (although lately I have gotten better results for at least one simple technical query.)

Edit: if a search engine respects your query you can always broaden your query. If a search engine ignores your query you cannot do anything.

ajmurmann · on Jan 26, 2022

Is my memory filling me or is that much closer to how search used to work in the early 00s? I have vague memories of being flabbergasted by someone even trying something like "how to XYZ?" in a search engine.

alex_sf · on Jan 26, 2022

Yeah, you used to have to think of specific, unique keywords that would appear in the page you were searching for. A lot of people couldn't break the habit of asking questions, so search engines just started accepting it and trying to optimize for it.

skinkestek · on Jan 26, 2022

> A lot of people couldn't break the habit of asking questions, so search engines just started accepting it and trying to optimize for it.

Which had been totally fine if they hadn't nerfed it for us who knew how to use it.

skinkestek · on Jan 26, 2022

No, you are right.

This is the way anyone with a clue would approach search until Google broke it: search for words or phrases within the page.

samstave · on Jan 26, 2022

I always search as such:

"perfect steak recipe"

"perfect grilled chicken"

and such... typically resulting in the best results.

"R53 mini cooper fuel filter"

reduce the words to the core of the question, and get good results.

sajforbes · on Jan 26, 2022

I distinctly remember terming my searches as questions. I was young and this was back pre Google so I was using Ask Jeeves. Naturally if you're asking someone something you would phrase it as a question, so in my mind it made sense to ask it that way.

vwcx · on Jan 27, 2022

Wasn't that the whole premise of Ask Jeeves? And then much later, Wolfram Alpha awed with that same 'plain language' functionality?

shrikant · on Jan 26, 2022

I got access to the Kagi beta a week or so back, and have been daily-driving it as my default search engine ever since.

I've only ever "fallen back" to Google for a small handful of searches since then, and only out of curiosity for what it might look like in comparison to the Kagi results.

Although it's only been a small amount of time, I'm sticking with Kagi for now because the search results are at least on par with Google's, and significantly less encumbered by Google's SERP dark patterns.

marginalia_nu · on Jan 26, 2022

Yeah it's not quite the sort of query it's good at. It's more geared toward discovery than answering questions. You can answer that particular question if you tune the query a bit though.

https://search.marginalia.nu/search?query=python+datetime+un...

Has this as the first result, though:

http://www.logophile.org/blog/2010/08/15/python-datetime-con...

3pt14159 · on Jan 26, 2022

You're right. It works better when I think "hard keyword search" than what I'm used to with DDG and Google. I tried "python super classmethod" and got here: https://legacy.python.org/dev/peps/pep-3135/ which perfectly answered my question.

skinkestek · on Jan 26, 2022

Isn't it beautiful that something so amazing can run on a standard tower PC in a living room in Sweden?

Holding up against a HN hug of death last month, providing an absolutely refreshing search experience!

TheSecondMouse · on Jan 26, 2022

I just tried on Kagi, this was the result: https://imgur.com/a/Ds1YTf5 I've been using Kagi for a little while now, honestly, it's been pretty great.

skinkestek · on Jan 26, 2022

Kagi for serious stuff (work, health). I've no idea how since they build mostly on Google and Bing but something they do makes an enormous difference.

Marginalia for fun stuff.

endisneigh · on Jan 26, 2022

> Our searching includes anonymized requests to traditional search indexes like Google and Bing as well as vertical sources like Wikipedia and DeepL or other APIs. We also have our own non-commercial index (Teclis), news index (TinyGem), and an AI for instant answers.

Kagi seems to use both Google and bing results and will cost a minimum of $10 a month. Seems like the worst of all worlds to me. It’s a good option I suppose for those who want to pay.

skinkestek · on Jan 26, 2022

That's the magic part, it uses Bing and Google but manages to get good results!

I've recently seen queries answered on Kagi that returns utter rubbish when I paste the exact same queries into Google or DDG.

endisneigh · on Jan 26, 2022

I don’t see what’s magical about taking results from other sites lol. What happens if bing and Google block them?

I’m skeptical of their privacy policy as well, given that you need an account and have to pay, it’s unlikely they don’t know your searches.

jsnell · on Jan 26, 2022

They are paying for API access, not scraping results. Why would they be blocked?

endisneigh · on Jan 26, 2022

Where did you see that they’re paying for API access? Their FAQ says that they’re making anonymized requests but don’t go into more details than that.

That being said, even if they did pay if they have any traction at all they’d still be blocked. Why would their competitors help them?

In any case I doubt it will take off, people who are poor aren’t going to pay 10 bucks a month just to search when there are dozens of free competitors.

Happy to be proven wrong.

jsnell · on Jan 26, 2022

If they were just illegitly scraping results, they could not provide any kind of consistency in quality as they would be getting blocked on and off. And the small search engines like DDG, Startpage, Ecosia, etc have been making these kinds of deals for more than a decade and with more users than Kagi is likely to ever have without having their contracts terminated due to "too much traction".

eitland · on Jan 27, 2022

Late but I have texted a number of times with the Kagi team and they do pay for API access.

CountDrewku · on Jan 26, 2022

>That's the magic part, it uses Bing and Google

That's the part I want to avoid. Build your own crawlers and stop relying on Google. Google already has enough control over the internet, I do not want a search engine that's just downstream of them.

skinkestek · on Jan 26, 2022

Ok. I see. But I cannot do that.

I can however use a search engine that doesn't mock me and second guess me all the time and that tries to anonymize me.

But I agree very much with you.

If someone launches a search engine that works as well as Kagi and doesn't use either Google or Bing I'll happily pay a premium for that on top of of what I have already said I am willing to pay for Kagi.

Edit: In fact I already support a search engine with an independent index.

alex_sf · on Jan 26, 2022

They are still a startup. If they can get the users/money, eventually depending on Google will be a risk/too expensive, and they'll start crawling themselves.

0xedd · on Jan 26, 2022

Define better first.

If by better you mean supporting a competitive market that doesn't strengthen the monopoly Google has on search and their results which, as years go by, are augmented for their benefit but not yours, then anything is better than Google.

If by better you mean increased chances that the first few results will contain the answer you need, then Google is still king. That being said, when it comes to tech, with most answers found in open git issues or stackoverflow, i find brave search sufficient. So much so that i haven't used any of Google's services in over a year.

The real question is, what are you willing to compromise? There's an in-between for the two extremes above and the choice is subjective.

An exciting alternative is https://neeva.com/ I'll happily pay money for a privacy oriented search engine as i do for email, cloud storage and NextCloud. We shouldn't expect things to be given to us for free. I work, they work, we can do a fair trade with some iou derivatives.

smallerfish · on Jan 26, 2022

Neeva signups are bizarrely region locked. I get that there may be some local results for some kinds of searches, but that's easily handled with a banner in results for geo searches that states "local results aren't available in your area yet", rather than locking much of the world out of the service.

skinkestek · on Jan 26, 2022

> If by better you mean increased chances that the first few results will contain the answer you need, then Google is still king.

Hasn't been true since at least December. Kagi is better now by almost an order of magnitude.

Of course my Google is not your Google. Still I have experimentdd so much with Google over so many years, logged in, logged out, from multiple addresses that I am confident in saying that it hasn't been itself for a decade and finally now we have a better alternative.

nottorp · on Jan 26, 2022

> Of course my Google is not your Google.

And that's the main problem with the new Google.

DocTomoe · on Jan 26, 2022

> neeva

A website that requires me to sign up to see what their premium account may cost ... this sounds pretty scammy to me.

skinkestek · on Jan 26, 2022

Also not available in my region.

Straight from the playbook of big film ;-)

aceazzameen · on Jan 26, 2022

I was excited for Neeva at first. But then I saw the premium includes some kind of NFT trash attached to it. I'll take my money elsewhere. I signed up for the Kagi beta and am hoping it will be better.

btrettel · on Jan 26, 2022

A warning for those who remove dates from their content and care about invalidating bad patents and rejecting bad patent applications: If there's no date, likely a patent examiner can't use it as prior art.

I used to work as a patent examiner and I was disappointed when I found web content describing an element of a patent application I was working on, but there was no date that could be used to be certain the document was available before the priority date of the application.

You can use the Wayback Machine and similar archivers to get a date, but frequently the archivers didn't capture the page or didn't capture it in time in my experience (even if it likely was published in time, I can't establish that legally).

Before I quit, I spent some time saving a ton of webpages in one of the areas I was working on (water heaters) just because I wasn't sure how long I'd be at the USPTO and I could be certain of the date given that I myself archived the documents. It was a long-term investment, but could have been quite useful if (for example) a company tries to patent something they previously sold a long time ago and forgot about. The Wayback Machine often had spotty coverage of corporate webpages so I couldn't see all their products at a particular time.

hackerfromthefu · on Jan 26, 2022

Check the source html, sometimes cmses put dates there

fer · on Jan 26, 2022

I've certainly noticed Google ignoring results more than 3-4 years old when something barely related, but more recent, matches. I call it "recency trap". Because of that I've found myself more and more systematically setting the date range of the desired results (which isn't 100% of the time useful as many sites reply with misleading metadata).

To point a recent example (and given the current events) a number of Russian officials blamed the sinking of the Kursk on NATO (either on purpose or by accident), and I recall such statements from back then, but via Google it's been almost impossible to find a primary source. Most results were from the 2021 statements insisting on that from a retired admiral that was involved back then, but from 2000/2001 the relevant content was certainly tough to find.

Part of it is because this is 2000/2001 and many links rotted away, another part because the existing links usually don't respect basic SEO, and finally because Google, in my experience, very strongly prioritizes now/recent content.

dublinben · on Jan 26, 2022

At least Google still allows you to set a specific time window for your search results. Given their strong recency bias, this is often the only way to find older resources.

aimor · on Jan 26, 2022

I'm glad the date range tool still exists, I use it now and then and it's generally good. What's bad is that it's not available on the mobile site. But what's ugly is how unreliable and cluttered the results are. For example:

"Dec 21, 2001 — House Democrats plan to vote Wednesday to impeach President Trump for his role in inciting the deadly Capitol attack as President-elect Joe Biden prepares ..."

https://www.google.com/search?q=president+Trump&biw=980&bih=... (be sure to switch to desktop mode if on mobile)

marginalia_nu · on Jan 26, 2022

Google has a pretty big recency bias. From a content producer-perspective it makes sense, if you put up new content you want to see traffic to that content. From a consumer perspective it's questionable at best. Given the Lindy-effect, odds are the quality of old content is higher than average.

I also do kinda think we should be thinking more about what legacy we leave than we presently do. HTML has some serious problems with that regard, especially in terms of link rot, and especially now that we treat it as a way to build platforms. Archive.org is great and all, but is it enough? How will SPAs fare when the backend server is down in 30 years? How much value will be lost?

southerntofu · on Jan 26, 2022

> HTML has some serious problems with that regard

I think the problem is not with HTML but with HTTP as location-addressed protocol. For future-proofing, and DDOS/censorship mitigation, content-addressed storage (DAT/IPFS/Torrent) makes sense. I would love to seed my favorite blogs, if i was given the possibility to do so: in this sense, a web browser based on IPNS would be rad: too bad the only one i know of is bundled with JS (instead of operating a paradigm shift) and produced by an adware company.

> How will SPAs fare when the backend server is down in 30 years? How much value will be lost?

You don't even need to wait 30 years for SPAs to be broken. By the next API update they may stop working, and subtle changes in browser sandboxing could kill them just as quick.

iggldiggl · on Jan 28, 2022

> For future-proofing, and DDOS/censorship mitigation, content-addressed storage (DAT/IPFS/Torrent) makes sense.

Well great, now you can't even change the content one tiny bit (not even definitively benign changes like fixing a typo or a bug in your CSS or whatever) without invalidating all links pointing to it.

Okay, so as long as anybody still has that version cached and is seeding it at least the link still works (though over time it's not excluded that that number might drop to zero and the link still break after all), though now of course everybody who comes in gets the old version and might not even new that some update exists.

While I acknowledge that the silent updates possible with HTTP for an URL's contents can be a curse as much as they are a blessing, I'm not sure if "absolutely no updates ever" are the right answer, either.

southerntofu · on Jan 29, 2022

Very good point, though you're missing a layer of indirection in your thinking. I should have made my point more explicit!

Usually you would subscribe to a cryptographic pointer (the publisher's public key) whose value is stored in a DHT. Then the pointer can be updated to point to new revisions of the content/website. IPNS is a famous implementation of that for the IPFS protocol. So as a client you can seed a specific revision of the content, or all of them, or just seed the latest version.

soco · on Jan 26, 2022

If I think about Google search I think at a search engine. Not a content producer. Probably they try to become the latter while starting to suck at the former.

bencollier49 · on Jan 26, 2022

Even more frustrating than this are the sites which automatically append the current year to their article titles.

"Best CMS frameworks (2022)", for example, and yet the content is out of date.

danparsonson · on Jan 26, 2022

A particularly egregious example of that are the product review sites - "Best wireless headphones (2022)" and most of the products listed are no longer available.

bencollier49 · on Jan 26, 2022

It drives me batty. Another instance of solving the problem by adding "Reddit" to the end of the search string. Of course this means Reddit will shortly be swamped with AI spam bots driving crap to the top of forums. Presumably that's a current issue.

DimmieMan · on Jan 26, 2022

It's not swarmed with bots yet, but I've found reddit to be less reliable in the last year with quite a few suspicious comments popping up for various products.

Reddit is still better but I worry in another year that will no longer be the case.

speby · on Jan 26, 2022

I find a lot of those 'review sites' are really just lightweight b.s. marketing sites that are really designed to generate product referral fees by providing a convenient link to Amazon or wherever else rather than truly a real attempt at providing good review.

robbomacrae · on Jan 26, 2022

The past is always disappearing. I know of no data recording device that we could use to store anything for more than a million years except DNA (for fun check out the Arch Mission Foundation that attempts to use DNA for backups of human knowledge) [0] or 5D nanostructured glass [1].

This post reminded me of a great Kurzgesagt video [2] that went briefly into how much of the past life on earth we have no information on and will never be able to know. Incidentally it took me a few seconds to find that video. Before the internet if I was trying to lookup a clip I had seen a month ago on TV I don't know even where I would have begun searching...

However I think we are getting increasingly better at preserving information and making it easy to access with tools like the internet archive, and cloud backups for your photos. This is despite the sheer quantity of data (such as the number of photos you take) growing at an exponential rate. Would you have been able to easily find instructions for a machine that was decades old before the internet?

So the past is disappearing but possibly at a decreasing rate.

0: https://en.wikipedia.org/wiki/Arch_Mission_Foundation

1: https://en.wikipedia.org/wiki/5D_optical_data_storage

2: https://www.youtube.com/watch?v=xaQJbozY_Is