Per google, shortened links “won't work after August 25 and we recommend transitioning to another URL shortener if you haven’t already.”
Am I missing something, or doesn’t this basically obviate the entire gesture of keeping some links active? If your shortened link is embedded in a document somewhere and can’t be updated, google is about to break it, no?
About to break it if it didn't seem 'actively used' in late 2024, yes. But if your document was being frequently read and the link actively clicked, it'll (now) keep working.
But as I said in sibling comment to yours, I don't see the point of the distinction, why not just continue them all, surely the mostly unused ones are even cheaper to serve.
This leaves me wondering what the point is? What could it possibly cost to keep redirecting existing shortlinks that they consider unused/low activity already anyway?
(In addition to the higher activity ones parent link says they'll now continue to redirect.)
In another submission someone speculated the reason might be the unending churn of the Google tech stack that just makes low-maintenance stuff impossible.
I built a URL shortener years ago for fun. I don't have the resources that Google has, but I just hacked it together in Erlang using Riak KV and it did horizontally scale across at least three computer (I didn't have more at the time).
Unless I'm just super smart (I'm not), it's pretty easy to write a URL shortener as a key-value system, and pure key-value stuff is pretty easy to scale. I cannot imagine that isn't doing something as or more efficient than what I did.
Google also has the advantages that they now only need a read-only key-value store, and they know the frequency distribution for lookups. This is now the kind of problem many programmers would be happy to spend a weekend optimizing to get an average lookup time down to tens of nanoseconds.
I don't think it would even cost me very much to host all these links on a GCP or AWS thing, I don't think more than a couple hundred dollars a year.
Obviously raw server costs aren't the only costs associated with something like this, you'd still need to pay software people to keep it on life support, but considering how simple URL shorteners are to implement, I still don't think it would be that expensive.
ETA:
I should point out, even something kind of half-assed could be built with Cloud Functions and BigTable really easily; this wouldn't win any kind of contests for low latency, but it would be exceedingly simple code and have sufficient uptime guarantees and would be much less likely to piss off the community.
If I had any idea how to reach out to higher-ups at Google I would offer to contract and build it myself, but that's certainly not necessary, they have thousands of developers, most of which could write this themselves in an afternoon.
I don't understand the data on ArchiveTeam's page but, it seems like they have 35 terabytes of data (286.56TiB)? It's a lot larger than I'd have thought.
FYI, "TiB" means terabytes with a base of 1024, ie. the units you'd typically use for measuring memory rather than the units you'd typically see drive vendors using. The factor of 8 you divided by only applies to units based on bits rather than bytes, and those units use "b" rather than "B", and are only used for capacity measurements when talking about individual memory dies (though they're normal for talking about interconnect speeds).
Either way, we're talking about a dataset that fits easily in a 1U server with at most half of its SSD slots filled.
The binary units like GiB, TiB, are technically supposed to be Gibibytes and Tebibytes. Thought it was a bit silly when they first popped up but now I find them adorkably endearing, and a good way to disambiguate something that's often left vague at your expense.
In my experience, nobody actually says "Tebibytes" out loud; it's just that silly. In writing, when the precision is necessary, the abbreviation "TiB" does see some actual use.
If that's the unit, I am saying it, but yes - everyone gives me weird looks every time and just assumes I am mispronouncing terabytes but yet does not correct me.