Because then you can access the archived destination if you already know the sho...

mdaniel · 2025-08-18T03:55:18 1755489318

You are aware of which thread you're discussing this in, right? The one where a bunch of like-minded souls enumerated all the address space in a few weeks?

The sibling link above that queries Wayback's warc index shows at least the first several are only 6 alnum wide so it's no wonder the ArchiveTeam got them in reasonable time

Picking one at random, it seems the super sekrit deets you're safeguarding include buyrussia21.co.kr which, yes, is for sure very, very secret

brokensegue · 2025-08-18T04:07:42 1755490062

i asked them why they did this. the answer surprisingly is because they fear if they release the full dumps they will get blocked because of the AI scraping wars.

cedws · 2025-08-18T04:59:56 1755493196

Feels like a bit of a kick in the teeth that I contributed towards archiving something that I don’t even get access to. What happens if they disappear? The dataset is gone forever.

qingcharles · 2025-08-22T01:21:01 1755825661

This does seem off. Especially as I can navigate to any of those URLs myself. Hell, if I wanted to spin up 50 virtual servers and go crazy I could probably pay a few thousand bucks to re-scrape the thing myself.

brokensegue · 2025-08-18T13:36:56 1755524216

You get access to it via the wayback machine

mdaniel · 2025-08-18T15:04:55 1755529495

This whole thread is starting to read like some kind of misguided practical joke. I also recognize that it may seem like this is directed toward you, but I'm not shooting the messenger I'm just anchoring my reply under this new information. Sorry about that.

But, ok, let's continue in good faith

scenario 1: they don't want to uncork the .warc files because it will potentially leak the means and methods of the Archive Warrior or its usages

scenario 2: they don't want to expose the target of the redirects because it will feed the boundaries of the ravenous AI slurp machines

If it's scenario 1, then CSV exists and allows mapping from the 00aa11 codes to the "location:" header, no means and methods necessary

If it's scenario 2, then what the hell were they expecting to happen? Embargo the .warc until the AI hype blows over so their great grand children can read about how the Internet was back in the day? I guess the real question is "archive for whom?" because right now unless they have a back-channel way to feed the Wayback Machine's boundary using the .warc files, and thus it secretly populates the Wayback without wholesale feeding the AI boundary, this whole thing is just mysterious

brokensegue · 2025-08-18T17:52:03 1755539523

i think you're missing some key information. the warcs do not just contain the location header information. and their methods are fully public/open source so scenario 1 makes no sense.

sure maybe the warcs will be unlocked at some point in the future. this is a fairly small volunteer effort. i doubt there is some "unlock in 100 years" feature on IA.

nicolas_17 · 2025-08-18T19:52:57 1755546777

Yes exactly, Wayback Machine can use the warc files despite them being blocked for direct download.

globular-toast · 2025-08-18T06:57:25 1755500245

Who fears they will get blocked by whom?

brokensegue · 2025-08-18T13:38:15 1755524295

Archive team blocked by hosts wanting to protect their data from AI companies (presumably because they want to extract money from them)

yreg · 2025-08-19T11:20:10 1755602410

Yeah what they did is probably the best way to handle it.