Hacker News new | past | comments | ask | show | jobs | submit login
How to become a pirate archivist (annas-blog.org)
579 points by pilimi_anna on Oct 17, 2022 | hide | past | favorite | 99 comments



I'm curious how Sci-hub's approach compares to the What.cd/Redacted approach.

IIUC Sci-hub has scooped up science docs through a good enough UX that it was able to leverage the goodwill of science folks to upload docs (plus whatever other methods it has used to scoop up docs), and it uses a public blitzkrieg-style distribution mechanism. I.e., I guess if one had a big enough harddrive and a fast enough internet connection, one could start downloading the lib right now and see if they win the race against the copyright holders.

On the other hand, the What.cd/Redacted approach seems to use Bittorrent ratios to create a private-tracker economy. New users get a few gigs free download on joining. But apparently because a) there's a 1:1 upload/download ratio, and b) a few first-mover fat cats are sitting on enormous ratios, this means there is a scramble by everyone else to upload new FLACs to build up their ratio so they can continue to be able to download FLACs. It seems that would mean the library-in-its-entirety cannot be easily replicated at will. Yet the tracker was apparently already nuked off the internet as What.cd and reappeared later as Redacted. Was any data lost between the two services?

Oh yeah, there's also apparently another approach in rutracker, which seems to be blitzkrieg to add content and publish, at the (apparent?) cost of quality of content.

It's really a shame that the nerdy, completist domain of digital archiving through torrents isn't covered by fair use. Perhaps we could exclude the most recent 10 years of music so that the hopeful young musician streamers can get paid a few hundred dollars for millions of streams and then receive the silver lining of fair use protection against a label refusing to release one of their albums.


Sci-Hub was interesting because until 2021, it could automatically add any article not in its database by querying proxy servers set up at universities that actually subscribed to the journals, which would then download the PDF from the journal’s website and forward it to Sci-Hub. (This was the approach most academics took in the pre-Sci-Hub days; they’d email friends at other universities and ask them if they had access to a given article.)

Sadly, Sci-Hub took down this “magic proxy” to try and win a court case in India, which its creator thinks might legitimize it elsewhere. It’s a huge shame, because it means that many obscure articles are now inaccessible via Sci-Hub.

Ironically, a pure proxy-based Sci-Hub that didn’t host any articles on its own might actually be legal in certain countries, since it’s not actually hosting any copyrighted content itself. It definitely would be a lot easier to host and a lot harder to shut down; indeed, it could be completely decentralized.


The Pirate Bay has never hosted any content; yet many, many cases have been decided against them. Why would that work for sci-hub?


I didn’t say that all countries would deem Sci-Hub legal, just that some might. Indeed, The Pirate Bay is perfectly legal in several countries.


Courts can make dumb decisions, either by bias or bad arguments. Proper lawyer-ing can cover for bad decisions some of the time, and perhaps losing some case and winning others is more important.

realistically everyone wants to know where they stand. There's certain "arranged" understandings when it comes to science (and all academic) publishing, and navigating that to provide a uniform interface to the actual documents is the "hard part" - although realistically, a file is a file, and distribution should be "easy".

recently i needed to check what domains i had purchased through various domain name providers (namecheap, godaddy), and i nearly lost access to my godaddy account because of the TOTP/verification stages. I consider cellphone numbers ephemeral, as i consider all phone numbers. I've been told that a dozen numbers in twice as many years is too many, but i refuse to get led around by the nose by cellular providers. So TOTP only works if i have access to the - few - phone numbers i had access to when i signed up. Several services (including some mentioned) used to have voice TOTP, but no longer do, and use an SMS bridge. Some want me to have an app on a phone *with a(n) (e)SIM*. my email service is attached to a domain name that i very nearly lost access to management for. I was complaining about an old gmail address i had and i decided to look at my overall domain health and found it lacking.

someone who is - forgive this - "balls deep" in a degree just wants whatever they publish to work for their advisors and graders. Obviously research assistants and research students in a masters or doctorate program are held to a slightly higher standard of "publish or perish" because of the way grants are written and assigned. It's all garbage, and has been for at least 15 years.

sorry for my meandering, but i felt these two annoyances are related.


ohhhh, so that's why, when my people got their PhD and Master degrees from universities suddenly sci-hub stopped working correctly. I put out the word that it was possible to get any study ever and people asked me this year for a bunch of studies that, prior to this year, should have been the definition of ease.

Months later, I'm still waiting on sci-hub or anyone to get access to the studies.

The real WTF is science publishing. never-mind reproducibility of the studies, just getting the study in the first place is a predicament. at least genesis still works for 85% of requests i get.


Redacted's ratio economy is fucked. There's groups of people using high speed seedboxes that grab every new upload to build up ratio in a normal way, if you can call it that.

Everyone else is left with scraps, trickling data out to whoever comes later. Or maybe you get lucky and you're the lone seeder, you get a 1:1 copy. You just better hope that was on a 24-Bit FLAC release, given how big they can get.

There's many many threads on https://old.reddit.com/r/trackers about REDs economy problems. OPS has a bonus point system and suffers less, but has worse content than RED. My opinion only though

rutracker is all about putting content out there. A lot of stuff is mislabelled or lower quality, less retention, identified wrong, seeded slowly. But you can sure find a lot of oddities there.

> was any data lost

yes, absolutely. I've hundreds of albums not on RED, and I can't bother to reupload them. it's a total waste of time when you know they're unseeded in a week if not for me hanging on.

time better spent finding new music and sharing everything on soulseek instead.


Interesting. I haven't used Redacted, but was an avid user of Oink's Pink Palace, the OG music tracker. Ratio requirements were to partially to encourage people to upload stuff. The site was defined by the number (and quality) of uploads it had. If you upload a 24-bit FLAC yourself, you are guaranteed at-least its file-size added to your ratio. Seed-boxes were pretty much accepted as a requirement if you didn't want to upload anything. i.e. each user was expected to contribute in some way.

That said, things are probably different nowadays. In Oink's day everyone was on ADSL at home. Was paying €20 a month for a 100/100 OVH box; today I have 1000/100 at home and can instead spend that €20 on Spotify and Bandcamp.


What saves the economy is the generous freeleech and gift token events. (Currently one running if you haven’t logged in this week.)

It all serves the purpose of getting users to be good citizens participating in the community, and not just snatch and run.


In theory. In practice, most just can't be bothered using private trackers because of this. Hard to get into, hard to keep up - you have to be a dedicated archivist just to use them. Which is the polar opposite of why people have been torrenting in the past - piracy had less friction than using licensed content.

Rutracker works differently, they aren't afraid of snatch-and-run. They managed to build a community out of those dedicated archivists. So the volunteers keep the rare content afloat, and the tracker can also be open and usable by ordinary people as well (which keeps popular content afloat). This approach also has its flaws, of course.


> generous freeleech and gift token events

That's what saved the what.cd economy, which had generous freeleech and gift token events. RED does it differently, and has stingy and ineffective freeleech and gift token events. On wcd, staff picks were freeleech, and that ensured people were able to download and seed, and there was free liquidity.

RED's lack of large freeleech events has meant that most people on the site don't have much buffer, and can't just download what they like. It's functionally a forced recession due to lack of liquidity.

Basically an internet version of this: https://yathartharora.substack.com/p/53-the-babysitting-rece...


> Yet the tracker was apparently already nuked off the internet as What.cd and reappeared later as Redacted.

You imply there is some continuity in the operation between these two trackers, but I don’t believe that’s the case. What.cd shut down. Subsequently, redacted (passtheheadphones) and apollo started, appealing to the same userbase. Neither of those trackers were privy to what.cd’s databases.


in the sets of what.cd and waffles.fm and Redacted, approximate the intersection of the sets.

I like music a lot. I have a lot of vinyl and weird CDs, too. However, i can't be assed to rip to whatever draconian style-guide some of these private trackers want. So it's a matter of being a member of several servers and finding something that either isn't listed or seeded and "filling" or creating a torrent with some other tracker's set of files.

this is rewarding the wrong behavior. If i remember some song i heard in 1996, i should just be able to get it. It would be nice if all of the people who were involved in the creation and publication of the song got rewarded, somehow, but that's just not how art works in capitalism. I say this as someone who has personally released 11 CDs and a further 6 CDs in collaboration, of music. I haven't been paid a penny or more for anything i've ever produced in "art". I don't consider this a downside. People who know me and know i write music appreciate my music. People who don't know me will miss out. That's all there is to it.


Unfortunately, a lot of data is currently yet to be restored. Some of that music might not surface again for decades to come.


> Was any data lost between the two services?

Yes, absolutely, a lot.

Even though some people have automated their setup very well and have been downloading (and uploading on the newer trackers) a lot, it's just a giant amount of content, it's unlikely for any one person to have it all, and coordination was very limited back then.

The birthday release of the torrent db of what.cd includes 2.6m torrents in 1.2m groups (aka individual releases), in total weighing in at 588TB (or 421TB is you discard mp3, but there was content that hadn't been available in FLAC). That's doable on an SWE salary today if you're dedicated, but what.cd was shut down in 2016, and you'd still need to deal with the ratio system during collection.


after wcd fell some private sftp and dc+ servers came up for about 100 of the top seeding and hoarding members who were all familiar with each other. people regularly shared new content and filled out mussing music in their collections. i dont think it exists anymore but it might. i did not have access but knew several people that did who got me content to complete music sets

the combined amount of content on those servers was probably around 100TB but most likely more


Oh, certainly, with some coordination there's a lot you can achieve, and it would probably be much more than 100TB. A friend of mine had barely above 1TB seeding and iirc that was somewhere in the top 450-500, so top 100 probably had much more.

It's still a lot to store (and maintain) individually, so unless they're still going with that private exchange, each tracker end could mean someone not migrating to the next installment and their part of the library being lost. The Eye is doing a lot of public-ish work in that area (seen with very mixed feelings in the tracker scene, because apparently they're not just archivists but also jerks).


> Was any data lost between the two services?

Undoubtedly.


> What.cd/Redacted approach seems to use Bittorrent ratios to create a private-tracker economy. New users get a few gigs free download on joining.

Isn't that "bittorrent ratio" easy to cheat? I remember a good old times where I had to download some popular files just for ratio, then (beginning of 10's everybody starts to cheat and some of the biggest trackers turned to forever free leach.


>Isn't that "bittorrent ratio" easy to cheat?

Yes[0] but you have to be careful else you might get caught

0: https://wiki.installgentoo.com/wiki/Ghostleech


Other users also report their upload and download. Ergo, if you say you uploaded 10gig, it would show up in other users' download. This is tracked and checked, and you'll be kicked if you try to play the system this way.


Bounty systems are a pretty common way to get insane upload ratios so that you can archive.


>insane upload ratios

not with most RED bounties you can fill. if you only sort by biggest bounties, you get albums that realistically can't be filled, or that would require serious money and effort to find.. rare asia specific releeases and stuff.

you can do specific requests if you have accounts for streaming platforms but nobody makes bounty requests for those


I think Alexandra Elbakyan actually did not want to be revealed as the librarian behind Sci-Hub, it was her poor opsec that led to her being identified.

Basically her servers were set up to emit detailed error messages from PHP, including full path of faulting source file, which was under directory /home/ringo-ring, which could be traced to a username she had online on an unrelated site, attached to her real name. Before this revelation, she was anonymous.


> which was under directory /home/ringo-ring, which could be traced to a username

Ha, my home dir is always called "me" or "and". Try google that.


This is deep infosec. Instead of security through obscurity, it's security through ubiquity.


People might be able to, now!


Don't worry, I'm a "me" too.


Smart. Here's how to rename on OSX (might break a lot of tooling, but whatever).

https://support.apple.com/en-us/HT201548


IIRC, on new installations of NeXTSTEP (based on 4.3BSD Unix), the username of the single installed desktop user was "me".


Electron star - seems oddly specific, though maybe not spelling it same way you do.


yeah that's a regrettable mistake. It was registered in early days, OPSEC wasn't considered then. The Internet was supposed to connect people not hunting people.


All good, OpSec is hard, which was point I was trying to make, great you take it seriously, just be little more humble. Everyone slips, only matter of time, either in past or future.


i like how people are trying to correlate your HN account with - ostensibly - wild /home/ directories. This is why i've recently moved off of this username, going forward. 20 years as genewitch has attached a lot of bad "OPSEC" to this username, and coupled with the fact that i have federal licenses means that people can just google my entire life story.

my last name, without any other information, has one tenth of the bits of information that "genewitch" does.


I’ve narrowed it down to the east coast of the US.


Did not know that detail. Will add to the post, thank you.


And from your addendum:

So, use random usernames on the computers you use for this stuff, in case you misconfigure something.

...or a username that is so common as to be meaningless, like "user", "Administrator", or even "root".


The next step is to change your legal name to Zhang Wei or at least a John Smith.


差不多 (chabuduo / chàbuduō)


You're actually supposed to use a specific username for each process. That's what permissions are for.


"ec2-user"


You get a little obscurity with random though


I was able to find an archived page of hers using ringo-ring back in 2010, which I believe predates Sci-Hub’s launch in 2011; not able to find reference to “home/ringo-ring” anywhere else, but given prior information, very least seems plausible.


As an active hoarder I think there is problem with "6. Distribution: Packaging it up in torrents, announcing it somewhere, getting people to spread it.".

I miss a p2p application with torrent packaging and Kademlia like per-file advertising and discovery, where I could point it to my hefty NAS directory of random things and they could be wired to released torrents. This way we could make torrents live much longer, even partially complete. In super extra option the app could even notify me to load DVD because somebody asked for a file which I indexed and advertised previously.

For years my program preferences changed, file locations changed - I have moved files around, made them offline, burned on DVDs, deleted some parts of torrent just to keep interesting stuff. Now these torrents are lost, at least my seeding contribution is gone. But I almost never change the content of these files, their checksum stays the same forever, so they could be still discoverable.

The digital preservation needs better distribution system.


The "v2" torrent file format allows most of this. Some clients have support already.

All you'd have to do is make a torrent of your whole hard drive and then seed that. You don't need to publish the torrent anywhere.

If anyone else in the world is downloading any other torrent that happens to contain a file you have, they will end up connecting to your machine to download it.


A v2 torrent allows clients to identify duplicate files across torrents via the "pieces root" entry in the metadata, so if they're downloading from torrent A and B, and each share file C, they can utilize peers from either swarm.

But there's no way for other clients to know that there exists another torrent containing the file they are interested in if they only have the metadata for torrent A. In other words, there's no lookup mechanism for a "pieces root" to know that torrent B exists and contains file C.

If you were to make a v2 torrent of your entire drive, other clients won't know to download from your torrent. They'd need to have the contents of the metadata to know it would contain a file they are interested in, and have no way of knowing which metadata contains the desired "pieces root" entries without downloading all of them.

I'm very interested in this problem space, if you are aware of clients/mechanisms that allow for this I would love to hear them.


I don't think you can solve this problem, without destroying the system by releasing too much data.

If I find a torrent of a file, and can look up all the seeders across all torrents for this file... then so can the FBI.


> there's no lookup mechanism for a "pieces root" t

I guess they'd just have to publish each pieces root in the DHT. And obviously you would let the user decide which directories to share...

I guess that'll have to wait for torrent v3...


This.

Also, advertising every file in torrent (of unsuspecting user) to entire world would be a HUGE privacy flaw.


But you already advertise (via the DHT, which offers no secrecy) the torrent file... and that contains all the data needed to get all the files within the torrent.

So this data is already being published, just in a less queryable form.


Hm, is there a security concern here?

Someone could use this to

A) Remotely check the presence of any specific file on your machine.

B) Exfiltrate the contents of any file they know the hash of (or possibly more specifically, the hash of each piece? I don't know the protocol details).

Fine if you have a dedicated "I expect the contents to be public" drive or directory, but not something I'd want to do on my OS drive.


I have seen people obviously sharing their entire hard drive on P2P platforms, and it's usually been someone's C drive and a massive security issue. I don't think some clients do enough to make it clear to people that they need to be careful not to share more than they intend to or should.


On modern windows, ideally there are no sensitive files accessible to a user program, because everything should be in the secret store/protectedstorage.

Obviously a lot of programs still store cached session keys and passwords in plaintext somewhere in your user directory though...


There's a lot more to be concerned about than session keys and passwords. Tax documents, health records, photos, private journal documents, just to name a few.


besides that a user's Photos and Documents directories can be rather sensitive


and assuming nobody puts their passwords.txt on desktop and keeps unprotected wallet.dat around.


I did work on a proof of concept program to accomplish this for my own content library. It would scan a directory to find files and compare them with locally stored metadata. For v2 torrents this is trivial to do via a "pieces root" lookup, for v1 torrents it involves basically checking that each piece matches, and since pieces may not align with the file then it's not possible to guarantee that it's the same file without having all of the other files in the torrent.

I built it with libtorrent and after loading in all of the torrents (multiple TBs of data), it would promptly and routinely crashed. I couldn't find the cause of the error, it doesn't seem it was designed to run with thousands of torrents.

One problem that I've yet to build a solution for is finding the metadata to use for the lookup phase. I haven't been able to find a publicly available database of torrent metadata. If you have an info hash then itorrents.org will give you the metadata, if it exists. I started scraping metadata via DHT announcements, but it's not exactly fast, and each client would have to do this unless they can share the database of metadata between them (I have an idea on how to accomplish this via BEP 46).


I have a solution to this, it's the successor to Magnetico.


Could you please share a link to your solution? I would be interested to take a look


>One problem that I've yet to build a solution for is finding the metadata to use for the lookup phase.

I think BEP 51 followed by BEP 9 is all you need.


This is how I was originally achieving this. As I said, it's very slow. I don't think it would be a good solution on its own because it would require that every client be constantly sampling all DHT nodes, and downloading all metadata and indexing it for a potential future lookup. It's a huge amount of additional load on the DHT.

I think a better solution would be some way for clients to query the DHT for a specific "pieces root", but I don't know if all clients publishing "pieces root" for the torrents they know about would also be a good idea. Some kind of distributed metadata database where clients can query would be ideal.


Would you mind sharing the source? Sounds like something others could build on.


The source is available here: https://github.com/chhs1/content-seeder


But why? The current Kademlia implementation built in eMule/aMule does exactly this. Primary data advertised is file name string, there is also a support for several meta fields. Transfer proto sucks, but I don't think it really matters in the case of ultra-rare content.


That `ultra-rare content`, eMule is mess if you need collection of files, ofc you can share ZIP but they get somehow repackaged too often. Torrents nicely group the stuff - it is kind of album.

I've dreamed of federated meta-client which mixes all available p2p networks in a wild and can download missing torrent part from eMule or Soulseek.


> I miss a p2p application with torrent packaging and Kademlia like per-file advertising and discovery, where I could point it to my hefty NAS directory of random things and they could be wired to released torrents.

That was the problem with p2p no one bothered to actually package their releases instead just slinging it onto the internet no care given as to if it was complete or even correctly labeled.

Sites like pirate bay had everything, torrents are a super fast and error resistant way to distribute content its the perfect system. The problem was that the content is illegal so it was nuked by companies with more money and power than any one can fight against.

The problem has never been technical or distribution the issue is the law.


I'm not in the pirate archivist space, but sections 3 and 5 are relevant to my interests. I've had great luck with ZAP (https://github.com/zaproxy/zaproxy#readme) glued to a copy of Firefox (because it allows monkeying with the _browser_'s proxy without having to alter the system one as other browsers do) for archiving all content seen while surfing around a site. It even achieves the stated goal of preserving the HTML (etc) in a database since ZAP uses hsqldb

Then, section 5 reads like an advertisement for Scrapy since it is just stellar at following all pagination links and then either emitting the extracted payload as your own data structure and/or by telling Scrapy you want to download some media as-is. It will, by default, put the local content in a directory of your choice and hash the url to make the local filename. A separate json file serves as the "accounting" between the things it downloaded and their hashed on-disk filename

Scrapy is also able to glue 3 and 5 together because it has a pluggable (everything, heh) dupe detection hook and also HTTP cache support that can be backed by anything, including the aforementioned hsqldb operating in network mode. Scrapy is also very test friendly, since each method accepts a well known python object and emits either a follow-on request, zero or more extracted objects, or nothing if pagination has ended. Thus, one need not rerun the whole scraping process just to test if a bug has been fixed, or during development

I can appreciate there may be other scraping frameworks, but of the ones I've tried Scrapy makes everything that I've asked it to do simple and transparent


This is extremely relevant to my interests. I will try ZAP with a VM that has the explicit purpose of mirroring all content i view "online" within that VM.

there's the web archival projects - that i cannot remember right now - that have some sort of proxy front end, but realistically, it should be possible to record the "content" portion of all web interactions, without relying on such dalliances as OCR and screengrabs or even OBS studio or a screen recorder.

Sometimes i go on a deep dive of some concept, and when i am done i feel i have a decent enough understanding to explain the concept to an adult, and sometimes i do a deep enough dive to explain to a 6 year old. I'd like to archive the entire "session" that got me there. Ideally as plaintext, but never have i wanted video documentation. I only ever use video to prove to someone that their service is acting up, since audio/visual desktop captures can do that, without cheating and provably.


«That secrecy, however, comes with a psychological cost. Most people love being recognized for the work that they do, and yet you cannot take any credit for this in real life.»

Feels like the anonymous torrent seeder who keeps seeding a file for years just for the sake of keeping it alive. It's not easy, but some people seem to be able to derive full pleasure from accomplishing the task itself, whether recognition happens or not.


It's probably not that uncommon that people working for companies, governments, or criminal organizations can't talk about their work in public.

One group I remember in particular are mathematicians working for the NSA, etc., who are not permitted to publish their research, then they watch as other mathematicians rediscover their work and get the credit.


NSA mathematicians still get some recognition though:

* Fellow NSA employees, your coworkers and boss

* Cash money, my personal favourite form of recognition

* Your close friends and trusted loved ones will know the broad outlines of what you're doing.

It doesn't make you world-famous but it wouldn't be as lonely as a job that needed total secrecy.


I believe the word we seek is humble. Hopefully the lack of wide, popular recognition leads to humble NSA employees. Obviously, the leaks smeared their reputations. It focusses on the negative outliers. You hear about all the shit they were able to prevent seldomly, if ever.


Is this really a bad thing, though? I mean, personally, recognition seems important. I've withheld patenting certain inventions that became commercial products a half dozen times in my life.

That some specific instance of a discovery or whatever becomes the mainstream version is, well, it's irrelevant. Who discovered calculus? It doesn't actually matter because calculus works without some belief system and worship. Traffic routing algorithms? yes, if the person is alive and kicking, being able to lay claim to some algorithm or novel solution is a CV bullet point, but, and i say this with the utmost respect: most people are one hit wonders. If they can ride that "fame" to higher pay or respect, cool. But in the grand scheme, it's irrelevant. Ideas should be spread far and wide, so that people who have a greater understanding can explain the ideas to those without an understanding.

Capitalism is the problem.


One of my proudest achievements-that-dont-count-for-much-outside-of-internet-nerd-innercircles is maintaining a seed ratio of >7 over a a decade and a half in a world where most folks are happy with their seed ratio if 0.1


I seed until 2.0 or i move the files to their proper locations, whichever comes last. I don't set time limits or anything. I have the unadulterated bandwidth to do this, even if it takes a year or more. My main seedbox actually is on the fritz, and i have a small feeling of guilt that if i am too lazy to fix the box that has the .torrent files, that dozens of people will miss out.

I hand-ripped and released the netflix wii disc once upon a time, and my seed ratio on that was astronomically high.


I think you can still get a reddit account without giving them even an email address, and definitely not a phone number.


Or you can use a fake email address like from mailinator.


I've found Mailinator to be pretty much useless nowadays, despite having been the go to place for, what, a decade? It limited its free domain to only mailinator.com, which is mostly blocked anyway.

There are plenty of alternatives out there though.


You should check out Mailinator's new free tier... You'll get a private domain, access to the API, and avoid the filters that come with the public mailinator.com domain.

Here's a link: https://bit.ly/3cs4v5E


Yes you are right reddit account can get without phone number


>> That secrecy, however, comes with a psychological cost.

Being acknowledged for someone concerned about OpSec is minor, if not completely unimportant issue. Grind of maintaining OpSec for most is mind numbing in my experience, especially over an extended duration. One minor slip ends it all - and risk of slipping increases relative significance of the related operations, since more eyes increases odds someone will notice something, they’ll be forced into unfamiliar situations, etc.

Beyond that, research shows that odds of being discovered grow as more people know:

https://www.bbc.com/news/science-environment-35411684.amp


While reading this I realized that the first impression for 'Pirate Archivists' that I was exposed to were the bums in Fahrenheit 451 who memorize books so they can't be burned.

I never realized that was my first true introduction to piracy. Really enjoyed the write up!


200 years ago the Grimm brothers collected tales that had been memorized and shared for generations in the German oral tradition and made them into a book of fairy tales. 70 years Walt Disney made some animated movies based on these fairy tales. Today sharing a copy of Cinderella is Piracy.


> sharing a copy of Cinderella is Piracy

with Disney artwork.


Moreover, Grimms' copyright has been expired while Disney's copyright is set to never expire.


Disney being the artists whose labor and skill they labelled their own and claimed ownership of in perpetuity.

Sure, it's "Disney's" artwork.


I have a problem with Disney's

> claimed ownership of in perpetuity

but I don't have a problem with labour and skill - D. paid for that labour and skill. Awfully low and abusive, but just like every other corporate, especially in USA.


If Disney gets to renegotiate it's licensing rights with the government; the original authors of work claimed by Disney should be allowed to, too. But, that concept exhibits a "should" attribute of fairness so obviously it's illogical and ridiculous.


I keep feeling we shouldn't accept the term "piracy" anymore. The problem, the big problem is on the so-called "legal" side, and the purpose of this system is not about retrieving authors anymore, is about some big economic groups hoarding goods (and power by doing that). But that's heavily against the common interest. I met quite a few years ago with a member of my country's senate with a solid proposition to end the "piracy" problem. Got an email asking for more info about my proposal. That was the end of it.

PS. Maybe instead "pirates", we should call ourselves "keepers".


The "DNA" of pirate-ish-ness is baked into the US cultural zeitgeist since the 1860s. Freebooters, filibuster, the US excursions into Central America in the 1850s, and the overall Dutch influence on our language and culture still captures some part of us, whether we want it to or not. We're adverse to pirates, even though pirates are universally bad. Unless it's Jack Sparrow. I guess.


As the founder of emuparadise some 22 years ago, I can relate. I got into retrogames because I never got to play those games growing up in India. I thought, well let me archive these games and make them available for everyone else to play.

It was wildly successful. At it's zenith EmuParadise was ranked 700 or so as per Alexa on the entire internet. We're talking millions of visitors per day and thousands of active users every single second. I ran it all by myself with an entire team of moderators, contributors, etc.

It did have ads. Heck, our server bills were in the range of tens of thousands of dollars a month. How could I pay for that without having ads on the site? Then we're in commercial copyright infringement territory. Basically if you get sued, you can go to prison, and you will be bankrupted for sure. At the time there were no torrents, no IPFS, no distributed hosting solutions in any case.

As time went by the stress became enormous. Of course threatening letters and DMCA takedown notices were the norm. And the fact that the site was hugely popular and government agencies such as the FBI could get involved at the behest of Nintendo et al just made it worse. But also keeping it online, through various CDNs, trying to keep it anonymously run at all times (my OpSec was terrible starting out, it started in the year 2000), keeping servers online and uptime to almost 100% and bandwidth flowing and hard drives spinning and RAID arrays working. It was a whole lot of everything all at once and I was just one guy doing it all.

After another website Loveroms got sued by Nintendo in 2018 (for $12MM) I decided I had had enough. Reading stories like the kickasstorrents guy getting arrested while on holiday with his wife and kid, loveroms getting sued, I decided that this was the end of the road for me. I pulled all the games from the site. Eighteen years of work down the drain.

My mental health had suffered tremendously, I was depressed and anxious almost all the time. The sight of a police officer on the street would set me panicking. The cost was too high.

Was it a blast? Oh yes it was. I used to receive thousands of emails from grateful people. Cancer patients who reminisced in their last days playing video games from their childhood, soldiers at war whose only escape was a few rounds of Bomberman (the irony is not lost on me), and so many more beautiful stories of nostalgia and connection.

But current copyright law is going to destroy all this art and culture. There is no real legal way to preserve it. And people like me may do it for a long while, but at what cost to ourselves? I firmly believe that a 7-10 year copyright (extendible even somehow? debatable) would be fair and would let authors get what they need out of their creations. It would help us preserve all this beautiful art and culture that we have enjoyed and share it with future generations.

I would love for a human kid living on a distant exoplanet in the far future be able to play Chrono Trigger and wonder about the history of the earth and our stories.


I just want to say thank you for creating emuparadise. As a kid (~2005) searching for ROMs online, I remember finding a ton of ad-ridden fake sites, endless demands for "voting" on link aggregator sites, and malware downloads. Emuparadise was like a breath of fresh air compared to those sites, and it basically instantly became my favorite ROM site. While not perfect, it actually had all the games I was looking for, and the community actually seemed to care. I was able to play so many classic games that way that I otherwise never would have had access to. (Including Chrono Trigger :)

Emuparadise is also the site that introduced me to BitTorrent, and my very first torrent was downloaded from there. That would get me interested in file sharing. In some way, it's partly responsible for why I'm interested in archiving and links like the OP. I'm sure I'm not the only one.

So, thank you for creating such a wonderful library and community back then! It was a great part of my childhood and adolescence, and it showed me how important preservation and sharing can be.


Oh yes, I always tried to keep it friendly towards our users. There were too many sites out there that just kept you going in loops forever to get to whatever you wanted. Thanks for noticing that :)

Our bittorrent tracker was very short lived. It did well and had some pretty good sets on it! But in 2010 when bittorrent was getting a really bad name right after The Pirate Bay case it was easy to get torrent trackers shut down.

One day I got a downtime alert, I think it was mid-2010. I checked the site, gone. The server, unresponsive. I got in touch with the host. He said he'll check with the data center. After a while he got back to me and said: "German police came in and seized your servers." There had been no notice, no warning, no nothing. Just boom, and gone. I asked him: "How can they do this? What do we do?". He said: "Nothing, they just come and take whatever they want every now and then."

I hired a lawyer in Frankfurt to go and check on the case. He said that they had closed the case with no further process because the person in question was unknown. And he ended that email with: "But Nintendo may try something else".

Until that moment, I had no idea that Nintendo was behind the server seizure. I was relieved that the case was closed. Anyway, I still went ahead and resurrected the site sans bittorrent tracker. YOLO and all that.

For the next 8 years, we never really had much trouble after that except the usual DMCA takedown notice here and there and a threatening legal letter sometimes. But pressure kept piling up. I did consider myself small and unimportant fish to fry (compared to say, The Pirate Bay or even current gen videogame piracy websites) but that didn't stop them from going after LoveROMs.

There was always the chance that one day they would just catch me at an airport or immigration (like the kickasstorrents dude) or something and that would be it. Or the police would just knock at my door. I mean, they would have to know who I was provably but I don't think it would be that hard for a government agency. It was just a matter of time that the powers that be would need to lobby the government to get at me.

I didn't want to live my life like that any more.


The creator of Emuparadise? Feels like meeting a celebrity! Certainly one of my childhood-defining websites. And no, I did not delete my ROMs after 48h :D

I'm glad you didn't suffer consequences from it. Thank you so much for your work!


Feels as though an organization such as this should have more domain appropriate points of contact than Twitter or Reddit.

A very interesting thing nonetheless.


I just dropped here to praise archivists and their merit in general. I treasure content (regardless of its perceived quality) preservation much more than legal or even ethical problems associated with it.

Anecdote: Remember when Microsoft Corp. declared what they love open source software and launched CodePlex platform, and then lost their business interest in it (when they bought GitHub) so they completely erased CodePlex archives? I was able to reach several long forgotten project I was interested in thanks to invaluable work of independent volunteer archivists. (It was quite tough manual job for me, I had to d/l database then locate desired archive segment and only then could transfer required files via bittorrent proto)


Having a local copy of an entire ebook archive is one way we can find information without having to use the Internet. Thus we can avoid being subjected to mass surveillance, which is excellent. I wonder if the archive is full text searchable?

Finally an alternative to this Orwellian nightmare we call the Internet. Can't wait to have a copy at home, and there will probably be times where I'll be pulling the plug on the router with relief. And it's one more step towards reducing my Internet usage, thus keeping the government and corporations out of my life.


This article has me curious as to most people's "op-sec" around personal piracy practices, e.g. torrenting. Do people take requests from family members? How restrictive are you with these behaviors, especially when backed by something like Plex (which presumably just directly erodes any other opsec you may be practicing).


Would you be open to using an apostrophe ’ in your header instead of the straight single quote? I dig the Comic Sans, but it would look so much better. Cheers lol




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: