For context: Amazon Cloud Drive is a Google Drive/Dropbox analog, except that it provides "unlimited" storage space and bandwidth for $5/month. The built-in user interface requires you to manually upload/download files through a web browser, but the service also supports an API for programmatic access. So tools like rclone and acd_cli were developed to let you do bulk transfers and/or mount your storage as a network filesystem, optionally with encryption (which defeats any attempts Amazon might try to use for deduplication).
Now Amazon is suffering the obvious consequence of offering unlimited storage: people are using it to store tens of terabytes of media and/or backups at very low cost. In an attempt to kill off heavy users, they shut down registration of new API keys several months ago, and now they're systematically revoking the API keys used by popular open-source tools.
Yeah, this is a hard one. There are users on /r/DataHoarder/ that claim to have uploaded literally 100s of (encrypted) TBs to Cloud Drive, which is plainly unsustainable and abusive.
On the other hand, they've also killed the product for a lot of more legitimate users as well. The Amazon web interface and apps for Cloud Drive are obnoxiously terrible, and Rclone really is just a better way to use it. I've been using it to sync 10s of GBs of photos between all my different computers, but with Rclone unavailable, I'll have to fall back to Google Drive, S3, or some other option (the unlimitedness of Cloud Drive was good peace of mind).
I'll be keeping an eye on it over the next few weeks to see whether shipping a binary with OAuth secrets was actually the reason for the ban, or just a pretext for getting the Rclone users off the service (personally, I suspect the latter).
I just use it to upload daily encrypted backups of my mail-server (< 500MB per month)... so I wouldn't mind if they set some reasonable limit to encrypted uploads (say 1-10TB).
I feel like anyone who's actually uploading personal content and who isn't uploading media files that are amenable to deduplication would be comfortable with some threshold as well.
Since encrypted data is indistinguishable from random noise, I think poor compressibility is actually zero compressibility, isn't it?
There are tools to search for TrueCrypt / other encrypted partitions on disks, so it's a solved problem to detect encrypted data.
It would be unfortunate if services ban the ability to upload encrypted secrets, though. On the other hand, that'd be good for Tarsnap. I wonder how much it'd cost to store 10TB on it?
The first billion of digits of Pi might look pretty random, but there might be a short program which genrates them - which can be considered a compressed form. In general, it is impossible to decide how good a given string might be compressed.
https://en.wikipedia.org/wiki/Kolmogorov_complexity
Since encrypted data is indistinguishable from random noise, I think poor compressibility is actually zero compressibility, isn't it?
I don't think so. Random noise can take any form - even of a string composed entirely of zeros, which would be trivially compressed. It's just very unlikely that it'll actually be compressible.
The definition of a random number is in the process of generation. You have to actually generate random numbers if you want to have a random number that meets some criteria. And even with impossibly vast resources, you will never find a random megabyte that compresses well.
Well, sure, that's what I wrote: "it's very unlikely that it'll actually be compressible." But it's incorrect to claim that random numbers are by definition non-compressible.
It was in the context of encryption. Even a 64 digit random number is very unlikely to be compressible, and 64 bytes is about as small of an encrypted partition as you'll ever have.
It'd take longer than the universe has left to find a string of N zeroes by generating random numbers, for sufficiently large N. And N is surprisingly small.
Yes zero. As zero as zero can possibly be, measuring with the most precise instruments possible. There's an infinitely better chance of both of us being struck by lightning and imagining you found such a number randomly.
It wouldn't be zero in certain math worlds. It is zero in the real universe.
The nice thing about Rclone is that it already supports Google Cloud Drive (and a host of other providers) so the technology isn't the problem.
I'm lamenting Amazon Cloud Drive in particular because it was the best deal. $60 a year for unlimited storage and with no caveats (or so it seemed before last week).
For those who don't recognize it, that's from the movie "Real Genius". For those who don't quite remember it that way, that looks like it is the version from an early version of the script. By the time the movie was actually made it was changed to Frito-Lay from McDonald's, Hopsfield was changed to Hollyfeld, and the above dialog was altered slightly.
That scene is based on a real life incident which in fact involved a McDonald's sweepstakes and Caltech students entering over a million times: [1]
Since everyone knew that the school in the movie, Pacific Tech, was meant to be a thinly disguised Caltech (it only became Pacific Tech when Caltech objected), and McDonald's had not been happy with the Caltech sweepstakes prank and probably would not want it brought up, my guess is that one or both of McDonalds and Caltech asked for the change.
Changing it to Frito-Lay is an interesting choice, because six years before the McDonald's sweepstakes, a group of Caltech students tried mass entry on a Frito-Lay sweepstakes, but apparently were not as successful.
Crap. I knew something seemed off about it. Thank you!
====
Lazlo: No. These are entries into the Frito-Lay Sweepstakes. "No purchase necessary, enter as often as you want" - so I am.
Chris: That's great! How many times?
Lazlo: Well, this batch makes it one million six hundred and fifty thousand. I should win thirty-two point six percent of the prizes, including the car.
Chris: That kind of takes the fun out of it, doesn't it?
Lazlo: They set up the rules, and lately I've come to realize that I have certain materialistic needs.
abuse: use (something) to bad effect or for a bad purpose; misuse.
Abuse doesn't mean breaking the rules. It's like going to an all-you-can-eat buffet and staying for a week. Or taking a job with unlimited vacation time and coming to work once a month.
I bet in the ToS there are likely limits spelled out, or barring that, a clause that grants Amazon to right to shut you down if they unilaterally decide you're "abusing" the system.
$5/month for unlimited access via API was really a steal. This would buy you only 217 GB via standard S3 storage, and then you'd have to pay an exorbitant $0.090 per GB for egress data transfer; you'd burn through your $5 after downloading 55 GB.
I'm a bit skeptical about how much deduplication saves them. I have a couple of 100 gigs on ACD, unencrypted, most being family photos/videos - I don't think they're going to find any portion of those that's remotely common to other users - at least, I hope not!
If it's file-level deduplication, sure. But if it's chunk-level, I think you may find that you actually have many chunks identical to many other users' chunks.
> In an attempt to kill off heavy users, they shut down registration of new API keys several months ago, and now they're systematically revoking the API keys used by popular open-source tools.
Actually this seems to have been more triggered by a serious issue with acd_cli's authentication server that resulted in its users seeing other people's files:
I think DropBox will shut down their API in July of this year. I got email warning me about six months ago - since then I moved our internal system to FTP...
Now Amazon is suffering the obvious consequence of offering unlimited storage: people are using it to store tens of terabytes of media and/or backups at very low cost. In an attempt to kill off heavy users, they shut down registration of new API keys several months ago, and now they're systematically revoking the API keys used by popular open-source tools.