Hacker News new | past | comments | ask | show | jobs | submit login
Amazon Drive removing unlimited storage plan (amazon.com)
352 points by some-guy on June 8, 2017 | hide | past | favorite | 312 comments

It's frustrating to watch cloud companies subsidize their storage, in order to break into the market with a product that is too good to be true.

This strategy is the worst possible scenario for our entire industry. Users feel slighted, and the trustworthiness of the cloud in general is gradually eroded, as the scenario plays itself out over and over.

It feels like ransomware. Pay more or we'll delete your files.

When Dropbox for Business launched, it was $12.50 / user / month for "all the storage you need". Just recently Dropbox announced pricing changes, which will take effect in 2018. The new unlimited plan is $20.00 / user / month. And for those unwilling to pay, there's now a fixed storage tier, which is slightly cheaper than the original price, but it's capped at 1 TB.

Microsoft OneDrive included unlimited storage with any Office 365 subscription. After millions of users bought in, Microsoft dropped the maximum storage to 1 TB. Users were then given the choice of deleting their files, or moving elsewhere.

Mozy had an unlimited plan, and then dropped it and raised prices. SugarSync had an unlimited plan, but eventually dropped it.

Barracuda offered virtually unlimited cloud storage with their Copy.com service. A few years in and Barracuda shut down the entire service. Users were given very little notice, and had to move elsewhere.

Bitcasa, one of the original "unlimited" cloud storage providers (and a TechCrunch Disrupt Battlefield finalist) crashed and burned three years in. Again, users were given very little notice, and there was talk of a class-action lawsuit.

Time and time again we're seeing startups burn through their capital subsidizing the storage as some sort of brilliant marketing plan. It's not.

Disclaimer, I work at https://www.sync.com

The trouble here is, I think, "unlimited" tends to just mean "a lot" and for Average Joe that's fine. If you go to an all-you-can eat buffet, you will eventually be unwelcome after you have 20 plates stacked on your table. Is that a business problem or a customer abuse problem?

The ugly fact is the Unlimited Amazon Drive has been abused by media pirates and data-hoarders to store up to 20TB (and sometimes more) at what any reasonable person would say is an unreasonable cost. Not to mention, much of this media is accessed and streamed often. Wander over to /r/PlexShares to get an idea: imagine if someone had 20TB of 4K and BluRay movies stored on your service @ a very generous $60/yr, streaming that data 24/7 to 10, 15, maybe 20 people via Plex (with each streamer paying the media manager $10 a month) all over the world. While torrenting all day. Suddenly that sounds more like abuse IMO.

Of course, I agree that you shouldn't call something "Unlimited" if it isn't. But it's not a one-sided issue and I thought I'd bring that up. I don't personally know anyone, in real life, who uses Amazon Drive. Most people don't even know it exists. The only time I see it discussed, especially the "unlimited" tier, is on /r/datahoarders and /r/seedboxes as an exploitable deal.

As far as I can tell, Prime Photos will remain unlimited. I wonder how long it'll be before media hoarders hide their content in Google/Prime Photos with a convenient CLI tool?

> The ugly fact is the Unlimited Amazon Drive has been abused by media pirates and data-hoarders to store up to 20TB

Somebody on reddit bragged about reaching more than 1PB [0]

> Of course, I agree that you shouldn't call something "Unlimited" if it isn't.

I am maybe nitpicking, but it really was unlimited. Nobody is being billed for exceeding a given threshold or stopped from using it. The service is being discontinued and people will not be able to renew it. They never said "forever" :-).

It's like I got a special deal from my gym for unlimited access. If next year they won't offer it anymore I cannot say "it was not unlimited".

> As far as I can tell, Prime Photos will remain unlimited. I wonder how long it'll be before media hoarders hide their content in Google/Prime Photos with a convenient CLI tool?

In another thread on reddit somebody was already talking about that, so I guess it won't be long.

[0] https://www.reddit.com/r/DataHoarder/comments/5s7q04/i_hit_a...

> It's like I got a special deal from my gym for unlimited access. If next year they won't offer it anymore I cannot say "it was not unlimited".

I don't think this analogy works. If you used a lot of space, or really any space over the amount that they change to as the upper limit, that data is now at risk. Some places will make it read only until you move it off (up to a certain amount). Most will simply delete it after a certain amount of time if you don't move it.

Your gym can't really take back an amount of your previous, unlimited usage of the gym. If you have legit uses where you're way over the limit this can pose a real issue. I ran into this when storing photos and videos on Microsoft's OneDrive after they lowered the crazy high amount they were given...I eventually decided to pay the higher fees to cover my data so I wouldn't lose it until I had more time to move it all off.

I no longer use OneDrive.

> It's like I got a special deal from my gym for unlimited access. If next year they won't offer it anymore I cannot say "it was not unlimited".

That is a very anti-consumer way of looking at it. Storage, especially for businesses, is not like a gym membership. These bait and switch tactics are harmful for the consumer as well as the industry itself.

You place a certain trust with data storage companies. Alot of media companies can easily have 30 terabytes of data to backup or share. They kill the unlimited plan with essentially a price hike. Now I am wondering "when is the next price hike coming"?

Amazon has already cut their CLI interface and their web interface is terrible. I would rather just keep it on a NAS with NextCloud.

> That is a very anti-consumer way of looking at it. Storage, especially for businesses, is not like a gym membership. > These bait and switch tactics are harmful for the consumer as well as the industry itself.

The point of the comparison was to stress the fact that unlimited amount of space doesn't necessarily mean for an unlimited amount of time.

> Storage, especially for businesses, is not like a gym membership.

I though that the unlimited plan was only for personal use. If it was open to businesses I understand why it became so quickly a money sink for Amazon.

I don't think you can call it bait and switch. That would imply that they offered you unlimited but only gave you a limited amount. They did allow unlimited storage, they have now decided to remove this product and offer something else in it's place. The consumer can choose to stay or leave. It's difficult to leave, but it's the same kind of deal when your apartment's lease is not renewed. I can't really see this as anti-consumer

I assumed bait and switch now encompassed this marketing tactic. If you know the exact term this tactic is called please let me know.

> The consumer can choose to stay or leave. . It's difficult to leave, but it's the same kind of deal when your apartment's lease is not renewed. I can't really see this as anti-consumer

Your argument is simply: consumers deal with something like this for an unrelated industry so it is not anti-consumer. That is not a good argument.

edit: also, renting is a very poor example. There are laws that govern how much rent can be raised that vary based on jurisdiction. If renting wasn't anti-consumer why would such laws exist? No such laws against gouging against for data storage which undermines your argument.

You place a certain trust with data storage companies.

I think the point is that now we know not to.

Noah has been ranting about this long before Amazon and Onedrive did this.

I don't have a whole lot of sympathy for the argument that "we meant a lot but UNLIMITED was a much better marketing term, so we said that."

You are missing the point. Amazon's service was truly unlimited

> Amazon's service was truly unlimited

It is not and never was. Check TOS

> 5.2 Suspension and Termination. Your rights under the Agreement will automatically terminate without notice if you fail to comply with its terms. We may terminate the Agreement or restrict, suspend, or terminate your use of the Services at our discretion without notice at any time, including if we determine that your use violates the Agreement, is improper, substantially exceeds or differs from normal use by other users, or otherwise involves fraud or misuse of the Services or harms our interests or those of another user of the Services. If your Service Plan is restricted, suspended, or terminated, you may be unable to access Your Files and you will not receive any refund of fees or any other compensation.


But did they ever kick anyone off? They didn't kick off people using hundreds of terabytes or more. And abuse language is there on every ToS. I'd say it truly was unlimited while they were selling it.

> But did they ever kick anyone off?

Sure they did. Saw plenty of people complaining about it on r/DataHoarders months before this big cut-off happened.

>it's not a one-sided issue

It's totally a one sided issue. Abuse and fraud are not specific to Amazon, they are attempted at every business and industry under the sun.

It's just not credible to believe they initially thought the offer was long term sustainable without any limitations, protective terminations for abuse, etc. This is no more complex than what it appears. A deliberate, and effective, marketing campaign that would eventually have to come to an end.

Maybe they could at least use some kind of old school caveat like "while supplies last", or "until the rate of user acquisition is strategically outweighed by the subsidies required to absorb our losses".

I think without abuse, for most users it was sustainable. If I had to guess, the average HDD size being sold on a budget PC today is closer to 250GB (I've been seeing plenty with 32GB EMMC drives).

That's 4 PCs worth of content without going over the limit, and post non-technical users are probably using even less than that.

They could probably account for outliers using way more, but as long as most users were "average" users backing up word docs and family pictures they'd be fine.

The problem is "data horders" latched on much harder than average users. I wouldn't be surprised if most of those "average" users aren't affected by this change at all. I can't imagine my non-technical parents (for example) generating 1TB of data to back up very easily at all.

But they aren't "abusing" the system if the system offers "unlimited" bandwidth/storage/whatever. Companies are being very dishonest because none of them actually mean "unlimited" when they say "unlimited". They should have just said 1TB from the beginning. But they were trying to fool people into signing up because they used -- disingenuously -- the word "unlimited". So screw the companies. They shouldn't say things they don't mean.

My point is more, to the average consumer it really was "unlimited". When you can backup 4 lifetimes worth of your PC with a service, for "practical" purposes it is unlimited

Well what I'm saying is that companies should then offer 1TB of storage, not "unlimited" and then yank the rug out from underneath people that are actually using the "unlimited" storage they were advertised. The storage is not unlimited if any consumer can get to a point where they're using too much storage. The companies then call these people "abusive", but I personally don't think it makes it any less scummy of these companies if we can find some value of "unlimited" (like "practically unlimited") that sort of fits the story of them pulling out of their promise. It's the companies that are being abusive, not the consumer.

I guess we just don't agree on that.

To me unlimited is impossible right off the bat if you want to take it literally (there must be some finite limit to how much free storage Amazon has). Both sides in the agreement have some definition of unlimited that is less that unlimited, and to me hosting TBs of pirated content and porn is going past what a reasonable definition of unlimited and turning into abuse. If only the minority of users with legitimate TBs of data of their own creation to back up had used it I doubt AWS would have had trouble profiting without storage limits. But with people abusing it (or using it as piracy storage or mass internet backups if you want to claim that's not abuse) I don't see why they shouldn't have put an end to the plan.

all-you-can eat buffets work because each person has to pay seperately and the amount any person can eat in one setting has hard physical limits that are fairly low.

Unlimited in the computing world usually has practical limits many orders of magnitude above normal usage, making outliers much, much more expensive. But you can't substantially increase the price because you can't afford to lose your normal customers and be left with only outliers.

I have seen ACD accounts storing more than 1000 TB of data (mostly collected by scraping porn sites) for a mere $60 a year.

If you offer unlimited, expect people to take advantage.

Some data hoarders can be acommodated since the service only cares about average usage per user.

Also data deduplication works even for data hoarders if they are hoarding media commonly shared on the internet.

Given the above points, I think there is no basis to call it "abuse".

They are encrypting it to avoid detection of this TOS violation.

Are there any other alternatives to Amazon that still offer unlimited storage?

You forgot one more reason this is crappy behavior. It unfairly crowds out startups who want to go with a sustainable business model from the outset.

I realize "fairness" is not an excuse for inability to compete with the marketing of other companies. But if a company wants to test customer appetite for a service with pricing that is upfront honest, stable, and bound to only get better over time, there's never enough time to give it a proper shot.

This is just the technical equivalent of dumping or subsidizing, and has become fairly common in quite a number of industries to keep out competitors. The complete lack of regulation enforcement has been a problem for at least the last 40 years. To me it also seems to be a hallmark of the SV venture capital scene, where the majority of the capital isn't used to build a better product but to flood the market for a few years to gain a controlling share.

You should never trust unlimited anything. It's never really unlimited. They should really just offer a "as big as a reasonable person could want" package. For a household that wants to backup some phones and PCs and the family photo album, you can offer 5-10TB of storage and they'll never fill it up. But eventually you know that a handful of power users are going to abuse the system to the point that you have to stop.

I like how Google did their storage. They started off with a something that was (at the time) insane amount of storage for email, like 1GB. Then they slowly increased that over time as storage costs decreased. It was never unlimited, but it seemed unlimited compared to their competitors, who were often at 1/10th or even 1/100th the space.

Granted that's more difficult to do today as more players have embraced the concept. Google Photos (formerly Picasa) came out with unlimited storage for photos, and Amazon Cloud Drive is a response to that. ACD still offers unlimited photos, but they should never have tried offering unlimited "other files" storage. They have been doing cloud storage to technically adept people for years. I don't know why they thought they could offer a rather similar service, mark it as unlimited, and not expect people to use the cheap unlimited option over their per-use pricing plan.

But I say that they're not "abusing" the system as long as the company is offering "unlimited" service. They're actually not offering "unlimited" service, so it's the companies that are abusing the buyers, not the other way around. If companies offered like you said, a "large enough" package with some limit, instead of "unlimited", then I would absolutely agree that they're abusing the system.

I disagree on the effect this has on consumers. I am a paying customer of several cloud storage services, 2 of which you mention.

The effect on me of the temporary unlimited tiers is that I get to know my data usage requirements for a while before I have to choose a tier.

It's really a very welcome intro to cloud storage. I can easily see that even with truly unlimited storage I only used xxGB and therefore can choose the appropriate tier with confidence that my costs won't jump.

One surprise about these unlimited deals is that I'd rather pay Apple for photo storage than use a third-party service that is free (or included in Prime, e.g.). I tried multiple free photo storage services, but the convenience of Apple's offering far outweighs the cost saving.

If you want unlimited cloud storage the G Suite $10 / month plan is still unlimited. (Actually it says 2 PB in Google Photos when backing up original size photos & videos.)

"unlimited" is such a stupid word. TANSTAAFL. There should be an advertising law against the word "unlimited".

FWIW, backblaze still has unlimited storage. Of course it's cold storage, so abusing that is harder.

To be fair, this is not "pay more or we'll delete your files."

That would be a terrible way to treat customers. Customers will simply lose the ability to upload new files. All their existing content will still be there.

> You have a 180-day grace period to either delete content to bring your total content within the free quota, or to sign up for a paid storage plan. After 180 days in an over-quota status, content will be deleted (starting with the most recent uploads first) until your account is no longer over quota.

Looks like that's not the route Amazon took.

Yup. Totally agreed.

But duuuude. How much did you pay for that domain?! NOICE.

Back in 2013 we originally launched with the domain name "Sync.us" (we didn't have a choice, the .com was taken). Instant regret, as it made branding difficult, and our marketing efforts simply drove traffic to the many, more recognizable sync branded TLDs.

Then a few months later, one of our team members inadvertently visited Sync.com (a common Freudian domain name slip for all of us), and discovered the domain was up for auction. Instead of subsidizing our storage, like most of our competition was doing at the time, we diverted a substantial portion of our marketing budget towards winning the auction ;-)

It wasn't cheap.

I found out about OneDrive for Business which on some plans lets you bump up beyond 5TB by calling support. I wonder how that works.

If you're cynical you'd think "of course they'd offer a free version and then make you pay for it." But that's basically the cloud storage market in a nutshell. All the cloud storage providers are in a race to the bottom, and once they have your data they want to upsell you on other cloud technologies. If you just need to store data, that's actually great news, because these providers are working hard on the optimizations that make it possible to give you storage as cheap as they can.

Cheap options are Glacier at $0.004/GB, Backblaze at $0.005/GB, GCS Coldline at $0.007/GB. That's the monthly cost for bare cloud storage with no egress. Anything cheaper than about $50/year for 1TB will fall in one of two categories:

1. Subsidized by the provider, but probably not for long. This has happened so many times I can't even remember them all, Amazon is just the most recent.

2. Strictly DIY, probably with a much higher chance of data loss than you realize.

Before you mention "Cheap options" and Amazon Glacier, read this: https://news.ycombinator.com/item?id=10921365

"How I ended up paying $150 for a single 60GB download from Amazon Glacier"

> If you’re retrieving more than 5% of your stuff, expect to pay a fee, “starting at $0.011 per gigabyte”.

Key words here is "starting at". as in, no cap on how much they can and will charge you for your data. Cause you know, 60GB is realllly expensive to transfer :(


What Glacier seems to be for, is more like compliance paperwork and other "We are required to save this for X years" kind of paperwork. Because most of those types of documents, you don't care about and only legally have to keep available. And if a court case comes along, the Glacier retrieval price can just be added on top of court fees (which in that case, Glacier is probably a pittance even at the $10k level).

There's a lot of articles about these things that seem to have the same summary: "I didn't read the documentation for how pricing works and then I was surprised by how pricing worked." Some people were shocked by Glacier's egress pricing and others were shocked by the cost per file. Both of these are approximately zero, most of the time, for most people, if they know what they're doing. If you don't know what you're doing and haven't read the docs you shouldn't be using Glacier.

> Cause you know, 60GB is realllly expensive to transfer :(

Network and disk IO capacity planning is hard. You can't just buy GBs of egress at $0.01 and then resell them at $0.01, because you're paying for maximum capacity and turning around and selling average usage. Similarly, when you have a bunch of data shoved into unused sections of disk, you can't just read them back out without affecting whatever else is reading from the same disk. If you want to sell something close to cost, you need to reflect the pricing structure of what you're paying to your customers.

So if you upload a bunch of data to a super-cold storage system that's even named after a geological formation that stays frozen for centuries, to remind you how cold it is, and then you make it hot by trying to download it all at once, you'd expect it to be more expensive.

Just do your regular cost-benefit analysis and it should be fine, based on how long you plan on storing the data and how likely or often you think you'll need to restore it.

The root problem here is , that Amazon Glacier is as opaque as mud about pricing. Or did the words "Starting At" not pique your buzzword bullshit detector? It certainly did mine.

Pricing means sifting through a legalese dense block of text to try to come up with even a way to estimate the costs. That is a problem. I should be given a rough cost expectation up front. Sure, it's not going to be exact and all, but This article is about a cost factor of *180.. That's the difference between an iPad and a Ferrari.

Other places of varying types of backup systems are a lot more clearer to tell how much I should be expected to pay. Or at least, I get within +/- 10%. Glacier? "Yeah, Don't worry about it till we bill ya!" That's the scam with all these cloud services really. Once we have your nuts in a vice, we can squeeze and extract, cause you don't have any other choice, now do you?

I think the information you're working with might be out of date, I don't see the words "starting at" anywhere. Glacier egress pricing is $0.01 per GB and $0.05 per 1,000 requests. That's for standard retrievals. Not so complicated.

Before Feb 2017, it did work with the peak hourly request fees. Those were well documented. Apparently peak transfer rates were "confusing". I didn't think so, after all, that's similar to how I pay for internet access at home (I pay based on capacity, not based on bytes transferred).

Glacier pricing has changed since that post. It's now a much simpler model, around $0.10/GB to download. The original pricing was very complicated though!


So how would you categorize CrashPlan? They offer unlimited storage for $60/year. I've had around 8TB backed up there for the last 2-3 years. They have existed since 2001 (no idea how long the unlimited feature has existed).

On the other hand, supporting your point #1, I used to have several TB on StreamLoad which 10ish years ago used to also offer unlimited... until they suffered a critical hardware failure and lost tons of customer data. They then changed to MediaMax, then suddenly disappeared.

Perhaps you know of others, or this list is incomplete, but Wikipedia has a list of file services, of which there aren't many who offer an unlimited plan:


CrashPlan is "unlimited" but it's not a storage service, it's a backup service. From what I understand, you have to use their client software, and that software will use RAM proportional to the total size of the data you're storing. In practice this puts a hard limit on the amount of data which is proportional to the amount of RAM you have.

So heavy users are not only capped, they're also subsidized by the light users.

I've got about 12TB on Crashplan, and the client's never used more than about 2GB. I doubt your "cap" applies in practice.

So about 170mb/TB if this scales linearly for some reason.

Some users were apparently bragging about having 1PB+ on ACD. That would mean the RAM usage would actually be a useful upper bound, as I doubt many of these guys have 170GB of RAM in their media server PCs.

In addition to the comment below about CrashPlan requiring you to use their client software, which limits abusabiliy -- I am also seeing signs of strain from CrashPlan. They have always had a policy that claimed they would delete data from machines that hadn't touched the service in a long time, but they had never implemented this in practice before; just last year they finally started sending out threatening letters to people saying they're going to delete data on short notice. (And they didn't really back down when people got angry.) This tells me the storage expense is starting to hurt them.

Yep, I cancelled my CrashPlan account when they pulled that. IIRC they gave me under a week to either download the data for 5-6 machines, or lose it forever.

> So how would you categorize CrashPlan? They offer unlimited storage for $60/year. I've had around 8TB backed up there for the last 2-3 years.

CrashPlan's upload speed is a joke for people located outside North America. I had an account for ~3 years, but found it difficult to upload more than 6TB. (There are rumours that CrashPlan throttles uploads.)

On the other hand, I've managed to backup most of my ~15TB NAS (and more) on Amazon Cloud Drive.

I'm in the UK and evaluated CrashPlan a couple of years ago - IIRC, it was so slow it was going to take several weeks to upload my 1TB of data.

Crashplan is a backup service. Different category of company.

Ever hear of Sia? That's a new decentralized option gunning for Amazon. Website is sia.tech.

It's very difficult to take Sia seriously. For one thing, Sia only has 25T of data. That's total data, in the entire global network, including parity blocks. And when I checked prices today, it was more expensive than Glacier—but I can only imagine how much the prices would swing if you suddenly bought 2.5T of storage.

Sia's decentralization is not unique. Amazon, Google, and Microsoft cloud storage are also decentralized. The idea of reselling unused bytes on disks that you already own is also not unique—again, Amazon, Google, and Microsoft already do this.

Sia's decentralization is unique. It's unique not because it spreads data all over the world, but because there's no central player that controls prices or decides to shut things down. Amazon, Google, Microsoft, they can change terms or disable services whenever they want. Sia is a lot more robust, because it's governed via a blockchain, and you put your data on 30+ individual parties, making it extremely unlikely that a sufficient number of them all shutdown simultaneously.

This situation with retiring unlimited storage on ACD would not be possible with Sia. Amazon is changing terms and conditions, Sia is a blockchain where no party has the power to do that.


To address your pricing comment, there are more than 1000 TB for sale on the Sia network. Adding 2.5 TB to the network would not move the price at all, that's really not how pricing works on Sia.

The reason that it's more expensive than it used to be is because prices are set entirely in Siacoin. When the siacoin prices rise, hosts need to manually re-adjust their prices to keep the same USD price. As of writing, there are no tools that will let this happen automatically. The siacoin price has doubled 6 times in 6 months. The result is that a lot of the defaults are now grossly expensive, and hosts need to be very familiar with the pricing mechanisms to respond accordingly. Most have not, though the ones that have are seeing much higher utilization than the ones that haven't.

We will be releasing stuff in the next month to help hosts set prices more intelligently, that should move the prices back to the competitive spot that they historically been at. The high prices right now are merely a result of market confusion among the hosts.

Hi Taek, Sia recently came on my radar, and it seems pretty interesting usage of the blockchain, and I've started following it.

Is there a good place to keep an eye on the roadmap for the technology (with info like what you had in the last paragraph)?

> The siacoin price has doubled 6 times in 6 months.

Presumably this represents entirely speculative activity driven by the other fashionable blockchains?

I'm sure that's part of it, but you can't really avoid that. But there are a ton of altcoins out there now, and most of them don't have interesting tech behind them or much to distinguish them from any of the others. Sia seems to have interesting tech behind it, to me at least.

Have to be careful with Glacier: the retrieval costs can be very high and are extremely hard to calculate.

Amazon S3 Infrequent Access has higher retrieval costs, though not as crazy as Glacier, and a delete penalty. Deleting a file after 1 day will cost 30x normal S3 storage fees.

Likewise, Coldline sounds great, but they have higher retrieval fees (not so bad) and a delete penalty. The Coldline delete penalty will cost 90x the regular storage cost if a file is deleted the next day. I don't recommend Coldline unless you know you won't be deleting files before 90 days.

If it is backup, it doesn't need to be super safe. The risk that you lose both your primary data, (optionally your local backup) and your online backup at the same time is pretty insignificant, given that the online backup will be uncorrelated as in a different physical location (primary and local backup very correlated I agree).

> The risk that you lose both your primary data, (optionally your local backup) and your online backup at the same time is pretty insignificant, given that the online backup will be uncorrelated as in a different physical location (primary and local backup very correlated I agree).

Actually, that reasoning is exactly what I wanted to talk about. When the primary is lost, there is a much higher chance than you'd expect that a secondary contains unrecoverable errors. This is why RAID 5 arrays fail to rebuild after a disk failure so often—they're supposed to be able to tolerate a single disk failing, but they can't tolerate a disk failing and any other IO error at the same time. Part of this is due to how short the timeout is for failed reads in RAID setups, but I've still seen a lot of RAID 5 arrays fail, and I've seen a few RAID 6 arrays fail too.

On top of that, there's the high chance of configuration errors in DIY systems.

Agree. Having a script doing a full read of all the data every couple of months and sending a report by email (that you would notice if not sent) is a sort of must have.

I am still looking for a good way of doing this.

Weekly integrity check of all backed up data. E-Mail which informs of result. Web interface which shows overview of results of historical checks. External service which sends E-Mail if integrity check failed to run (e.g. https://deadmanssnitch.com/).

FreeNAS gets fairly close to this out of the box. My home server runs ZFS with RAIDZ2. By default, I think, there's a weekly cronjob to scrub the ZFS pool (integrity check everything, as I understand it), and then the results of that scrub are emailed to me.

I don't believe it has a web interface with historical checks, although I could be wrong. That said, it might be stored in a log file somewhere.

I also don't have an external service that would send me an email if it failed to run. That said, I would get an email if cronjobs had a mysterious error; otherwise, if the server itself was dead, my data would not be accessible on my home network, so I'd notice.

If the home server dies tragically, well, I hope Google Cloud Storage is doing similar integrity checks -- that's where my offsite backups are.

Integrity is harder because you have to sort of maintain a signature of every file on the side. What I do is to just have a script that reads all data. If some data is unreadable an exception will be thrown and a text message will be sent to me. And a confirmation email is sent otherwise.

In .net it's only a few lines of codes. Haven't thought of deadmansnitch but it's a good idea. Would just take one more line of code.

git-annex has fsck command that can test the data and rclone remote so you can use any configuration of "cloud" storages to hold the chunks.

I would argue the opposite. When you reach for your backup, you are in a dire situation. If it fails you then, things will be very gloom.

The problem is sometimes you only find out your backup is corrupted when you need it.

If you don't test your backups, then they're not backups.

yup, the assertion that both failing is highly unlikely is under the assumption that you are checking the status of each copy regularly.

In fact for any disk, it is important to have a script that reads all the data at least every couple of months. It will force the bad sectors to be identified or to be notified you have a bad disk before things get worse.

Yeah, agree. And I am not talking about enterprise level of reliability. More like personnal / small office.

But for anything above a few TB, running your own hardware will be way cheaper if you take say a 5y horizon. Disks are so large today that you don't need a big config. In fact you might even keep two copies to reduce the risk.

Of course there is the occasional trip to the datacentre. I made that mistake. I pay more in uber than I saved by picking a datacentre far away.

Do you need co-location though? I dropped an HP Microserver at a friend's place, which backs up snapshots of my main backup server at home daily. Seems to work well so far, at quite limited costs. Since my main backup server also backs up his NAS we both win. Since all backups are encrypted there is also little privacy risk from theft (or snooping on one another).

With 1Gbps symetrical connections slowly becoming more frequent this would be a much cheaper alternative (assuming you only run it for backups, I also run a mail server and some websites). But in London it is not very practical. I know very few people who have really fast broadband.

My backup contains files and old versions of files I no longer have on the primary storage. It is much more than a 1:1 copy, and can not be recreated.

I need to feel safe that I can reasonably undo my changes on the primary storage, even if it takes me years to realise what I did.

This is why I keep two backups. Until now, the secondary offsite copy was on ACD...

That's not a backup. That's an archive. You need to backup archives the same way you do any other data you care about.

If it is backup, then it needs to be scrubbed regularly to detect and repair bit rot.

Ah, the old "give them a bunch of storage and then ask for more money to keep storing it" meme.


My annual cost will jump from $60 to $180. That's too much for simple offline backup , so it's time to start looking for options again :(

Glacier may a more affordable option, but my experience a few years ago has been terrible.

Any suggestions? Google Drive is also pricey ($240); Crashplan is incompatible with NAS, and tarsnap is out of question (>$6,000/year).

I personally run syncthing on several devices, and don't worry about the cloud. It's self-hosted, devices replicate files between themselves, and there's no real limit other than hard drive space. It runs on just about anything too; several of my backup systems are Raspberry Pis.

It can be a bit weird to set up initially, and is a lot less magical in the interest of putting you in control for privacy reasons, but the flexibility added is pretty useful. I have a music folder that I sync to my phone without needing to pull the rest of my backups along with it, since they wouldn't fit anyway. Several of my larger folders aren't backed up on every single device for similar reasons, but some of my really important smaller folders (documents, photos, regular backups of my website's database) go on everything just because it can.

Anyway, check it out. Highly recommended all around: https://syncthing.net/

If you accidentally delete some files (even all the files!), won't Syncthing delete all the "backups"?

I don't use Syncthing, I use an rsync script I wrote over 10 years ago, using the --link-dest option to keep incremental backups for around 2 years.

This relies on Zsh's fancy globbing, but the gist of it is:

    date=$(date +%Y%m%d-%H%M)

    [for loop over users]

    older=( $backups/$user/*(N/om) )

    rsync --archive --recursive \
        --fuzzy --partial --partial-dir=$backups/$user/.rsync-partial \
        --log-file=$tempfile --link-dest=${^older[1,20]} \
        --files-from=$configdir/I-$user \
        --exclude-from=$configdir/X-$user \
        $user@$from:/ $backups/$user/$date/

Syncthing has options to store versions of files so that scenario is easily avoided: https://docs.syncthing.net/users/versioning.html

Unfortunately, in my experience, Syncthing's versioning mechanisms leave much to be desired compared to what I'm used to from Dropbox. AFAIK all of Syncthing's versioning schemes only keep versions of files that have been changed _on other devices_, and not those that have changed on the device itself, whereas what I'm looking for is an option to keep a synchronized version history for all files on all devices, and the ability to more intuitively roll back and roll forward the state of any file to any revision without having to mess with manually moving and replacing files and reading timestamps (better yet would be the ability to do so for entire directories, but I realize this would probably be very difficult to accomplish across devices in a decentralized manner).

I used a similar script for a long time but I'm using now rnsapshot.

For me, one of the main benefits of cloud-based backup is that it's off-site - so if my house burns down, my data is still safe.

Think about a Media Safe. Some are really expensive, but this one is not too bad. Just a really small storage area. https://www.amazon.com/First-Alert-2040F-Water-Media/dp/B000...

What about break ins? Someone enters your place and steal your NAS (and the Media Safe)...


That's my primary use case for Amazon Drive. I have a robust rsync of the workstations and laptops to a NAS, and then to a second (incremental-only, no delete) NAS. Works great, but if the house burns down, or if someone breaks in and steals the computers, I want to ensure there's a copy somewhere.

If that is your main concern you could always put it on an external drive and put it a bank safe deposit box. I've thought about doing that for at least the very important things, perhaps even printing some important pictures too.

You just need another house to burn down.

Don't you have friends or relatives at a reasonable distance who can set up mutual backups on each other's home servers?

Yes, but nobody else with a FTTC internet connection with an unlimited bandwidth allowance (I'm in the UK). I have 2TB of data, so speed is important.

I don't have a single friend that has a home server. Most adults don't even own computers anymore, just phones and perhaps an iPad.

A good scenario is building a backup server/nas solution that you can put in a little cubby at your friends place. There's trust involved that you're not using their internet to hack the government, and you have to be mindful of their bandwidth/power costs. So not a rackmount server or even a tower, but something much smaller and very appliance looking. A nuc sitting atop a wd passport or their "my book".

If it provides them a benefit like an in-house plex server, even better.

Another option would be to rent a safety deposit box at a bank for $25 per year and store your backups there as flash drives. Cheap and very secure.

Of course it requires you going to the bank regularly to update the backup.

I've moved mostly to syncing through Syncthing for my devices too, but I'm curious what people use for sharing files with others and accessing files through a browser on machines you don't control?

So you're one house fire away from losing all your data forever.

There's Siacoin, a cryptocurrency/blockchain built around the idea of decentralized encrypted p2p storage.

They're dirt cheap to store as of now: median contract price is $12/TB·mo, but network storage utilization is currenly only 2%, so actual deals settle on about $2/TB·mo. Downside is that exchange rate of their coin is highly volatile, at least was during last month.



http://siapulse.com/page/network (Prices tab)

Do these decentralized storage networks provide any guarantees in terms of durability, redundancy and availability? I've been looking into Siacoin, Filecoin, Storj and the like, but lack of clarity around some important concerns have so far prevented me from taking them seriously as a backup solution:

1. Performing a restore in a timely fashion on a large dataset seems like a tall order if these networks don't impose any minimums for the upstream bandwidth of the hosts.

2. Files can completely disappear from the network if the machines that are hosting them happen to go dark for whatever reason, which seems to be a much more likely occurrence for some random schnub hosting files for beer money than it would be for traditional storage providers that have SLAs and reputations to uphold.

Maybe these concerns are unfounded, and some or all of these networks already have measures in place to address them? I'd appreciate it if someone more familiar with these networks could enlighten me if that's the case.

In addition to redundancy, Sia has the concept of collateral, which is basically money locked in a smart contract that says "I'm willing to bet this money that I'm not going to lose your files". I.e. Hosts lose the money if they fail to store your files.

Different hosts have different amount of collateral, and it's both an important security measure as well as market mechanism.

Also, Sia is completely decentralized (unlike StorJ for example), so it can't be intervened with by anyone which might result in lost files.

Speaking as a Sia developer, I can address your concerns.

> these networks don't impose any minimums for the upstream bandwidth of the hosts.

Sia today primarily handles that through gross redundancy. If you are using the default installation, you're going to be putting your files on 50 hosts. A typical host selection is going to include at least a few sitting on large pipes. Downloads on Sia today typically run at about 80mbps. (the graph is really spiky though, it'll spike between about 40mbps and 300mbps).

We have updates in the pipeline that will allow you to speedtest hosts before signing up with them, and will allow you to continually monitor their performance over time. If they cease to be fast enough for your specific needs, you'll drop them in favor of a new host. ETA on that is probably ~August.

> Files can completely disappear from the network if the machines that are hosting them happen to go dark for whatever reason

We take host quality very seriously, and it's one of the reasons that our network has 300 hosts while our competitors are reporting something like 20,000 hosts. To be a host on Sia, you have to put up your own money as collateral. You have to go through this long setup process, and there are several features that renters will check for to make sure that you are maintaining your host well and being serious about hosting. Someone who just sets Sia up out of their house and then doesn't maintain it is going to have a very poor score and isn't going to be selected as a host for the most part.

Every time someone puts data on your machine, you have to put up some of your own money as collateral. If you go dark, that money is forfeit. This scares away a lot of hosts, but that's absolutely fine with us. If you aren't that serious about hosting we don't want you on our network.

> but lack of clarity around some important concerns have so far prevented me from taking them seriously

We are in the middle of a re-branding that we hope introduces more clarity around this type of stuff as it relates to our network.

This is the one I've got my eye on - once the marketplace boots up on both sides, it's going to be hard to compete against it. I suspect some day even the big providers like Amazon and Google will sell into these kinds of marketplaces.

I'm calling it, it's not gonna happen.

For data storage, you need error encoding. Sia does that, but you pay for it. So for 1TB of data, you upload 2TB to the network (that's how Sia is configured) and at the current $2.02/TB per month, that's $4.04/TB, which is more expensive than Glacier. Glacier charges funny for downloads but Sia charges for downloads too.

I assume that if you wanted to store ~2.5TB like we're talking about, you'd be paying more than $4/TB, because 2.5TB is 10% of the total of all data currently stored in Sia, currently 24.5 TB. (By comparison the major cloud providers are undoubtedly in the exabyte range of actual data stored. Or for another comparison, you could comfortably hold 24.5 TB of storage media in one hand.)

Sia promises to be cheap because you're using unused bytes in hard drives that people already bought, but that's exactly what Amazon, Google, and Microsoft are already doing, except their data centers are built in places where the electricity costs less than what you're paying. Plus they don't charge you extra for data redundancy.

In that case, Sia provides an avenue for an new company with access to cheap electricity to compete with Amazon, Google, and Microsoft without investing a cent in marketing or product. They will just plug in and start receiving payments, and strengthen the network and lower the price in the process.

Another cool thing is Sia lets hosters set their storage and bandwidth prices, so specialized hosts will likely pop up. For example one host might use tape drives, set cheap storage cost and expensive bandwidth cost. Clients can prioritize as desired. SSD servers with good peering can do the opposite.

The real interesting part will be when you can create one-time-use URLs to pass out, which connect directly to the network - effectively turning it into a distributed CDN.

The $2 / TB / Mo we've traditionally advertised as our price included 3x redundancy. The math we've done on reliability suggests that really you only need about 1.5x redundancy once you are using 96 hosts for storage.

The network prices today are less friendly, though that's primarily due to market confusion. The siacoin price has doubled 6 times in 6 months, and there's no mechanic to automatically re-adjust host prices and the coin price moves around. So hosts are all currently advertising storage at these hugely inflated rates, and newcomers to Sia don't realize that these aren't really competitive prices.

Though, I will assert that even at our current prices it's not price that's the primary barrier to adoption. It's some combination of usability, and uncertainty. Sia is pretty hard to set up (it's around 8 steps, with two of those steps taking over an hour to complete), and a lot of people are not certain that Sia is truly stable enough to hold their data.

We're focused on addressing these issues.

You can't compare to Glacier. S3 is a more comparable product. And obviously redundancy is already in the price, or did you think there's no redundancy?

From what I understand, your client does the error encoding and pays for raw data storage on the network, rather than trusting the network to do error encoding. You can configure the encoding to whatever you want, you just end up paying more for more redundant encodings.

Isn't this exactly what Pied Piper gets used for in later seasons?

Currently trying Backblaze: https://www.backblaze.com/b2/cloud-storage.html. Overall fits my needs.

I used Backblaze for several years before closing out my account in 2012.

Initial backup took a long time. There was no easy way to prioritize, for example, my photos over system files. I ended up manually prioritizing by disallowing pretty much my entire filesystem, and gradually allowing folders to sync. First, photos, then documents, then music, etc.

Eventually it all got synced up and it was trouble-free... until I tried to get my data back out.

The short version of the story is that a power surge fried my local system. I bought a new one and had some stress when it appeared the BB client was going to sync my empty filesystem (processing it as a mass delete of my files). I managed to disable the sync in time.

Then I discovered there was no way to set the local BB client to pull my files back down. Instead, I had use their web-based file manager to browse all my folders and mark what I wanted to download. BB would then zip-archive that stuff which would then only be available as an http download. There was no Rsync, no torrent, no recovery if the download failed halfway, and no way to keep track of what I had recently downloaded. Also, iirc, they were limited to a couple of GB in size per file. (which didn't matter because at that time, the download would always fail it the file was larger than __MB (I don't remember the exact number. 100MB? 300? Also hazy on the official zipfile size limit)

So I had to carefully chunk up my filesystem for download because the only other option BB offered was to buy a pre-filled harddrive from them (that they would ship to me).

I felt like Backblaze was going out of their way to make it hard for me in order to sell me that harddrive of my data. I felt angry about that and stubbornly downloaded my data one miserable zipfile at a time until I had everything.

Once I was reasonably sure I had everything I cared about, I closed my account and haven't looked back.

[Edit to add] This was at least 5 years ago. No doubt their service has improved since then.

I would think that for a full restore you might be better off with their restore by mail. Note that if you move your data off the drive they ship then send it back they refund the charge for it.

I use Backblaze but haven't had to do a restore yet. It appears their current limits are 500gb per zip file. They also have a "BackBlaze Downloader" utility (Mac & Windows) that has the ability to resume interrupted downloads.


It looks like styx31 linked to B2 which is a separate service from their backup service that's closer to S3 or Google Cloud Storage. With that you can use rclone which should avoid the issues you encountered, though at higher cost if you have a lot of data (there are per-GB storage and download fees).

My restore experience was also poor with Backblaze. The download speed was slow. If I had to restore an entire drive it would have take me many days to download the entire thing.

I switched to Arq with Amazon Drive as the storage backend.

Well you get to choose cheap backup vs expensive restore. Better than impossible restore (that is if you don't do backups)

I'm in the process of uploading 2TB of backups to B2, it's ridiculously cheap, and they don't charge for upload BW, just storage and download

You could try out one of these new storage cryptocurrencies:





I haven't used them myself so I can't vouch for the UX or quality, but they should be able to offer pretty low prices.

I feel like a luddite but I have three backups at home (PC HD, 2 rsync'd USB drives I bought several years ago) and one off-site backup (encrypted HD in locker at work). Far cheaper afaict than any cloud backup.

I think this is a good basic and relatively low-tech strategy.

Do you do versioning? As in what happens if your files are silently corrupted e.g. by accident or by malware? Rsync would overwrite your files, and you might even overwrite your off-site backup when you connect it.

My main reason for going beyond such a set-up though is that it takes time, effort and remembering to sync the off-site backup by taking it home, syncing and putting it back. And during that time all your data is in the same place. If something happens to your home during that time (break-in, flooding, fire...) you're out of luck. Unless your rsync'd drives are also encrypted and you just switch one of them with the off-site one for rotation.

One of my backups is 'add only'

The really key stuff is in git repos.

Most of the data (films, mostly) I could stand to lose.

My offsite backup is likewise an encrypted disk stored at a friend's house, and vice versa. After the initial hardware puchase cost it's free.

And, based on my experience, generally horrifically out of date.

* cheap * capacity * convenient

choose 2

I have a Raspberry Pi at my parents home (which has r/w to the disk attached to my fathers Airport extreme), it rsyncs every night with the server in my basement (which has all my data on 2 disks.) It also syncs my parents data back to me. It works well but I still need to add a feature to email me if syncing somehow halts or errors out. I use "rsync -av" (over SSH), so nothing is ever deleted.

> so nothing is ever deleted.

It could be overwritten though. A good backup protects you from more than just destruction at the primary site. There are various relatively efficient ways to arrange snapshots when using rsync as your backup tool.

Also, remember to explicitly test your backups occasionally, preferably with some sort of automation because you will forget to do it manually, so detect unexpected problems (maybe the drive(s)/filesystem in the backup device are slowly going bad but in a way that only affect older data and don;t stop new changes being pushed in).

Versioning backups seems like a must. Encrypting malware is a thing and has been for a while, just like rm -rf type mistakes which are subsequently propagated automatically to "backups".

Another thing that I do with my backups is making it so that the main machine can't access the backups directly and vice-versa. It is slightly more faf to setup, adds points of failure (though automated testing is still possible), and is a little more expensive (you need one extra host) but to significantly so.

My "live" machines push data to an intermediate machine, the the backup locations pull data from there. This means that the is no one machine/account that can authenticate against everything. Sending information back for testing purposes (a recursive directory listing normally, a listing with full hashes once a month, which in each case gets compared to the live data and differences flagged for inspection) is the same in reverse.

This way a successful attack on my live machines can't be used to attack the backups and vice-versa. To take everything you need to hack into all three hosts separately.

Of course as with all security systems, safe+reliable+secure+convenient storage of credentials is the next problem...

This is especially true with Ransomware type attacks that encrypt/corrupt data. Having a backup of unusable files isn't doing anyone any good.

Crashplan isn't incompatible with NAS. You can either mount a share and run it from your workstation, or run it directly on the NAS itself. The core of the product is Java so it runs on just about any architecture to boot.

Coming from someone who tried to do this setup, it wasn't worth it. CrashPlan's client isn't something you generally would want to run on your NAS, it takes memory proportionate to the amount of data on your disk (and a fair amount of RAM, at that) and unless you're running an GUI on your NAS it's impossible to configure without a huge headache.

You can run it from your workstation, but if you've got a reasonable amount of data on your NAS then the memory issues will bite you again. Something like Backblaze B2 is more expensive, but I'd rather pay $10/mo to backup the 2TB of data on my NAS (growing every day) and use CrashPlan to backup my computers only.

> CrashPlan's client isn't something you generally would want to run on your NAS, it takes memory proportionate to the amount of data on your disk (and a fair amount of RAM, at that) and unless you're running an GUI on your NAS it's impossible to configure without a huge headache.

CrashPlan's client is able to attach to a headless instance [1], but the RAM requirement does mean that it's only really usable on NASes with expandable RAM.

[1] https://support.code42.com/CrashPlan/4/Configuring/Use_Crash...


I used Crashplan for 3 years on a Synology NAS. It's a disaster. Every time there was a Synology upgrade, the CP headless server would stop working, and you'd need to reinstall, re-set the keys, etc.

After 10 ou 15 times doing this, I got rid of Crashplan entirely, migrated my backups to Amazon Drive, and never looked back.

Given the lack of decent options, seems the best choice will really be to pony up the $180 for 3TB that Amazon will start charging next year...

If you were paying $60/year for 2-3T of cloud storage then Amazon was subsidizing you. Even Glacier would cost $120/year for 2.5T, and Glacier is so cheap that everyone's trying to figure out how they could possibly sell Glacier and still be making money.

>Even Glacier would cost $120/year for 2.5G

$120/year for 2.5T.


Yes, thank you for pointing out the typo.

Isn't the catch that you have to pay to get your data out of there?

Why is CrashPlan incompatible with NAS? I am running it on a headless Ubuntu server and it works just fine (you just need about 1GB of RAM for every TB of storage).

And if you happen to be running FreeNAS, there is even a plugin available via the GUI (same RAM rules apply).

Same for Synology.

Have you upgraded your Synology OS lately? For 3 years, every time I did it, the headless CP server would stop working.

I don't actually have Synology. A friend of mine does and he runs CrashPlan on it.

If it's just backup, and it's from a single computer (with potentially multiple external harddrives), then maybe BackBlaze: https://www.backblaze.com/cloud-backup.html

Backblaze (and others) don't support backing up from a NAS, which, for a family, is impractical.

But isn't that with B2 pricing, not the $5/month unlimited pricing?

Sure, but B2's pricing isn't too expensive anyway. If I had all 7TB of usable space filled up on my NAS it'd cost me $35/mo - that's easily doable, even for a digital packrat like myself.

That's $420 a year, which is well over what the grand-grand-*-poster of this sidethread mentioned was too much.

Even for his smaller data size of 3TB it still works out to $180 a year which is the same as what he'd have to pay Amazon.

For Google storage - you can get GSuite (https://gsuite.google.com/pricing.html) - $10 a user a month, for unlimited storage via Google Drive.

You can then mount the drive using DriveFS:


It's basically a FUSE filesystem built on top of Google Drive.

Alternatively, you can use the Drive Sync Client, if you want to just sync stuff back and forth (without a virtual FS).

The Glacier storage class on S3 would probably better if you like Amazon and are okay with Glacier's price. Backblaze's B2 is pretty cheap too, and has a nice API.

Isn't Glacier separate from S3?

I'm using Resilio Sync (formerly BitTorrent Sync) which makes like a private cloud dropbox between my machines/phone. No versioning, but pretty solid.

Google Cloud Nearline storage is rather cheap, doesn't have as much limitations as Glacier and is AWS API compatible so NAS backup software works with it.

Re Crashplan & NAS... I've managed to get NAS back up to work. Are you certain on this point? I am going to double check my set up.

I have the MacOS CrashPlan client configured to back up a variety of NAS shares when the NAS is powered on and the share is mounted. Only about 4 shares, and I made a point to mount them and leave them mounted until the sync completed.

The shares are cold storage, so once synced, they stay virtually unchanged.

maybe use https://camlistore.org/ on GCE?

For fun, I just checked out Google Cloud Platform. 1 TB of regional storage costs $20.00 a month not including bandwidth which could be huge.

1 TB of egress bandwidth is $120.00 a month.

Regional storage is inappropriate for backup. On GCS, backup should be nearline or coldline, depending on how long you think it will be there.

Presumably you'd pay the $120 bandwidth fee seldom or never.

Ok, Google Storage Nearline is still $10 a month for 1 TB. That's $120 a year vs $59.99 a year for Amazon Drive not including Google bandwidth which could be significant.

I feel like the comment about bandwidth got ignored—you only pay for egress bandwidth, which basically means you only are paying high bandwidth fees if you lost all of your data and it's an emergency, at which point they seem pretty reasonable because you just lost your house in a fire or something like that. Uploading is free (well, you pay your ISP).

Most of the time, people only need to restore a few files from backup because they were accidentally deleted. The bandwidth costs for a few GB here and there are pretty cheap.

I've been thinking that a p2p backup solution (encrypted storage, storage cryptocurrency, occasional random requests to make sure they're still around) would work. I guess these guys: https://storj.io/. $15/TB of storage, $50/TB of bandwidth. and competitors: https://news.ycombinator.com/item?id=13723722

Where are you getting 240 for google drive? It's only costing me 120/year. (Well technically through gsuite)

$19.99 per month for 2TB plan [1].

[1] https://www.google.com/settings/storage

You can buy a GSuite plan (https://gsuite.google.com/pricing.html) - which is $10 a user a month, for unlimited storage.

Make a tool that makes photo files out of data files. In its simplest form you just need to make the according header files (added benefit: additional meta data can be easily encoded). Because as a Amazon Prime photo storage is for free and unlimited.

Could be a nice hack.

They could be reencoding, witch will destroy the data.

Use forward error correction.

GCE's Nearline maybe?

Backblaze b2?

I was just reading on reddit how some users were uploading petabytes of data to it. I am an ACD user, but I can't blame them for stopping that.

I was always amused when I warned people on /r/datahoarder against abusing the service because Amazon would inevitably put an end to it. I was always told that I had no idea what I was talking about and was given many rationalizations about why Amazon wouldn't care about users storing dozens or hundreds of TB of files on the service.

Not entirely fair, there are whole communities around storing 100s of TB on amazon

It is fair from the moment you offer an unlimited plan and even more fair when you make a service of it and charge for it.

Customers are customers, not product managers. It is only natural to make use of a service you pay for.

Indeed, and it's in their right to stop offering that when the period you payed for ends.

Its understandable from their point of view to offer unlimited and be awesome but not expect this kind of usage that is not sustainable. So they made a mistake and are correcting it.

It's hard to see it as a deliberate strategy to pull in users and then charge them more when they are "locked in"

> So they made a mistake and are correcting it.

Do they also refund people for their time wasted assuming this was a sustainable service, or does this "correction" only work in one direction?

I'm not even mad. /r/datahoarders brought this on themselves. Who in their right mind expects to upload 100s of TBs of data, encrypted and pay 59.99?

People who understand the literal meaning of the word "unlimited"?

Yes, Amazon is to blame here as well. They shouldn't have offered unlimited service.

At the same time, I don't get, why would you encrypt your "Linux ISO's"? Let the AWS dedup do its job, don't abuse it, and everyone is happy.

I don't get why if they don't mean unlimited just say up to 20TB/mo.

Possibly because there actually wasn't any limit. Maybe if a handful people were exceeding $LOTS TB, they don't care, but if 60% of users exceed $LOTS TB, the service becomes unsustainable. In this case, the service really is unlimited (there genuinely is no limit that you're not allowed to go over), and if you wanted that effect, advertising a limit would be net negative — a high limit would encourage the "too many users use a lot" case and lead to the same result we get now where the plan has to be canceled for unsustainability, and a low limit would defeat the purpose.

Because it isn't linux iso's.

were they being serious, I can't tell.


> At the same time, I don't get, why would you encrypt your "Linux ISO's"? Let the AWS dedup do its job, don't abuse it, and everyone is happy.

Because if you are a self-proclaimed data hoarder, do you have the time to sort through and selectively classify your hoard to "encrypt this ISO don't encrypt that tarball" on a file-by-file basis across many terabytes?

How much would be saved by deduping anyway? If they're not deliberately making it easy/redundant, even if you got 300TB down to 100TB or such, a single order-of-magnitude reduction doesn't fundamentally change the economics of "unlimited."

Blame data hoarders, but don't blame encryption.

I store a bit of data at home (only ~20TB). Really easy to sort. There are plenty of apps that do it for you. This extension with those keywords in filename goes to this directory. Others to another dirs.

I only have my pictures and personal data in AWS cloud, encrypted. They way I set it up? Point rclone to relevant directories and skip the rest.

Except Amazon revoked rcloud's key a while back.

Any recommendations on the "plenty of apps" that sort your data for easy searching?

As someone completely unfamiliar with this space, this prompted me to do some reading into this rclone issue. I'll record it here for anyone else similarly curious.

It seems that as of a few months ago, two popular (unofficial) command line clients for ACD (Amazon Cloud drive) were acd-cli[1] and rclone[2], both of which are open source. Importantly the ACD API is OAuth based, and these two programs took different approaches to managing their OAuth app credentials. acd-cli's author provided an app on GCE that managed the app credentials and performed the auth. rclone on the other hand embedded the credentials into their source, and did the oauth dance through a local server.

On April 15th someone reported an issue on acd-cli titled "Not my file"[3] in a user alleged that they had received someone else's file from using the tool. The author refered them to amazon support. The issue was updated again on May 13th with another user that had the same problem - this time with better documentation. The user reached out to security@amazon.com to report the issue.

Amazon's security team determined that their system was not at fault, but pointed out a race condition in the source for the acd-cli auth server (sharing the auth state in a global variable between requests...) and disabled the acd-cli app access to protect customers.[4]

In response to this banning, one user suggested that a workaround to get acd-cli working again would be to use the developer option for local oauth dance, and use rclone's credentials (from the public rclone source).[5] This got rclone's credentials banned as well,[6] presumably when the amazon team noticed that they were publicly available.

To top this all off, the ACD team also closed down API registration for new apps around this time (which seems to have already been a strenuous process). I suppose the moral of the story is that OAuth is hard.

[1]: https://github.com/yadayada/acd_cli [2]: https://github.com/ncw/rclone [3]: https://github.com/yadayada/acd_cli/issues/549 [4]: https://github.com/yadayada/acd_cli/pull/562#issuecomment-30... [5]: https://github.com/yadayada/acd_cli/pull/562#issuecomment-30... [6]: https://forum.rclone.org/t/rclone-has-been-banned-from-amazo...

I hope this (and the many more examples) put a stop to this "unlimited" bs. You can't say people were abusing a service that throws that keyword for marketing reasons.

That is very selective of them. While their marketing materials said "unlimited", people chose to ignore the ToS which stated that they wouldn't tolerate abuse and that abuse was basically whatever they determined it to be.

One guy in particular admitted to having​ 1PB stored. People like him fucked the rest of us over.

Yes.. but them not having an upper limit doomed "the rest of you" from the beginning. Is anyone surprised some would do that? Is Amazon? Should they be? Of course not..

Ouch, reading those comments, even by the OP... the writing was on the wall then even

Who is now at 1.5PB, while someone else replies to him who has a 1.4PB flair, and another has a 1.1PB flair...

Looks like it's really the plex people to blame. They were hosting tons of TBs of pirated movies/tv shows.

And why is that a problem? Copyright is theft.

Corporations see "complicity in an illegal act" as a negative utility far larger than the ultimate lifetime value of any single customer. So, when you do something illegal (even if for dumb reasons) and use a corporate service to do so, you've got to expect that said corporation will immediately try to distance themselves from complicity in that act by terminating your account with them. This is one of those "inherent in the structure of the free market" things.

Why? Isn't this like a private storage? Unless people are sharing the files, why should Amazon care what's in the files?

So, first of all I think you're focusing on the wrong thing.

The whole point of an unlimited tier is to attract large numbers of outsiders who don't want the cognitive burden of figuring out $/GB/month and estimating how many GB photos they'll need to store.

What we're talking about here is that they got some customers like that, but they also got a small number of customers taking them for a ride, call them 'power users' the kind of customers who (as we see elsewhere in these comments) won't stick around if the price changes.

There's nothing wrong with these power users storing huge amounts of data at subsidised price, just like there's nothing wrong with Amazon changing the pricing. They just decided to stop subsidising that behaviour and probably take a slight hit on a conversion rate somewhere.

As for your question about 'private' storage, it's a grey area. Privacy isn't absolute, especially in cases where a company is by inaction helping you breaking the law (whether you agree with the law or not). Companies work very hard to distance themselves from responsibility for their customers actions and don't want to jeopardise that by letting it get out of hand

> Privacy isn't absolute, especially in cases where a company is by inaction helping you breaking the law (whether you agree with the law or not). Companies work very hard to distance themselves from responsibility for their customers actions and don't want to jeopardise that by letting it get out of hand

How does this work with Google Play Music (you can upload up to 50k songs for free and listen to it "on the cloud")?

I think you are focusing on the wrong thing. Corporations don't care about the law any more than individuals do. Laws and regulations are just guidelines if you are determined enough to get your way. Look at all the Uber stories. Pretty sure people here still like Travis for his tenacity no matter what you say about his morality.

I think we often forget that humans wrote the laws we have today. They didn't come to us in stone tablets down the mountain top. At the end of the day, these laws don't matter. They are not written in stone so as to speak. We should always strive to do better. Intellectual property is a sham. I mean think about it. I think there is legitimate intellectual property, the trademark.

I think it is wrong for me to sell "Microsoft Windows" (even if I wasn't charging any money) if I had modified the software and added malware into it. But me watching a movie or reading a book without paying royalties does not hurt anyone.

Please think about it. Just because something is legal does not make it right and just because something is illegal does not make it wrong. We need to calibrate our laws based on our image and not the other way round. We write the laws. The laws don't write us.

> Corporations don't care about the law any more than individuals do.

I'm struggling to find a connection between the points that I made in my comment and the points in your reply. Suspect we have some miscommunication here... my own comment wasn't spectacularly well filtered.

I'll bite on these though;

> Laws and regulations are just guidelines if you are determined enough to get your way. Look at all the Uber stories.

Don't conflate civil or criminal law with the work of regulatory bodies, who in my experience with the FCA and OFT are very open and collaborative without any need for "tenacity".

Uber work very hard on marketing and competition, but they are allowed to succeed to regulators who WANT them to succeed despite their amoral hussle, not because of it. Regulators in my experience (the FCA and OFT specifically) are very open and collaborative. They understand that markets move on and regulations sometimes lead and sometimes follow.

> Please think about it. Just because something is legal does not make it right...

So, I'm assuming from this comment that you're quite young. Just for you information; I suspect most folks on HN are already aware of the delta between legality and morality.

I'd also recommend thinking about the subjective nature of morality, and the causes and malleable nature of it.

Hi. Out of interest, what is it that you do in your life to generate income / money for food?

I am a programmer (:

I call it the Dropbox Mantra.

Amazon recently disabled acd-cli and rclone from accessing their services "for security reasons." I see that acd-cli is back while rclone remains effectively banned. Acd-cli and rclone truly had poor implementations. Though, the timing is suspicious for them not to allow rclone again if they implemented the service securely again.

My guess is that Amazon had more datahoarders than average-joe users and so the low-volume users didn't outweigh or pay for the heavy users like they originally estimated when they set the price for the service. It was good while it lasted.

> My guess is that Amazon had more datahoarders than average-joe users

I would further speculate that plex users were the largest single group of offenders. seemed like a cat and mouse game for a while -- amazon started comparing hashes of files to known bootlegs and banning accounts, so everyone started using encfs, and later migrated to the unlimited plan from google apps. I guess google's the only game in town, now.

I don't think Plex ever really worked with amazon. Plex advertised it at first but it didn't end up working out, sadly. See https://techcrunch.com/2016/12/02/amazon-isnt-playing-nice-w...

this is incorrect, plex works fine with amazon.[1][2][more on request]

the issues being described in the link you posted may refer to early-release bugs, users who were getting throttled or nuked due to the anti-piracy efforts I referred to earlier, or any number of things (certain kinds of transcoding, maybe?) -- it's a pretty vague article! but plex and amazon can definitely be integrated.

1. https://amc.ovh/2015/08/13/infinite-media-server.html

2. https://www.reddit.com/r/PleX/comments/58uhmo/guide_to_using...

Those links are from a while ago. You linked to a FUSE solution, which isn't first-class and you need a computer for it. You're right, it probably would "work" with that, but I would say that's an outlier solution.

It doesn't work with plex's cloud feature: https://www.plex.tv/features/cloud/ - they removed it from the list there. This would have gotten a lot more users

that is correct, plex cloud is not the preferred solution for data hoarders using plex. plex cloud only supported amazon drive until jan 1st of this year[1] and all existing accounts were grandfathered in. that was a pretty short period of time but probably enough to create a problem. plex cloud is beside the point, though, because it isn't what people use for this. they used amazon cloud drive with fuse.

regarding fuse:

> you need a computer for it

well, not really, only to the extent that a VPS is a computer.

> I would say that's an outlier solution

an outlier solution for an outlier problem (can we agree to call that people storing 100+ TB of files?). except the problem was seemingly large enough that they had to get rid of it, so maybe it's not fair to call it an outlier. I don't think "first class" is a concern for people with such ridiculous amounts of data. plex cloud just makes things simpler, but running plex on a VPS takes two commands and there are some pretty detailed guides out there for people who don't know what ssh or digitalocean is. it's at a point now where there's even a platform for automating this stuff, complete with fancy dashboard etc.[2] needing to use things like fuse and encfs is hardly a barrier.

people talk about this stuff a lot more publicly than I would have thought, in places like /r/datahoarders, /r/plex, as well as the lowendbox, quickbox, and torrent tracker forums.

1. https://support.plex.tv/hc/en-us/articles/203082447-Supporte...

2. https://quickbox.io/

Yeah VPS ia computer, plex cloud doesn't require that. But that doesn't matter, I think we're both correct, although I don't think as many people run VPS's as you seem to think for this. Not saying it's not easy, it' s just not the usual way I've seen people use plex. But it doesn't matter, really.

Thanks for the links, quickbox does look neat. I've been looking to get a media server for my plex stuff and it seems to support a lot.

> [...] everyone started using encfs

Interesting. I wonder if this indirectly expedited the price hike? Encryption would make it (practically) impossible for Amazon to deduplicate people's data and store it more cost efficiently.

As anectodal information: I know a community of people who share movies. In the past few months, abusing ACD to move their personal, encrypted movie collections (which go up to many terabytes) was definitely a fad. There was automation written to allow for streaming movies directly from ACD. A popular resource point was https://www.reddit.com/r/PlexACD/comments/6bmt9s/a_framework...

As long as Amazon's API could allow for accessing the storage space in a drive-like manner, they were open for abuse. For me the writing was on the wall.

There was a thread recently around how acd-cli users were gaining access to other peoples' data - it was a legit concern, I think.

Amazon was completely right for banning acd_cli.

What they were doing is using a server in the middle for auth. But oh god the server started handing out the wrong keys so you might end up having total control of another acd_cli users account.

And when acd_cli got banned they (not acd_cli themself but users that modified the source) just used rclones app ID and keys to trick Amazon into thinking acd_cli was rclone. Then rclone got banned.

They (amazon) did say the problem with rclone was the keys were public since it is open source.

Ooops, missed this thread. I did a little mini-writeup of what happened on a different branch if anyone is curious on the details: https://news.ycombinator.com/item?id=14512598

Yea it's been great for my family. We've been backing up family photos and movies to it with acdcli. Close to 1TB of all that content there, so we're fine for now but I'm probably going to end up doing something else for the longer term.

I'm a customer and they didn't even send out an email about it. And no grace period? Effective today and you'll loose your data if you don't pay up?

This should be illegal, I'm never trusting Amazon with my business again.

You don't lose your data, you just lose the ability to upload more. You can still view, download, and delete your data.

8) What happens to my content if I choose not to renew into one of the new storage plans?

When your paid storage subscription expires, your account will be considered in an over-quota status if your content stored is greater than the free storage quota on your account. If your account is in an over-quota status, you will not be able to upload additional files, and can only view, download, and delete content.

And the very next paragraph

  You have a 180-day grace period
  to either delete content to
  bring your total content
  within the free quota, or to
  sign up for a paid storage plan.
  After 180 days in an over-quota
  status, content will be deleted
  (starting with the most recent
  uploads first) until your
  account is no longer over quota
So if you have more than 30TB stored (the new maximum), you have 189 days after your current subscription expires to get it off before it's auto deleted.

Out of curiosity, what would you expect them to do with your data if you don't have a subscription? Keep it forever?

being that you put that data there with the understanding that you could keep it forever - yes...

You expect a company to continue to provide a service for you after you stop paying them?

I expect a company to abide by the terms of the deal that I entered into, not to change the deal. Pray that they do not alter it further.

The deal that you entered into had a set duration, and it was only binding for a set period of time for which you paid for. Once the money/time runs out, you'll need to renegotiate a deal, which Amazon has done.

From amazon's release: "Current customers will keep their existing unlimited storage plan through its expiration date."

Which means they are honoring the deal and not altering it and are abiding by the terms agreed to.

Six months is a generous time limit; I can think of dozens of services I've used in the past who changed their terms or went out of business, and the customer was given 30 days or less to migrate.

I'm not trying to be an Amazon apologist, just being realistic.

Ah, missed that part, maybe I flew off the handle a little then. Still really scummy not to alert their customers about a big change like this.

Looks like I have 180 days to move before getting deleted, guess I'd better pray they don't decide to cap download speeds...

I don't think you should worry about overreacting. I'm working on an e2e encrypted sync/backup service right now and I would never dream about pulling off such a move. But I'm a pussy and not a "fierce businessperson". Long story short, Amazon can afford to not give a fuck about losing the trust of the users of one of their small sattelite services, maybe you should consider a business whose trustworthiness affects their bottom line.

180 days to move after your 1 year plan expired. If you signed up for your plan 364 days ago, then that's really unfortunate timing. Otherwise, you have more than 180 days.

I also didn't get any e-mail. I recently started using it as my backup solution with duplicity. Feels good not to be worried about TBs stored.

I thought that while random backup company may not understand full implications of offering a free unlimited storage, Amazon is the one that should, and that worst case I'll get my upload throttled.

Well I guess it was a bit too good to be true.

I'm an unlimited storage customer and I just found out about this change here on HN, not happy about that..

I will say that even at $60/yr for 1TB and unlimited photos, its still a nice deal.

I'm glad that this happened in my trial period. I haven't yet spent the time making this a core part of my backup strategy.

I posted this link because I got an OSX notification through their Mac app, oddly enough.

>> Do Prime members still get unlimited photo storage? > Yes.

Heh. Anyone tried writing an encoder to make files look like photos, and upload them to any of these "unlimited photo" services. I am sure they are probably watching for it and will close the account.

Then what if it there is a way to gradually mix in data with actual photos, a bit like in steganography. But more aggressive. And create what still looks photos, just like really noisy.

Dropbox did a thing a few years back where they gave you permanent additional storage for uploading photos. A jpeg header followed by gigabytes of zeroes supposedly worked just fine.

This is very easy to catch, though. Abusing comment fields might be more interesting. Use a small valid image, and insert 4MB of data into a comment field. It would be a fun project to implement this as a FUSE file system. I'm not sure whether any of the popular formats support arbitrary-length comments, and whether or not Amazon strips such data (it's image storage, after all, not file storage).

Already exists, see StegFS: https://albinoloverats.net/projects/stegfs

I wouldn't use steganography (it's not normally very space-efficient), just dump the payload data into a PNG's comment field. You can do this with exiftool "-Comment<=/path/to/inputfile" dummy.png and re-extract it with exiftool -b -Comment dummy.png > recovered_file. Of course you'd use libpng directly in a FUSE filesystem, but this shows that it's easy enough. Use a 20kB cat picture and 2 MB payload and you achieve 99% space efficiency.

It's probably inconvenient enough for most data hoarders for Amazon not to care. You can still switch to encoding it in the pixel values in case they do. I'm not familiar enough with PNG to comment on the space-efficiency of that.

Check out OpenPuff (uses the open source libObfuscate) and StegoShare (open source):



I don't think they are watching for it, it is too uncommon and inefficient.

Heh, back when Gmail was new and outrageous, someone wrote a GmailFS script or plugin or something, letting you store your files in a Gmail account. I guess it was one of the first cloud storage systems.

I wonder if it would be possible with machine learning to find out how data can be recovered after re-encoding an image using different encoders. This would ensure no loss of data. Least significant bit stenography must be very fragile.

Some other methods are listed here: https://en.wikipedia.org/wiki/Steganography_tools#Carrier_en...

To link someone else's project: https://github.com/vgel/flickr-fuse "Flickr-fs is a fuse filesystem that allows you to upload your files to flickr as encoded .pngs"

Although it does come with a fair warning that ...

"It's also really slow and liable to explode at any moment, so don't seriously use it."

I find it highly amusing though

not sure if anyone ever formalized it in some sort of library, but yes, it's been done! people were using flickr for storage years ago.[1] I have no idea if it's still possible.


The 'Cloud storage' business model is simply a) get all your data b) make it hard to move it elsewhere c) profit!

Operating a large storage infrastructure is not free of course so it almost cannot be any other way unless everyone decided all at once "hey I'm going to charge what I need to stay in business forever right from the start." Which no one ever does because who signs up for such an expensive cloud storage plan where there are so many cheaper ones to choose from?

I pitched a psuedo peer to peer storage plan to some investors once (feel free to pick up the torch and keep running :-) where the company would 'sell' a NAS box to customers that they put on their network and the Internet. 50% of the available storage would be theirs to use, 50% would be used by people off site. The NAS box would encrypt peer-to-peer erasure coded copies of the local storage. If you're allocation of storage was 10TB you "got" 10TB of which the most up to date copy was on your 'home' NAS but it kept the 'cloud view' consistent within a few seconds if you had a decent network.

There was a variant where you took less than 50% of the storage and the company would sell the extra to people on a subscription basis and offset the cost of the NAS appliance you had in your house/apt whatever. The 'key value' of the company was this virtual datacenter where all of its gear was distributed amongst all of these individual installations. That needed some interesting capabilities like ship on warn replacement drives to owners etc.

As the available bandwidth to individual houses increases it gets to be a better idea.

Symform is the company that did this and I loved their model. Sadly, they went out of business. The idea of backing up your data across 38 peers, encrypted, instead of a central cloud storage service was enticing. Especially nice was that you didn't need to pay because you "paid" by also donating local storage. It's a great idea that hasn't taken off yet, not in any way comparable to Google Drive, OneDrive, Dropbox, and others. It's frustrating because "paying" for cloud storage by giving up unused local storage seems like a great alternative.

Well, people might be interested to know that one person has managed to use 1.73 PB+ of the unlimited plan. Looks like that's the highest anyone's managed as far as I know.


Check out the first comment

It's bait and switch. I know in /r/DataHoarder some people have uploaded (allegedly) 1PB of data, which is excessive. However given the growth of 4K video 1TB as a basis does seem low when the whole selling point of ACD was unlimited storage.

Also very unimpressed that they banned rclone's usage of the ACD API and then flat out refused to re-allow the app if the developer changed to not include the secret key in the source code.

Unlimited is a disaster because of media horders, but 1TB is slightly too small for me (I'm a serious photographer off hours and work from RAW files). I would love a metered plan, a la S3 but for consumers, that comes out to about $60/y for 2TB for moderate data transfer rates. Then, at the very least, Amazon can do power law scaling for higher users (3TB meters out to $70/y, but 20TB with high transfer comes out to $5000+/year). Then they can slide the metering as storage gets incrementally cheaper. In fact, I would pay $120-$200/y for 2TB if they had a well-supported Linux client.

> Unlimited is a disaster because of media horders, but 1TB is slightly too small for me (I'm a serious photographer off hours and work from RAW files).

Amazon Cloud Drive's (still-available) Unlimited Photos plan includes RAW files.

Thanks for the tip!

Dropbox Business does 2TB for $12.50/user/month and does have a Linux client, right?

Dropbox says: "starting at 3 users"

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact