1. My Time Machine backup (primary backup)
2. BackBlaze (secondary, offsite backup)
3. Amazon Glacier (tertiary, Amazon Ireland region)
I only store stuff that I can't afford to miss on Glacier: photos, family videos and some important documents. Glacier isn't my backup, it's the backup of my backup of my backup: it's my end-of-the-world-scenario backup. When my physical harddrive fails AND my backblaze account is compromised for some reason, only then will I need to retrieve files from Glacier. I chose the Ireland region so my most important files aren't even on the same physical contintent.
When things get so dire that I need to retrieve stuff from Glacier, I'd be happy to pony up 150 dollars. For the rest of it, the 90 cents a month fee is just a cheap insurance.
Also the dialog for restoring files is very clear in showing costs vs speed https://www.arqbackup.com/documentation/pages/restoring_from...
Haven't had any problems, i'd recommend it.
1. Synchronization across multiple machines using Bittorrent Sync. 30-day archive on local machines and one remote (encrypted-only). One local machine has the archive set to non-deleting.
2. Time machine backups of my primary machines.
3. Encrypted backups through Arq to OneDrive.
I'll probably soon add encrypted Arq backups to B2 or Nearline. OneDrive is practically unbeatable price-wise. I work at a university, so have the academic discount. That's 65.97 Euro for four years, or ~1.37 per month for 1TB of storage space (though I only use a fraction of that).
For those that don't know, rsync.net now has full, native support for both attic and borg.
Our longstanding "HN Readers" discount make it very affordable. Just email us.
 Native, as opposed to pointing attic to a local sshfs mount point that terminates at rsync.net, which is what we used to offer to attic/borg users. We don't have a python interpreter in our environment, so it was not easy to provide those tools. We solved the problem by (cx)freezing the attic and borg python tools into binary executables. So, still no python in our environment (reducing attack surface) but the ability to run attic and borg just like they are meant to be run.
It would help if there was a more clear pricing structure.
You can either pre-ammortize over the storage period, or you can backload at retrieval time. Furthermore, charging at retrieval time also sends a clear message on what type of data this service is to be used for and modifies user behavior.
There is no perfect backup solution, and I'd be surprised if this one failed in my lifetime.
Should go get a coffee.
Google Nearline is a much better option IMO. Seconds of retrieval time and still the same low price, and much easier to calculate your costs when looking into large downloads.
Their pricing is great (0.5c/month) but I'm a little worried about their single DC.
This makes their offer very interesting.
Considering though that before that you couldn't even upload a directory, only files, this is a huge step. :)
Is there any source for the single DC?
EDIT: Nevermind. I did some more googling & found one:
It's almost as cheap as Glacier, but requires no waiting and has no complicated hidden costs, just simply somewhat higher request pricing, a minimum 30 days of storage and an extra $0.01 per GB for data retrievals.
> Standard - IA has a minimum object size of 128KB. Smaller objects will be charged for 128KB of storage.
Rounding up all small files to 128 KB can be a huge deal. I for example use Nearline to directly "rsync" my NAS box for offsite backup (yeah I know, I should use something that turns it into a real archive or something, but I'm lazy and Synology has this built in). If those hundreds of thousands of (often) small files were rounded up, S3-IA would easily be more expensive than S3/GCS.
Disclaimer: I work at Google on Compute Engine and am a happy Nearline customer myself.
Glacier, as the article points out, is difficult to grok because there are provisioned bandwidth costs.
Nearline is difficult to grok because it simply caps your bandwidth at 4 MB/s per TB stored. It starts at 1 MB/s.
Basically, both are optimized to make retrieval slow and/or expensive because the cost optimizations they're doing internally are incentivizing very cold data, very big data, or both.
Disclaimer: I work on Compute Engine (GCE) but not Cloud Storage (GCS).
It does go against their slogan "only pay for what you use", though. You're not using that 15GB/hour for 744 hours, and haven't explicitly selected provisioned IOPs.
Not a problem all you can get is a crappy 8 Mbps ADSL. Oh well, let me whine... ;)
This write up is a pretty good summary of our differences on the compute side: https://news.ycombinator.com/item?id=10922105
Disclaimer: I work on Compute Engine, and specific to that article launched Preemptible VMs.
Since I can't find a way to set my account status to "personal" I'm somehow personally responsible to pay VAT (I wouldn't even know how to do that) if I don't want to risk getting problems with our version of the IRS. That doesn't sound like a nice way to do my personal backups.
On AWS that's not a problem, Amazon bills and pays the VAT for me.
That's all any of us on Google Cloud can usefully say about the damn VAT issue. We've got people working on it, but it's a Google wide issue (and I honestly don't understand the last mile enough to know why we don't do exactly what you describe AWS doing).
As a side reply to the poster below, it's not true that we're trying to exclude individuals. There are lots of happy, individual customers all over the world. We unfortunately just didn't get out ahead of this personal service / VAT thing as a company. So again, Sorry! And someone is working on it, but don't expect any immediate fixes when taxes/regulations/laws are involved.
From that same page: "Google Cloud Platform services can be used only for business purposes in the European Union."
That said, Google (like most businesses?) hasn't cancelled any product that is making lots of money. At Google, that means Ads, Apps, and yes Cloud. We're here to stay, and unlike on the consumer side Cloud has an explicit deprecation policy for products that have "gone GA" (graduated from Beta).
Big picture items like Compute Engine, Google Cloud Storage, etc. aren't going anywhere. We might deprecate our v1 API after we're on v2 of course, but we're not cancelling the product (again, it makes us money, and that's a pretty key distinction from the consumer products Google has historically shuttered).
Disclaimer: I work on Compute Engine and haven't personally cancelled anything yet ;).
Your comment doesn't really relate to the parent, to this thread and is a question that is trivially easy to answer. There's no point in hundreds-to-thousands of HN readers seeing your comment which can be answered in under a second and anyone who's used GCE already knows the answer to.
Please considering querying Google, or a relevant knowledge base, before querying a tangentially related HN comment chain.
Disclaimer: I work on Compute Engine.
First of all, I just woke up (it’s morning here in Helsinki) and found a nice email from Amazon letting me know that they had refunded the retrieval cost to my account. They also acknowledged the need to clarify the charges on their product pages.
This obviously makes me happy, but I would caution against taking this as a signal that Amazon will bail you out in case you mess up like I did. It continues to be up to us to fully understand the products and associated liabilities we sign up for.
I didn't request a refund because I frankly didn't think I had a case. The only angle I considered pursuing was the boto bug. Even though it didn't increase my bill, it stopped me from getting my files quickly. And getting them quickly was what I was paying the huge premium for.
That said, here are some comments on specific issues raised in this thread:
- Using Arq or S3's lifecycle policies would have made a huge difference in my retrieval experience. Unfortunately for me, those options didn't exist when I first uploaded the archives, and switching to them would have involved the same sort of retrieval process I described in the post.
- During my investigation and even my visits to the AWS console, I saw plenty of tools and options for limiting retrieval rates and costs. The problem was that since my mental model had the maximum cost at less than a dollar, I didn't pay attention. I imagined that the tools were there for people with terabytes or petabytes of archives, not for me with just 60GB.
- I continue to believe that “starting at $0.011 per gigabyte” is not a honest way of describing the data retrieval costs of Glacier, especially when the actual cost is detailed, of all things, as an answer to a FAQ question. I hammer on this point because I don't think other AWS products have this problem.
- I obviously don't think it's against the law here in Finland to migrate content off your legally bought CDs and then throw the CDs out. Selling the originals, or even giving them away to friend, might have been a different story. But as pointed out in the thread, your mileage will vary.
- I am a very happy AWS customer, and my business will continue to spend tens of thousands a year on AWS services. That goes to something boulos said in the thread: "I think the reality is that most cloud customers are approximately consumers". You'd hope my due diligence is better on the business side of things, as a 185X mistake there would easily bankrupt the whole company. But the consumer me and the business owner me are, at the end, the same person.
Amazon has done a great job with this feature. By doing a poor job implementing something for an extremely narrow usecase, in a technology that is outdated and then providing the most complicated pricing structure surrounding every aspect of the product one can't helpbut use the feature: any other provider or service.
Like, wtf would be the usecase for amazon glacier in 2016? I dont think I would put hubdreds of petabytes of sata into 20 year cold storage, and the author of this post certainly wouldnt use it again. The fact that i need to read 2 pages of pricing docs and then the 2 pages you linked to control them because I cant estimate them myself, is a sure sign this is absurd
Not all products are for all people. If you foresee a need to recover a large amount of data all at once, then glacier's not for you. If you might occasionally need a filing from 6 years ago, then glacier would be great.
Amazon starts to charge you extra anytime you exceed restoring 5% of the data.
If for example you save all your tax-related documents in Glacier, then you are audited then the accounts department or the government will want all the information. Not 5% of it. Not 10% of it. Everything. At that point Amazon will have you over a barrel, because getting the data out at a reasonable time frame will cost exponentially more than dripping out the data over the course of 20 months.
Isn't one way to get past this... increasing your data usage by 20x? If OP used less than a $1 a month, then if he uploaded $20 of junk data, he can get the 5% original data back "for free". Sure, it's $20, but it beats out $150+.
Though it's hard for me to imagine having so many categories of unrelated, useful and important data.
Legal issues, at least in the US, do not have those requirements.
I have had situations in the past 12 months where recovering past data would be worth >$1k for the right 100k of data.
Are you sure about that? I haven't worked with tax litigation specifically, but I've worked with e-discovery w.r.t. e-mail and I can assure you that no one ever asks for all the e-mails sent by a particular company over all time. It's always a matter of asking for the e-mails sent by or received by a select group of people, over a fairly discrete time period. For something like this, a Glacier store might make sense, if it was coupled with an online metadata cache stored in e.g. S3.
The government basically comes to you and says they think you owe X, and you have to prove that false to their satisfaction. The more data you give your CPA to work with, the better.
Lots of businesses have data retention requirements, and it can be difficult and time consuming to make sure this data is backed up in a way that is secure and can survive a catastrophe.
The author's use case (and most other personal use cases) might not be a good fit for Glacier, but he's not the target market.
What usecase could the price delta make sense to have a 4 hour feedback loop and all of your important data locked in someone elses data center.
the usecase where your data is so properly massive that this makes sense && you don't have the storage infrastructure in place, is so narrow that it doesnt make sense.
It is basically one research student's crawl data
Edit: also, s3 is pretty cheap. So again, i dont really see the usecase here. How much room is in the market between your own physical or digital system and amazon s3 or an equivalent. you would have to have a massive amount of data you dont care about and be very price sensitive.
You don't have to pay $150 for retrieval of 60GB. And you don't do long-term storage for X TB / 5 TB * $150. You might have to rent space in someone elses datacenter to put your own external backup... or you could pay Amazon for Glacier and not deal with maintenance etc. Might be worth it even if you have infrastructure for all data that isn't glacier-cold.
The data I care about is already backed up on two different multi-TB drives at home, and another one at work.
Glacier is the contingency for "something took out the original data and all three backups in two different locations 7 or 8km apart - if I'm still alive after whatever just happened, I'll consider whether or not to pay Amazon a grand or so to retrieve it quickly from Glacier, or wait ~20 months to get it all for single-digit-dollars".
That gives you 2 backup providers that can durably store everything and it's free and quick to access. Why deal with all the harddrives and glacier?
I believe there are other similar services for Linux or you can just use browser to upload files with Amazon.
So you trust Glacier or Google Nearline (or any similar provider) without testing? No testing ever???
I wouldn't feel happy with my critical data floating around in the cloud, without my checking it at least once a year to make sure it really exists.
And once you do start verifying that data, you will incur all sorts of charges to access it.
This allows you to trivially share, copy, move and retrieve that data quickly as well as fully control who has access and when.
I am sure there are use cases for this but in a situation where you have petabyte scale data, it is often the case that you also have the infrastructure to save it. How many places would need to store >5tb of data a week that
* don't have this capability in house
* will almost never need to access it again.
* will not need it in a timely manner, if they do need to access it again.
* don't have the money to implement their dedicated server and storage on site for this purpose.
I am not saying that this rules everyone out, but the prices are so low, and tape must be so annoying, I couldn't imagine why they keep offering this. Obviously, some peopel must be using it but in 2016 with storage prices being so low already, i don't know how many places have this amount of data and meet the above requirements.
This is purely based on the author's excellent description of the format in his arq_restore tool: https://www.arqbackup.com/s3_data_format.txt
The idea would be that the data would either never be restored or you could compel someone else to foot the bill or using cost sharing as a negotiation lever. (Oh, you want all of our email for the last 10 years? Sure, you pick up the $X retrieval and processing costs)
Few if any individuals have any business using the service. Nerds should use standard object storage or something like rsync.net. Normal people should use Backblaze/etc and be done with it.
We had a legal requirement to be able to product up to 7 years worth of bank statements upon receipt of a subpoena.
Not "reproduce the statements from your transactions records" but "give us a copy of the statement that you sent to this person 6.5 years ago"
We had operational data stores that could generate a new statement for that time period, but if we received the subpoena then we needed to be able to produce the original, that included the (printed) address that we sent it to, etc.
We had (online) records of "for account 12345, on 27th October 2011, we sent out a statement with id XYZ", we'd just need a way to pull up statement XYZ.
There's no way(^) we'd ever get subpoenaed for more than 5% of our total statement records in a single month, so something like Glacier would have been a great fit.
We had other imaging+workflow processes where we'd receive a fax/letter from a client requesting certain work be undertaken (e.g a change of address form). 90 days after the task was completed, you could be pretty sure that you wouldn't need to look at the imaged form again, but not 100% sure. We could have used glacier for that.
We use case that would have cost us (rare, but we needed to plan for it) was "We just found that employee ABC was committing fraud. Pull up the original copies of all the work they did for the 3 years they worked here, and have someone check that they performed the actions as requested."
Depending on circumstances & volume that might trigger some retrieval costs, but the net saving would almost certainly still be worth it.
(^) Unless there was some sort of class action against us, but that's not a scenario we optimised for.
I know it'll take either a lot of time or money to restore from Glacier, but if my home and work backups have both gone I'll either not care about my data any more, or I'll be perfectly happy to throw a grand or so at Amazon to get my stuff back (or, more likely, be happy to wait up to 20 months for the final bits of my music and photo collections to come back to my own drives).
its even less suited to disaster recovery (unless you have insurance)
Think about it. For a primary backup, you need speed and easy of retrieval. Local media is best suited to that. Unless you have a internet pipe big enough for your dataset (at a very minimum 100meg per terabyte.)
4/8hour time for recovery is pretty poor for small company, so you'll need something quicker for primary backup.
Then we get into the realms of disaster recovery. However getting your data out is neither fast nor cheap. at ~$2000 per terabyte for just retrieval, plus the inherent lack of speed, its really not compelling.
Previous $work had two tape robots. one was 2.5 pb, the other 7(ish). They cost about $200-400k each. Yes they were reasonably slow at random access, but once you got the tapes you wanted (about 15 minutes for all 24 drives) you could stream data in or out as 2400 megabytes a second.
Yes there is the cost of power and cooling, but its fairly cold, and unless you are on full tilt.
We had a reciprocal arrangement where we hosted another company's robot in exchange for hosting ours. we then had DWDM fibre to get a 40 gig link between the two server rooms
Yes, the docs are imperfect (and were likely worse back in the day). And it was compounded by the bug, apparently. But it's what everyone on HN has learned in one way or another... RTFM.
Was it mentioned in the article that the retrieval pricing is spread over four hours, and you can request partial chunks of a file? Heck, you can retrieve always all your data from Glacier for free if you're willing to wait long enough.
And if it's a LOT of data, you can even pay and they'll ship it on a hardware storage device (Amazon Snowball).
Anyone can screw up, I'm sure we all have done, goodness knows I have. But at the very least, pay attention to the pricing section, especially if it links to an FAQ.
As far as I see, in all other places they specify their pricing "per GB", and only this small phrase uncovers real meaning, which is not per actual GBs you transferred, but your peak rate multiplied by number of hours in month. IMHO this should be one of the first phrases describing the pricing model.
If you want more than 5% of your data in a month, the minimum it will cost you is $0.011 per gigabyte. Click here to see how to work out how much it will cost.
It's certainly not unreadable, although to be fair, it's nowhere near as clear as it should be.
'Use Amazon Glacier, it costs about a cent!' is the essential pitch.
Paying 87c a month for a couple of years, then 52c a month for a few more years, to back up 60+GB, and then getting a one off fee of $150 still averages out at around $2/mo. Hardly getting shafted.
It's like getting a parking ticket for parking somewhere you've unknowingly been parking illegally for years. Yeah, if you average it out, it's cheap. But having to stump up the cash still hurts.
I guess he always had Linus's backup strategy open to him - "Only wimps use music backups: real men just upload their important tunes on Bittorrent, and let the rest of the world seed it ;)"
A couple grand? 500GB hdds have been less than $100 for a long time now, certainly since before Amazon Glacier was a product.
Just because you can formulate a large valuation for something doesn't mean that it's a reasonable valuation.
Even $10/cd is a ridiculous estimate of the cost of blank CDs. If you insisted on using blank CDs for backup, you could do 200 CDs for $40.
Edit: And what are CDs... 640mb? that's like 12 CDs, not 200... I'm realizing that I've fallen into a troll trap...
Meta: I guess I'll leave this comment up as a cautionary tale.
I bought my ~3000 disc CD collection here in Australia, many of them still have price stickers of $30 and more on them. While CDs were "a thing" here, I figure I spent about a quarter of a cheap house on them. (I don't regret a cent though. I'm also "that guy" who thinks "You only had 60Gig of music to back up? Wow!", but is mostly socially-aware enough to keep that sort of reaction to himself.)
Wouldn't 2 x 100GB Blu-ray discs be better for the same price? Less physical storage space required and less time to burn the data.
Snowball currently supports importing data to AWS. Exporting data out of AWS will be supported in a future release.
Well, it was that and also the docs being knowingly and deliberately set up to trick incautious readers. The user does have a responsibility to read the fine print, but that doesn't excuse Amazon being openly evil about it. This is no different than ISPs that advertise "up to 50Mb/s" when they know very well that their network won't deliver more than 5.
You pay a lower per-kilowatt-hour rate, but your demand rate for the entire month is based on the highest 15-minute average in the entire month, then applied to the entire month.
You can easily double or triple your electric bill with only 15 minutes of full-power usage.
I once got a demand bill from the power company that indicated a load that was 3 times the capacity of my circuit (1800 amps on a 600 amp service). It took me several days to get through to a representative that understood why that was not possible.
Has anyone tried this or know of a gotcha that would exclude this?
And I realize that for the OP's situation, it wouldn't have mattered since he thought he was going to get charged a fraction of this.
Additionally, I wouldn't be surprised if the 5% is also based on a storage measurement that is pro-rated for the month. So I would let the 1200 GB of data sit in Glacier Storage for another month before extracting anything, just to be (more) safe.
If you've gotten into Glacier for the wrong reason, you may already be in the trap, and you can quickly rip yourself free and take a bunch of skin, spend almost 2 years ever so gently prying yourself free, or maybe a third way. That's my angle here. Also, traps don't have to be laid for someone to feel like he's in one, so I'm not putting that on AWS.
The cheapest way out seems to be to just grab 5%/month over 20 months, but that's a lot of sustained effort and contact with the service. So I could see a trick like this as a potential middle ground, at three months and ~$30 according to previous comment's details.
And, there is no "transition from Glacier to S3": if you want to do that, you have to:
1. restore it to S3 (and incur the fees and 4-hour wait)
2. copy the restored S3 object to a new S3 object
3. delete the restored object (or wait for it to timeout)
These days the infrequent access storage method is probably better for most people. It is about 50% more than Glacier (but still 40% of normal S3 cost) but is a lot closer in pricing structure to standard S3.
Only use glacier if you spend a lot of time working out your numbers and are really sure your use case won't change.
 - 5 cents per 1000 requests adds with with a lot of little files.
And then a customer wanted all their files rather than just one or two. Although that was billed back to the them.
Pricing plans FOR B2B should model, as effectively as possible, the underlying costs -- this allows the provider to offer the lowest possible pricing for the services that cost them the least to provide, with expensive services priced accordingly. As others have mentioned on this thread, utilities are really, really good at this -- they come up with extremely complex rate plans for their largest customers that help them achieve whatever economies they are aiming for, for example incentivizing customers to provide level-loading (which is effectively what Amazon is doing in this retrieval scheme).
That's a manufactured excuse for a fundamentally bad product api and pricing structure. When dropbox is more useful than AWS, amazon has screwed the pooch (which they do pretty often). Segmenting users by arbitrary circuitous logic into "consumers" (can't find a good use for it) and "enterprise" (can find a good use for it) isn't constructive. Both classes should avoid it, because it's not even an inexpensive choice, for what you get.
Adding on your utility company meme, there are different rate schedules for residential and business customers in most places. In Cloud, I think negotiated contracts with advanced customers probably make more sense than complex, unfriendly pricing models for everyone.
Big disclaimer: I work on Compute Engine.
That's something that generally keeps me from using AWS and many other cloud services in many cases: the inability to enforce cost limits. For private/side project use I can live with losing performance/uptime due to a cost breaker kicking in. I can't live with accidentally generating massive bills without knowingly raising a limit.
I have not tried variety of AWS services because I have no idea what it would cost me if something went haywire on my server.
If I could simply deposit to Amazon a prepaid amount and it would just use this deposit until it's depleted, after which the services I have would grind to halt. This would be a perfect way for me to try it.
For other services it is easier, but even then, setting up and managing my own cost control mechanism is a level of complexity (and risk of failure) I'd really want to avoid, esp. since I probably use AWS to avoid management overhead.
I would be a lot more worried about a risk of over-charging myself if AWS wasn't incredibly good about refunding accidental overages.
were linked elsewhere in the comments.
Edit: By user?id=re.
My only experience of using boto was not good. Between point versions they would move the API all over the place, and being amazon some requests take ages to complete.
After that worked with google APIs which were a better, but still not what I'd describe as fantastic (hopefully things are better over last 2 years).
Does s/he substantiate this claim in any way? AFAIK glacier's precise functioning is a trade secret and has never been publicly confirmed.
As noted by others here, if you treat glacier as a restore-of-absolute-last-resort, you'll have a happier time of it.
Perhaps I'm being churlish, but I railed at a few things in this article:
If you're concerned about music quality / longevity / (future) portability - why convert your audio collection AAC?
Assuming ~650MB per CD, and the 150 CD's quoted, and ~50% reduction using FLAC, I get just shy of 50GB total storage requirements -- compared to the 63GB 'apple lossless' quoted. (Again, why the appeal of proprietary formats for long term storage and future re-encoding?)
I know 2012 was an awfully long time ago, but were external mag disks really that onerous back then, in terms of price and management of redundant copies? How was the OP's other critical data being stored (presumably not on glacier). F.e. my photo collection has been larger than 60GB since way before 2012.
Why not just keep the box of CD's in the garage / under the bed / in the attic? SPOF, understood. But world+dog is ditching their physical CD's, so replacements are now easy and inexpensive to re-acquire.
If you can't tell the difference between high-quality audio and originals now - why would you think your hearing is going to improve over the next decade such that you can discern a difference?
And if you're going to buy a service, why forego exploring and understanding the costs of using same?
I did a comparison between FLAC and ALAC (a.k.a. Apple Lossless) on my CD library a few years ago (plus a few 48kHz tracks taken from DVDs), and the difference in total filesize was less than 10% so I doubt that is a major factor. I personally went for ALAC, as it has equal (EAC, VLC) or better support (OS X Finder, iTunes, Windows Explorer, Windows 10 media player, some tagging scripts, iOS) in stuff I currently use. Providing I keep a decoder with the files, its proprietary nature doesn't really bother me - I can always convert to xLAC if desired.
I wouldn't use a proprietary format because I could neverbe sure when in the future I'd want to read / re-encode, and what type of systems I'd have available at that timer, other than knowing I'd always have access to free software.
I have some FLAC archives, but I don't use them - so support to play that format hasn't been something I've taken much notice of. Do you normally play your ALACs, or keep an mp3 / ogg / aac version around to actually listen to?
Also, someone has wrapped it to build with different tool chains.
They could be running Glacier storage at cost (or even a slight loss).
But they make their profit when you try to get your data out.
I'm not implying anything nefarious - more along the lines that Amazon (could have!) looked at the market and compared demand to what they were offering. Then found something that scratched an itch...
They didn't just set the prices (or the pricing model) to amuse themselves.
I'm really doubting the need for a maintenance regimen on a drive which is almost entirely unused. Could have spent $50 on a magnetic-disk-drive and saved yourself hours worth of trouble.
Does magnetic media like this (especially spinning disk) suffer from bit-rot? What about the possibility of mechanical failure?
I'd never rely on mechanical disks as the one and only backup of any data critical to me - a two tier approach of mechanical for fast retrieval, and cloud/online backup seems to be the safest bet.
I currently have 100gb of photos on Glacier. I am going to be finding another hosting provider now.
I ended up using some cheap VPS, two of them located in two different countries. And it's still cheaper then say Dropbox.
And I don't pay the overhead of an add-on service.
Also I back up stuff that's not on my hard drive (only on external USB drives) and I'm not sure how the services handle that.
If the services give me some of these points, that's not sufficient; they would have to give me all of these points. Only then would I consider them. All things being equal I'd be willing to pay for some convenience but my current solution is all scripted so it's pretty darn convenient.
I can see why detailed control would be one reason, but you could still just have a very controlled backup to your own storage location(s) as a first step and just let a backup service bulk store your already named and encrypted files? It's only the last-resort you need to go to so if it's a huge blob of encrypted data that shouldn't matter too much -- you only need to access that in case of a total disaster where you lost all your own backup endpoints first.
> And I don't pay the overhead of an add-on service.
The reason I'm asking is because I was under the impression that backup services are much cheaper than pure storage, while still offering some conveniences such as versioning/backup apps. Glacier charges $0.007 per GB per month, that's $7/month just for a single 1TB machine, just for a single version (If my math is correct, it's early)! If you have dozens of versions it quickly adds up.
I do 10 machines at around 1TB on average, unlimited storage in unlimited versions, at $1.25 per machine per month (flat rate, regardless of storage volume). I have tried building my own machines, tried looking at storage providers etc., but can't get near.
Even if I did only 1-2 machines, the cost in Glacier would break the backup service cost already at a couple of TB total storage.
I back up data, not machines. I would prefer that third party backup software not have access to the unencrypted files on my machines.
The pricing you mention is compelling. Which service is that? Does your model work if the data is on multiple external drives?
Getting down to a specific example, say I have one laptop and three external drives. Do the backup services you have in mind work with this setup? How would they charge?
As long as you just keep the external drives connected to the same machine during backup, the backup application doesn't care where the file set to backup is mounted. its just a directory list per machine.
I used to backup "data" too, by using a first step of backup of multiple machines to a NAS and then only backup that data to the cloud. However, I sacrificed that extra security for the added convenience of direct restore of individual files and machines. It also reduced the risk of me having done a mistake in the backup config of the first step (which I estimated to be a way higher risk than hardware failure, fire or a data breach at the cloud provider, since I have basically never configured anything right). Being able to easily fetch an individual file from 1 day or 1 week ago can really save time. Edit: also remember that the backup client on Linux uses ram proportional to the backup size so on my cheapo NAS I also outgrew its ram and would have had to get a faster one or a file server, that was also partly why I left it.
The good thing about the backup pricing is that it doesn't increase with aggregated backups like normal storage. It's 150/year so to be competitive you need a few TB, but you can very easily backup your parents machines too and save some time around Christmas... Note though that people sharing the same plan can see each others data (or so I presume)
removed material that might have derailed things further.
I don't even use my collection now that I have Spotify Premium. The only music I've bought lately is some 24bit high bitrate stuff.
This is known as "Format Shifting" — taking one copyrighted medium and converting it to another. In Australia, you are explicitly not allowed to do this with CDs, DVDs and Blu-rays.
You are only allowed to keep a digital copy if you continue to retain the original — a backup. If the original is lost or destroyed, your digital copies must be discarded.
For example, you can rip a CD and put it on your iPod, or computer, as long as you continue to own the CD. The issue here is that in both cases you also control the device you are copying it to. You don't control but rather lease space on Amazon's servers — so it introduces a grey area on whether you are allowed to backup to such places and whether putting data on those servers constitutes distribution of the copyright material.
Realistically, none of this is black-and-white and Amazon could flag it as infringing content and remove it just to cover themselves against DMCA complaints anyway. This is true in both Australia and the US, regardless of the differences in copyright law (Australian copyright law offers far fewer protections than the US, incidentally) because both having similar DMCA laws.
Also backed up by:
http://copyright.com.au/about-copyright/exceptions/ - which states we're allowed to "space shift" music.
And then posting this fact to hacker news. Not the brightest bulb in the pack.
1) Buys music CD.
2) Rips CD to own computer. Shares with own devices [via personal cloud] for continued listening.
3) Destroys original CD.
4) Continues to listen to music which has been paid for; not sharing files with anybody else.
I for one fail to see the problem here. But then again, I'm probably not the brightest bulb in the pack either.
It would be easier to download pirated copies off the internet and exactly as legal.
Mine are all in a box shoved against the wall under my bed, where they don't get it the way but I can prove that I still deserve consideration under the First Sale doctrine.
I'm surprised that this aspect has not been mentioned here in the comments yet:
> I was initiating the same 150 retrievals, over and over again, in the same order.
This was the actual problem that resulted in the large cost.
At my old job we would get a lot of complaints about overage charges based on usage to our paid API. It wasn't as complicated of pricing as a lot of AWS services, just x req / month and $0.0x per req after that, but every billing cycle someone would complain that we overcharged them. We would then look through our logs to confirm they had indeed made the requests and provide the client with these logs.
Except that it wasn't. The repeated requests were free, because he already set the maximum rate with the first wave of requests. Surprising?
Also, it really is not a hit piece. It's an honest report of what he did (and what he did wrong) and that he thinks the docs aren't as clear as they could be.
>> This was the actual problem that resulted in the large cost.
That is not true, according to the article itself. The first request for the full 60.8gb already results in a $154.25 bill, regardless of the ones that follow. From that point on he can continue to retrieve 15.2GB/hour for the rest of the month without incurring further costs.