I rely on Restic at the moment which seems to need fast read access to data, but their incremental snapshotting is great. It'd be ideal if I could find something like that supporting these "cold storage" solutions.
There are 3rd parties that will do it for you (Iron Mountain is at least one) but that's an extra cost and Google takes no responsibility for it. I assume this is an example of a place where Amazon is able to leverage its wholistic business, with a Cloud service that can also take advantage of their physical logistics system. Google's service here is quite significantly cheaper and has some nice features though, but even if it's not worth a $4/$1.23 premium for Amazon I could definitely see continuing to pay Amazon some premium ($2 vs $1.23 say) for that alone anywhere with limited high speed WAN availability.
We also have a Transfer Appliance , that comes in two sizes (100T and just under 500T). We don’t currently support shipping one filled up with your data for recovery/export though.
You can also request a "B2 Fireball"  from them. It's basically a small array that they mail to you for $550 with 70tb of storage. You fill it up and send it back to them within the month, and they'll load the data into your account.
 https://www.backblaze.com/b2/cloud-storage-pricing.html (Bottom of the page)
Amazon's equivalent to B2 Fireball is "AWS Snowball" (amusingly enough, not sure if there is a bit of fun name riffing between the two here), which is a service fee of $200/50TB and $250/80TB device, any onsite days after the first 10 at $15/day.
It's interesting how the pricing mix is on this feature though. Amazon offers lower potential ingress pricing depending on your use, though notably if you kept the Snowball a whole month the pricing would get very close to the Fireball (+20 days @$15/day brings the price to $500/$550 respectively, though the former with 20TB less and the latter with 10TB more).
Backblaze and Google are both much cheaper to get data out of though, Amazon's Glacier and descendent services remain very much deep freeze focused.
There may also be a minimum storage period, like Amazon has.
Let's wait and see.
The actual product is pretty painful when you need to do a recovery, especially if you don't know where the file lived on disk. I haven't tried newer Arq Cloud Backup destination to see if it improves the search experience.
That said my experience is from more than a year ago and I would try it again if they were able to bring their search on par with current consumer backup offerings.
If you only backup from a single machine it has a local cache of already backed up data, this has the large advantage that it basically only needs to push the delta data to the remote, not do any kind of synchronization to check what is already there or not.
... with one very big difference - you can only point borg at an SSH host. You can't point borg at S3 or B2 or Glacier, etc.
rsync.net supports both borg and restic, but even the heavily discounted plans are much more expensive than "Cold Storage" or Glacier, because they are live, random access UNIX filesystems ...
Also worth pointing out that my storage is calculated after compression and deduplication. So depending on the data a Borg backup can be much smaller than the actual data.
Restic seems more made from the ground up to utilize the existing power of a filesystem as a database, so it needs remote storage that offers quick interactivity (esp. checking existing files), i.e. it's impossible to use something like Glacier as a backend.
It's not a problem for me since I just backup to a local drive and (am planning to setup) synchronization to a remote dumb storage.
Actually, since it’s google I likely wouldn’t consider them regardless.
shots fired I like when multinational corporations with revenues the size of midsized countries engage in some childish puns
Alright alright alright!
Density: 1.4 PetaBytes / rack
Power consumption: 3 KW / PetaByte
No Air Conditioning, instead use excess heat to help heat the building.
Raw Numbers as of August 2014:
4 data centers, 550 nodes, 20,000 spinning disks
Wayback Machine: 9.6 PetaBytes
Books/Music/Video Collections: 9.8 PetaBytes
Unique data: 18.5 PetaBytes
Total used storage: 50 PetaBytes
Costs are $2/GB, lifetime, I believe.
For me this is a "last resort backup", costs little to keep around, and god-forbid we ever need it. BUT that means we need to account for the case were we do need it! And if it's going to cost too much then there's no point in the backup anyway.
I've been comparing cloud storage prices to hard drive prices for years now. My first thought when seeing the storage prices was "huh, that might actually be worth it", but depending on the retrieval costs, you might still want to roll your own no matter the storage costs. For private use, I am (was?) planning a variant of this as soon as I am finished doing a server migration: https://old.reddit.com/r/DataHoarder/comments/7rjcdn/home_ma...
/usr/bin/md5sum --quiet -c md5sum.chk
It has no meaning whatsoever. Someone on the marketing side of the team decided that was a "competitive" number to present, outwards, and someone in engineering was tasked with, working backward from that number, coming up with some plausible calculation that resulted in it.
In the real world, they, like Azure and Amazon, will have single point in time outages that will wipe that out for a year or more.
Here is what an honest assessment looks like:
"Historically (updated April, 2019) we have maintained 99.95% or better availability. It is typical for our storage arrays to have 100+ day uptimes and entire years have passed without interruption to particular offsite storage arrays."
"In the event of a conflict between data integrity and uptime, rsync.net will ALWAYS choose data integrity."
An outage affects availability, but as long as it's not permanent it doesn't affect durability. For example, if I add a new backup provider that stores data on-premise I've added a (nearly) independent data store. This substantially decreases my risk of losing my data unrecoverably (increases durability) but if I don't set up any sort of automatic failover I'm still at risk for substantial outages (no practical increase in availability).
For example, I don't believe Amazon has ever lost any S3 data (https://www.quora.com/Has-Amazon-S3-ever-lost-data-permanent...), and if they did it would be a big deal. Same with the other major cloud storage providers.
> Someone on the marketing side of the team decided that was a "competitive" number to present, outwards, and someone in engineering was tasked with, working backward from that number, coming up with some plausible calculation that resulted in it.
I would be incredibly surprised if that happened. That's not the way I've seen anyone work here.
(Disclosure: I work at Google, though not in Cloud)
Cloud Storage is designed for 99.999999999% (11 9's) annual durability, which is appropriate for even primary storage and business-critical applications. This high durability level is achieved through erasure coding that stores data pieces redundantly across multiple devices located in multiple availability zones.
Disclaimer: I work at GCP, although not in GCS specifically.
You are correct - I misread that as availability even after quoting that very same line.
You might not need to speculate much about how it works, it's probably implemented as described by Google themselves in slides 22ff. here:
(In a nutshell, pack some hot data with a lot of cold data on many large drives, then put a Flash-based cache in front of them to get long tail performance predictability back.)
Gory details are in the patents, 9781054, 9262093 and 8612990, which I'm not linking directly, because your lawyers might not approve. There's even a follow up, 10257111. It's so new, from two days ago, that Google Patents can't find it, while Justia can.
So I suspect this is not a fully cold storage. That's why they can retrieve the data faster. Seems more like an economics hack (Longer commitment to keep the data, allows them to buy and operate the storage hardware/software at a cost that can be amortized against those commitments)
Let say that you only need to write to it once, have 2 secure location available for free, that would still means that you need 2 of them which would pay for itself in 28 months then.
Sure it's "cheaper" but it's far from being as good and the price difference isn't that big.
The important part is just that the keys don't end up in long term cold storage. Either it's only retained for a short period (e.g. tape backups that get rotated after two weeks), or it supports live deletion.
Unlike tape and other glacially slow equivalents, we have taken an approach that eliminates the need for a separate retrieval process and provides immediate, low-latency access to your content. Access and management are performed via the same consistent set of APIs used by our other storage classes, with full integration into object lifecycle management so that you can tier cold objects down to optimize your total cost of ownership.
* Go to https://cloud.google.com/compute/
* Scroll down to the pricing information
* Click on the link for the price list
Glaciers restore costs had a lot of fees in my one experience. We could have bought several RAID units for the price of a fast restore. If you asked for it back over a long period of time, the price dropped dramatically.