It is surprising that they didn't make it compatible with the S3 API -- at least for common object/bucket create/delete. This will require more code to be written and it will be harder to adapt client libraries.
The API documentation is here: https://www.backblaze.com/b2/docs/
* The lack of scalable front-end load balancing is shown by the fact that they require users to first make an API call to get an upload URL followed by doing the actual upload.
* They require a SHA1 hash when uploading objects. This is probably overkill over a cheaper CRC. In addition, it means that users have to make 2 passes to upload -- first to compute the hash and then another to upload. This can slow uploads of large objects dramatically. A better method is to allow users to omit the hash and return it in the upload response. Then compare that response with a hash computed while uploading. In the rare case that the object was corrupted in transit, delete/retry. GCS docs here: https://cloud.google.com/storage/docs/gsutil/commands/cp#che...
And... you answered your own question. :-) We reduce our operating costs
by not having as many load balancers in the datacenter and pushing off
the responsibility to the API. It all comes from our traditional backup
product where we wrote all the software on both sides so we could save
money this way.
With that said, we are actively considering offering an S3 compatible
API for a slightly higher cost (basically what it would cost us to deploy
the larger load balancing tech).
Having been on the receiving end of entirely too many corrupted files in my life, I strongly approve of their use of a hash that's been standardized and fast for decades and remains cryptographically strong. "But fast" if you fail to store it isn't very helpful. TCP has a CRC too. We're wallpapering over it with better ones and everyone serious has been for years: it's time to accept that cheap CRCs aren't a good place to get stuck.
Improving the API to avoid the 2-pass problem is spot-on though. Another possible solution is to require either a subsequent API call, or format the first message as a multipart, and use that route to have the caller submit the hash that's used to confirm and commit the file to storage after the body upload. This would solve the 2pass problem while still ensuring the client is actually doing the integrity check -- and since Backblaze is more than likely to take the heat on any corruption issues, it's probably a good policy for them to make sure lazy client implementations aren't going to cause problems that their storage then gets the publicity smear for.
But there is no call for a cryptographic hash here. This isn't being used as any sort of ID or to verify integrity outside of corruption.
The API works on top of TLS, which already includes cryptographic authentication of all data (usually via SHA-1/2 HMAC or AES-GCM).
The hash would be computed at the client right after reading from disk and right before TLS enryption, and since they seem to terminate TLS at the storage server it would be computed right after TLS decryption and right before storage, so it doesn't seem to provide any gain.
I think they should just remove it, or at least make it optional.
This is all rare, but it does happen. This is why the GCS team wants to know if you are seeing corruption on file upload as it might be some bad hardware failing in a non-obvious way.
There's the write path from B2 receives your bits to when they're stored on disk, for one. You could have unforeseen bugs in the code sitting on the other end of their upload URL (it's probably not all theirs, and even if it was it was written by human developers).
Or B2's internal network path (if they have any) between that and the disk. Ideally that would provide integrity too, but maybe not. They offer a low price point and call out other compromises they make to achieve it (e.g. limited load balancing) - so while I really doubt it, it's remotely plausible they deem the internal overhead of SSL too high.
But then there's the potential for mismatch between "what the customer thinks they uploaded" and "what the customer actually uploaded" too! Less of an issue for now because their API only appears to support uploading files all at once, but eventually I'm sure they'll support a multipart upload scheme like the other platforms do. At which point uploads become more complicated since clients need to retain state and potentially resume. What if a client screws it up and there's some off-by-one error (or whatever)? If you can provide instant feedback, at upload time, that your clients provided bogus data, that's a good thing.
You can argue it's a painful requirement to force on users since it means they have to track/compute it themselves (might be nontrivial for streaming applications), which is fair. But there are enough points of failure, and the numbers so large, that errors happening is a fact and you really need to insure against it. Especially here, your entire reason for existing is to reliably store bits so it's kinda important to get it provably right.
It seems completely sensible to err on the side of caution, especially as a new and relatively unproven platform (as an object storage platform provider I mean, obviously they have tons of experience storing things).
If you're handling data on behalf of others, it's paramount that you checksum data end-to-end. Amazon S3 allows you to do this by sending the MD5 or SHA along with the data. Google GCE allows you to do this with CRCs (which, despite what others in this thread say, are more appropriate for the task than crypto hashes, as long as you use enough bits).
You would think, that if it's just being used as a checksum, anything that passes https://code.google.com/p/smhasher/wiki/SMHasher with high marks would be sufficient.
For example, I might have md5-colliding files on my hard drive somewhere, that someone else made as a proof of concept. I honestly don't know. But I would worry about using a storage system that depends on md5, because what if it deduplicates without checking every byte?
For the same reason that UTF-16 has encouraged so many broken implementations, at least in a pre-emoji world, it's a bad idea to almost but not quite support convenient features. Either clearly don't support something, or fully support it.
Let's say I backed up 8 TB of data for a small business, and I need to restore in 24 hrs, is it possible to request overnight shipments of hard drives of data so I can do the restore locally instead of taking weeks to download all that data
I know amazon has this feature, not sure about google.
Another question, what's the max number of buckets can an account hold?
Looking forward to try this service out, Thanks
Yes. In our traditional 8 year old product line of online backup, you can order a restore on an external USB hard drive for $189 (you keep the hard drive, and the cost includes world wide shipping). We FedEx the restore to you anywhere in the world. (We ship to Europe all the time, but add an EXTRA 24 hours for that to arrive.) Oh, if you only have 128 GBytes of data you can order a USB thumb drive of that FedEx'ed to you for only $99.
B2 absolutely supports this USB drive restore functionality. We call the feature "Snapshots" (you take a "Snapshot" of some of your data, then you can either download it as one large zip file or you can have it sent to you via USB Hard Drive).
Our online backup product tells you that the restore is up to 4 TBytes, but we have prepared Drobos as "special orders" for customers that were much larger than that. We aren't trying to make profit from that part of the business, just kind of break even on materials and shipping and create customer goodwill. You would be AMAZED how happy some customers are to receive 8 TBytes of Drobo with all their data they thought they might have lost. :-)
We are limiting each account to 200 buckets right now until we gain some experience. Each bucket can hold unlimited numbers of files.
We really want feedback on these decisions, so as you run into issues just let us know. Many of the limits are arbitrary, I think the 200 buckets was so we didn't have to create a paginated list on your logged in webpage for version 1.
I could see this being a practicality for small time video editors, I need to keep copies of old projects, but would be willing to pay $300 to get an overnight hard drive of the files since whatever new project will pay for that cost.
Saved me from losing quite a bit of data when a tornado trashed my apartment.
There's no talk about their backbone or their network capacity. I get that they have terabytes of upload coming in, but as anyone who's used their software can tell you, it's throttled. I don't know how many users they have to tell you how much bandwidth they're actually handling, but can they handle people using B2 as a distribution point for large files for customers? For example, I have a huge S3/CF monthly bill from customers downloading ~400MiB ISO images tens thousands of times a month. Amazon CloudFront is ~$0.085/GB for the first TB, while BackBlaze B2 is an incredible $0.05/GB - but at what performance? Will my technical support representatives be getting angry phone calls about halting download speeds or do they have the capacity for something like this?
Hosting the world's data is no tiny task, I hope they're ready for it and I do, truly, wish them all the luck. I've been a BackBlaze customer for a few years now (at least 5 or 6, I imagine) as a tertiary or quaternary backup (haven't had to restore... yet), and B2 looks and sounds promising, but as far as technical details go, this post is nothing.
EDIT: In response to the reply below, I believe it's throttled by default in the client, though that can be turned off in the application settings. Also, you've replied to my claims of throttling but have ignored my question regarding backbone capacity and network readiness...
We currently have about 100 Gbps symmetric capacity into our datacenter on a couple of redundant providers, but the key is we have open overhead and we'll purchase more as our customers need it.
But here is the best part (if you want OUTBOUND capacity) - our current product fills the INBOUND internet connection, but currently we only use a tiny, tiny fraction of the OUTBOUND connection. So if you want to serve files out of our datacenter we have a metric ton of unused bandwidth we would LOVE you to use. And if you fill it up, we promise to purchase more.
But also keep in mind, Backblaze is very experienced with STORAGE and I have a lot of confidence we won't lose any of your files. What we don't have a huge amount of experience with yet is serving up viral videos and such. So just bear with us during this beta period while we figure it all out. Personally I'm looking forward to that part (all the CDN/caching layers).
:) well, yeah -- but thats also what B2 charges for.... so The business model requires that BW to start getting consumed :-)
If you serve up viral videos etc. and start eating a ton of bandwidth, even a "do it yourself" CDN out of VPS's could quickly save you a fortune...
As somebody else mentioned, since we're in a commercial datacenter with a bunch of network providers already serving us, it's pretty easy to dial up our capacity as we need it.
Whats the lead time in your case?
In my historic experience doing this regardless of if I was even in MAE-West... cross connects and provisioning were eons in internet time...
It could go faster, but if we need to buy a new (expensive) network switch that can take a few days to arrive. And as you mention, the datacenter guys are happiest if you give them 3 - 4 days and a work order to do the cross connect.
Building out more vaults (the blocks of 20 storage pods we store data in) is usually about the same if we rush it, but we have a big (multi-petabyte) buffer spinning ready to accept data at anytime. We have a regularly scheduled delivery of pods once per month based on projections, but we have been known to tell our provider to go ahead and build three months worth of pod chassis (everything except for the drives) immediately and ship them to us. We supply the hard drives, so that either comes from our own stashes or we quickly order some more from various sources.
Brian from Backblaze here: no it is not throttled (by us). If you only have a 10 Mbit/sec upload capacity you are throttled by your ISP. Also make sure you visit our "Performance" tab in the online backup client and tweak a few settings, like increase the number of threads.
I moved to Linux a few months back, and was going to basically cancel my Backblaze sub when I got around to it since you have no interset in making a Linux client. Maybe B2 can act as a solution to this at a price penalty.
I can understand your biz reasons for not having one though.
My only interest in B2 is backing up for a lower cost than the ridiculousness of S3: at $0.022/GB, I might as well buy a 3TB hard drive myself, put it at a friend's and push my data there. Every month. At the end of the year, I'd have 36TB in hard drive capacity if I bought drives instead of paying for 3TB of S3 storage.
(All numbers are estimates and "roughly"s. Also I don't have external backups now because I'm too lazy to write the software myself, so there is something to say for paying instead of not having it.)
I use backuplizard for personal data/photos which works out more like cost of a one 2 TB disk per year to me and to me Its easy to pay it instead of owning disks and worry about them breaking,etc
All in all, it's by far the cheapest option to store it at a friend's. Storage providers could also cheapen things a whole lot by offering reduced redundancy and whatever else it is they do to make it so expensive (glacier storage is also more expensive than the price I got). It's a backup after all, I don't need my backup to have five copies on spinning (versus offline, non-powered) disks. If my backup dies, I'll upload it again...
It is faster transferring big files rather than many small files.
I tried from multiple physical locations but I could not increase my downloads past 1-2mbps, and for TB of data, that seemed like it was throttled by BB considering I was easily uploading 20mbps.
I contacted BB support and they ignored me, so I switched to a competing services and have had no issues ever since.
This really made me sad because BB's blog is amazing and their tech is really cool, but when you see people saying "its throttled" its because of real experiences out there, and not just ones limited to an ISP issue.
But either way, we added threading to the bzdownloader (our custom application to download large restores) and if you tried it today crank it up to 10 threads and I swear you'll be happy with the download performance.
My issue did occur during that period, and I am impressed you can call that out from memory, it must have been a frustrating time for BB.
If that alone was the problem, you would have just 100% won back a customer, but the thing that irked me the most was the customer support response.
I know you do not work for your helpdesk, but their response was more the reason I left, their apparent lack of concern was what turned one of any service provider malfunctions into a dissatisfied customer looking for a competitor.
> do not work for your helpdesk
It's unfortunate when a customer gets a bad experience. The helpdesk guys are faced with this monumental task of responding to tons and tons of basic questions by Mom & Pop customers that are not computer professionals. Then mixed in are competent programmers and IT guys that know what the heck they are talking about. The helpdesk guys sometimes get it wrong who they are dealing with and it infuriates the competent computer users.
I think we should issue "professional computer user" cards where you can get a different level of support from all these companies. If you were helpful on forums you could earn points for your card, but if you ask helpdesk too many dumb questions your card could be revoked and you would go back to the first tier support. :-)
I'm bothered by the whole idea of putting all my data with any one vendor (with Backblaze or Amazon) and thinking you don't need a backup. I claim "RAID / Reed-Solomon / real time mirrored copies" is NOT "Backup". If your programmer makes a mistake and a line of code deletes some mission critical data from Amazon S3, then all the Reed-Solomon encoding in the world doesn't help you, the data is still gone.
What you need is a copy of all your data from Amazon S3 in another vendor lagging behind for 24 hours that is NOT real time mirrored. Maybe you lose all the customer data generated that day, but your business survives by restoring from backup. (I chose 24 hours arbitrarily, each business needs to choose their upper limit of loss where they can survive.)
A good rule of thumb for a CONSUMER is three copies of your data: 1) primary, 2) onsite backup, and 3) offsite backup. If you are a business that will lose millions of dollars if a programmer makes a mistake or an IT guy is disgruntled, add 4) another offsite backup with a totally different vendor that doesn't share a single line of code with 1-3 and has separate passwords.
I'm surprised at the implication here, that you'd use Glacier on a non-versioned bucket. Making destructive updates impossible doesn't cost much extra in archive fees.
My point stands: if you don't mind losing your data, store it in one vendor. But if you would REALLY lose your business and put 10 people out of work if the data is lost, storing it in Amazon (or Backblaze) without a second copy backed up somewhere else and a third copy backed up in yet a third location (with a totally different vendor with a totally different payment system) is irresponsible.
What you actually need is a provider that will guarantee the durability of your data even if they (temporarily) cut off your access to it for lack of payment. Anything else is just a level of indirection that suffers the same problems.
 I don't actually know if anyone does this, let alone AWS. Here's a quote from Tarsnap's FAQ—where you'd think cperciva is someone who would have considered the "I had no idea my infrastructure was relying on this service until it shut off" use-case:
> You will be sent an email when your account balance falls below 7 days worth of storage costs warning you that you should probably add more money to your account soon. If your account balance falls below zero, you will lose access to Tarsnap, an email will be sent to inform you of this, and a 7 day countdown will start; if your account balance is still below zero after 7 days, it may be deleted (along with any data you have stored) at our discretion. (If you can't add money yet but will be able to later, contact us and explain the situation. We're reasonable people and simply knowing that you're alive and haven't forgotten that you were using Tarsnap is very helpful.)
7 days is probably reasonable in the case where there's an active IT staff who will notice when, say, servers stop backing up. But if nobody's watching for that...
Do you have a solution in mind for the case of a company where email is going to /dev/null and nobody is reading the output of their cron jobs?
I mean, if I can't contact someone, it doesn't really matter if I wait a week or a month...
• Ask people to provide optional contact information for an "executor of their estate"—a person who can make decisions about what happens to their data on their behalf if they cannot be reached.
• Ask people for a secondary credit card that can be charged as a backup: specifically, suggest that this be the personal card of Someone Important in the company, who will be likely to notice the charge and flip out.
• Ask for a flat-fee deposit to enable a secondary "long-term storage, no uploads, no monthly billing" mode of usage. Make it enough money to be motivating when you imagine just schlepping this hunk of data around for the rest of your life. If the user has paid this deposit, and their regular card gets declined, switch them to this mode and consume the deposit. If they close their account, refund the deposit if it hasn't been consumed.
And so forth.
Here are my thoughts on our announcement today: https://www.backblaze.com/blog/b2-cloud-storage-provider/
B2 finally creates an option for Linux users to use BackBlaze for back-up (at minimum) at work and at home. I look forward to that.
CAVEAT (PLEASE READ): this does NOT encrypt data yet!! This is just a quick technology demonstration, it isn't a polished backup client. Give us another month for that...
We expect Linux servers (and desktops) to make up a significant percentage of the things communicating to B2, so you can expect a lot more support than you have been getting from the traditional Backblaze Online Backup product line.
This is a very low friction way to have accessible data and have it encrypted by a fairly popular mechanism.
Back stuff up, but properly encrypted instead of your Windows client's closed source stuff. Without B2 I would use S3 but at their rate I might as well rent a datacenter myself, so I'm going to do the math again with B2 soon.
This announcement may mean I finally get to test your stuff! (I've been frustrated with the quality and feature creep in open source syncing solutions and procrastinating building my own, bare bones alternative).
Is this only for noncritical, reproducible data as S3 reduced redundancy?
Also, we are the only company we know of that releases our drive failure rates. We release them quarterly, here is the most recent failure analysis:
And for the record Backblaze only has one datacenter. This bothers some customers deeply so if it is a show stopper definitely don't use Backblaze B2. We just want to be transparent about what we do and what we don't do. One idea is you could use B2 as a primary copy of the data, and make another copy into Amazon Glacier in case the Backblaze datacenter is hit by a meteor (or a terrorist attack or an airplane crashes into it).
Oh, I said this elsewhere but I have a lot of confidence we won't lose your data, we've been perfecting that for 8 years. What Backblaze DOESN'T have much experience in is serving up viral videos and the CDN (Content Delivery) layer. I'm looking forward to that layer, I think it will be fun to polish, but especially over the next few months of invite only beta anybody using B2 needs to be able to work with us to get the kinks worked out.
Arq seems really good at supporting a broad variety of cloud providers though, so hopefully they'll add this too. I'm hesitant to use cloud backups generally; I've never seen an audit of how secure Arq's backup scheme is, for example (though it seems pretty simple - https://www.arqbackup.com/s3_data_format.txt). I've used CrashPlan a lot and basically take it on faith that it's secure. It's probably good enough for my use, given that I'm not storing state secrets or anything, but it's still a little unsettling to 'lose control' of one's data.
From Backblaze's point of view, I guess this is either smart (diversifying themselves–people can use other backup software if they like, and Backblaze still profits) or less smart (turning themselves into a commodity), but it seems like their software is still first rate, so I guess it'll work for them.
I'll be taking a look at this of course, but there are things which are more important than price -- for example, reliability. Tarsnap users trust me to not lose their data, and I trust S3 to not lose their data. That's a trust I don't have in B2 yet -- first, simply because B2 hasn't been around for long enough to prove itself, and second based on what I've heard from former Backblaze users.
They use Blowfish. Says it all really - their default encryption is a long-obsolete 64-bit block cipher you might have picked in 1999 because it was faster than 3DES.
I can only assume they do this because migrating would cost them money, and being able to advertise "448 bit encryption" actually sounds like a plus to most people and not the glaring red flag it actually is.
> it seems like their software is still first rate
What, like their backup client that can't actually do restores? It's still all "log in to our website and let us decrypt your data for you" :/
Not defending it, because I know it's old and there are weaknesses, but aren't Blowfish and 3DES both still technically secure? This is a genuine question. It was my understanding that if implemented correctly, with a random key etc., that neither has been formally broken. 3DES is 2^112 no? which is still not practically accessible by brute force. Not that this means anyone should use them, of course, AES is a standard for a reason...
As you say, I had just assumed the migration cost was too high to move to something newer, but I don't think it necessarily means data stored there is unsafe?
Calls into question their competence, their honesty and their architecture all at once.
Blowfish supports key-lengths up to 448-bits. And I've never heard of a single criticism of the function. Its just kinda... less used than Rijndael because it didn't "officially" win the contest. But otherwise, it is a fine function.
EDIT: Confused Twofish with Blowfish in the AES finalists.
It also calls into question the nature of all the other crypto they're using - is that all >20 years old too? Still tuned for a world of 486's and 68040's?
I also created a PR to add support for Exoscale.
I like this offering, but I'm not getting good signals on it's seriousness. It may be something they're going to sunset soon. I would need some reassurance as to what's going on here.
And these are interestingly exactly the same reasons enterprises buy IBM and Oracle.
This is actually a big pet peeve of mine :)
The trust is in quotes because I'm not sure what to think about it. Every month I see a post here about some AWS service outage but it looks like nobody is getting nervous because of this. People just wait until it is fixed. On the other hand, I have experienced that people begin to trust companies because the company advertises on TV. But AWS has earned its trust legitimately I think.
The whole concept of "confidence/trust in companies" is so important but I know so little about it.
The reason you hear about so many AWS outages is because it's a massive service with so many users. If you build appropriately, you can have extremely good uptime built on AWS. They've earned tons of trust from their users.
First, they weren't in North America until recently. Having a server in France means high ping times for me and latency for the vast majority of my visitors. OVH started operations in Québec in 2013. So they've had less than three years to establish themselves. EC2 is 9 years old.
Second, it's hard to figure out what to buy. With EC2, they're all Xen instances and you decide on the right CPU/RAM configuration. DigitalOcean, Linode, Vultr, etc. all are easy. With OVH, what am I supposed to buy? Do I want a dedicated server or an infrastructure dedicated server? And then if I click for dedicated, I need to choose from Hosting, Enterprise, Infrastructure, Storage, Custom, or Game. I know computers - tell me the processor, RAM, and storage without breaking it into categories. So, I go with Hosting and half of the options are for "Delivery from September 30". Ok, that's more than a week out. Maybe I want more flexibility like hourly billing on VPSs. I can go to Cloud -> VPS. And now I can choose SSD or Cloud with different prices. Why is the SSD so much cheaper? $3.50 vs $9 and they're both 1 core, 2GB of RAM, 100Mbps network link KVM boxes. Then I wonder if these are the same things as the RunAbove labs vs regular. The labs ones shared the processor cores, but this seems to indicate that both don't have the noisy neighbor problem. So I check RunAbove. Wow, everything has changed. Looks like they don't offer the SSD of Ceph instances anymore, but they have SATA backed instances. So, they're running all sorts of different combinations. And should I be looking into Kimsufi or SYS brands? Do they still exist? What if I want object storage. Ok, the US site takes me to RunAbove which tells me that it's now part of OVH proper which brings me to their UK site with apparently no way of loading it on the American site. Compare that to DigitalOcean where you just get a very simple, "here are the plans, there's no complex stuff with weird names or categories, buy what you need." Even Vultr manages simple with SSD VPS, SATA VPS, and Dedicated Cloud. Perfect. Most likely I want the SSD VPS, but maybe I need more storage or maybe I want metal servers sold to me like cloud servers. Easy.
And to be fair, OVH used to be a lot more complicated and a lot worse. It looks like they're streamlining a ton. But they should still simplify a lot more.
Third, OVH is terrible at marketing. I want to define what I mean by marketing. DigitalOcean is a king of marketing. You go to their site and you see brief comments from the creator of jQuery, the creator of RailsCasts, the creator of Redis, and a Rails core member. You might not use those technologies or even like them, but you recognise that DigitalOcean can't be total crap given that these are people with options and a reasonable amount of taste. DigitalOcean sponsors hackathons like woah. Giving students a dozen or so dollars in credit makes them well-known and an easy service to try. DigitalOcean's site inspires confidence in its simplicity. You don't feel like there's some hidden thing because it's just simple plans that increase rather linearly. Finally, try searching for VPS + some tech term. "VPS Ansible" has a DigitalOcean blog article as #3. "VPS elasticsearch" has DO with the top two spots. The point is that you see that and it's an indication that they're part of the community (supporting some free content) and kinda get it.
OVH, on the other hand, inspires none of those good feelings. OVH has a generic site that you can't tell apart from other generic sites. It has the kind of "throw everything at the user and see what sticks" design that I don't think users want. We want DigitalOcean to say "this! this is good!". OVH is like, we have a lot of different things and someone has written "enterprise" or "cloud" on some of them without really indicating how some options are more "enterprise" or "cloud". And there are stock images of network switches and RAM and such like a pizza place that has a stock picture of a pizza on their take-away menu that isn't their pizza. Do they get it?
I really wish OVH well. More providers means downward pressure on pricing which is good for me. I mean, 2GB of RAM VPS for $3.50? Awesome! Glad to see that graduate from RunAbove. But OVH still has a ways to go. Lots of the time you have to wait for servers. If I want a dedicated SSD box, they're quoting a 10 day wait for all except one model. The entire "hosting" range has quotes of 3-12+ days. "Enterprise" has one box for 120 second provision, two that are 3 days out, and two that are 10 days out. It seems like OVH is a place to get a good deal if you're willing to deal with complicated process, waiting for a box, and them switching things up on you. But maybe OVH is stabalizing. I'm hoping their VPS offering will be a lot more stable than it has been. Seems like they're cutting down on using alternative brands like SYS and Kimsufi.
I can see OVH being a good company, but it's no surprise to me that they aren't as well known as AWS.
Runabove, OVH, VPS all those. The worst part is they keep posting about their streamlining and deep thought into reorganization. And yet no body understand "their" why.
To many it seems more like a reorganization for the sake of reorganization.
Then there is their Network. Which could range from Very good to VERY VERY bad.
And the final final thing? Is their lack of support or communication. You fire a email or ticket at least other host gives you a reply. OVH? None.
And when you add their non active support, unable to talk to sales or people for inquiry, and their overall complexity it is not hard to see why they dont pick up as much customer as they should have.
Their current "10 days" delivery time is quite unfortunate, but I believe that's explained from the fact that they just started upgrading all their machines to DDR4 (and unfortunately, as I've been hit with it, a price increase for dedicated servers still on DDR3).
In the past I have used their 120 seconds delivery time extensively, but it has a few problems: 1. You have to verify your account first. 2. That's only guaranteed for one single server, try ordering 20+ of their top of line servers and it'll take a few days to get all of them.
Their control panel is also very confusing and feels very sluggish, and I'm talking about the one they released quite recently.
They also have tons of country specific domains (perhaps for tax reasons?).
They're pretty great on hardware, but I've experienced a bit of downtime with them, but as long as you have enough redundancy, you should be fine. I'm hosting game servers in there, with an automatic fallback to some cloud providers if those go down, so I'm not too worried about that. Still, seeing some servers randomly lose connectivity like this: http://i.imgur.com/9uMOHnH.png doesn't inspire confidence.
We are already looking for another datacenter, but mostly because we're running out of space in the current one due to our traditional business (online backup) doing so well.
So! If you can tolerate the loss of a datacenter, store in Blackblaze. If you need geo-redundancy until Backblaze can offer it? Store in us-east-1 (which is geo-redundant between Virginia and Oregon).
All AWS AZs are physically separated facilities with redundancy on all their infrastructure, although they're obviously in the same general area.
us-east-1 is not geo-redundant. It is entirely on the east-coast, as the name suggests. Although S3 does have geo-redundancy in all regions.
You may have been thinking of "US Standard", but it is the same as "us-east-1".
Quote from Jeff Barr @ AWS: http://shlomoswidler.com/2009/12/read-after-write-consistenc....
All I could find on the S3 FAQ says "your objects are redundantly stored on multiple devices across multiple facilities." which seems to contradict the "one datacenter" claim.
Also, do you have a source that us-east-1 is geo-redundant between Virginia and Oregon? That was not my understanding of how it worked.
"You specify a region when you create your Amazon S3 bucket. Within that region, your objects are redundantly stored on multiple devices across multiple facilities. Please refer to Regional Products and Services for details of Amazon S3 service availability by region"
Note, "within that region". Separate AZs, same geographic location.
"CRR is an Amazon S3 feature that automatically replicates data across AWS regions. With CRR, every object uploaded to an S3 bucket is automatically replicated to a destination bucket in a different AWS region that you choose. You can use CRR to provide lower-latency data access in different geographic regions. CRR can also help if you have a compliance requirement to store copies of data hundreds of miles apart."
This post http://shlomoswidler.com/2009/12/read-after-write-consistenc... has a quote from Jeff Barr at AWS indicating that us-east-1 is bicoastal, which is also why its eventually consistent, instead of immediately after a write (EDIT: it appears this constraint no longer applies to the US standard region).
I asked for sources about your "one datacenter" claim. Just because several facilities are in the same geographic region does not mean they are the same datacenter.
Just because something is bicoastal does not mean your data is replicated on both coasts. It could also mean that your data is stored on either the west or the east coast.
I would have trouble believing they store twice the data as their other regions but charge the same (actually a bit less!).
"To solve latency, Amazon built Availability Zones on groups of tightly coupled data centres. Each data centre in a Zone is less than 25 microseconds away from its sibling and packs 102Tbps of networking."
25 microseconds at the speed of light (best case, through a vacuum; through fiber is significantly slower) is ~4.7 miles, and based on the quote, that is the furthest they are apart. If your buildings are within 1-2 miles of each other, they're essentially the same facility.
That is not geographically redundant.
Or store it in Amazon AND store another copy in Backblaze. This isn't necessarily an "either/or" question. Having two copies with two different vendors in two separate regions is probably more reliable than having two copies inside the same vendor. For example, if Amazon has a large outage that affects both your regions, you can still access the copy in Backblaze.
Buyer beware when it comes to Backblaze.
The technology isn't bad, but their customer service is some of the worst I've ever seen. I was a Backblaze customer for three years and not once did I have what I'd consider a positive experience. If anything goes wrong they leave you hanging. They're not a company I'd ever trust with valuable data again.
We do plan to add file offset access and larger file support very soon, so you would be able to append a 1 MByte chunk to an existing file in Backblaze with a SHA-1 of only the 1 MByte chunk. That should allow you to stream?
All great feedback, by the way. We really want to hear about these shortcomings in our API right away.
Being able to append to a file in 1 MByte chunks (or larger) would be perfect - that is exactly the way Amazon S3 multipart uploads and google drive multipart uploads work.
Yeah, that would be sufficient.
Eventually it could make sense for Backblaze to partner with someone like DigitalOcean or Linode and offer low cost bulk storage and low cost virtualization colocated in the same datacenter: these services seem to be a perfect complement for each other.
What I'd really like is a deal with Amazon where we put a "virtual cross connect" from the Backblaze datacenter into Amazon's EC2 so you could use EC2 instances on B2 data without incurring a download charge (or not exposing that charge to our customers). But I don't know if Amazon is open to that kind of thing.
If the alternative cloud ecosystem wants to compete effectively against AWS, it desperately needs a more sophisticated authorization scheme. Don't forget that IAM/STS is a major enabling factor in applications integration of EC2 and S3.
A real deal breaker is if you need to use an EC2 server to proxy the upload for any reason (content validation). The transfer into EC2 is free, but it's 9 cents for each GB out (18 months of storage cost).
@brianwski - any suggestions here?
To elaborate: I think these two would be able to become a viable competitor to AWS. If you think about it AWS launched with S3 and then EC2.
They could differentiate themselves by staying as a pure IaaS play. Then companies like Dropbox would not be afraid of DigitalBlazeOcean moving up the stack and competing as AWS has done in several instances (e.g. WorkDocs).
I would like to know more of the implementation of this and more information on policies to protect access to my data. And I would like to know where the data is stored. I suppose I got read the manual, but maybe some info tidbits could be included in the announcement.
Also, we are the only company we know of that releases our drive failure rates. We release them quarterly, here is the most recent failure analysis: https://www.backblaze.com/blog/hard-drive-reliability-stats-...
:-) We definitely plan to add an API to append to an existing file. The current largest file size is 5 GBytes, and we want to support much larger (imagine a 1 TByte encrypted disk image). That will be by appending chunks to files followed by a "commit" declaring the file as complete.
I think the reason most of us cloud providers don't like replacing parts of files is it helps our caching layer be much simpler, and it would change the SHA-1 checksum on the file which just means "more complexity". But it isn't out of the question, it might just come with a "cost" (like you can replace the span but it might take a while and then we provide you the final checksum of the whole file in the response).
If there's attention to detail with one thing, odds are you'll find it in other places, too.
Only on https://www.backblaze.com/b2/why-b2.html it is that I can cite the following: "the B2 Cloud Storage service has layers of redundancy to ensure data is durable and available". What that exactly is or what it translates to is nowhere to be found. If you want corporations or developers to use your storage services for their precious data, I'd be a bit more specific.
E.g. on B2 if you wanted to retrieve data to do your own scrub/validation it would cost you the equivalent of 10 months of storage just to do one retrieval: $0.005/GB/month to store, $0.05/GB to download.
Google Cloud Storage Nearline has the same problem: $0.01/GB to store, $0.12/GB for egress. But at least in this case you can egress for free to Compute Engine, so you would only need to pay $0.01/GB for retrieval.
So it's not possible (at reasonable cost) to do your own validation of what's stored in B2. In Google's case, as long as you're willing to use their cloud computers, validating your data once a month doubles your cost.
In conclusion, you're trusting the vendors to handle failures, it's very expensive to check your data yourself.
Note TDs are air-filled 7200rpm regular drives.
Are those 8TB and 10TB the helium ones with HAMR? Very slow.
My understanding of HAMR is that it is probably perfectly fine for Backblaze's backup products, which are (more or less) write-once, read-rarely. Shingled magnetic recording should also be OK for that use case.
But clearly not good for cloud storage.
Backblaze should expand their service to a cheaper cloud storage service similar to Amazon S3. They already have the infrastructure and the know-how.
And voilá ... here it is.
Disclaimer: I worked on the predecessor to AltaVault at Riverbed
I hope they add B2 support at some point.
Just curious as to why I would migrate from S3 for FE assets to Backblaze.
What I'd really like in the short term is to do a deal with Amazon where we put a "virtual cross connect" from the Backblaze datacenter into Amazon's EC2 so you could use EC2 instances on B2 data without incurring a download charge (or not exposing that charge to our customers). But I don't know if Amazon is open to that kind of thing.
Can anyone compare that to other similar providers? While the storage is cheap, it seems more useful for cold storage.
So yeah, I'd agree with you. But for anyone prepared to use S3 for anything but cold storage, this is still a lot cheaper.
My suggestion would be to use this for cold storage + big cache boxes at a provider with low bandwidth charges. Especially if your "hot" objects make up a relatively small percentage.
Backblaze is the same cost or cheaper as the cheapest tier in several other popular services like Amazon S3 and Microsoft Azure. We don't know of anybody with a lower cost of downloads.
You can open yourself up to a large number of customers by making it easy to get started via PowerShell.
I saw the comment about getting drives shipped to you, which is pretty neat, but what about the other way? I have about 50 TB of data we'd like to store, but only 5mbps upstream. Can we ship drives to you?
I ask because I tried Backblaze a while back, and uploads from the UK were very slow.
> I ask because I tried Backblaze a while back, and uploads from the UK were very slow.
Curiously from here in Japan, I've managed to clock 80 MBit/s backing up to Backblaze. I presume it all has to do with what kind of international peering your ISP has.
What's the setup for file permissions? Can I have multiple people writing to the same bucket? Can I restrict deletion & write rules?
We're actively looking for feedback in this area, so as developers ask us for something like Amazon's IAM (AWS Identity and Access Management) we'll be filling that functionality out. Hopefully without adding too much complexity to the simple model we have now.
Personally I'd like to use some access management, and there's one case that I've not seen solved particularly well (though would appreciate anyone chiming in with things I've missed):
Distinct write and create permissions.
I'd like to be able to grant someone permission to create files but not allow them to modify or delete them later. I end up generally adding this externally.
I think B2 is really close to this, as you've got the file ids for multiple versions, so I can effectively ignore the filenames and use the file ids instead. It'd need a difference between "upload new version" and "delete version" though.
I don’t know whether or not Arq will be integrated with B2.