Hacker News new | comments | show | ask | jobs | submit login
The Cloud’s Software: A Look Inside Backblaze (backblaze.com)
82 points by ingve 44 days ago | hide | past | web | 35 comments | favorite

The fact that the entire infrastructure of backblaze exists in a single building is a little worrying.

Are there any plans to expand to other datacenters? What happens if there is a disaster there? Is everything gone? Does backblaze backup it's own data?

Yev from Backblaze here -> Yes we do plan on expanding to more datacenters, and we do have emergency plans in place, though we do choose our datacenters carefully to make sure that we avoid any natural-disaster prone areas. As for backing up our own data - we certainly do make backups of our core info/necessary data. As for the user data that we store, that's backed up across the storage pods in a vault as discussed in that post. We do not replicate customer data across multiple datacenters. At our price-point, that's just not feasible.

Is there thought being given to expose such an option to the end user at additional cost?

Say if I wanted to make sure my data was stored in 2 regions, and was OK with a slight increase in cost to accomplish that.

Sure, we consider it every now and again but for now we like "clean" pricing - meaning that it's the same across the board. Of course that might change in the future, but we try to strive for simplicity, so having different "tiers" goes somewhat counter to what we've done thus far.

Long time BackBlaze user & fan here. Currently backing up three computers. Love your service.

However, for my main dev computer, I would totally pay another $5USD to replicate it to another location. I don't think that needs to complicate the pricing that much. You're essentially just paying for another backup, except it's the same computer in a different location.

Another question, any plans for coming to Linux?

Hey there! Thanks for using us :D Icefo is correct, we don't currently have plans for Linux support - but we do have Backblaze B2. While not unlimited, it gives a lot more flexibility, and may end up being less expensive, depending on how much data you're storing -> backblaze.com/b2

As far as I know they haven't released an official client for Linux but started a new service: B2 cloud storage. There is an API and lots of third party clients. The only downside is that there is no 5$ unlimited plan

I think at that point you could look into rsync.net

I don't see that as a problem. If you take a backup CD home from the office, it's also stored in a single location. The chance of your house burning down at the same time your office computer gets hacked is low enough to not worry about.

Even if Backblaze replicated your backup in different locations, that only protects against their data center being destroyed. You will still lose your data if the app silently stopped working or your account gets deleted across all data centers, etc.

If you really want multiple locations, then I think you should use a 2nd backup service entirely to remove that single point of failure (Backblaze itself).

This works for backups, but this makes me more concerned about their B2 offering. I feel like B2 data should at least have the option of being replicated to another datacenter.

> What happens if there is a disaster there?

While unlikely there are other threats. You could have a disgruntled or mentally unbalanced employee. All eggs in one basket (whatever the basket and how robust it is) is always a potential problem.

To me the way to deal with this threat would be to spread backups over two vendors even if backblaze decides to open other data centers as Yev has indicated in his comment.

> You could have a disgruntled or mentally unbalanced employee.

The recent S3 outage was caused by a typo. You can plan around natural disasters and all sorts of things, but the vast majority of the time, data loss is caused by humans. Diversify where that data lives. We would never recommend that you should ONLY keep your backup in Backblaze. We should be part of your solution.

I think that if you do open other data centers you should charge extra for that in your pricing, optionally. While there could be negative marketing implications depending on the exact cost or how it's worded, I don't think it's an unreasonable approach. Since not everyone needs the diversity and it would make more sense (because it assumes your costs will go up) to not make all customers pay for something that they don't need because they have other plan b's. You might already be the plan c for some of your customers.

> We would never recommend that you should ONLY keep your backup in Backblaze

I am guessing (I haven't thoroughly checked so I could be wrong) that your marketing materials don't make this fact known in an obvious way. No issue with that but I think many non business users might not think of it this way.

We do try to emphasize a 3-2-1 backup strategy through newsletters and blogs posts. A few examples: https://www.backblaze.com/blog/the-3-2-1-backup-strategy/ and more recently https://www.backblaze.com/blog/backing-up-for-small-business.... We're not trying to hide anything, and truthfully, plenty of folks do use us as their primary or only backup, and it works just fine. Most business' already have a multi-part backup strategy in place.

As for the charging for geo-redundancy, it's something we've considered, but at least for now we like "clean" pricing where it's the same across the board. Might change in the future, but nothing to report now.

That was a service outage which is a far, far cry from a data loss.

True - my point was more that accidents happen and often are caused by people. In this case there may not have been data loss, but monetary impact of sites going down was probably not insignificant.


I think you are confusing S3 and GitLab right now.

Yeah, you're right. I replied to this right after reading other comments about Gitlab. Sorry, thanks.

I must say, I started using Blackblaze's S3 competitor, B2, just this week and I'm very impressed. The API is easy to understand, but robust enough to handle hard edge cases, and the interface is quite easy to understand and well designed. The sample code provided works without a long chain of dependencies and it's overall felt like it's working with me rather than having to fight it to do what I want.

Too early to speak to the uptime or support, but I'm pretty optimistic given what I've seen thus far.

Yev from Backblaze here -> Great to hear, glad you're enjoying it! Hopefully you won't need to use support too much :D

i love B2 as well. about 25% of the cost of S3! i use it for weekly MySQL server cron backups

I love that backblaze has added the ability to back up a server or NAS for $5 per TB per month. I just convinced my friend at a media company to replicate their 30TB backup server offsite to Backblaze and the cost is only $150 per month. That's just crazy cheap compared to colo-ing your own offsite server and drives.

Why not Amazon drive?

BackBlaze, I really love their yearly Hard Drive Failure Reports.


Each time a very thorough post with good set of data to generate statistics from.

Yev from Backblaze here -> Glad you like it! I'll keep giving the writers compliments, they love those things :D

This looks like it would be very interesting to use this as a backup for all of my machines. Does backblaze have an Rsync compatible endpoint?

I use Linux on all of my machines, so I couldn't use the personal backup setup that they have.

Yev from Backblaze here -> We do have some B2 integrators that have RSYNC capabilities. Take a look at B2 Fuse -> https://github.com/sondree/b2_fuse

While I'd obviously listen to Yev first over me, I was also looking into my options for B2 from Linux today and found rclone that looks nice:


> While I'd obviously listen to Yev first over me

LOL, that is NOT the prevailing wisdom around the office :P We do have an integrations page, I just pulled the first one I could think of. More info here -> https://www.backblaze.com/b2/integrations.html

Excellent read, thank you for sharing!

Just curious, was there any consideration given to existing storage solutions like Ceph when you were evaluating writing your own erasure encoding system?

It seems like it may have been a potential good fit at one point in time, and might have made it easier to expose an S3-like API if you wanted to down the road.

> S3-like API if you wanted to down the road.

We already have -> https://www.backblaze.com/b2/cloud-storage.html! Vaults was one of the reasons why we could build out ourB2 Cloud Storage service. We considered a lot of of options, but rolling our own fit with our culture :)

I think by "S3-like API", OP likely meant an actual S3-compatible API such as that provided by other competitors (off the top of my head, Google Cloud Storage).

Ah - that makes sense. Our devs are always looking at ways to make us more viable for folks - so they are aware :D

How do you handle the JVM pauses due to GC and jit kicking in? Or are you just able to accept some milli sec delay every so often ?

I work on very-low-latency products (sub-sub-msec) and we have only been able to mitigate this, never fully solve it.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact