
Backblaze B2 Cloud Storage Now Has S3 Compatible APIs - pyprism
https://www.backblaze.com/blog/backblaze-b2-s3-compatible-api/
======
ksec
Backblaze is also the founding member of Bandwidth Alliance, meaning getting
those B2 via Cloudflare is essentially free.

So you are only paying for Storage. ( Correct me if I am on this one )

I wonder why doesn't _ALL_ non HyperScale Cloud Vendors, like Linode and DO
provide one click third party backup to B2. You should always store offsite
backup somewhere. And B2 is perfect.

~~~
Hamuko
How does that work? Can I just dumb two terabytes of video into Backblaze B2,
setup a Cloudflare account and have people watch those videos with it costing
me only $10 a month? Because that doesn't sound right.

~~~
heipei
The Cloudflare ToS explicitly exclude that use-case.

 _2.8 Limitation on Serving Non-HTML Content The Service is offered primarily
as a platform to cache and serve web pages and websites. Unless explicitly
included as a part of a Paid Service purchased by you, you agree to use the
Service solely for the purpose of serving web pages as viewed through a web
browser or other functionally equivalent applications and rendering Hypertext
Markup Language (HTML) or other functional equivalents. Use of the Service for
serving video (unless purchased separately as a Paid Service) or a
disproportionate percentage of pictures, audio files, or other non-HTML
content, is prohibited._

~~~
mikehearn
In a previous HN post [1] where someone wrote up how to use B2 as an image
host, the Cloudflare CEO chimed in and addressed rule 2.8 specifically, saying
if Cloudflare workers are used for URL prettifying and redirecting, a
different ToS is applied and that use-case would be fine.

Does that mean your video use-case would also be fine? I have no idea. An HN
comment from the CEO doesn't seem like it would hold up if Cloudflare suddenly
shut down your free account.

I'd love for Cloudfront to officially clarify the limits of the Cloudfront/B2
alliance in terms of external traffic. The confounding issue here is that B2,
as a storage service, is not really intended for "serving web pages and
websites" — it's for larger files, binaries, etc. — and therefore any traffic
from B2 going through Cloudfront is sort of de facto in violation of 2.8.

[1]:
[https://news.ycombinator.com/item?id=20790857](https://news.ycombinator.com/item?id=20790857)

~~~
brianwski
Disclaimer: I work at Backblaze so I'm biased. :-)

> B2, as a storage service, is not really intended for "serving web pages and
> websites" — it's for larger files, binaries, etc

It might be missing a couple features (which is a pet peeve of mine) but we
SURELY intend for it to be used for serving web pages. That's one of the
largest differences between "Backblaze Personal Backup" (our original product
line) and Backblaze B2. The largest parts of the redesign/refit when we
originally did B2 was around the concept of what we call "Friendly URLs" (web
page names, folder names) instead of just ugly 82 character hexadecimal file
names like Backblaze Personal Backup stores all your files in.

For full disclosure, Backblaze B2 isn't a great "hosting" solution for
something like WordPress because we lack two or three things, one of which is
comically easy to fix and I keep trying to convince everyone to do it. The
issue is URLs that end in a "/" (trailing slash) basically need to "guess"
that after that is an ".html" or ".php" or whatever. So the URL:
[https://f001.backblazeb2.com/file/ski-
epic-c/full/2015_scotl...](https://f001.backblazeb2.com/file/ski-
epic-c/full/2015_scotland_will_macdonald_birthday_in_duns_castle/) does not
work, but the URL: [https://f001.backblazeb2.com/file/ski-
epic-c/full/2015_scotl...](https://f001.backblazeb2.com/file/ski-
epic-c/full/2015_scotland_will_macdonald_birthday_in_duns_castle/index.html)
does work. All modern web servers do this automatically filling in of the
"index.html", but it is missing from Backblaze B2 currently. And it would take
just a day or two for one of our developers to fix it. And dang it, I'm going
to get it done one of these days.

~~~
john-shaffer
S3's use of separate servers for website hosting is actually very sensible.
Options like usage of index.html and error.html only apply on the website
servers and won't cause any surprises for people using the service as a key-
object store.

That said, I would absolutely not consider using B2 without support for
index.html, error.html, and Website-Redirect-Location.

~~~
erichocean
You can trivially use Cloudflare Workers to implement that functionality on
top of B2.

------
waffle_ss
Just a year ago B2 couldn't do server-side file copying.[1] If you wanted to
rename or move a file you had to re-upload the whole thing (not great for
large multi-gigabyte files)! That ruled them out of consideration for storing
my personal backups.

Glad to see they've since fixed that, and with this update are clearly
continuing to improve ergonomics. I'll have to give B2 a fresh look.

[1]:
[https://github.com/Backblaze/B2_Command_Line_Tool/issues/175](https://github.com/Backblaze/B2_Command_Line_Tool/issues/175)

~~~
atYevP
Yev here -> thanks! We're constantly working on making the platform better,
and copy-file was definitely a widely requested feature! That plus S3
Compatibility, for folks who wanted to integrated with B2 Cloud Storage but
didn't have the resources to write code to our B2 Native API.

~~~
rarrrrrr
I've been using B2 to disrupt a bunch of ugly & entrenched vendors in the
price sensitive K12 market. Thanks for building it. :)

~~~
atYevP
Ah that's awesome! I'd love to more know! How are you using B2 in general, and
does this make it easier for you? If you want to leave a note here, or you can
send it to: b2feedback@backblaze.com!

~~~
ajinkyapatil
are there any plans to host public datasets like aws pds ?

------
nielsole
Here is the reasoning why they didn't have "s3 compatibility" before:
[https://www.backblaze.com/blog/design-thinking-b2-apis-
the-h...](https://www.backblaze.com/blog/design-thinking-b2-apis-the-hidden-
costs-of-s3-compatibility/)

~~~
tambre
>It requires Amazon to have a massive and expensive choke point in their
network: load balancers. When a customer tries to upload to S3, she is given a
single upload URL to use. For instance, s3.amazonaws.com/<bucketname>.

Now that Amazon has deprecated the single URL version and replaced it with
region-specific URLs (e.g. s3.dualstack.us-east-1.amazonaws.com) and tooling
has been mostly updated, this huge reason for not supporting the S3 API is
gone.

~~~
jjeaff
Even though they are using region specific URLs, they would still have to load
balance all of that traffic. Backblaze avoided this by having a 2 part request
for a file. You would make a request to a centralized url and then that would
return a url that connected you directly to a server that had that file.

~~~
yjftsjthsd-h
> You would make a request to a centralized url and then that would return a
> url that connected you directly to a server that had that file.

Heh, kinda like how FTP worked. That's funny to see again.

------
christophilus
Swank. One of the reasons I'm not using Backblaze is because I couldn't find a
way to generate a private url which allowed secure upload from the browser. It
only allowed (so far as I can tell) a private url that had access to an entire
bucket. If they've got an S3 compatibility layer now, this problem is solved.
I'm gonna invest some time on this tomorrow.

------
jedberg
This is huge because it means you can use things like S3 Fuse to mount your
storage. Which means you can use it to extend your local disk, or run your own
backups, or whatever.

Amusingly the price to store 1.2TB of data is the same as the cost of their
backup plan, so if your disk is smaller than that, you could save a few bucks
running your own backups. Until you have to restore (from what I can tell
restores are free on their backup plans but would cost money on the S3 plan).

~~~
DavideNL
> Which means you can use it to .... or run your own backups

You could, but if i read correctly (s3fs-fuse limitations): _" random writes
or appends to files require rewriting the entire file"._

So changing 1 bit of a 10GB file, means re-uploading 10GB.

[https://github.com/s3fs-fuse/s3fs-fuse#limitations](https://github.com/s3fs-
fuse/s3fs-fuse#limitations)

~~~
gaul
This changed in 1.86 and I updated the README as follows:

> random writes or appends to files require rewriting the entire object,
> optimized with multi-part upload copy

Now changing one bit means re-uploading 5 MB, the minimum S3 part size.

~~~
tgtweak
Only if the blackblaze implementation supports put byte range... Not supported
by default.

------
bithavoc
I migrated a client from Cloudinary($1k+ /mo) to B2, a Go+ImageMagick program
running in DigitalOcean and Cloudflare CDN for a total of $60 /mo. It’s been
running for two years now, B2 has been incredibly reliable.

~~~
atYevP
Yev here -> That's awesome to hear! Glad we can make things more affordable
for you and that it's working great!

------
hemancuso
As a developer that supports B2 (I write ExpanDrive) I think it’s great that
they are moving on from an API that doesn’t expose any extra value.

That being said, I wish B2 performance was better. Throughput is dramatically
slower than S3.

~~~
eyegor
What region are you moving from/to? Last I checked, b2 only exists in
datacenters on the US west coast.

~~~
budmang
(backblaze ceo here) We also have a region in Europe:
[https://www.backblaze.com/blog/announcing-our-first-
european...](https://www.backblaze.com/blog/announcing-our-first-european-
data-center/)

~~~
karambir
Please consider an Asia/Pacific data center. I am from India and my company
was not able to use B2 due to high response times even from European DC. Even
a DC in Singapore will be helpful for us.

\- Thankful Personal Backup Customer

~~~
tgtweak
Bandwidth in asiapac is very expensive for non-incumbents and India is no
exception.

I find it bizarre how in India you can get 100GB of LTE for a few dollars but
cdn bandwidth can cost content providers more than that - which is absurd.

~~~
karambir
Mobile broadband is witnessing intense competition in grabbing customers as
millions of rural Indians are coming online. This started with a Petro-
chemical billionaire starting Jio Network and giving free unlimited 4G data
for a year(his company has 300million subscribers now).

Already 4 networks have exited the market and 3rd & 4th largest
networks(Vodafone and Idea) have combined due to cash crunch. Airtel(earlier
largest) has been raising outside money in hopes that it can survive the low
prices. So there are only 4 networks remaining. Only recently they started
increasing prices.

That billionaire is also going into Fibre(purchased his bankrupt bother
company's infrastructure), maybe we'll see that competition extend to DC and
interconnects.

------
ing33k
Used B2 heavily until recently as a origin server for a CDN. Few weeks ago we
saw a spike in 502 / 504 responses.

When I contacted their customer supported , I was pointed to the following URL
where they explain in detail how they handle these errors
[https://www.backblaze.com/blog/b2-503-500-server-
error/](https://www.backblaze.com/blog/b2-503-500-server-error/)

Essentially they are not considered as errors and expect the client to retry
loading the file. This approach won't work in our use case.

~~~
TickleSteve
So, you're relying on the API being 100% reliable? no errors?

~~~
ing33k
I not expecting 100 % reliability.

but when we get a 503 response it should be considered an an error and
acknowledged by the provider that it's an error.

in my use case, we were using a CDN which was configured to pull files from
B2. When B2 responds with a 503/500 I have no control on the retry mechanism.

The error rate was around 5-10%

------
avolcano
Huh, I thought it already had this! Must have mixed it up with a different
object storage service (maybe DigitalOcean?).

I've been using B2 for backup storage for some personal projects. It doesn't
necessarily do anything "better" than S3 from what I've seen, but never having
to log into AWS's dashboard is a reward enough on its own.

They do have a command-line client that's a quick PIP install, so you can do
something like:

    
    
      b2 upload-file bucket-name /path/to/file remote-filename
    

Which is, of course, nice for backups.

~~~
neurostimulant
I really wish the B2 client support uploading file from Unix pipe. It would be
nice to be able to archive a huge directory into a tar.bz archive and directly
pipe the result into the B2 client without having to save the archive into
disk first.

Currently I have to save the tar.bz archive to disk first before uploading to
balance. Took several hours to do so (huge spinning disks array, not as fast
as ssd), while uploading to B2 is blazing fast. Saving the archive to ramdrive
essentially solved this, but as the data grows I don't have enough memory to
spare anymore for a ram drive that can fit the whole archive.

~~~
jorams
Can you use process substitution?

    
    
        b2 upload_file bucket <(tar -cj huge-directory) archive.tar.bz2
    

The argument the command sees will be something like "/dev/fd/42", and the
shell will provide the output of tar through that file.

~~~
neurostimulant
Does process substitution actually wrote the content to disk first or not? The
information on internet I found seem to be conflicting on this. If it's
actually writing the data into disk first, then it's probably won't solve my
problem (limited disk i/o). Afaik writing to pipe won't result in saving the
data to disk temporarily. I guess the only way to know is to try it out on my
system and see how it performs.

~~~
Ineentho
If process substitution doesn't work, shouldn't /dev/stdin work? I haven't
tried it, but as long as b2 doesn't try to check the file size before
uploading I don't see why it wouldn't work: b2 upload_file bucket /dev/stdin <
file

------
jszymborski
Ooo, even more reason to set-up a NextCloud instance now! Previously, it
wasn't really practical to set-up B2 as external storage because you'd need to
also set up a compat layer.

~~~
christefano
I set this up yesterday, and it was a breeze.

Just had to be sure to omit the B2 external storage folder from the backups on
my Nextcloud server.

Now only if Virtualmin (YC ‘08) supported virtual server backups to
S3-compatible B2 cloud storage… There’s an open ticket for this at
[https://www.virtualmin.com/node/65024](https://www.virtualmin.com/node/65024)

------
mgamache
S3 is now the standard for cloud storage APIs? Not sure if that's good or bad.
I guess competitors have to reduce switching costs.

~~~
atYevP
Yev here -> It's not so much a standard, though S3 is generally the most often
used suite of APIs. 100s of integrations exist with our B2 Native APIs
([https://www.backblaze.com/b2/integrations.html](https://www.backblaze.com/b2/integrations.html)),
but a lot of folks only know how to write to S3 Compatible APIs and don't have
the resources to write to multiple API suites, so this makes integration
easier for them!

~~~
mgamache
That's how something becomes a standard... You are just responding to market
realities.

------
heipei
Now that we're talking about B2, has or is anyone using them for latency-
sensitive small-file object storage? I'm about to take the plunge and set up
benchmarks, my use-case is that I want to store and serve ~ 500k small files
(30b-1MB) per day to website visitors. So far B2 support has told me that it
shouldn't be a problem, and early benchmarking indicates the same, just
curious if anyone had stories from the trenches.

~~~
willcodeforfoo
We use B2 to store images on Vintage Aerial
([https://vintageaerial.com](https://vintageaerial.com)), both high res scans
and all kinds of thumbnail sizes.

It is... a little slower than I'd like but with Cloudfront in front it has
been manageable. I love tips from Backblaze on how to increase performance
there beyond caching to CF.

~~~
heipei
Can you go into detail what you mean by "slower than I'd like"? Are you
talking about TTFB (Time-To-First-Byte) or sustained read or concurrent read
performance? Are you using the API or the HTTP endpoints from a public bucket?

------
rkrzr
Does Backblaze offer strong consistency for files?

The killer feature of Google Cloud Storage in my eyes is its ability to be
strongly consistent, if you set the right HTTP headers. This is not possible
for Amazon S3, which is always eventually consistent and makes it unusable for
many use cases where you need to be able to guarantee that customers will
always see the newest version of a file.

~~~
nilayp
Nilay from Backblaze here.

Yes - B2 is strongly consistent. When you upload an object using either the B2
Native or S3 API - the object is persisted to the final resting place before
the upload completes. Therefore, you can list/download the file immediately
after your upload completes.

------
kstrauser
As a Synology user, _please_ let this mean that Hyper Backup can work with B2
now (or at least soon).

~~~
kevstev
I was trying to see if this was now better than Glacier- and aside from the
SLA's being much better in terms of retrieval, I am not sure they make sense
for a backup use case- where you are only really planning on downloading that
data back down in a worst case scenario. It may depend on what your
incremental backups look like as well- mine are negligible- a dump of a few GB
of photos after holidays, other records are tiny.

Glacier pricing in us-east is .0004 vs .0005 for B2. There is always pricing
obfuscation with cloud, but AFAICT, there is no need to move off Glacier for a
backup use-case.

~~~
cdumler
My two cents is that there is no reason to _use_ Glacier as a backup strategy.
Glacier's cost come for restoration: the more you restore and the faster you
want to restore it, the more quickly costs rise. It's far better suited for a
collection where you're pretty much most of it will never be restored but what
you'll need to restore you don't know. Think video, art, music assets for
projects. B2's retrieval is far, far lower cost and completely immediate for
restoring an entire backup back to a server. If you're not careful that extra
.5 cents you save will really cost you on a full restore.

~~~
Dylan16807
If you can wait a few hours, the better comparison is probably Glacier Deep
Archive, which is not $4/TB/month but $1/TB/month.

Amazon wants to charge you $90/TB* to get data to the outside world, compared
to B2's $10, but you can mitigate it in various ways. At the low end that's
using a lightsail instance as a VPN, depending whether you think the TOS
allows that. At the high end it's paying flexify.io $40 to move your data to
B2, then paying B2 $10.

There might be other ways to improve S3 egress costs. It's a very hard thing
to search for. I only learned about flexify from this post.

So if you have to restore less than half of your data each year, Glacier Deep
storage will save you money. It's worth considering, unlike normal Glacier
which is almost entirely downside.

* There's also a $2.50/TB fee to get things out of Glacier, but that's dwarfed by the other costs.

~~~
kevstev
Thanks for making me aware of this- I missed the Deep Glacier announcement
last year it seems. Glacier is already cost effective as is, but this will
make it even cheaper!

I use this as an offsite backup- that as long as disaster does not strike, I
will never use, and even if does, I can be patient about restoring.

------
WalterBAmaQ
Great. How about rsync.net-compatible (i.e. bog-standard, vendor-neutral)
"APIs"?

~~~
atYevP
Yev here -> Anyone can write to the B2 Native API or our S3 Compatible API, we
have tons of integrations that do it, here's a list ->
[https://www.backblaze.com/b2/integrations.html](https://www.backblaze.com/b2/integrations.html)

~~~
solarkraft
I don't think that's what gp meant. Why not use a standard protcol like SFTP?

~~~
amiga-workbench
I don't see how you could cram all of S3's functionality into sftp? How would
you configure a lifecycle policy for a file for example? Or generate a signed
URL?

It seems to me you would only get a very narrow subset of the functionality.

~~~
solarkraft
I see, I suppose there wasn't a free standard before and S3's API just
inofficially became one.

------
unilynx
I've looked at B2 from time to time, but doing database blob storage over S3
or to disk and backing up database and files over rsync made us stick to our
existing technology (eg Transip cloud storage which also charged 10 EUR/month
per 2TB). One thing we didn't look forward to was having to reimplement
cataloging and garbage collection for all of disk, S3 and B2, so we just stuck
to a rsync hardlinking solution (which makes incremental backups painless)

having access to primary storage and cheap backup storage using the same S3
API will make us reconsider that and will probably make it worth the effort to
dump our rsync-based solution for B2.

------
numbsafari
I would absolutely love to replace my use of S3 with B2 as a backup for data
stored elsewhere. Personally, I would much rather this storage to to a service
that only does storage, rather than everything else that AWS does, so I don't
have to worry about anything strange happening in a cloud service I don't use
every day.

When they first launched B2, I inquired about ability to enter into a BAA
(Business Associates Agreement) for HIPAA compliance and was told that it
wasn't "on the roadmap". It sounds like B2 has come a long way on the
compliance side. Would be great if they were open to this.

~~~
atYevP
Yev here -> Double good news for you this morning: we're now signing BAAs for
B2 Cloud Storage ;-) Just contact sales and they can get you sorted out!

~~~
numbsafari
That's great to hear! I'll definitely be reaching out.

------
hartator
Actually excited by this. I was benchmarking S3 vs B2 vs others 2 years ago
and I had to give up on B2 because its implementation for performance was so
much more difficult. (88 lines vs 36 lines for all others in Ruby)

~~~
simplyinfinity
So how many times a month do you have to implement this to be reasonable
compared to the cost of the storage?

~~~
hartator
This is not an implementation cost issue.

It was just super hard to make the code perform well. Like you have to manage
client sessions on your side and chose optimizations on your side. Like you
have to spread things manually. Which is hard to do. Whereas S3 is maximising
your bandwidth with no custom code required. It's not really S3 compatibility
that was needed but B2 API wasn't good.

~~~
prirun
HashBackup (author here) was one of the 1st if not the first B2 integration. I
didn't find the B2 API any more difficult to use than the S3 API. It has the
same functionality with similar kinds of API requests. The only significant
difference is that you have to request an upload URL and download URL, and
requests can sometimes return a code to get a new URL when a vault is full or
overloaded.

There is a price/performance trade-off: B2 has higher request latency than S3,
no matter where you are (my experience), but they also are 5x cheaper on
storage costs, 10x cheaper on download bandwidth, and have no price gimmicks
like minimum object sizes or minimum object lifetimes like many other services
(S3 IA for example).

To make up for B2's request latency it is more important to issue requests
from multiple threads, especially for short-running requests like removing
files.

Another key difference is that B2 always uses SSL whereas S3 can be accessed
without SSL with little security impact because each S3 request is
individually signed with a secret key. Setting up an SSL connection is more
overhead, so another key to performance is to reuse connections.

Both of these suggestions apply to S3 as well, just more to B2 because of the
latency difference.

------
sida
Can Amazon actually patent their API (per the google vs oracle case) -
basically like prevent other vendors to provide S3 APIs so that Amazon can
lock in users.

I am not a lawyer. So this is a genuine / dumb question.

~~~
brianwski
Disclaimer: I work at Backblaze. I'm also not a lawyer. :-)

> Can Amazon actually patent their API (per the google vs oracle case) -
> basically like prevent other vendors to provide S3 APIs so that Amazon can
> lock in users.

Most likely yes. Backblaze plans going forward are to fully, uncompromisingly
maintain our original native B2 APIs for a few reasons including this concern.

It's probably up to Amazon whether they want to boot all 3rd parties off their
S3 API. Backblaze has a viable fallback if that occurs. I hope for customer's
sake Amazon doesn't declare war in that fashion.

If Amazon decides on this path, internally at Backblaze we have discussed
immediately doing the opposite - declaring for all of time anybody can copy
our B2 APIs. Remember, our APIs are technically superior to the S3 APIs. They
are lower cost to implement, and are shockingly easier to use for developers.
They don't make all the mistakes S3 made. We had the luxury of learning from
all their mistakes over the years. :-)

~~~
sida
This is kind of scary that companies can patent an interface.

So google cloud is actually expected to also have this potential legal time
bomb?

Amazon can sue you and retroactively force you to pay them right? So all they
need to do is to wait for the alternatives to become popular

~~~
swyx
thats very short term thinking though. you win the battle but lose the war by
being so partner-hostile. amazon has thousands of partners pay to join it at
re:invent for a reason.

------
IvanK_net
It reminds me a moment three years ago, when I asked Dropbox to make their API
similar to Google Drive, as they basically provide the same service.
[https://github.com/dropbox/dropbox-api-
spec/issues/3#issueco...](https://github.com/dropbox/dropbox-api-
spec/issues/3#issuecomment-320685313)

It is just awful to see, how everyone tries to reinvent the wheel and not to
be compatible with anyone else.

~~~
tyingq
Fear of lawsuits related to copying APIs may also be a factor.

See
[https://en.wikipedia.org/wiki/Google_v._Oracle_America](https://en.wikipedia.org/wiki/Google_v._Oracle_America)

~~~
throw_away
Backblaze is clearly violating Oracle's copyrighted copy of Amazon's S3 API:
[https://docs.cloud.oracle.com/en-
us/iaas/Content/Object/Task...](https://docs.cloud.oracle.com/en-
us/iaas/Content/Object/Tasks/s3compatibleapi.htm)

------
willcodeforfoo
This is great news... there are lots more good clients for S3 than B2, and
implementing one is less than trivial because of some special considerations
B2 had in the beginning (namely: uploading directly to a pod.)

I see this isn't available for old buckets, is there a straightforward way to
duplicate a bucket to make it compatible or do you have to use something like
rclone?

~~~
budmang
(backblaze ceo here) Yes, easy to move the data to a compatible bucket using
our B2 CLI or Transmit: [https://help.backblaze.com/hc/en-
us/articles/360047120614-Ho...](https://help.backblaze.com/hc/en-
us/articles/360047120614-How-to-move-Data-from-an-Existing-Bucket-to-a-
new-S3-Compatible-Bucket)

------
shanemhansen
I'm curious what their load balancing layer looks like. There's alot of
interesting options. (Disclaimer: I've worked in the CDN and the storage space
in the past)

If their load balancer is smart enough it can call the dispatcher, and make
use of something like
[https://zaiste.net/nginx_x_accel_header/](https://zaiste.net/nginx_x_accel_header/)
to figure out where to forward the request. Unfortunately this still requires
uploads be proxied through the dispatcher.

You could get crazy and involve a CDN (akamai or cloudflare or fastly) that
could do some smart logic, especially if you can emit your dispatcher as a
lookup table that's updated frequently. I don't know what bandwidth costs
would be for that though. Probably high.

It's an interesting problem space and I'd love to talk to these folks about
it.

~~~
ADefenestrator
Hi! Backblaze employee who did some of the LB stuff here. It's relatively
standard/straightforward. There's a L4 load balancing layer using IPVS and
ECMP-via-BGP, then a custom application that does the actual
proxying/forwarding to the appropriate vault.

------
whalesalad
This is great. Their current API requires you to identify a unique host to
send data to, so you’re constantly performing a metric ton of DNS queries.
Until I white listed the base domain it was the #1 client of my Pihole
installation by multiple orders of magnitude.

~~~
guenthert
The DNS resolver library of your client is allowed to cache the IP address for
a given hostname for up to TTL. If it does so, the cost should be negligible.

~~~
brianwski
Disclaimer: I work at Backblaze.

> The DNS resolver library of your client is allowed to cache the IP address
> for a given hostname for up to TTL

Not only that, but one mistake a lot of developers made early on was asking
for a location to upload for every upload. That was _NEVER_ the intention. In
fact that annoys our servers also.

Developers are supposed to request a location to upload ONCE, and then upload
to that location for hours, or even DAYS. Unless you have a bug in that
software, it really shouldn't come anywhere close to being a high runner in
DNS. We're talking 9 or 10 requests per day, at most, if you are unlucky. Feel
free to reach out to our support if you aren't seeing that!

~~~
whalesalad
If you can use one of these arbitrary domain names for hours or days ... why
wouldn't you just handle it on your end and provide the public with a single
domain?

~~~
brianwski
> why wouldn't you just handle it on your end and provide the public with a
> single domain?

That is what we did for the S3 protocol. It adds cost via a load balancer.

The whole original storage design was based on the fact that in our original
product line (Backblaze Personal Backup) we owned both ends of the protocol -
our servers on the back end, and our client on the customer laptop. We were
able to eliminate all load balancers from our datacenter by being a little
tiny bit more intelligence in the client application (maybe 50 lines of code).
The client asks the central server where there is some free space. The server
tells it. Then the client "hangs up" and calls the storage vault directly, no
load balancer required! Then the client uploads as long as that storage vault
does not fill up or crash. If the storage vault crashes, or is taken offline,
or fills up all the spare space it has, the client is responsible to go back
and ask the central server for a NEW location. This fault tolerance step in
the client ENTIRELY eliminates load balancers! Normally you need an array of
servers and a load balancer to accept uploads, because what if one of the
array of servers crashed, had a bad power supply, or needed to update the OS?
The load balancer "fixes that" for you by load balancing to another server.
Pushing the intelligence down into the client saved us money. Nobody ever
noticed or cared because our programmers could write the extra 50 lines of
code, to save the $1 million worth of F5 load balancers (or whatever solution
Amazon S3 has).

We based our original B2 api protocols on this cost savings and higher
reliability, but it does push the 50 lines of code logic down to the client.
It caused a lot of developers this extreme, extreme angst. They just couldn't
imagine a world where their code had to handle upload failures and retries.
They would ask us "how many retries should we try before we just fail to
backup"? Should I try 2 retries, or 3 before the backup entirely fails and the
customer loses data? Our client guys had a whole different approach, since it
was a computer we just went ahead and tried FOREVER. Never endingly, until the
end of time, in an automated fashion. A couple times a year one client gets
unlucky and it requires several round trips before getting a vault to upload
to, but who cares? It's a computer, it can retry forever. It never gets tired,
never gives up.

But S3 never figured this out, and they require the one upload point have
"high availability". It saves any app developers about 50 lines of code and a
lot of angst, but then we (Backblaze) has to purchase a big expensive load
balancer, or build our own. We mostly built our own.

~~~
rossjudson
(I work at Google on Cloud Storage.)

Developers working with cloud storage APIs generally need to get used to the
idea that not everything is going to work all of the time. Retries and proper
status code/error handling are critical to making your application work
properly in real-world conditions, and as "events" occur. Every major cloud
storage provider has circumstances under which developers must retry to create
reliable applications; Backblaze is no different. For GCS, we document
truncated exponential backoff as the preferred strategy [1].

Google has its Global Service Load Balancer (GSLB) [2], which handles...let's
just say an enormous amount of traffic. GSLB is just part of the ecosystem at
Google.

It's hard to design a storage system that's "all things to all people"! There
are a series of tradeoffs that need to be made. Backblaze optimizes for
keeping storage costs as low as possible for large objects. There are other
dimensions that customers are willing to pay for.

[1] [https://cloud.google.com/storage/docs/exponential-
backoff](https://cloud.google.com/storage/docs/exponential-backoff) [2]
[https://landing.google.com/sre/sre-
book/chapters/production-...](https://landing.google.com/sre/sre-
book/chapters/production-environment/)

------
haywirez
That's awesome, but I really want to see lightning fast response times and
TTFB... Second pain point is the number of retries needed for uploading a
large batch of small files. Those are the main reasons I'm still considering
migrating away. I really wish I shouldn't as otherwise I love the pricing and
the philosophy.

Edit: also think DigitalOcean Spaces and B2 might be better off merging
together, or Spaces being a whitelabel B2 in disguise (both are part of BWA).

~~~
brianwski
Disclaimer: I work at Backblaze.

> I really want to see lightning fast response times and TTFB (Time To First
> Byte Served)

If a file is "cold" (nobody has requested it in the last 24 hours) then it
needs to be reconstructed from the Backblaze Vaults and there is a little
delay. After that, it should serve pretty fast for the following requests (off
of a caching layer with SSDs).

In the end, Backblaze B2 is a good solution for some customers, and not ideal
for others. If your application requires blinding speed, like sub 1
millisecond serve times, Backblaze B2 may not be perfect for you. But how
often is that the case? Certainly not when fetching a web page, or storing a
backup for a year, right? In those cases a small delay is FINE. This is an
example web page served by Backblaze B2 here, how does it load for you?
[https://f001.backblazeb2.com/file/ski-
epic-c/full/2015_scotl...](https://f001.backblazeb2.com/file/ski-
epic-c/full/2015_scotland_will_macdonald_birthday_in_duns_castle/index.html)
Fast? Slow? How is it?

For comparison, my regular hosting provider serving the same web page here:
[https://www.ski-
epic.com/2015_scotland_will_macdonald_birthd...](https://www.ski-
epic.com/2015_scotland_will_macdonald_birthday_in_duns_castle/index.html)

Personally I can't tell any difference. I still look silly in a kilt in both
versions. :-)

> Second pain point is the number of retries needed for uploading a large
> batch of small files.

It really shouldn't take any retries, or geez, at VERY MOST something like
less than 1% - why is that an issue? Software should handle the tiny failure
rate. I'm honestly curious, we want to know why people aren't choosing our
solution!!

~~~
heipei
I understand you're probably not in a position to say anything about it, but
I'd love to see the "little delay" when reconstructing a file qualified
somewhat. Are we talking < 5s or < 10s? What do the percentiles for restore
latency look like? How does file size play into it? This, for me, is one of
the biggest unknowns right now since it's not easy to create a test benchmark
for this case (i.e. upload a bunch of stuff and let it sit idle for at least
24 hours, hoping it will be expired from the caching layer).

~~~
brianwski
> I'd love to see the "little delay" when reconstructing a file qualified
> somewhat. Are we talking < 5s or < 10s?

I asked the engineers that work on that code, and they pulled a random sample
from the logs (we time all of this) and said for files less than 1 MByte, it
averaged around 250 milliseconds to reconstruct the file from the Backblaze
Vault and get it onto the cache servers where it is then served up. In 95% of
requests completed within 900 milliseconds, but there were a few up over 1
second (1.2 seconds was the highest they found). Those are live production
numbers so it includes all the load on those Vaults.

A couple other notes just to add color. Any one Backblaze account is bound for
life to what we call a "cluster", for example there is one cluster in Europe
so all files are stored in Europe for any account in Europe. There is a load
balanced array of "cache servers" in front of all the vaults specific to that
cluster (the caching servers are physically located close to the vaults for
latency reasons), and our biggest cluster has something like 20 of these SSD
based caching servers. Ok, so the cache layer is not "shared", meaning each
cache server only pulls directly from the Backblaze Vault. So if you were
serving a file, and 20 separate customers got amazingly unlucky, the file
would get the 250 millisecond lag every time for those first 20 fetches. The
cool parts of this architecture is that then you have 20 populated caches that
are completely unrelated to each other so you have 20x the bandwidth available
to serve it up (and a rack of really fast 20 servers to serve it). Plus they
are all totally independent so they can crash or be brought offline to upgrade
the software without any downtime.

We can add these cache machines as we need them, they are these 1U units and
we have "warm spares" for a variety of things. When we have had spikes in load
in the past we toss some hardware at it pretty fast.

------
Waterluvian
Okay dumb it down for Monday Me. Does this mean I can read from and write to
my B2 storage using AWS S3 libraries (like the CLI, Python, and Node libs)?

~~~
jedberg
Yes.

~~~
mahesh_rm
Would S3cmd cli work as well?

~~~
zimpenfish
I tried s3cmd according to their blog post without success. Just kept
complaining that the access key was invalid. Which is a shame because `s3cmd`
is much easier to use than `b2`.

~~~
atYevP
Yev here -> make sure you ping our b2feedback@backblaze.com address and let us
know about that experience, we're writing it all down and keeping tabs on
what's not working as intended.

~~~
zimpenfish
Thanks for following up! I eventually got it working - I'd followed the
example too closely and had a rogue `us-west-002` left in the config which
broke things because I'm apparently on `us-west-000`. But I'll drop an email
anyway because I can't see an easy way to see what region you're in other than
visually parsing the endpoint URL.

------
dopamean
I have a decent sized music collection consisting a lot of lossless vinyl rips
that I've made from my record collection. It totals around 200gigs at the
moment but is growing weekly. I've been looking for somewhere to back this all
up in the cloud and backblaze is looking most promising at the moment. Anyone
here have any thoughts on where I should go with this?

~~~
tams
There are lots of tools you can use to back up to Backblaze, including their
comsumer backup service.

If you want to sync to B2 specifically with a lightweight tool, check out
[https://rclone.org/](https://rclone.org/)

~~~
terseus
Rclone is really a fantastic tool, its configurability based on backends allow
for amazing combinations!

You can configure any cloud storage backend (B2, S3, GCS ...) and combine it
with other utility storage backends, like "crypt" [1], "cache" [2] and
"chunker" [3], I highly recommend it to anyone searching for a backup
solution.

The only feature I miss from Rclone is automatic directory monitoring and
mirroring, which I solved using Syncthing (but forces me to host an additional
server for it).

[1] [https://rclone.org/crypt/](https://rclone.org/crypt/) [2]
[https://rclone.org/cache/](https://rclone.org/cache/) [3]
[https://rclone.org/chunker/](https://rclone.org/chunker/)

------
artellectual
This is wonderful news for me. I host a video on demand site Codemy.net and
all the original source videos are on backblaze. Originally I had to write a
library to connect to the backblaze api. Now I look forward to using the
existing aws client libraries, one less thing I have to maintain.

~~~
atYevP
Yev here -> That's awesome to hear! Ease of use is one of the things that we
strive for at Backblaze and I'm glad that the S3 Compatible APIs are going to
unlock some use-cases and make things easier for people that don't have the
bandwidth to maintain different codepaths!

------
tgtweak
I remember when they had all their servers in one room and the redundancy
boiled down to erasure encoding in single servers.

They've been doing incredible work in the open (storage server design,
hardware reliability data, etc) and I'm really happy they've grown to where
they are today.

------
jjice
I was actually looking at B2 vs S3 literally 2 days ago and went S3 for the
universal API. Luckily, it was a personal project and I I can probably migrate
everything very quickly. This is a killer feature, and I bet this will
convince a lot of people to move to Backblaze.

------
idrock
Given all the Cloudflare discussion - Cloudflare webinar with Backblaze coming
up next week:
[https://www.brighttalk.com/webcast/14807/405472](https://www.brighttalk.com/webcast/14807/405472)

------
manigandham
Just switched to Wasabi last week for better pricing and S3 interface... but
great to see this.

------
kcdipesh
Great news. Only a few days ago I was trying to figure out ways to use minio
to use backblaze as mattermost cloud storage which needs to be s3 compatible.
I expect that will work straight now. Have anyone already tried this
integration?

------
polskibus
Is there an open source S3-compatible component that I could rollout on prem?

~~~
wooptoo
Minio is open source and S3 compatible [https://minio.io](https://minio.io)

~~~
polskibus
Thanks, looks like spinning up a cluster is AGPL, so it's a no go.

------
mdevere
Every day I have to use something like 4gb of data to let Backblaze sync. This
is despite the fact that I might only have created/changed 100mb worth of
files since the previous day's sync.

------
rb808
Is there a cheap s3 compatible service that is less reliable? I dont want to
pay for redudancy Eg its my backups I can handle a 3% chance that my data is
lost as long as I find out about it.

~~~
sundbry
If you want super cheap, use DreamObjects by DreamHost.

~~~
rb808
doesn't look cheap at 5x the price of blackblaze.
[https://www.backblaze.com/b2/cloud-storage-
pricing.html](https://www.backblaze.com/b2/cloud-storage-pricing.html)

------
sparrc
Does this mean I can use awscli to interact with b2 by specifying some
backblaze server with --endpoint-url? What is the endpoint I would use?

~~~
cbo100
When you create the bucket it shows a url along with the keys.

Not sure how unique that URL is, looking at the structure it could depend what
data centre your bucket gets created in.

------
hannibalhorn
Just signed up to try it out - would really like to see some form of two
factor auth that isn't SMS based. TOTP and/or FIDO U2F.

~~~
ac29
I've been using TOTP with B2 for ages. I think SMS is just to set up the
account.

~~~
hannibalhorn
Ah, the copy next to "Turn on Two Factor" sure sounded like SMS only, but sure
enough, it gives you the option to use TOTP later on. Thanks!

------
benbro
Is B2 suitable for streaming video files? Can I stream the same file to 1,000
viewers at the same time?

~~~
budmang
(backblaze ceo here) B2 is a great origin store for your video files. If
you're streaming to lots of viewers, using a CDN with Backblaze B2 is optimal.
We partnered with Cloudflare as a founding member of the Bandwidth Alliance so
you can store your videos with B2 and transit them for free to Cloudflare,
which can serve to your viewers.

~~~
GeneticGenesis
Please correct me if I'm wrong, but my understanding was that Cloudflare
should not be used to deliver video files unless using Cloudflare's "stream"
product, IE specifically this [1].

[1]: [https://community.cloudflare.com/t/cloudflare-how-not-to-
vio...](https://community.cloudflare.com/t/cloudflare-how-not-to-violate-the-
terms-of-audio-video-static-html-content/101075)

~~~
fabiandesimone
Would love to confirm this.

------
siscia
Did somebody actually tried to use the S3 API? It seems like they are not
working for me.

------
gramakri
Fantastic news. B2 Storage is one of the most requested backup storage backend
for us.

------
S3raph
happy customer of backblaze. Love how transparent they are with everything
(especially the Harddisk statistics) and how the CEO takes time to respond to
a lot questions only confirms how down to earth they are.

------
ghawkescs
Any chance of adding Azure storage compatible APIs in the future?

------
brian_herman__
There is no mention of the durability guarantees that s3 has.

~~~
brianwski
Disclaimer: I work at Backblaze.

> There is no mention of the durability guarantees that s3 has.

I wrote this blog post doing some of our math around this:
[https://www.backblaze.com/blog/cloud-storage-
durability/](https://www.backblaze.com/blog/cloud-storage-durability/)

But here is the thing: if you value your data, like if you will really go out
of business if you lose it, then you should store three copies with AT LEAST
two separate vendors. No matter how reliable any one vendor is, "stuff can
happen" like your credit card is declined and the vendor deletes all of it.

I would recommend you use two separate vendors like Amazon S3 and Backblaze
B2, and use two separate credit cards that expire on different cycles. I
believe the credentials for login should be different on those two accounts,
and the same one employee shouldn't have the credentials to both. Because one
disgruntled employee should _NOT_ have the ability to put you out of business.
If you want some other thoughts, here is a blog post Backblaze wrote called
the "3-2-1 Backup Strategy": [https://www.backblaze.com/blog/the-3-2-1-backup-
strategy/](https://www.backblaze.com/blog/the-3-2-1-backup-strategy/)

------
ckdarby
Another step towards Amazon acquiring them.

~~~
Aeolun
Oh man, I hope not... I enjoy my independent B2.

------
joshuaellinger
Any plans for an Azure Blob API?

