
Design Thinking: B2 APIs and the Hidden Costs of S3 Compatibility - Manozco
https://www.backblaze.com/blog/design-thinking-b2-apis-the-hidden-costs-of-s3-compatibility
======
elFarto
I wonder if you could get the same functionality of AWS, with the same
implementation of B2, by having a single URL to POST files to, that simply
sent a redirect to the correct location (apparently there's the 307 HTTP
status code for exactly this).

E.g:

    
    
      => POST https://upload.backblaze.com/bucket/file
      <= 307 redirect to https://pod-000-1007-13.backblaze.com/b2api/v1/b2_upload_file/...
      => POST https://pod-000-1007-13.backblaze.com/b2api/v1/b2_upload_file/...

~~~
user5994461
Strictly speaking, it's possible but it's not reliable and it shouldn't be
used.

A redirect on a POST never results into another POST with the same content.

\- API typically don't follow redirects (without special flags). Too much risk
to cause damages by doing repeated calls.

\- Browsers follow the redirect by a GET with no data. It's typical for a
completed form to redirect to another page, do not re submit the form there.

There are some settings and flags to alter the behavior into what you describe
but it's not a good idea to go there. It will definitely not work out of the
box with much of anything.

~~~
willglynn
There are different kinds of redirects. See the discussion of 303 See Other vs
307 Temporary Redirect in RFC 7231:

[https://tools.ietf.org/html/rfc7231#section-6.4.4](https://tools.ietf.org/html/rfc7231#section-6.4.4)

[https://tools.ietf.org/html/rfc7231#section-6.4.7](https://tools.ietf.org/html/rfc7231#section-6.4.7)

303 means "GET this new URL" while 307 means "resend your request at this URL
without changing the verb or body".

~~~
user5994461
And you can see that the RFC only talks in terms of should and may. It's
beyond basic HTTP functionality, the RFC is basically moot at this stage and
you're dealing in implementation specific details.

You will have to work individually on every client you plan to support, both
browsers and libraries. It can be done but it's not necessarily a good idea.

------
jlmorton
I'm really surprised B2 doesn't seem to charge for upload API requests. I have
a project which uploads several billion small objects to Amazon S3. The vast,
vast majority are written, stored with a 15 month TTL, and never touched
again. Some small number are downloaded each month.

To illustrate this, here's a recent S3 bill:

$0.005 per 1,000 PUT, COPY, POST, or LIST requests 289,727,754 Requests
$1,448.64

$0.004 per 10,000 GET and all other requests 62,305 Requests $0.02

$0.023 per GB - first 50 TB / month of storage used 18,990.009 GB-Mo $436.77

As you can see, most of our spend on S3 is from the PUT requests, not the
storage, or download. Probably there are some things we could do to reduce the
number of PUT requests. We don't really care that much, because the total cost
is not that large, but there is at least some incentive to reduce the number
of PUT calls.

But if it was free? I would never change this system. Does Backblaze really
want this sort of traffic profile?

~~~
brianwski
Disclaimer: I work at Backblaze.

> Does Backblaze really want this sort of traffic profile?

Oh heck yes, we would very much like your business! :-)

Backblaze only charges you $0.005/GByte/Month so your $436.77 bill would go
down to $94.95 and we would be happy to have it. That is profitable for us.
(Since Backblaze doesn't have any deep pockets or VC funding, we have to stay
profitable.)

~~~
donavanm
So your pricing is aligned with the storage cost. But youre explicitely not
pricing in write throughput/IO access, deletes, or similar caused by lifecycle
events? How does that square with long term trends to greatly increase density
while IO remains flat for the past decade?

~~~
mmt
> IO remains flat for the past decade

Although I agree that storage density growth for HDDs has greatly outstripped
any I/O growth, I don't agree that the latter has been flat (i.e. no growth).

Are you sure you're not doing something like comparing 7200rpm drives from 10
years ago to 5900rpm (or slower) or variable-speed "green" or even SMR drives
from today?

That said, I think what many people forget is that for "cloud" storage, the
I/O bottleneck is almost certainly going to be the network and not the backend
storage, especially for mainly sequential access.

If each of their "pods" holds 60 drives and each "vault" holds 20 pods (17 of
which are non-parity data), that's over a thousand drives per vault. If each
drive is 7200rpm non-SMR, it can saturate a 1Gb ethernet with sequential I/O,
and random I/O would divide that by 10 or so. That's the equivalent of
100Gb/s, per vault.

That kind of bandwidth is possible and even affordable to provision inside the
datacenter without metering and charging for it and is likely to dwarf the
size of the connection to the Internet.

------
hemancuso
It’s a bit unclear to me what is so expensive about the load balancing nodes.
Care that explain why it’s substantially more than a few round robin’d smart
reverse proxies moving data to the correct storage node? With S3/Dynamo design
the back end destination is largely known from the hash ring.

Also- Wasabi has fantastic pricing and full s3 compatibility.

~~~
zzzcpan
> It’s a bit unclear to me what is so expensive about the load balancing
> nodes.

They also use Reed-Solomon and split data into multiple pieces to store on
multiple servers. So they need all those "load balancing"-like nodes anyway
and probably no new hardware or infrastructure is necessary to conform to S3
API.

~~~
brianwski
Disclaimer: I work at Backblaze.

> They also use Reed-Solomon and split data into multiple pieces to store on
> multiple servers. So they need all those "load balancing"-like nodes anyway

Yes. We definitely do "load balancing" or more accurately "disk space loading
balancing" but we do it all in software. The net outcome is the same, but the
cost is lower.

> probably no new hardware or infrastructure is necessary to conform to S3 API

No, it would require additional hardware we do not purchase at all right now.
Backblaze's philosophy is to shave off cost at all layers if it doesn't
actually contribute to uptime or durability. Put differently, if there is a
lower cost way to achieve the same uptime or durability with some intelligent
software or possibly an extra network round trip, we do it that way instead of
purchasing extra hardware.

~~~
hemancuso
What special hardware vs a few cores with a reverse proxy? Surely a trivial
cost.

~~~
brianwski
> Surely a trivial cost.

So we both agree it is more than zero cost? Backblaze saves that cost passes
on the savings to customers. I'm not sure what the exact costs would be
because Backblaze did not implement it that way.

> a few cores

By "a few" do you mean 10, 100, 1000 or...? For how much bandwidth will your
solution support? For example, can your few cores support 10 Gbits/sec? 100
GBits/sec? 1 TBit/sec?

Backblaze is COMPLETELY FREE of worrying about these questions, because our
solution does not require this additional step and this additional hardware,
and therefore does not have this choke point.

------
bcheung
Anyone know why Amazon didn't adopt existing standards like SCP / SFTP /
WebDAV? I've always found the S3 APIs to be difficult to work with, especially
for authorization and large uploads.

~~~
yzmtf2008
Because S3 is a K/V Store, not a file system.

~~~
bcheung
File systems can be used as key value as well. The Key is the path, and the
value is the contents of the file. Most modern filesystems also have metadata.

Granted it's not hierarchical, so listing a bucket lists everything in every
folder, but I don't see why that's too big of a concern. Especially since
there are pseudo directories in many of the S3 tools. Many people use S3 and
have a folder-like hierarchical naming convention anyways.

With WebDAV, the URL is the key, and the value is the contents of the upload /
download.

------
willglynn
This article contains some misunderstandings about the S3 API.

> The interface to upload data into Amazon S3 is actually a bit simpler than
> Backblaze B2’s API. But it comes at a literal cost. It requires Amazon to
> have a massive and expensive choke point in their network: load balancers.
> When a customer tries to upload to S3, she is given a single upload URL to
> use. For instance,
> [http://s3.amazonaws.com/<bucketname>](http://s3.amazonaws.com/<bucketname>).
> This is great for the customer as she can just start pushing data to the
> URL. But that requires Amazon to be able to take that data and then, in a
> second step behind the scenes, find available storage space and then push
> that data to that available location. The second step creates a choke point
> as it requires having high bandwidth load balancers. That, in turn, carries
> a significant customer implication; load balancers cost significant money.

In fact, S3's REST API requires callers to follow HTTP redirects, and the PUT
documentation expressly mentions the HTTP "Expect: 100-continue" mechanism
precisely so that the S3 endpoint you reach in your initial PUT request does
not have to handle the HTTP request body.

[https://docs.aws.amazon.com/AmazonS3/latest/dev/Redirects.ht...](https://docs.aws.amazon.com/AmazonS3/latest/dev/Redirects.html)
[https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPU...](https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html)

> The Dispatching Server (the API server answering the b2_get_upload_url call)
> tells the Client “there is space over on “Vault-8329.” This next step is our
> magic. Armed with the knowledge of the open vault, the Client ends its
> connection with the Dispatching Server and creates a brand new request
> DIRECTLY to Vault-8329 (calling b2_upload_file or b2_upload_part). No load
> balancers involved!

Again, this could be done directly with HTTP. PUT to the first server, receive
a redirect, PUT to vault-8329, receive "100 Continue", transmit file. There's
no need to have a separate API call to get the "real" upload URL.

> 3) Expensive, time consuming data copy needs (and “eventual consistency”).
> Amazon S3 requires the copying of massive amounts of data from one part of
> their network (the upload server) to wherever the data’s ultimate resting
> place will be. This is at the root of one of the biggest frustrations when
> dealing with S3: Amazon’s “eventual consistency.”

Wait, I thought they were load balancers? Why does the load balancer need to
copy any data once it's done uploading?

As for eventual consistency, there is truth to this complaint -- but much less
truth than in the distant past. Every S3 region except us-standard has always
had read-after-write consistency for new objects since launch, and as of
August 2015, us-standard does too:

[https://aws.amazon.com/about-aws/whats-
new/2015/08/amazon-s3...](https://aws.amazon.com/about-aws/whats-
new/2015/08/amazon-s3-introduces-new-usability-enhancements/)

If your PUT returns 200 OK, a subsequent GET will return the object, assuming
you're using unique keys. This prevents the 2015-and-earlier problem where
you'd create a new S3 object and enqueue a job to process it, then the job
gets 404 Not Found while retrieving the new object.

There are other cases where S3's eventual consistency can be an issue, but
none of them have been dealbreakers for my applications. Having said that:
S3's consistency model is a weaker than the model B2 provides, so this is not
an argument against providing an S3-compatible interface.

~~~
brianwski
Disclaimer: I'm the original author of the blog post. :-)

> In fact, S3's REST API requires callers to follow HTTP redirects...

Yes. Let's take the example of uploading 3 small files.

With S3, _every_ upload call must start by hitting the same URL that is then
redirected. To upload the same 3 small files in B2, it is a little different.
For B2, one call is made to the "dispatch server" to ask for where there is
spare space, then there is no redirect. The client must disconnect, and
contact the final destination three times. So for this example, S3 would need
to do 3 redirects (3 total network calls) and Backblaze B2 would do zero
redirects but make 4 total network calls to upload three files. I hope that
makes sense.

I'm not saying one is better than the other, and I freely admit S3 can be a
little more intuitive. But it saves Backblaze money to not have such high
loads and high uptime demand on the original URL.

> Again, this could be done directly with HTTP.

Yes I agree it could have been.

> There's no need to have a separate API call to get the "real" upload URL.

The "need" was to save money. See example above. Amazon S3 would require 3
redirects from the original URL, while Backblaze B2 chose a different tradeoff
that is zero redirects and 4 total requests where 3 out of 4 requests were NOT
redirected. The 3 out of 4 requests go to the final location DIRECTLY.

If you are only uploading one file, the Amazon S3 API is about as efficient as
Backblaze and I agree with you. But if you upload 1 million files, the
Backblaze B2 architecture is cheaper/simpler. But either way, it is the
tradeoff we chose.

> as of August 2015, us-standard... has read-after-write consistency for new
> objects

TIL. We launched the B2 API before August of 2015, sorry if I propagated old
info!

~~~
teraflop
> For B2, one call is made to the "dispatch server" to ask for where there is
> spare space, then there is no redirect.

An API call that retrieves the address of another machine to contact is
functionally the same thing as an HTTP redirect, just with different syntax,
right?

> The client must disconnect, and contact the final destination three times.

This seems like the key idea that wasn't quite explained in the blog post: the
client is expected to cache and reuse the same "vault" server for future
requests, not just for multiple pieces of a single upload. If the client
doesn't conform to that expectation, your approach would end up with exactly
the same performance characteristics as S3.

What I find interesting is that unless I'm misunderstanding the API
documentation [1], file downloads apparently _do_ go through a load balancer.
That is, you make an HTTP request to a single endpoint that's the same for all
files in the account, and it fetches the data from whichever "vault" server is
actually storing it. So does that mean that Backblaze's rate of incoming
uploads is much larger than the rate of downloads? Otherwise it seems like you
could just reuse the same load-balancing infrastructure for both.

[1]:
[https://www.backblaze.com/b2/docs/b2_download_file_by_name.h...](https://www.backblaze.com/b2/docs/b2_download_file_by_name.html)

~~~
brianwski
> An API call that retrieves the address of another machine to contact is
> functionally the same thing as an HTTP redirect

If you are only uploading exactly one file -> yes.

But if you are uploading 1 million files, a redirect system would result in 1
million redirects. The B2 system would result in less load on the dispatch
server.

> the client is expected to cache and reuse the same "vault" server for future
> requests

Yes, exactly! This is the very core part of the B2 architecture. In fact, it
is a waste of time and performance to keep asking the dispatch server for a
location for every file. Just assume the last vault you got continues to be
"valid" until the vault kicks you off with a 503. In B2 world, the 503 is
_NOT_ a fatal error, it means "go back to the dispatching server and ask for a
new location to upload to".

> file downloads apparently do go through a load balancer

Correct. Because we wanted the ability to serve up static web content such as
this picture (this is served out of B2):
[https://f001.backblazeb2.com/file/bucket9/cute3.jpg](https://f001.backblazeb2.com/file/bucket9/cute3.jpg)
and do it in a highly available and highly scalable way if the content goes
viral. The way we do that is we have load balanced "download servers" that
also act as a caching layer. The FIRST time somebody fetches a file from the
vault, the caching layer has to ask the vault to reassemble the file from
parts, and then the caching layer caches it on fast SSDs inside the download
servers. The second time (in 24 hours) that a customer requests the file, it
comes out of the cache and not out of the vault. Otherwise, if a video went
viral the vault would get crushed trying to reassemble the video for all 10
million views. :-)

The load balancers we used are described in a different blog post here:
[https://www.backblaze.com/blog/load-balancing-
and-b2-cloud-s...](https://www.backblaze.com/blog/load-balancing-and-b2-cloud-
storage/)

> does that mean that Backblaze's rate of incoming uploads is much larger than
> the rate of downloads?

Yes. VERY MUCH yes. Backblaze has more than 10x the incoming bandwidth as
outgoing bandwidth. Backblaze started as an online backup company with the
"Backblaze Personal Backup" solution. Online Backup is highly inbound heavy.

------
deepsun
That "get_upload_url()" trick they invented was in AppEngine's BlobStore since
2008. Although Google deprecated it in favor of GCS.

------
deedubaya
That's all fine and good, I don't care if you're S3 compatible or not....

I do care if I have to write my own API client for your storage backend. Or if
you have examples to go off of. Backblaze doesn't seem to offer either for
non-C++/Swift languages. Complete non-starter.

The, perhaps obvious, win of being S3 compatible is that you open the door to
thousands of existing S3 clients already implemented in my different
technologies, for free. And you get the developers who use them as customers.

~~~
brianwski
Disclaimer: I work at Backblaze.

> Backblaze doesn't seem to offer either for non-C++/Swift languages.

On each of the API web pages there are code examples for the following
languages: 1) cUrl, 2) Java, 3) Python, 4) Swift, 5) Ruby, 6) C#, and 7) PHP.

For example, go to
[https://www.backblaze.com/b2/docs/b2_authorize_account.html](https://www.backblaze.com/b2/docs/b2_authorize_account.html)
and scroll ALL THE WAY TO THE BOTTOM of that web page, and you should see
"Sample Code" section. Click on the blue buttons to see the different code
examples.

If your favorite language is missing, we can add it for you! One of our client
engineers wrote most of the code examples in all 7 languages in less than 1
week. With these working code examples, the expectation is you should be able
to get B2 working in literally less than 2 days in any application in any
language.

~~~
trevyn
FYI, the most popular Node B2 library seems to be in the process of being
abandoned:
[https://github.com/yakovkhalinsky/backblaze-b2](https://github.com/yakovkhalinsky/backblaze-b2)

Would be nice if there was an official Node library.

~~~
brianwski
I really want to do more JavaScript support and examples and SDKs for B2 (both
web page and server side).

While I think of myself as a 'C' programmer, the GIGANTIC amount of work being
done in JavaScript nowadays is just amazing, and you can reach so many
customers via JavaScript that I think JavaScript is a huge part of the future
of computing.

------
metalrain
It's great that cost of elasticity is not hidden. I'm glad that there are
alternatives.

------
misterbowfinger
Honestly surprised that AWS, GCP, or Azure haven't acquired BackBlaze by now.
Seems like an obvious move.

~~~
brianwski
Disclaimer: I am the author of the blog post and work at Backblaze.

> surprised that AWS, GCP, or Azure haven't acquired Backblaze by now

Sometimes press/customers/people ask us how many customers has Backblaze
"converted over from" S3 to B2. To be honest, I think the answer is
approximately zero. If a customer already has a Petabyte uploaded into S3, it
is simply not practical or cost effective to download that from S3 and import
it into B2. The S3 download costs are extremely expensive (9 cents per GByte
to download out of S3, vs 1 cent per GByte to download out of B2).

Most of Backblaze's B2 customers fall into one of two camps:

1) New customers starting out that have just started to look at cloud storage
and need to decide where to put their data. They look at S3, compare with B2,
and make a decision and Backblaze B2 gets some of those customers.
Realistically we don't even get half of these new customers either because B2
is missing a feature the potential customer needs, or because the customer has
not even heard of Backblaze. So Backblaze is probably not causing Amazon too
much lost revenue yet.

2) Multi-cloud customers. Any one "cloud provider" can have outages, including
Backblaze and including Amazon. So if you store an entire copy of your data in
Amazon S3, and also store an entire copy in Backblaze B2, you will (by
definition) have both more durable data and higher availability data than a
single copy in either one alone. By definition this doesn't actually cost
Amazon S3 any sales, because you STILL need a complete copy in S3!! For multi-
cloud, Backblaze B2 doesn't harm Amazon one bit.

> Seems like an obvious move (for Amazon to acquire Backblaze).

Amazon has never offered to buy Backblaze, but also we are not for sale. Or at
very least not at the prices Amazon would probably want to spend (or could
justify to their board). I don't know if this is common knowledge but
Backblaze is employee owned. We never took any significant VC funding, so the
only people with voting rights on the board of directors are the original 5
Backblaze founders (including myself). We like what we are doing and we
COMPLETELY control our own destiny and work environment. Literally nobody
(except customers) can tell us what to do, how to price our products, or what
features to build. Backblaze is a fun company to work at, and we all make a
good living. So it would be inordinately expensive to buy us out just to put
us out of business and dissolve our tight-knit group here and ruin all our
fun. Our plan is to stay independent forever.

~~~
nicoburns
> Amazon has never offered to buy Backblaze, but also we are not for sale. Or
> at very least not at the prices Amazon would probably want to spend (or
> could justify to their board). I don't know if this is common knowledge but
> Backblaze is employee owned. We never took any significant VC funding, so
> the only people with voting rights on the board of directors are the
> original 5 Backblaze founders (including myself). We like what we are doing
> and we COMPLETELY control our own destiny and work environment. Literally
> nobody (except customers) can tell us what to do, how to price our products,
> or what features to build. Backblaze is a fun company to work at, and we all
> make a good living. So it would be inordinately expensive to buy us out just
> to put us out of business and dissolve our tight-knit group here and ruin
> all our fun. Our plan is to stay independent forever.

This as much as anything encourages me to trust and invest in Backblaze. Long
may you live. Maybe I'll get around to writing that Rust SDK I've been meaning
to write for a while...

------
vpribish
"Design Thinking" >>cringe<<

