
Show HN: B2blaze – A Backblaze B2 library for Python - gsibble
https://github.com/sibblegp/b2blaze
======
ereyes01
I use rclone's B2 driver ([https://rclone.org/b2/](https://rclone.org/b2/)) as
an rsync-style backup solution of about 1TB worth of my pictures and other
media. Also, it's encrypted with my own local key using rclone's crypt module
([https://rclone.org/crypt/](https://rclone.org/crypt/)).

rclone supports multithreaded upload, and even has experimental support for
FUSE mounting. However, the sync command gets you Dropbox-like behavior and
can be cronned:
[https://rclone.org/commands/rclone_sync/](https://rclone.org/commands/rclone_sync/)

I really like the price of B2, I hope it stays low :-)

~~~
kachurovskiy
Why not use $60 you're paying per year for 1Tb of online storage to buy a 2Tb
hard drive and use it for the backup?

~~~
billyhoffman
Online/offsite backup is a different use case. They are paying $60/year so
that, if their house burns down, gets flooded, disk gets fried by lightning,
they still have their family pictures.

Local backup is cheap and fast, and you should do it too. But it doesn't
provide geographic redundancy.

~~~
kachurovskiy
Another $60 buys a water and fireproof safe for the hard drive. I assume it
also helps against lightning :-) I honestly think remote backup is an overkill
for personal needs and new risks you get by placing your valuable data on
someone else's hard drive are not always internalized.

~~~
scarface74
And you take your non redundantly stored hard drive out in 5 years and you
can’t retrieve data on it.

But realistically, isn’t it worth $60 a year to have a constantly backed up
hard drive? The alternative is to take the hard drive out of the safe every so
often and do a backup and put it back.

~~~
kachurovskiy
I've hard a few hard drives fail in the past decades but I've always been able
to retrieve the data. I'm far less confident in a remote company.

Your point is valid and constant, seamless backup is indeed a good thing to
have. Whether it's worth $60/year is one's own decision.

~~~
eitland
> I've hard a few hard drives fail in the past decades but I've always been
> able to retrieve the data

Care to share what kind of procedures you use?

I've recovered some data for friends and employers who want it back but aren't
prepared to pay > USD1000 for it but if I cannot connect to the disk I'm lost.

(My tricks: tilting the disk, freezing the disk, leaving ssds powered on but
sata unconnected, and even before that photorec and ddrescue etc.)

Note: don't do any of the above if data needs to be recovered at any cost, in
that case just contact a data recovery company.

------
weitzj
It seems like nobody mentioned it, yet. Another great product is
[https://www.rsync.net/](https://www.rsync.net/) and this just works. There
are no bad surprises. You can overshoot your backup limits, and they will send
you an email to fix this. But still you have your backup.

Your interface is rsync/scp/ssh.

They give you ZFS snapshots, you can use s3cmd from their machines, so you can
delegate uploads to S3 via rsync.net.

Our prior backup setup was duplicity with GPG hitting S3, and this sometimes
was flaky for listing the current keys.

Glad I read HN, I heard about rsync.net. They even have/had a HN discount. You
should use the search functionality to find other threads.

[https://www.rsync.net/products/platform.html](https://www.rsync.net/products/platform.html)

[https://arstechnica.com/information-
technology/2015/12/rsync...](https://arstechnica.com/information-
technology/2015/12/rsync-net-zfs-replication-to-the-cloud-is-finally-here-and-
its-fast/)

~~~
alexktz
Great product but too $$$$$$$$$$$$$ for my taste.

~~~
coaxial
The Borg/Attic/HN discounted price is a quarter of the regular price IIRC.
Well worth it IMHO. They're reliable, answer emails very fast, and are happy
to provide technical help should you need it to configure your system.

~~~
StavrosK
I use their Borg discount as well, and am extremely happy with it. I do wish
it were cheaper, but I get 150 GB for $50/yr, which is enough for me with
careful rationing.

I wish I had a TB for $50 so I didn't have to be so judicious with my photos,
but the ability to use Borg is so fantastic that I can't complain.

~~~
rsync
Your account is about to get larger ...

We lowered the borg rate to 2c this past month and, as is our policy, existing
accounts get enlarged to match ...

So in the near future, your 150 GB account will become ~208 GB in size ...

~~~
StavrosK
That's great news, thank you!

------
krn
Hetzner Storage Box[1] is an interesting alternative to Backblaze B2. It's not
cloud-based, but provides free automated snapshots, free 1 Gbps bandwidth, and
supports FTP, FTPS, SFTP, SCP, rsync and BorgBackup[2].

[1] [https://www.hetzner.com/storage-box](https://www.hetzner.com/storage-box)

[2]
[https://wiki.hetzner.de/index.php/Storage_Boxes/en](https://wiki.hetzner.de/index.php/Storage_Boxes/en)

~~~
urtrs
I created an account with them recently and they asked for my passport or id
card for authentication. Is this usual?

~~~
deno
Yes, very usual for EU companies. Make sure to blank anything sensitive. If
this seems weird, consider how many places require your Social Security in US.

~~~
ar-jan
Not in my experience. Hetzner is the only one out of a dozen that I recall
asking for ID.

------
metakermit
BTW, the Backblaze team themselves officially support a Python implementation
of the API wrapper (the B2Api and Bucket classes):

[https://github.com/Backblaze/B2_Command_Line_Tool/blob/maste...](https://github.com/Backblaze/B2_Command_Line_Tool/blob/master/b2/api.py)

As these are used internally in their CLI, there's probably a higher chance
that they'll continue to work in the future.

~~~
gsibble
Indeed. It's just never been publicized and personally I think the
implementation could be better. Although with so many stars on this repo now I
plan on maintaining it for the foreseeable future. Start adding feature
requests everyone! :)

PS: They also don't even use their own library in their code examples so I
don't think they meant it to be used in that fashion.

~~~
metakermit
That would be great – it's always good to have some competition ;)

Regarding feature requests I'd love to see a well-maintained B2 Django
Storage. I'm currently using an existing implementation, but it's not that
well maintained:

[https://github.com/royendgel/django-
backblazeb2-storage](https://github.com/royendgel/django-backblazeb2-storage)

~~~
gsibble
I saw that one. I'm not a particular fan of Django but integrating my library
apart from Django's storage library wouldn't be difficult. Neither would be
building a django library on top of mine. Any takers? :)

------
scarface74
I use Backblaze now and once I get my NAS, I’ll probably end up using a B2
based backup. But let’s make an honest comparison. Backblaze does not
replicate your data across data centers. The standard S3 storage class does
(0.23/gb). The comparible storage class for S3 is one zone infrequent access
(.01/gb). B2 still comes out ahead, but I wouldn’t use either one for primary
storage. For thier suggested “3-2-1” backup strategy, sure.

Then again, just for backup, I could use S3 glacier for $.004/gb. That’s
cheaper than B2 and I get multiple AZ storage. The data charges would be
higher - but its backup. If catastrophe struck and I lost my primary _and_ my
local backups, getting my data fast is the last thing I would worry about.

~~~
josteink
> Then again, just for backup, I could use S3 glacier for $.004/gb

Having done that in the past, I have to say that's just a million times less
practical than basic S3-like storage. And if you want to _automate_ that
setup, Glacier is even worse.

~~~
scarface74
Why do you say that?

I could see using something like rsync + Cloudberry (maps S3 and make it look
like a network drive). Set it up to use one zone infrequent access, and then
after x days use a lifecycle policy to move it to Glacier.

My use case for backups is solely for movies and music. For source code I use
hosted git repos, pictures Google photos, and for regular office documents,
they are either on Google docs or One Drive.

~~~
josteink
Last time I used Glacier, it was a separate product from S3 and had its own
API.

You had to upload pre-prepared "tapes" for backups. You couldn't mutate an
existing backup, you had to create a new one. And frequently fetching and/or
deleting existing "tapes" (backups) would cost you money (more so than the
original cost of the backup).

That meant you couldn't just ZIP it all up, backup the latest version and the
delete the previous one to avoid being doubly charged for storage either.

Basically at time of archiving you needed to determine what was already
archived and create a new bundle with only what's new, and archive that only.
In the same spirit, restore meant piecing together multiple such tapes into a
full restore-set.

Absolutely terrible. It was like having traditional backup-software
constraints, but none of the software-support.

If Amazon has improved on that now, good for them, but I figured they probably
had to if they wanted any users at all.

~~~
scarface74
Honestly, I’ve never used the Glacier api directly. I’ve only used it as part
of a lifecycle policy where objects were stored in S3 and then using the
console to have AWS migrate data after a certain amount of time.

My offsite backup would only be accessed in the case of catastrophic failure -
my primary and local backup data is unavailable. Data transfer does cost more
but if I had that type of catastrophe, worrying about getting my movies back
for my Plex server would be of little concern. Everything that I would care
about - source code, photos, documents etc are stored other places.

That’s another strike against Backblaze backups (not B2 based backups). When
we were in between residences last year - we left our apartment when the lease
was up and stayed in an extended stay waiting for our house to be built, my
main computer was offline for 5 months. One more month and my Backblaze backup
would have been deleted. I forgot about it and I restarted my computer before
I reconnected my external drive - so my backup from my external drive was
erased from Backblaze as soon as I came back online. It wasn’t catastrophic
but irritating. Luckily I have gigabit upload.

------
Aissen
According to online.net "cold storage" C14 comparison, they are cheaper than
Backblaze, most of the time:

[https://www.online.net/en/c14](https://www.online.net/en/c14)

~~~
qeternity
C14 is really not at all an object store. Getting data in and out is a huge
pain, even compared to other cold stores like AWS or OVH. We evaluated them
and passed.

~~~
Aissen
Thanks, I was looking for this type of feedback. The pricing comparison is
still interesting.

Could you do a summary of your evaluation for others that didn't test most
services?

~~~
qeternity
Yeah, here's a quick stab:

S3 - not much to say, fast, durable, expensive...the gold standard. Given
limitations of below, we use for rotating nightly backups despite cost.

Glacier - great for cold storage/archive, but has 90 day minimum

OVH hot - open stack based, cheaper than S3 but not absurdly cheap, charged
for egress even intra-DC which is absurd and kills many use cases. They have
crippled OpenStack permission management (i.e. no write-only keys with
lifetime management per bucket which is necessary for doing backups securely)

OVH cold - charges for ingress but then storage is crazy cheap, and egress not
as bad as Glacier. This is our preferred archival option.

C14 - not object storage, more like a "cold" ftp dump

B2 - pricing is epic, S3-incompatibility is a pain and lack of Backblaze-
sponsored libraries (the library in the python b2 cli is not a proper
API)...we've been working on adding B2 to WAL-E. However, their
permission/user management doesn't cut it.

Wasabi - S3 compatible, great pricing if not for 90 day minimum, which they
hide in the fine print

~~~
brianwski
Disclaimer: I work for Backblaze.

> B2 - their permission/user management doesn't cut it

Have you seen the new "Multiple Application Keys" APIs we have published docs
for (and the release coming in a week or two)? I'm curious if they satisfy
your permission needs. The docs are here:
[https://www.backblaze.com/b2/docs/application_keys.html](https://www.backblaze.com/b2/docs/application_keys.html)

A screenshot of the web GUI to these keys is here:
[https://i.imgur.com/RdlgdAs.jpg](https://i.imgur.com/RdlgdAs.jpg) (NOTE: the
web GUI does not expose the full power of the multiple application keys, it is
meant to be easy to use and hopefully satisfy 95% of customer's needs.)

~~~
AgentME
That looks great for me at least! I'd been using B2 a bit personally, but had
written off using it for any serious projects because of the inability to make
extra restricted per-project(bucket) API keys.

------
logeek
Is there a fundamental reason why B2 is (and will remain) cheaper than S3, or
is it just because they need to compete with AWS and once successful the
prices will be the same (or higher)?

~~~
programmarchy
From my understanding, they've put a lot of work into lowering the cost of
storage. I know at one point they were using arrays of consumer-grade drives,
and they've done a bunch of analysis on the cost and reliability of drives on
the market. They also created the "Storage Pod" [1] to maximize storage
density.

[1] [https://www.backblaze.com/blog/open-source-data-storage-
serv...](https://www.backblaze.com/blog/open-source-data-storage-server/)

~~~
Aissen
Every cloud company does that. Google, Amazon, MS, all use consume-grade
drives with software on top to reduce costs and increase reliability at a
fraction of traditional enterprise storage solutions. Scality even provides a
proprietary solution to do the same on-premise.

------
voltagex_
Are you able to try multithreaded uploads too? I found that single stream
uploads were too slow (< 10 megabytes per second) but I could get ~35
megabytes per second from packet.net to B2 by using 4 threads.

Edit: removed incorrect stuff.

~~~
thisacctforreal
If you use the large_file API (needed for multithreaded uploads), you do only
hash chunks not the file.

~~~
brianwski
Disclaimer: I work at Backblaze.

> If you use the large_file API (needed for multithreaded uploads)

We recommend for small files that you use multi-threaded where each thread
sends a totally separate file. So if you have to upload both cat.jpg and
dog.jpg, you upload cat.jpg in one thread and dog.jpg in another thread.

Based on the Backblaze architecture, that means cat.jpg will be sent to one
"vault" in the Backblaze datacenter with one thread, and dog.jpg will be sent
to a totally different "vault" in the Backblaze datacenter with another
thread. This scales incredibly well, in that it should be twice as fast for
two files, and 20 times as fast for 20 files if you do it correctly.

Source: I wrote a lot of the Backblaze Personal Backup client, which uses this
philosophy.

------
tobias3
This whole library wouldn't be necessary if Backbalze implemented a S3
compatible API. They give reasons like being able to load balance on the
client for their API (which I do no think is a good reason), but ultimately
they just push work from their end to a lot of applications and developers.

Maybe it also has a strategic advantage? Now every product has to announce
they support B2 whereas nobody has to announce they support Wasabi, because
they support any S3 compatible storage such as AWS S3, Google Cloud Storage or
Wasabi.

~~~
ianamartin
Meh, I can see why they didn't. they aren't really in the same business. It
makes sense for followers to implement APIs compatible with market leaders.
Riak CS is API compatible with S3, which is nice. But it's literally intended
to be an open source version of S3 that you can host and scale yourself.

Backblaze is in a different market. They may be finding out that there's
overlap and allowing that use. But they are not the same and probably aren't
prepared for developers to start using b2 en masse.

I think it makes business sense. You want to save some money? Do a little
extra work for the cheaper product. Want to save even more money? Roll your
own with Riak CS. Cloud services all work along the same spectrum where you
pay more for convenience and ease of use, and you pay less up front if you're
willing to pay in developer or devops or infrastructure costs. I think this
fits in nicely on that spectrum.

------
dzek69
Hah, I hadn't heard about Blackbaze in a while and I was even thinking about
creating Ask HN asking if anyone was maybe using Blackbaze for a longer while
and can say something about them (speed, data reliability). Now I'll take my
chance:

Had you used Blackbaze B2? How was your experience?

~~~
chrisper
I have used B2 for a couple of months. From Europe it is way too slow. They
only have like 2 DC, which are both on the west coast I believe.

As I am using this for backups only, I went back to Google Drive where I can
max out my Gbit upload.

~~~
tribaal
If you are connecting from Europe, maybe you could consider exoscale.com 's S3
storage offering?

It's 100% european (with Swiss and German regions), and priced pretty
aggressively. [https://www.exoscale.com/object-
storage/](https://www.exoscale.com/object-storage/)

Disclaimer: I now work for exoscale, but was a happy user before that.

~~~
sschueller
Cool, you guys should send out mailings to your customers when new offerings
come online. I asked about s3 at the end of last year and was told it is
currently unavailable for new customers. I haven't thought to check if it was
available until now and may have gone to a more expensive competitor.

------
piqufoh
This feels like a handy tool! The first thing I read when opening new code is
the test suite - it's worth getting that right at the start. Would you
consider deeper unit testing? S3 (and aws) have the indispensible `moto` boto
mocks, I think something similar would be dead handy here.

~~~
gsibble
Yes, on my TODO list is much deeper unit testing. I made this in four days and
was just testing that it worked. It already has about 92% code coverage but I
want to cover that fields are returned properly and such. Some help would be
appreciated if people would like to, including mocking it up.

------
post_break
I just wish Backblaze would fix their snapshots. You still have no way to tag
a snapshot, put in any notes, anything. You literally make two snap shots 5
minutes apart and they only thing that differentiates them is the time stamp.
unforgivable.

~~~
brianwski
Disclaimer: I work at Backblaze.

> You still have no way to tag a snapshot, put in any notes, anything.

We totally agree, and the project is fully spec'ed, just waiting for an
available engineer to implement it! On a side note, we also have open recs for
engineers. :-)

~~~
post_break
Thank you! Please send it out in an email or something when it's ready. We've
been waiting for this for a while now which is why I'm so grumpy lol.

------
erickj
Hey that's cool. I made a B2 Ruby gem a few years back that gives a library
and cli for the API. +1,000,000 for backblaze!

[https://github.com/erickj/bkblz](https://github.com/erickj/bkblz)

~~~
gsibble
Awesome!

------
Dawny33
The best part is this API looks and feels very similar to AWS's boto S3 API.

Great job!

~~~
gsibble
Thank you!

------
DanielDent
B2's API design has security implications:
[https://www.danieldent.com/blog/restless-vulnerability-
non-b...](https://www.danieldent.com/blog/restless-vulnerability-non-browser-
cross-domain-http-request-attacks/)

~~~
voltagex_
I don't see how B2 is affected by this unless you assume malicious control of
a B2 API server.

~~~
DanielDent
It's fairly standard for network clients to assume potential malicious control
of the server they are connecting to.

It helps reduce the blast radius of a compromised server.

In the case where the server is operated by a third party (as is the case with
the B2 API server), there can be many compliance implications if that third-
party-operated server has access to an internal network.

We don't accept when SSH clients or web browsers have the ability to do things
they shouldn't based on instructions sent by the server they connect to.

Why would we suddenly have lower expectations of our file storage API clients?
(or any other network/HTTP clients for that matter)

~~~
voltagex_
Ah, I see what you're getting at. It'd be better if the URL for get_upload_url
(I think that's what the API was called) could be calculated client side.

At the moment, you're probably still more at risk of downloading a malicious
library from PyPi or npm but this is sure to turn up in a CTF at some point -
even curl is technically vulnerable.

Have you talked to anyone from Backblaze about this?

~~~
DanielDent
Yes, client-side URL calculation and/or a whitelist of acceptable URLs would
be a significant improvement.

Thankfully command-line curl won't follow redirects unless you pass it a
special flag, though if you do need it to follow redirects, I'm not sure what
the best way is to restrict the range of redirects that it will follow.

This issue was part of a broader coordinated disclosure and was only published
today. I've gotten in touch with B2 support & I'm hoping my support ticket
will make it to the correct people.

------
simula67
> Backblaze B2 provides the cheapest cloud object storage and transfer
> available on the internet

Are you sure ? [http://gaul.org/object-store-
comparison/](http://gaul.org/object-store-comparison/) says there are cheaper
options

EDIT: Edited the link away from
[https://wasabi.com/pricing/](https://wasabi.com/pricing/). Wasabi seems
cheaper than B2 and claims to be a hot storage solution

~~~
gsibble
That does indeed look cheaper. I'll have to make an SDK for them next :D

edit: Woah, you edited your comment. What made you change from Wasabi?

~~~
qeternity
Wasabi has a 90 day minimum storage period, just like AWS Glacier. This means
it's pretty unusable for things like nightly backups if your retention is less
than 90 days.

~~~
koolba
If your backup process involves downloading the backup artifacts in a
different region (say for true off site DR + validation) then it's still a net
win as the 90-day storage costs are less than the insanely high $.09/GB AWS
charges for outbound bandwidth.

------
Sami_Lehtinen
Is this the best way of dealing with Python 3 compatibility? What's wrong with
sys.version_info? Or is there even some better way?

Ref:
[https://github.com/sibblegp/b2blaze/blob/master/b2blaze/util...](https://github.com/sibblegp/b2blaze/blob/master/b2blaze/utilities.py)

------
Neil44
Duplicity has support for B2 back end now, I use it for a lot of backup and it
works well. Versioning, encryption etc.

~~~
gsibble
This library is meant more for web apps and other applications built on
Python. Not necessarily for backups.

------
gtt
Can I use B2 as replacement for S3? Namely, I need "get upload URL" like pre-
signed S3 URLs.

------
artellectual
That's funny, I did the same thing for elixir, feel free to check it out
[https://github.com/upmaru/upstream](https://github.com/upmaru/upstream). Not
well documented yet but it's coming.

------
jonotime
I'm loving B2 for my Linux desktop backup. I used Crashplan for many years
until they pulled the plug recently. Now I'm using B2 via Duplicati and I'm
actually saving money (I have about 500Gb of backup).

Borg was a runner up, but Duplicati had built in B2 support, provided
scheduling, and a web interface which makes navigating for specific files in a
tree easy when needed.

Crashplan had a nice Linux client, but it was blackbox/closed source, so
problems came up from time to time that were hard to debug. So its nice to
have more control of my data as well.

------
scottybowl
Any ideas on using Backblaze as a driver in a Laravel project? We're currently
using Digitalocean Spaces but it's very unreliable.

~~~
gsibble
I believe Laravel is PHP so you'd need a PHP library while this one is Python.

------
hinkley
A little off topic. Did I miss the backblaze hard drive reliability report? I
think it's been a while since they've done one.

------
iosDrone
I thought the knock against backblaze is that (compared to Google or Amazon),
uploading large amounts of data to it is very slow.

~~~
brianwski
Disclaimer: I work for Backblaze so I'm biased.

> uploading large amounts of data to [Backblaze B2] is very slow

All reports we (Backblaze) hear is that if you only use one thread, Backblaze
B2 is slightly slower than S3 (like maybe 90% of the performance). If somebody
has better numbers I would LOVE to see them!

If clients use multiple threads, this issue goes entirely away. Using 500
threads can provably be 500 times as fast with Backblaze. This is because the
Backblaze B2 architecture means there are no "choke points" like Amazon S3
has. Each thread will most likely be talking to a completely different
"Backblaze Vault" maybe even in a completely separate Backblaze datacenter.
Since they don't share any network switches or load balancers in common, there
is no way they will slow down.

But again, I would love any measurements or reproducible tests showing
differences so we can chase them down and improve Backblaze B2!

~~~
iosDrone
Fair enough. This is just what I've heard second-hand, so I definitely don't
have any data to back it up. I may try it out.

------
gsibble
admins: Will you please change the title to a Show HN? Thank you!

~~~
sctb
Sure, done!

~~~
gsibble
Thanks!

------
brian_herman
B2 does not have the durability guarantees that s3 has it is not a fair
comparison.

