Hacker News new | past | comments | ask | show | jobs | submit login
Using Backblaze B2 and Cloudflare Workers for free image hosting (jross.me)
282 points by CherryJimbo on Aug 25, 2019 | hide | past | favorite | 86 comments

Edit: See replies below. Cloudflare CEO says this use case is fine.

This is a cool project and something I will probably use for some hobby projects.

I would caution against it for anything more than a hobby project as it violates the Cloudflare TOS:

> 2.8 Limitation on Non-HTML Caching

> The Service is offered primarily as a platform to cache and serve web pages and websites. Unless explicitly included as a part of a Paid Service purchased by you, you agree to use the Service solely for the purpose of serving web pages as viewed through a web browser or other application and the Hypertext Markup Language (HTML) protocol or other equivalent technology. Use of the Service for the storage or caching of video (unless purchased separately as a Paid Service) or a disproportionate percentage of pictures, audio files, or other non-HTML content, is prohibited.


For something small, they won't care. If your images make the front page of reddit, you might get shut down.

That’s only for our traditional service. For Workers the ToS is different. Don’t see anything troubling about this project!

Thanks for engaging directly with the community!

So this wouldn't be allowed if he didn't use workers to make a redirect?

Hey Matthew! It's a darn cool project!

Workers are being used to do some URL rewriting.

The main point of this article is to use a Cloudflare cache-everything rule and use that caching to create a free image host. From the article:

> I'd heavily recommend adding a page-rule to set the "cache level" to "everything", and "edge cache TTL" to a higher value like 7 days if your files aren't often changing.

The guy you replied to is the CEO of cloudflare. If he says it's OK then I'm pretty sure it's OK!

^—- what he said.

In that case, thanks for the blessing. :)

I am not saying not to trust the word of the CEO, but this exact use case of using cloudflare as a image hosting comes up a lot on HN.

The word on the street is that they will start throttling and contacting you once you hit several hundred TB per month. [1][2][3][4][5][6]

Of course this is still extremely generous and the upgrade plans are usually still several orders of magnitude cheaper than any cloud provider per gb. But don't build a business or hobby project around cf providing unlimited free bandwidth forever.

[1] https://news.ycombinator.com/item?id=20139191

[2] https://news.ycombinator.com/item?id=19368684

[3] https://news.ycombinator.com/item?id=13580113

[4] https://news.ycombinator.com/item?id=12826389

[5] https://news.ycombinator.com/item?id=5214480

[6] https://news.ycombinator.com/item?id=19829740

(basically search HN for cloudflare + non-html)

To be fair, I was expecting to be contacted for this unlimited service way way waaay before hundreds of TB per month.

He might be the CEO but things change after an IPO. I hope Cloudflare stays great. I love their service.

Not only do things change but CF has hundreds of employees that weren't CC'd on that informal permission so there's still a high chance of being inconvenienced, and there's a decent chance the CEO won't be at your disposable should a problem occur.

Should CloudFlare later ban you for the practice, will the random support person you reach unpack the CEO's comments here and ensure nothing changed internally that prevents allowing your continued use and advocate restoring your account for you?

It’s not so different from when your company provides a perk that you expect not to see again.

If the perk saves you money, you put that money in savings. Once your budget expands to depend on that perk you are trapped, and when it goes away the pain will be noteworthy.

In words that are more applicable to a business case: You have to have a strategy for when the Too Good To Be True situation ends, because it will, and you have less control over when than you think you do.

Backblaze is cheap, but if you're uploading millions og files, beware -- there is no way to just nuke/empty a bucket with a click of a button. If you're not keeping filename references in an external database, you are left to sequentially scan and remove files in batches of 1000 in a single thread.

Support could not help, and it took me months to empty a bucket that way.

That doesn't really make any sense, Backblaze were not limiting you to a single thread - you were...

You do need access to an index/DB of all files in a bucket in order to delete them in parallel. Otherwise you're stuck paginating with the B2 API.

You need a DB of all of the dead entries that need to be deleted, and that’s a fine thing to have.

There are lots of problem spaces where deletion is expensive and so is time shifted not to align with peak system load. Some sort of reaper goes around tidying up as it can.

But I think by far my favorite variant is amortizing deletes across creates. Every call to create a new record pays the cost of deleting N records (if N are available). This keeps you from exhausting your resource, but also keeps read operations fast. And the average and minimum create time is more representative of the actual costs incurred.

Variants of this show up in real-time systems.

My case was really simple. I was done with my ML pipeline and nuked the database, but pics in B2 remained with no quick way to get rid of them and/or to stop the recurring credit card charges.

IMO an "Empty" button should have been implemented by Backblaze.

Would this technique have been faster?

A single pass: paginating through all entries in the bucket without deletion, just to build up your index of files. And then using that index to delete objects in parallel.

I believe S3 is the same way.

S3 has an "Empty bucket" button, unlike B2.

Disclaimer: I work at Backblaze.

> no way to empty a bucket.

Backblaze currently recommends you do this by writing a “Lifecycle rule” to hide/delete all files in the bucket, then let Backblaze empty the bucket for you on the server side in 24 hours: https://www.backblaze.com/b2/docs/lifecycle_rules.html

Very good info. Didn't know B2 is cheaper than S3.

It’s cheap but it’s proving unacceptably slow for me - sometimes I see 2.5s TTFB for accessing tiny audio files in my region (Berlin, EU). Server uploads are also quite unreliable, had to write a lot of custom retry logic to handle 503 errors (~30% probability when uploading in batch).

Great for it’s intended use (backups), but I’ll be switching to an S3 compatible alternative soon - eyeing Digital Ocean Spaces or Wasabi...

Wasabi is great. Same price but much more usable because it follows the S3 API. All cloud storage tools work seamlessly with it.

B2's API is frustrating to use and has limited compatibility, and also throws errors that need to be constantly handled, as you found.

Wasabi also has free egress plan if you don't download more than your entire storage account per month.

Wasabi is not the same price as B2.

B2 is: - .5 cents/GB/mo - 1GB/day free egress, 1 cent/GB after - generous free API call allowances, cheap after that

Wasabi is:

- $0.0059 cents/GB/mo (18% higher) - all storage billed for at least 90 days - minimum of $5.99 per month - this doesn't include delete penalties - all objects billed for at least 4K - free egress as long as "reasonable" - free API requests - overwriting a file is a delete, ie, delete penalties


With HashBackup (I'm author), an incremental database backup is uploaded after every user backup, and older database incrementals get deleted. Running simulations with S3 IA (30-day delete penalties), the charges were 19 cents/mo vs 7 cents/mo for regular S3, even though S3 is priced much higher per GB. So for backups to S3, HashBackup stores the db incrementals in the regular S3 storage class even if the backup data is in IA.

For Wasabi, there is no storage class that doesn't have delete penalties, and theirs are for 90 days instead of 30.

It used to be $0.0049 for the free egress plan so that's changed then. They do have lower storage pricing if you are on a paid-egress plan which is the same as Backblaze.

Either way, Wasabi is about simplicity and doesn't have any concept of storage classes. It's true that there's a 90-day min storage fee involved but that's only an issue if you're deleting constantly.

Those stats sound insane to me, and certainly don't reflect what I see.

I see 50ms or less TTFB, for images in the sub 200Kb range, and for videos in the 500Mb+ range, from Australia where the internet is still terrible.

I've only ever a single serve upload fail me - and it occurred when an upload hit a major global outage of infrastructure. In two years of regularly uploading 8Gb/200 files a fortnight (at the least), I've never needed custom retry logic.

If you are seeing 50ms TTFB between B2’s only datacenter (in California) and Australia, there is something wrong with your methodology or you have discovered FTL communication.

I've been seeing pretty bad upload failures (probably around 30%) for uploading hundreds of 30-40 MB files per month to B2 from New Zealand since I started using B2 over a year ago.

And I'm not convinced it's connectivity issues, as I can SCP/FTP the same files to servers in the UK...

When I test using an actual software client (Cyberduck) to do the same thing to B2, I see pretty much the same behaviour: retries are needed, and the total upload size (due to the retries) is generally ~20% larger than the size of the files.

Interesting. I have a webm media website where I've migrated hundreds of thousands of videos about that size from s3 to b2 with thousands of additional per month with almost zero issues. I didn't even have/need retry logic until I was on horrible internet from a beach for a month where long connections were regularly dropped locally.

Felt TTFB and download speed were great too considering the massive price difference compared to s3. Though also used Cloudflare workers anyways to redirect my URLs to my b2 bucket with caching.

How well can you cache the worker responses on CF? Can you prevent spinning one up & therefore incurring costs after the first given unique URL request is handled? Looking into now.sh for a similar use case (audio), but pondering how to handle caching in a bulletproof way as I'm afraid of sudden exploding costs with "serverless" lambdas...

Had a similar experience with B2, Wasabi was much faster in my testing.

You're very welcome - I'm glad it was helpful. B2 is significantly cheaper than S3, especially when paired with Cloudflare for free bandwidth. If you're interested, my company Nodecraft talked about a 23TB migration we did from S3 to B2 a little while ago: https://nodecraft.com/blog/development/migrating-23tb-from-s...

How's the outgoing/ingress bandwidth comparison? Outgoing bandwidth is expensive on AWS.

It's entirely free if you only use the Cloudflare-routed URLs thanks to the Bandwidth Alliance: https://www.cloudflare.com/bandwidth-alliance

How does CloudFlare themselves afford to give bandwidth for free? I understand that I can pay $20/mo for pro account but they also have a $0/mo option with fewer bells and whistles. What gives them the advantage to charge nothing for bandwidth?

Because we’re peered with Backblaxe (as well as AWS). There’s a fixed cost of setting up those peering arrangements, but, once in place, there’s no incremental cost. That’s why we have similar agreements to Backblaze in place with Google, Microsoft, IBM, Digital Ocean, etc. It’s pretty shameful, actually, that AWS has so far refused. When using Cloudflare, they don’t pay for the bandwidth, and we don’t pay for the Bandwidth, so why are customers paying for the bandwidth. Amazon pretends to be customer-focused. This is a clear example where they’re not.

Maybe Amazon is afraid of sabotaging Cloudfront and losing revenue coming from outgoing data transfers?

Thank you for clarifying. If I were to use a Google cloud service from a Cloudflare Worker would there be no bandwidth charges? That would change everything for us.

Bandwidth between GCP and Cloudflare isn't free, unlike with Backblaze, but the cost is reduced.


as AWS is primary cash cow for Amazon I doubt they would ever change that. Bandwidth fees are a key profit maker for them on the other hand AWS's crazy bandwidth pricing is prob. pretty beneficial to driving customers towards you guys.

Their terms indicate it should be used for caching html primarily. So if they find costly abusers, they could use this clause to get them to upgrade to a paid tier.

> 2.8 Limitation on Non-HTML Caching The Service is offered primarily as a platform to cache and serve web pages and websites. Unless explicitly included as a part of a Paid Service purchased by you, you agree to use the Service solely for the purpose of serving web pages as viewed through a web browser or other application and the Hypertext Markup Language (HTML) protocol or other equivalent technology. Use of the Service for the storage or caching of video (unless purchased separately as a Paid Service) or a disproportionate percentage of pictures, audio files, or other non-HTML content, is prohibited.


wasabi has it free and (afaik is cheaper than s3 and B2)

According to their documentation, Wasabi will move you on to their $0.005/GB storage $0.01/GB download plan (same price as B2 without api charges) from their $0.0059/GB storage free egress plan if you download more than your total storage (eg. with 100TB stored don't download more than 100TB/month).


Any web service from amazon is usually on the pricey side of normal.

You get more features and higher cost. Basically B2 gives you storage, but not detailed access management, endpoint internal to your vpc, and other extras.

It is but I think they only have US datacenters.

For backups, media and archival use-cases it looks really good for the price if you can live with it being in the US.

If you are doing any large data-processing using S3 you get the advantage of data locality, with VPC endpoints you can also bypass NAT gateway data charges and get much higher bandwidth.

> backups, media and archival use-cases it looks really good for the price if you can live with it being in the US

For these use cases S3 has lower pricing tiers (down to 0.1¢/GB-mo, matched by Azure and promised by GCP).

I think B2 is still only single-datacenter.

Same, thank alone is very useful takeaway for me :).

I didn't realize workers had a free plan. I avoided trying it for a while.

They added it very recently.

Here's the script, edited to cache files forever:


I wrote a simple uploader script that adds a random ID to each upload so they don't clash, but this will work fine regardless.

Using CloudFlare workers to clean up the URLs is neat but it seems like it would be really easy to reach the limits of the free tier.


You get 100,000 request per month and up to 1,000 requests in a 10 minute timeframe. So if you have a page with 10 images on it and you get 100 people visiting that page within a 10 minute timeframe, you will use up all of your free tier and all new visitors will get a 1015 error.

For paid plans you must pay at least $5 and get 10 million requests included and additional requests are 50 cents per million.

You get 100,000 requests per day, not per month. The burst limits are definitely a concern for heavy traffic, but for just $5 you can remove the burst limits entirely, as you mention.

Wow I was just able to drop Google Drive for basic storage with this and rclone. Thank you!

We're you able to use Google Drive for web image hosting somehow?

images, pirated videos, you name it


Google seem not to care

The one I linked works fine and rides on Google Drive, other pirate streaming sites often use Youtube as a backend, or even Google Docs!

How are we feeling about Cloudflare stability versus AWS these days?

They rarely (or never?) go down at the same time for any reason, other than the standard Internet BGP drama that all providers are at risk of and have no control over.

> Backblaze has a 10GB free file limit, and then charges $0.005/GB/Month thereafter.

is this true, I can have 110GB cloud storage for 0.5$ per month? it sounds TGTBT


One way they can achieve that pricing is by using consumer drives, instead of enterprise drives. See https://www.backblaze.com/blog/vault-cloud-storage-architect...

That doesn't seem completely fair. Much like blaming google for using normal commodity servers compared to Altavista using high end enterprise hardware. What matters is the reliability of the system, not some random part.

Backblaze is quite transparent about how they do things. They publish their drive reliability numbers (including brand/model numbers), storage pod design, and how their sharding/redundancy works.

Seems like most cloud storage vendors just say "We do object storage right handwave and we have lots of 9s". Backblaze says they shard your data into 20 pieces onto 20 servers and can recover with any 17 of those pieces. More details at https://www.backblaze.com/blog/reed-solomon/

Sure that's not enough redundancy for some, but at least you know what to expect and can plan accordingly. I've not see any other cloud vendor do that. Please post URLs for similar info from other companies.

1TB OneDrive is $80/year ($6.66/month). So 110GB is $0.73/month. Not that far from 50 cents.

That 1TB you always pay for, while Backblaze is pay-per-use.

Protip - get a o365 home instead.

5x 1TB for like 50 bucks. Also skype minutes and office software

that's 50 bucks per month vs 80 bucks per year

o365 home is $99.99/yr (not $50/mo), and allows up to 5 users, each of whom gets their own 1TB OneDrive allotment, evergreen desktop and mobile office software, skype minutes, etc.

It's a much better deal than paying $80/year for 1TB of OneDrive if you have 2+ users.

Can you host public web assets on OneDrive?

Yes, same as dropbox and the others.

Dropbox and Google Drive both removed HTML hosting over the past few years. With drive you can't even get direct links to images etc anymore. Not sure if public Dropbox files have the same limitation.

Ironically, none of the images on the page are loading right now because of "Error 1101 Worker threw exception". So, you know, caveat emptor.

That was entirely my bad. I was moving a few things around.

From what I'm seeing workers are used only for URL rewriting. This can be acheived much simpler with page rules.

Workers are also used for basic CORS headers, and stripping some other unnecessary headers. They're definitely not required, but I don't believe you can do URL rewriting with page rules; redirects, sure, but not rewriting.

Some possibly simpler alternatives to consider: Google Photos, Netlify, Gitlab/Github Pages

There are actually a couple pretty serious limitations of the Google Photos API:


Do GitHub pages integrate nicely with the large file support for storing binary media?

Or just use Imgur, or Discord

Imgur deletes photos after a while.

And compresses them heavily.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact