Author here and really cool link to Sippy. I love the idea here since you're really migrating data as needed so the cost you incur is really a function of the workload. It's basically acting as a caching layer.
And, if you're in the same boat as someone down-thread complaining about Cloudflare's uptime in recent weeks, you can keep S3 + Cloudfront (or Lightsail Buckets + Lightsail CDN) and S3 + CacheReserve on R2, which is what we do, and flip between them with DNS.
What are the economics that Amazon and other providers have egress fees and R2 doesn't? Is it acting as a loss leader or does this model still make money for CloudFlare?
Amazon doesn't have unit cost for egress. They charge you for the stuff you put through their pipe, while paying their transit providers only for the size of the pipe (or more often, not paying them anything since they just peer directly with them at an exchange point).
Amazon uses $/gb as a price gouging mechanism and also a QoS constraint. Every bit you send through their pipe is basically printing money for them, but they don't want to give you a reserved fraction of the pipe because then other people can't push their bits through that fraction. So they get the most efficient utilization by charging for the stuff you send through it, ripping everybody off equally.
Also, this way it's not cost effective to build a competitor to Amazon (or any bandwidth intensive business like a CDN or VPN) on top of Amazon itself. You fundamentally need to charge more by adding a layer of virtualization, which means "PaaS" companies built on Amazon are never a threat to AWS and actually symbiotically grow the revenue of the ecosystem by passing the price gouging onto their own customers.
AWS egress charges blatantly take advantage of people who have never bought transit or done peering.
To them "that's just what bandwidth costs" but anyone who's worked with this stuff (sounds like you and I both) can do the quick math and see what kind of money printing machine this scheme is.
Some people want to host a lot of warez and pirate movies and stuff but that doesn't monetize very well per GB consumed so pricing bandwidth high means those people never show up, thus saving a lot of trouble for AWS.
I remember when salesforce.com announced a service that would let you serve up web pages out of their database, it was priced crazy high (100-1000x too much) from the viewpoint of "I want to run a blog on this service" but for someone who wanted to put in a form to collect data from customers it was totally affordable. Salesforce knew which customers it wanted and priced accordingly.
You don't get charge for transit if you are sending stuff IN from the internet or to any other AWS resource in that region. So there is no QOS constraint inside except for perhaps paying for the S3 GET/SELECT/LIST costs.
It is pretty much exclusively to lock you into their services. It heavily impacts multi-cloud and outside of AWS service decisions when your data lives in AWS and is taxed at 5-9 cents a GB to come out. We have settled for inferior AWS solutions at times because the cost of moving things out is prohibitive (IE AWS Backup vs other providers)
Honest question, how is this different than a toll road? An entity creates a road network with a certain size (lanes, capacity/hour, literal traffic) and pays for it by charging individual cars put through the road.
There's at least a couple of reasons that your analogy doesn't really work.
First a lot of these roads are 'free' and yet you're still being charged for it. If two large networks come to an agreement then they connect the two networks (ie build that road), but no money changes hands.
Second if there is a paid peering agreement in place (ie say AWS had a cost to push your data out), that still wouldn't be billed to them in the way they're charging you. Instead they'd be paying for the rate of traffic at something like the 95th percentile of the max. This means that you could download a petabyte of data from them when the pipe isn't busy and cost them nothing, or you could download a gigabyte when it's busy and push up the costs.
The difference is that Amazon doesn’t own the road, they’re just a truck driver. Amazon customers rent space on the truck and pay whatever the driver asks them for.
somebody more knowledgeaeble please correct me if i'm mistaken, but i think the bandwidth alliance is really the lynchpin of the whole thing. basically get all the non-AWS players in the room and agree on zero rating traffic between each other, to provide a credible alternative to AWS networks
Also, for the CDN case that R2 seems to be targeting - regardless of the origin of the data (R2 or S3), chances are pretty good that Cloudflare is already paying for the egress anyway.
In the CDN case Cloudflare has to fetch it from the origin, cache (store) it anyway, and then egress it. By charging for R2 they're moving that cost center to a profit one.
Let's say you want to use cloudflare, or another CDN. The process is pretty simple.
You setup your website and preferably DON'T have it talk to anyone other than the CDN.
You then point your DNS to wherever the CDN tells you to. (Or let them take over DNS. Depends on the provider.)
The CDN then will fetch data from your site and cache it, as needed.
Your site is the "origin", in CDN speak.
If Cloudflare can move the origin within their network, there is huge cost savings and reliability increases there. This is game changing stuff. Do not under estimate it.
Completely free egress is a loss leader, but the true cost is so little (at least 90x less than what AWS charges) that it pays for itself in the form of more CloudFlare marketshare/mindshare.
I know from personal experience that "big" customers can negotiate incredible discounts on egress bandwidth as well. 90-95% discount is not impossible, only "retail" customers pay the sticker price.
That's still a 3-10x markup though. And it's also very dependent on your relationship with AWS. What happens if they don't offer the discount on renewal?
I’m inherently suspicious of services that are free (like Cloudflare egress). Maybe I’ve been burned too many times over the years, but I almost expect some kind of hostility or u-turn in the long run (I do really like Cloudflare’s products right now!).
I almost wish they had some kind of sustainable usage-based charge that was much lower than AWS.
Feel free to tell me why I’m wrong! I’d love to jump onboard - it just seems too good to be true in the long-term.
Because they're a CDN. You pay for storage already, so an object that isn't downloaded much is paid for. An object that gets downloaded a lot uses bandwidth, but the more popular it is, the more effective the CDN caching is.
There probably needs to be an abuse prevention rate limit (and probably is), but it's not quite as crazy as it sounds to just rely on their CDN bandwidth sharing policies instead of charging.
What happens if I host an incredibly popular file, and start eating up everyone else’s share of the bandwidth? ie - I become a popular Linux distro package mirror?
I do think there are “soft limits” in place like you say - it’s just my personal preference to have documented limits (or pay fairly for what you use). IMO it helps stop abuse, and prevents billing surprises for legitimate heavy use-cases.
They undoubtedly limit the % of bandwidth you can use when the link is full. The problem with that is that it's very hard to quantify, because whether or not they have spare bandwidth for you depends a lot on location, timing, and what else is happening on the network.
But that's really no different from the guarantee you get from most CDN services. If you're using cloudflare in front of S3, for example, you'll end up with the same behavior.
> But that's really no different from the guarantee you get from most CDN services. If you're using cloudflare in front of S3, for example, you'll end up with the same behavior.
But in my mind it’s also comforting that something like Cloudfront has a long-term sustainable model (I should also add with fewer strings attached like hosting video).
I do think the prices ant AWS are too high, but it discourages bad actors from filling up the shared pipes. ISPs are sometimes a classic example of what happens when a link is over subscribed.
Cloudflare’s “soft limits” are also somewhat of a dark pattern if you ask me. I like to know exactly how much something will cost, and it’s really hard to figure out with Cloudflare if you’re a high-traffic source. Do I hit the “soft limits,” or not? It’s really hard to say with their current model.
FWIW, I think Cloudflare is a great product right now - I am just skeptical they can keep it up forever.
Exactly this. Data has gravity, and this increases the gravity around data stored at Amazon...making it more likely for you to buy more compute/services at Amazon.
very true, but data gets stale very quickly. So you start putting new data in a new place. Eventually, you don't care about the old place. And all the people and processes who accessed the data in the old place are gone.
Completely agreed about data gravity, but it's not just that, it's also customer opted-in vendor-lockin.
The customer (because they are lazy, don't know better, aren't capable of, or all three) opts in to use various "convenient" CSP "services". These services could look convenient (and are always pretty to extremely expensive), they quickly becomes an integral part of the customer's badly architected "system".
The end result is complete vendor-lockin, the inability of the poor (stupid) user to leave and the continued gang rape of their bank account (also via additional, incompetent developer and devops "resources").
Throw in average modern "devops" who are hired to handle this. They aren't like the sysadmin of yesteryear, they no longer have experience with, or understand the bits and bytes. They are glorified UI clickers and YAML editors, they even lack any reasonable system level debugging skills. For every problem they encounter they first immediately run to google in search for answers.
In addition, I would argue that CSPs are a huge, huge waste of computing, space and power resources, because their systems completely encourage people to just do things, without understanding what they are doing, screw the consequences and just pay.
Result, the business suffers greatly (on so many levels), the CSP wins big and continues winning.
What happens here is that a system, if designed right from the get go, could have been run on a SINGLE, modern, high end, well positioned and connected server to the Internet, is now replaced with tens to hundreds of "instances" and random assorted CSP provided services -- what a colossal waste.
Books can be written on negligence, lack of understanding, utter tech stupidity and ultimately the costs which are absurd.
It's also a huge waste of human effort managing the complexity introduced by the cloud provider's arbitrary bullshit.
At this point multiple generations of engineers have little understanding of underlying layers of technology, having only really learned how to use cloud services. No TCP/IP, no UNIX, just a bit of bash and a ton of AWS.
Cloud providers do hide most of the low level complexity, which could be seen as a benefit (at least that seems to be what's touted as a main benefit, along with instant scalability.) Unfortunately they replace all of that with more arbitrary complexity which is ultimately (in my opinion, at least) a much bigger burden than the fundamental complexity that is abstracted away.
Greed on the cloud providers part, I think. You'd expect egress fees to enable cheaper compute, but there are other cloud providers out there like Hetzner with cheaper compute and egress, so the economics don't really add up.
Indeed, Hetzner is so much cheaper that if you have high S3 egress fees you can rent Hetzner boxes to sit in front of your S3 deployment as caching proxies and get a lot of extra "free" compute on top.
It's an option that's often been attractive if/when you didn't want the hassle of building out something that could provide S3 level durability yourself. But with more/cheaper S3 competitors it's becoming a significantly less attractive option.
Scaleway also, and they are fully S3 compatible. I use their glacier service for backup. I store 1.5TB for around 3€ per month.
I used the storage box from hetzner before but they only had 1TB or 5TB (and higher) choices so I had to pay for 5TB (€12 per month) without using most of it. Having rsync support was nice but rclone works fine with S3.
There has to be more to it than a pure loss leader, since there's also the Bandwidth Alliance Cloudflare is in, which allows R2 competitors like Backblaze B2 to also offer free egress, which benefits those competitors while weakening the incentive for R2 somewhat.
Here’s a tweet from Corey Quinn describing how bonkers R2 pricing is:
> let’s remember that the internet is 1-to-many. If 1 million people download that 1GB this month, my cost with @cloudflare R2 this way rounds up to 13¢. With @awscloud S3 it’s $59,247.52.
I left AWS in 2019, so my knowledge on the current recommendations & pricing is dated. But even back then we were strongly discouraging usage like this for S3 for both security and cost reasons. Cloudfront should be in front of your bucket serving the objects, and IIRC it’ll be 75% cheaper in most cases. Still doesn’t bring it within even a couple of orders of magnitude of the R2 price, but this comparison does feel like it’s painting a best case versus a worst case. And the worst case being an approach that goes against best practice recommendations that are at least half a decade old at this point (I will concede people absolutely still do it though!).
Don’t those volume discounts kick in once you’re doing near 1PB? You’re still paying “normal” CloudFront prices all the way up and it also varies per region :$
But their egress capacity is limited, not? We're talking about 1PB per month here. If every customer of them would be paying only 13 cents a month and pushing out 1PB per month, wouldn't they need to significantly upgrade their hardware and lose money in the process?
Just as an fyi, eastDakota is in Cloudflare’s executive team. Think he’s their CEO.
Not saying not to trust him - he’s probably a very reasonable and standup guy - but you should know this about him before taking his word on a topic like this.
"If every customer of them would be paying only 13 cents a month and pushing out 1PB per month, wouldn't they need to significantly upgrade their hardware and lose money in the process?"
Yes, but every customer would never do this, so what is your point?
You have to think more in terms of averages for things like this
I'm abusing the hell out of it right now offering GB+ downloads that I used to use Digital Ocean Spaces for. It's saving me $2000-3000 a month since the switch.
Maybe abuse isn't the right word but definitely making the most.
I am a bit scared about being turned off overnight though.
Just for the sake of enlightening some people. Roughly $1000 per month buys you unlimited/unmetered 10GBe (10GBps) connectivity to your server/rack (do you know what this is?), from a tier-1 network provider.
This translates to roughly 1.2 gigabytes per second (every second of of the month), and 3240 terabytes of data per month - in or out, the choice is yours.
Things scale down as you buy more bandwidth, or commit to a longer contract.
Many would say that $1000 per month is literally "nothing" in terms of costs of service for most real businesses our there, and if you're a happy CSP user, you're probably paying a hell of a lot more than that per month for your infra.
Yes R2 is likely loosing money in this case. But network capacity and switches is not that expensive the way AWS is charging for it. For $60k/month or $720k/year, AWS is basically giving 3GB/s.
I feel R2 should charge something for transfer though, otherwise people could abuse it. Hetzner charges ~1.5% of AWS egress fees which I feel is right thing to do and likely profitable.
One hour 4K Netflix episode would be around 1Gb magnitude and likely watched by even more than 1M ppl. Game downloads even bigger, often at several Gb, with similar amount of users.
As an indie dev, I recommend R2 highly. No egress is the killer feature. I started using R2 earlier this year for my AI transcription service TurboScribe (https://turboscribe.ai/). Users upload audio/video files directly to R2 buckets (sometimes many large, multi-GB files), which are then transferred to a compute provider for transcription. No vendor lock-in for my compute (ingress is free/cheap pretty much everywhere) and I can easily move workloads across multiple providers. Users can even re-download their (again, potentially large) files with a simple signed R2 URL (again, no egress fees).
I'm also a Backblaze B2 customer, which I also highly recommend and has slightly different trade-offs (R2 is slightly faster in my experience, but B2 is 2-3x cheaper storage, so I use it mostly for backups other files that I'm likely to store a long time).
The premise of Workers AI is really cool and I'm excited to see where it goes. It would need other features (custom code, custom models, etc) to make it worth considering for my needs, but I love that CF is building stuff like this.
Is there any reason to not use R2 over a competing storage service? I already use Cloudflare for lots of other things, and don't personally care all that much about the "Cloudflare's near-monopoly as a web intermediary is dangerous" arguments or anything like that.
1. This is the most obvious one, but S3 access control is done via IAM. For better or for worse, IAM has a lot of functionality. I can configure a specific EC2 instance to have access to a specific file in S3 without the need to deal with API keys and such. I can search CloudTrail for all the times a specific user read a certain file.
2. R2 doesn't support file versioning like S3. As I understand it, Wasabi supports it.
3. R2's storage pricing is designed for frequently accessed files. They charge a flat $0.015 per GB-month stored. This is a lot cheaper than S3 Standard standard pricing ($0.023 per GB-month), but more expensive than Glacier and marginally more expensive than S3 Standard - Infrequent Access. Wasabi is even cheaper at $0.0068 per GB-month but with a 1 TB billing minimum.
4. If you want public access to the files in your S3 bucket using your own domain name, you can create a CNAME record with whatever DNS provider you use. With R2 you cannot use a custom domain unless the domain is set up in Cloudflare. I had to register a new domain name for this purpose since I could not switch DNS providers for something like this.
5. If you care about the geographical region your data is stored in, AWS has way more options. At a previous job I needed to control the specific US state my data was in, which is easy to do in AWS if there is an AWS Region there. In contrast R2 and Wasabi both have few options. R2 has a "Jurisdictional Restriction" feature in Beta right now to restrict data to a specific legal jurisdiction, but they only support EU right now. Not helpful if you need your data to be stored in Brazil or something.
I don't know about R2 specifically, but we migrated one of our service from S3 to Cloudflare Images, and we have been hit with over 40h+ of down time on CF's side over the last 30 days. One of the outage was 22 hours long. Today's outage has been ongoing for almost 12 hours and is still ongoing, and we have had 2 or 3 others >1h outages.
Every cloud provider has outages sometimes but CF has been horrendous.
We were actually planning on migrating some other parts to R2 but we are just ditching CF altogether and just going to pay a bit more on AWS for reliability.
So if R2 has been impacted even a third as much as CF images, that would definitely be an important consideration.
I don’t know why this isn’t mentioned more. CF offering (R2/workers/pages) are extremely unreliable that I’m wondering if anyone is actually using them.
We are using Workers for ~12mo now with actually very little actual downtime. There have been some regional issues but no world wide outages.
That said we don't use any queues, KV, etc. Just pure JS isolates so that probably contributes to the robustness.
We do use the Cache API though and have ran into weirdness there. We also needed to implement our own Stale-While-Revalidate (SWR) because CF still refuses to implement this properly.
Overall CF is a provider that I would say we begrudging acknowledge as good. Stuff like the SWR thing can be really frustrating but overall reliability and performance are much better since moving to CF.
> Overall CF is a provider that I would say we begrudging acknowledge as good.
I don't understand. You say that you used a very small subset of their offering in a very specific and limited way; and with that you conclude that their offering is "good"? Shouldn't you make that conclusion after reviewing at least 50% of their offering?
It's been a while, but last time I checked, write latency on R2 was pretty horrendous. Close to 1s compared to S3's <100ms, tested from my laptop in SF. Wouldn't be surprised if they made progress on this front, but definitely do dig deeper if your workload is sensitive to write latency.
Another (that probably contributes directly to the write latency issues) is region selection and replication. S3 just offers a ton more control here. I have a bunch of S3 buckets replicating async across regions around the world to enable fast writes everywhere (my use case can tolerate eventual consistency here). R2 still seems very light on region selection and replication options. Kinda disappointed since they're supposed to be _the_ edge company.
As far as I know, R2 offers no storage tiers. Most of my s3 usage is archival and sits in glacier. From Cloudflare's pricing page, S3 is substantially cheaper for that type of workload.
I know people archive all kinds of data. I use Glacier as off-site backup for my measly 1TB of irreplaceable data. But I know many customers put petabytes in it.
What could you have a petabyte of that you're pretty sure you'll never need again? What kind of datasets are you storing?
There is no data locality. If your workload is in AWS already you might save money by keeping the data in the more expensive S3 vs going out to Cloudflare to fetch your bytes and return your results.
If you don't mind having your bits reside elsewhere, Backblaze B2 and Bunny.net single location storage are both cheaper than Cloudflare.
Is R2 subject to Cloudflare's universal API rate limit? They have an API rate limit of 1200 requests/5 minutes that I've hit many times with their images product.
And they won't increase it unless you become an enterprise customer in which case they'll generously double it.
R2 doesn't support versioning yet. If you need versioning you have to use DigitalOcean Spaces (also cheaper than S3) or S3.
Otherwise, I've been using R2 now in production for wakatime.com for almost a month now with Sippy enabled. The latency and error rates are the same as S3, with DigitalOcean having slightly higher latency and error rates.
One major thing that R2 doesn't have is big data distributed table support. E.g. you can use BigQuery to query data on GCS, or you can use Athena on S3.
1. No Object history and locking. So there is absolutely no way to recover files when you do any kinds of mistakes.
2. No object tiering and storage is not that cheap. Although R2 egress is free, R2 is only 35% cheaper than S3 in terms of storage, but it is not cheaper than other alternatives. Furthermore, R2 is a lot more expensive than S3 infrequent/cold tier.
For example, Backblaze B2 is 4 times cheaper than S3, and B2 offers history/locking.
When B2 egress is free up to 3x monthly storage, B2 is much better option than R2 for most cases if a considerably high egress is not needed.
Although it's probably faded from everyone's mind, I think Cloudflare and Backblaze still have the Bandwidth Alliance going which means free egress if you combine them.
OP is missing that a correct implementation of Databricks or Snowflake will have those instances are running inside the same AWS region as the data. That's not to say R2 isn't an amazing product, but the egregious costs aren't as high since egress is $0 on both sides.
Author here and it is true that costs within a region are free and if you do design your system appropriately you can take advantage of it but I've seen accidental cases where someone will try to access in another region and it's nice to not even have to worry about it. Even that can be handled with better tooling/processes but the bigger point is if you want to have your data be available across clouds to take advantage of the different capabilities. I used AI as an example but imagine you have all your data in S3 but want to use Azure due to the OpenAI partnership. It's that use case that's enabled by R2.
Yeah, for greenfield work building up on R2 is generally a far better deal than S3, but if you have a massive amount of data already on S3, especially if it's small files, you're going to pay a massive penalty to move the data. Sippy is nice but it just spreads the pain over time.
AWS S3 Egress charges are $0.00 when the destination is AWS within the same region. When you setup your Databricks or Snowflake accounts, you need to correctly specify the same region as your S3 bucket(s) otherwise you'll pay egress.
Cloudflare has been building a micro-AWS/Vercel competitor and I love it; i.e., serverless functions, queues, sqlite, kv store, object store (R2), etc.
Cloudflare is just reimplementing every service that Akamai has had for 10 years. The only difference is Cloudflare is going after the <$100/mo customer.
Vercel doesn't offer any of that, without major caveats (e.g. must use Next.js to get a serverless endpoint). And to the degree they do offer any of it, it's mostly built on infrastructure of other companies, including Cloudflare.
I would love to see a good blog post or article on Cloudflares KV store. I just checked it out, and it reports eventual consistency, so it sounds like it might be based upon CRDTs, but I'm just guessing.
We(Databend Labs) benchmarked #TPCH-SF100 on S3, Wasabi, Backblaze B2, and Cloudflare R2:
#S3 leads with its direct connect feature.
#Wasabi offers good performance.
#B2 and #R2 may not be suitable for big data needs
The other hidden cost when you are working with data hosted on S3 is the LIST requests. Some of the data tools seem very chatty with S3, and you end up with thousands of them when you have small filed buried in folders with a not insignifcant cost. I need to dig into it more, but they are always up there towards the top of my AWS bills.
I wish the R2 access control was similar to S3 - able to issue keys with specific accesses to particular prefixes, and ability to delegate ability to create keys.
It currently feels a little limited and… bolted on to the Cloudflare UI.
I did some like-for-like comparisons across S3 vendors. S3 perf is way better than the challengers, R2 is the worst performer. Also it doesn't support concurrency on list operations, or object versions.
So it's a bit more complex than "R2 is best coz it's cheapest" it's not super optimized yet
We moved entire infrastructure to AWS last year, to speed up/simplify/rethink it. We lasted 3 months on S3/CloudFront. We are still heavily invested in AWS, but moved our production storage/distribution to R2/Cloudflare and couldn't be happier.
Next up: moving our cloud edge (NAT Gateways, WAF, etc) to Fortinet appliances, which licenses we purchased bundled with our on-prem infra.
I know Corey Quinn always harps on AWS' egress pricing but you really can't emphasize it enough: it's literally extortionary!
>> you’re paying anywhere from $0.05/GB to $0.09/GB for data transfer in us-east-1. At big data scale this adds up.
At small data scale this adds up.
And..... it's 11 cents a GB from Australia and 15 cents a GB from Brazil.
If you have S3 facing the Internet a hacker can bankrupt your company in minutes with simple load testing application. Not even a hacker. A bug in a web page could do the same thing.
I think Backblaze B2 is probably the reference (which has free egress up to 3x data stored - https://www.backblaze.com/blog/2023-product-announcement/). I don't know of any public S3-compatible provider that is as cheap as 20$/TB/year (roughly ~$0.0016/GB/mo).
That’s what I thought they meant as well, but B2 is more like $72/TB/yr. Maybe relevant to another story on the front page right now, they have a very unusual custom keyboard layout that makes it easy to typo e for b and 2 for 7 ;)?
> In fact, there’s an opportunity to build entire companies that take advantage of this price differential and I expect we’ll see more and more of that happening.
Interesting. What sort of companies can take advantage of this?
Author here but some ideas I was thinking about:
- An open source data pipeline built on top of R2. A way of keeping data on R2/S3 but then having execution handled in Workers/Lambda. Inspired by what https://www.boilingdata.com/ and https://www.bauplanlabs.com/ are doing.
- Related to above but taking data that's stored in the various big data formats (Parquet, Iceberg, Hudi, etc) and generating many more combinations of the datasets and choose optimal ones based on the workload. You can do this with existing providers but I think the cost element just makes this easier to stomach.
- Abstracting some of the AI/ML products out there and choosing best one for the job by keeping the data on R2 and then shipping it to the relevant providers (since data ingress to them is free) for specific tasks.
-
Basically any company offering special services that work with very large data sets. That could be a consumer backup system like Carbonite or a bulk photo processing service. In either case, legal agreements with customers are key, because you ultimately don't control the storage system on which your business and their data depend.
I work for a non-profit doing digital preservation for a number of universities in the US. We store huge amounts of data in S3, Glacier and Wasabi, and provide services and workflows to help depositors comply with legal requirements, access controls, provable data integrity, archival best practices, etc.
There are some for-profits in this space as well. It's not a huge or highly profitable space, but I do think there are other business opportunities out there where organizations want to store geographically distributed copies of their data (for safety) and run that data through processing pipelines.
The trick, of course, is to identify which organizations have a similar set of needs and then build that. In our case, we've spent a lot of time working around data access costs, and there are some cases where we just can't avoid them. They can really be considerable when you're working with large data sets, and if you can solve the problem of data transfer costs from the get-go, you'll be way ahead of many existing services built on S3 and Glacier.
I'm building a "media hosting site". Based on somewhat reasonable forecasts of egress demand vs total volume stored, using R2 means I'll be able to charge a low take rate that should (in theory) give me a good counterposition to competitors in the space.
Basically, using R2 allows you to undercut competitors' pricing. It also means I don't need to build out a separate CDN to host my files, because Cloudflare will do that for me, too.
Competitors built out and maintain their own equivalent CDNs and storage solutions that are more ~10x more expensive to maintain and operate than going through Cloudflare. Basically, Cloudflare is doing to CDNs and storage what AWS and friends did to compute.
That'd be welcome, I'm not really doing it to make money.
But reality is a bit more complicated than that. Migrating data + pointers to that data, en masse, isn't super easy (although things like Sippy make it easier).
In addition, there's all the capex that's gone into building systems around the assumptions of their blend data centers, homegrown CDNs, mix of storage systems. There's a sunk cost fallacy at play, as well as the inertia of knowing how to maintain the old system and not having any experience with the new system.
It's not impossible, but it'd require a lot of willpower and energy that these companies (who are 10+ years into their life cycles) don't really possess.
Having seen the inside of orgs like that before, starting from scratch is ~10x-100x easier, depending on the blend of bureaucracy on the menu.
I'm investigating the same thing. But my bet is that they will either change the terms or lower your cdn-cache size (therefore lowering performance, you can't serve popular videos without a CDN).
And the difference is that you will fail your customers when that time comes because you'll just get suspended (we've seen some cases here on the forum) and you'll have to come here to complain so the ceo/cto resumes things for you.
I don’t believe anybody on a paid plan has been suspended for using R2 behind the CDN? (I’ve seen the stories you’re alluding to. IIRC the cached files weren’t on R2)
In their docs they explicitly state it as an attractive feature to leverage, so that’d surprise me.
That being said, I’m not planning to serve particularly large files with any meaningful frequency, so in my particular case I’m not concerned about that possibility. (I’m distributing low bitrate audio, and small images, mostly).
If I were trying to build YouTube or whatever I’d be more concerned.
That being said, with their storage pricing and network set up as they are, I think they make plenty of money off of a hypothetical YouTube clone.
I do think they’ll raise prices eventually. But it’s a highly competitive space, so it feels like there’s a stable ceiling.
Right, but they were serving content that wasn't from R2 as far as I understand from that thread. Not trying to say they that justifies their treatment, only that it doesn't apply to my use case. They were also seeing ~30TB of daily egress on a non-enterprise plan, which would absolutely never happen in my case – 1TB of daily egress would be a p99.9 event.
Re cache-size, maybe I've misunderstood what you mean by cache size limiting, but yeah that's my point – I don't need a massive cache size for my application. My data doesn't lend itself much to large and distributed spikes. Egress is spiky, but centralized to a few files at a time. e.g. if there were to be a single day where 1TB were downloaded at once, 80% of it would be concentrated into ~20 400MB-sized files.
He was ok by the terms though. Workers had/have the same terms as R2 before R2 got the new terms.
> They were also seeing ~30TB of daily egress on a non-enterprise plan, which would absolutely never happen in my case – 1TB of daily egress would be a p99.9 event.
I don't understand what media company you'll be competing against if you'll use just 30TB/month of bandwidth.
I just love minio. It is a drop-in replacement for S3. I have never done a price comparison for TOC to S3 or R2, but I have a good backup story and run it all inside docker/dokku so it is easy to recover.
We went from s3 to minio because cost issues (at that time b2 didn't have s3 API)
Minio to seaweedfs around 2020 because our minio servers had problems serving very large number of small files.
Then this year we migrate to B2 because it's way cheaper and we don't have to rewrite our apps.
Still my hat goes to S3. It is so massive that every open source or competitors need to have compatible API and it give us the ability to move to any vendor or selfhost just by changing endpoint.
The simple reason cloudflare hasn't emerged as a real competitor is that they don't offer traditional compute therefore you can't just do what you normally would do in the hyperscalers in the clouflare regions. If they really are trying to be a fourth hyperscaler and/or compete on price it feels like generql compute is what they need. What am I missing
It blows my mind that anyone would consider S3 cheap.
You always had available plenty of space on dedicated servers for way cheaper before the cloud.
You could make an argument about the API being nicer than dealing with a linux server - but is AWS nice? I think it's pretty awful and requires tons of (different, specific, non transferable) knowledge.
Hype, scalability buzzwords thrown around by startups with 1000 users and 1M contract with AWS.
Sure R2 is cheaper but it's still not a low cost option. You are paying for a nice shiny service.
Interesting side note that while S3 the service continues to get more competition, S3 the protocol has definitively won. It's a good protocol, but man I wish it were more consumer-friendly. Imagine if S3 specified an OAuth2 profile for granting access. Every web app could delegate storage to a bunch of competing storage providers.
This would be very useful in genomics, where pretty much everything is stored on S3 but always a pain to connect to apps.
It seems so inevitable. Once they have sucked enough data up then they can just change the pricing structure to have egress fees and higher storage fees.
Although I do wonder if that would be considered a bait and switch.
But what are you going to do with your data in R2? They don't have all the other cloud services to use the data. Unless your only use of the cloud is literally for raw storage, it's not that practical.
Compare with say Oracle cloud which tries to compete by having 1/10th the egress charge. But nobody uses it anyway and they DO offer all the other services.
Yeah if you want to completely ignore Cloudflare Workers, Durable Objects, etc then that comparison makes sense I guess... but wait that is only if you also wanted to ignore that you can serve the files publicly directly so that alone has many use cases as well esp with the free bandwidth
It looks like Backblaze B2 combined with Cloudflare gives the cheapest storage and free egress. Is there any reason to use R2 over B2 + Cloudflare?
My use case is image storage + serving for a service that users will upload a lot of images to. Currently using Cloudflare + storing all files on disk but space will soon become a concern.
The problem here is as long as cloud services are sticky, moving your data doesn’t really solve the vendor locking, egress is just one way to leverage that characteristic, I can easily come up with another ten way to charge you as long as you can’t not easily migrate your stacks off a cloud platform.
If I understand correctly when storing data to vanilla S3 (not their edge offering) the data live in a single zone/datacenter right? While on R2 they could potentially be replicated in tens of locations. If that is true how can Cloudflare afford the storage cost with basically the same pricing?
S3 Standard guarantees that your data is replicated to three availability zones within the region at minimum. (That's different data centers in the same city.)
My assumption is that "at least three" means "exactly three" in practice.
For the Cloudflare fans out there (i am one of them), it seems that the sales/finance guys entered the company and start to apply the usual upsell tricks. (See the advanced firewall and bots stuff)
Perhaps i’m too hasty with my judgement, hope so….
R2 is a nice and cheap service, I just want to caution people, it does have a reduced feature set than something more mature than S3 or GCS, for most people who just want to server an image etc, it's fantastic though.
What would be great is a tiered storage service or library where oft-accessed data is in R2 and infrequently accessed has metadata in R2 but blobs in the cheaper S3 storage tiers or Glacier.
Did Cloudflare share any information on how R2 is built? Like what kind of open source systems they use as the foundation or they built it from scratch?
Should we simply ignore the tremendous amount of phishing hosted using r2.dev? Or is this also part of "an economic opportunity"?
Cloudflare may well be on their way to becoming a monopoly, but they certainly show they don't care about abuse. Even if it weren't a simple matter of principle, in case they aren't successful in forcing themselves down everyone's throats, I wouldn't want to host anything on any service that hosts phishers and scammers without even a modicum of concern.
Since I know there will be Cloudflare people reading this (hi!), I'm begging you: please wrestle control of the blob storage API standard from AWS.
AWS has zero interest in S3’s API being a universal standard for blob storage and you can tell from its design. What happens in practice is that everybody (including R2) implements some subset of the S3 API, so everyone ends up with a jagged API surface where developers can use a standard API library but then have to refer to docs of each S3-compatible vendor to see figure out whether the subset of the S3 API you need will be compatible with different vendors.
This makes it harder than it needs to be to make vendor-agnostic open source projects that are backed by blob storage, which would otherwise be an excellent lowest-common-denominator storage option.
Blob storage is the most underused cloud tech IMHO largely because of the lack of a standard blob storage API. Cloudflare is in the rare position where you have a fantastic S3 alternative that people love, and you would be doing the industry a huge service by standardizing the API.
This is a good point, but just a standard for the standard create/read/update (replace)/delete operations combined with some baseline guarantees (like approximately-last-write-wins eventual consistency) would probably cover a whole lot of applications that currently use S3 (which doesn't support appends anyway).
Heck, HTTP already provides verbs that would cover this, it would just require a vendor to carve out a subset of HTTP that a standard-compliant server would support, plus standardize an auth/signing mechanism.
It allows you to incrementally migrate off of providers like S3 and onto the egress-free Cloudflare R2. Very clever idea.
He calls R2 an undiscovered gem and IMO this is the gem's undiscovered gem. (Understandable since Sippy is very new and still in beta)