Hacker News new | past | comments | ask | show | jobs | submit login

For certain apps, bandwidth cost is stupid expensive on AWS.

For example I've been following the development of an online manga reading site (mangadex.org) that is now pushing over 1PB/month of images. If they had built it on aws, even at the lowest rate of $0.02/GB with cloudfront/S3, it would be $20,000 per month. But they ended up paying nothing by using cloudflare who gave them unmetered bandwidth.

(well, until they got throttled earlier this year but they got out of that by upgrading to a measly $200/month plan)




Just for everyone's clarity, you can use Cloudflare with AWS just fine.

A previous site I built (https://hearthstonejson.com/) is pushing on the order of 100TB / month in images and JSON data, with 96% cached requests and 99.9% cloudflare-cached bandwidth. Almost nothing comes out of S3 itself.

Cloudfront is the ridiculously-priced offering IMO. I don't really understand why people use it.


I'm afraid you are in for a nasty surprise: CloudFlare specifically forbids using it mostly as an image/video CDN unless you are on the Enterprise tier (clause 2.8 in the Terms). I found it out the hard way a couple of years ago when I was helping with maintenance of a pretty large image hosting website (we got to 1-2Pb/mo range). CF may not pay attention to you while you are small, but when they do, it's going to cost you.


via https://www.cloudflare.com/terms/ -

"2.8 Limitation on Non-HTML Caching The Service is offered primarily as a platform to cache and serve web pages and websites. Unless explicitly included as a part of a Paid Service purchased by you, you agree to use the Service solely for the purpose of serving web pages as viewed through a web browser or other application and the Hypertext Markup Language (HTML) protocol or other equivalent technology. Use of the Service for the storage or caching of video (unless purchased separately as a Paid Service) or a disproportionate percentage of pictures, audio files, or other non-HTML content, is prohibited."

---

"disproportionate percentage" seems wide open


> CloudFlare specifically forbids using it mostly as an image/video CDN unless you are on the Enterprise tier

For those of us not in the know, what else would you use it for?


I remember them billing themselves as free DDoS protection.


I don't follow. They're in the business of serving static content, right?


They are willing to serve as your CDN if you don't make them front too much image or audio traffic and you don't make them front any video. Their free tier is for non-media-focused sites only.


Why don't they just limit it by bandwidth/hits? Is there anything special about media content that it should be considered differently?


Video might mean adaptive bitrate I suppose.


https://www.cloudflare.com/products/cloudflare-stream/

No enterprise contract required. Just for video, obviously.


I wonder if that's just for specifically image hosting websites like your case. Parent comment and my previous experiences seem to imply that embedded media is okay.

Even then, just a few terabytes of traffic will make Cloudfront reach the thousands.


What’s the enterprise cost per month?


Pro is $200, enterprise is $2000.

This is probably customized per client and could be more for big companies.


$Contact us


That. We ended up at $3500/month for ~200 TB/month. Reading this thread I might have negotiate harder when renewal time comes.


Subject for negotiation. But it's going to be in the ballpark of many $k/mo


I wonder if image data URIs fly below the radar


>> Cloudfront is the ridiculously-priced offering IMO. I don't really understand why people use it.

Companies that get their highly regulated product(health insurance, hospitals, government contractors) "certified" for use in the AWS ecosystem. Once they do that, they are likely to use it for everything, and pass the cost on to the customer.


Yes I came here to talk about bandwidth too, which is not mentioned in the article at all but can be the biggest engineering cost focus depending on app/scale :-). Over 10tb of transfer you can (and should) negotiate some cheaper pre-committed pricing on bandwidth.

There’s also engineering around the network involved in your application. A lot of architectures internally reflect traffic around (moreso if you’re also accepting some ridiculous amount of traffic rather than just serving it). In those cases you can save a lot of money by figuring out ways to pass traffic through unmetered connections like internal albs and amazon hosted dbs. Otherwise you can either rack up a lot of cross-az traffic charges or be on the hook for engineering your way to minimize those costs.


Right. At a previous company our Cloudfront bill ran up to $2000/mo because someone ran a test script a few times every minute that fetched a few resources from CF. Once we killed that script our CF bill dropped down to $200/mo.

Cloudfront’s pricing can absolutely bite you if you are not careful. Which is one of the reasons why I prefer Cloudflare over it.


Bandwidth is insanely expensive on all clouds. If you’re a large consumer, however, you can usually negotiate a much better rate.


> Bandwidth is insanely expensive on all clouds.

OVH Public Cloud[1] offers unmetered bandwidth (250-500 Mbps) in all regions, apart from Asia-Pacific.

Hetzner Cloud[2] offers 20 TB (1 Gbps) for each cloud instance, but has locations only in Europe.

Both of them offer dedicated servers with up to 1-3 Gbps unmetered bandwidth that could be used as exit nodes.

[1] https://www.ovh.com/world/public-cloud/instances/prices/

[2] https://www.hetzner.com/cloud


I've never had luck with the unlimited / unmetered folks.

Let's say you spin up 10 instances with OVH at the cheapest level (ie, $200/month). That is supposed to give you dedicated 3Gbps in addition to compute / storage etc or around 1PB data per month + compute and storage.

This compares favorably to google standard egress at $60,000/month. But as soon as you build a business model around this - poof - rate limiting -> some TOS violation claim pops up.

"Oh, we meant unlimited or unmetered, but only if..."

Seriously, with unlimited / unmetered at $20 these would make the great bases of things like CDN networks, image / static asset hosts for big properties etc. But it generally turns out to be total BS.

In contrast - paying for bandwidth with AWS / Google etc -> no one has ever complained to me (though my current usage is minuscule in the distance past had high usage experience)


> But as soon as you build a business model around this - poof - rate limiting -> some TOS violation claim pops up.

If bandwidth was at the core of my business model, I would certainly want to pay for it separately to avoid possible interruptions. In case of OVH, I assume, that's what "Bandwidth upgrade"[1] with a "limit of 20 Gbps per customer, and per datacentre" is designed for. It's not unreasonable to consider "spinning up to 10 cheapest cloud instances to avoid bandwidth limits" as a service abuse.

[1] https://www.ovh.com/world/dedicated-servers/bandwidth-upgrad...


Hetzner specifically is not unlimited/unmetered: You get a ridiculous amount of traffic included, and then you pay per TB.

The current price per TB is 1 EUR + VAT if applicable, so even if we assume you'll have to pay the German VAT, it's 1.35 USD/TB.


> Hetzner specifically is not unlimited/unmetered

That's true for Hetzner Cloud, but Hetzner's dedicated servers have been unmetered (with a guarantee of 1 Gbps) since October 2018[1].

[1] https://www.hetzner.com/news/traffic-limit/


Using the AWS figures above, it's over $20/TB, so Hetzner still compares very favourably.


Is this actually unmetered? Or is it unmetered until you hit a secret limit that they don't tell you, and then demand you pay them, like Cloudflare? (I'm not saying this is a bad thing, it's reasonable for CF to want their highest-usage customers to pay, I'm just curious)


I believe it’s unmetered by them personally.

But there’s always a “secret limit”—the point where your bandwidth usage looks like a DDoS attack, and the tier-1 exchange feeding the cloud provider’s DC decides to blackhole your traffic for the sake of the network.


Their bandwidth limits are based on Mbps / Gbps, so the bandwidth is unmetered, but not unlimited. And unlike CDNs, public cloud companies make money from the computational power (CPU / RAM) they provide.


They're going to limit you if you spin up a bunch of cheap instances to use up their bandwidth, but if you pay $300/mo for their 3 Gbps connection you can saturate that and they don't care.


I assume if you saturate that 3Gbps connection they will definitely care. Just maybe not be in a position to do anything about it.


I guess they'll care about as much as a restaurant cares if you've eaten the meal you paid for.


You can’t compare it like that as the calculation is different. If someone eats a small part of their meal or the whole portion doesn’t change the outcome for the restaurant. They can’t sell the rest to someone else, if someone doesn’t use bandwidth someone else is going to use it.


> I'm not saying this is a bad thing

Why not? It's deliberately misleading, no?


No, it’s not. The audience is meant to understand “if this isn’t specifically the thing you’re building to use, you won’t have to worry about it.”

That’s useful to me so I can just not worry about that thing.


Except that you do still have to worry, because it is metered, they're just being coy about the precise figures, and they're really just outright lying to you by using the word 'unmetered'.

Am I missing something? I'm not seeing another side to this. Secret fair-use rules are exactly as dishonest as when the telcos lie about 'unlimited' data plans, no?


Even amazon's own Lightsail offers terabytes of traffic for cheap(5$=2Tb). Obviously aws knows lightsail can be used to cut down the bill and number of lightsail instances is limited and using lightsail for traffic 'engineering' is against the ToS


Yeah, but what does the peering look like for these?



I think it's more that hetzner and ovh are great until you want to go outside of europe. Hetzner is only in europe. Ovh has one datacenter in America, one in canada, one in australia, and one in singapore. It may work great for europe, but most of the world's population is elsewhere. If trying to build a global service, the cost of making your service scale properly across different providers is often too high.


> It may work great for europe, but most of the world's population is elsewhere.

That's true, European providers work best for Europe.

But how many global locations does a service really need, if it's sitting behind a CDN? Three locations (East Coast, Western Europe, and Singapore) alone are enough to be within 100-150 ms of the most of the world's population.

OVH probably wouldn't be the best choice for projects focused exclusively on South America or China, but it already covers the rest of the world pretty well, including three locations in North America (East Coast, West Coast, and Canada).


Ovh has two data centers in America. One is down the street from me where I keep my server and the other is on the east coast. Actually haven’t had much problems With their peering now that the DC is stable.


> Hetzner Cloud offers 20 TB (1 Gbps) for each cloud instance, but has locations only in Europe

I've looked at them to use for my email/small web server and like their prices and features, but am not sure of the GDPR implications.

Currently I use a US cloud provider, and I'm in the US, and so am a controller or processor not established in the Union, and all my data processing takes place outside the Union. All my GDPR obligations, if any, are those that arise under the extraterritorial jurisdiction provision of Article 3(2).

If my server was hosted in the EU by an EU company, would that still be the case? Or would GDPR now apply via the in Union jurisdiction provision of Article 3(1)?


That GDPR article says[1]:

This Regulation applies to the processing of personal data of data subjects who are in the Union by a controller or processor not established in the Union, where the processing activities are related to:

a) the offering of goods or services, irrespective of whether a payment of the data subject is required, to such data subjects in the Union; or

b)the monitoring of their behaviour as far as their behaviour takes place within the Union.

So it basically says that the GDPR applies, I don't think that hosting in europe would change anything at all.

[1]: https://gdpr-info.eu/art-3-gdpr/


As I understand, the point of GDPR is to protect data and privacy of EU citizens. Therefore, you would have the same obligations to EU citizens even if your server was hosted outside the EU. On the other hand, if you don't serve EU citizens, GDPR might not apply to you even if your server is hosted in the EU.


It's broader than that. According to Recital 14, "The protection afforded by this Regulation should apply to natural persons, whatever their nationality or place of residence, in relation to the processing of their personal data".

It can't actually accomplish that goal, because the EU doesn't have the jurisdiction for that.

For controllers and processors that are in the Union, GDPR applies to their processing of personal data of people regardless of where those people are or what entities they are citizens of.

So, for example, as a US citizen residing in the US who has never set foot within about 7000 km of Europe, but has bought things from vendors in the EU, those vendors need to obey GDPR when dealing with my data.

For controller and processors that are not in the Union, the EU lacks the authority to enforce such a broad requirement on them. Instead, the requirement is that if the person whose data you are processing is "in the Union" and you are offering goods and services to them or monitoring their behavior as far as their behavior takes place within the Union, GDPR applies.

(Whether or not they can actually enforce that is still an open question).

Putting this all together, if I'm in the US, with users in the US, but having my server in the EU makes me count as being in the EU for GDPR purposes, then I have to obey GDPR when dealing with US users. If having my server in the EU doesn't do this, so that for GDPR purposes I'm in the US, then GDPR does not apply to my dealings with people in the US.


One way to get Akamai to unilaterally lower their bandwidth prices is to threaten to use BitTorrent instead.

https://news.ycombinator.com/item?id=7601083


BitTorrent between TomTom (i.e. mobile) devices? Sounds like a terrible idea. Am I missing something? Users pay good money for upload bandwidth. Far more per megabyte than TomTom would pay to use an ordinary CDN.


Oh no, not on the device, but the desktop computer you plug it into via USB. So it can download maps while the device isn't connected, over your home internet connection (but probably not your phone), overnight or whenever.

Yes, it was not appropriate for situations where you had to pay for upload bandwidth. We were very careful in designing the gui to disclose the fact that it would use your upload bandwidth, explain the possible costs and benefits, let users opt-in if that was ok with them, monitor the download status, and switch the BitTorrent feature on and off.

TomTom real time traffic prediction system depends on users trusting them enough to opt-in to uploading anonymized trip measurements, so it was very important not to do something that broke their trust by abusing their network connection.

https://www.tomtom.com/lib/img/REAL_TIME_TRAFFIC_WHITEPAPER....

TomTom had an "iTunes-like" desktop content management and device control desktop app called TomTom Home, which was implemented in xulrunner (the underlying framework of Firefox and Thunderbird, kind of a predecessor to Electron for writing cross platform desktop apps in JavaScript with C++ XP/COM plugins).

The first thing I tried was to make an XP/COM plugin out of the LibTorrent library. That worked ok, but the idea of increasing the size and complexity of what was already essentially a web browser with a whole bunch more complex code that does lots of memory allocation and networking all the time, didn't seem like a good design. This was long before Firefox supported multiple processes, so the same app handing bittorrent would bloat the app and degrade the user interface responsiveness.

However RedSwoosh, FoxTorrent and BitTorrent DNA all ran in separate processes that just handled all the BitTorrent stuff, and that you could talk to via https (to stream the file with a special url, and retrieve progress telemetry on another url). And it's incredibly easy to integrate xulrunner or any web browser with those servers via http, so no C++ XP/COM plugin required.

Another issue is that you don't want every application to have its own BitTorrent client built in, or you will trash the network and disk, since otherwise they would compete for resources. It needs to be a centralized a system service, shared by any app that needs it.

BitTorrent DNA worked nicely that way. And it could fall back to the CDN to download at full speed if there weren't enough seeders.


> not on the device, but the desktop computer you plug it into via USB

Gotcha.

P2P-with-failover-to-CDN is what Spotify famously used to do.


Does anyone have experience with bandwidth charges for colocated servers? If you pay $300 a month (or whatever it is, I really have no idea because most of these places don't post their prices publicly) to put a 1U somewhere with a 500mbps link, will they ever stop or throttle you if your use case is legitimate? It seems like colo could be significantly cheaper than cloud for a lot of high-bandwidth things that don't need HA.


I think I remember seeing ads for HE 100Gbit @ 10K/mon. the cloud markup for bandwidth is massive.


To be fair, there's a world of difference between a single POP HE connection and the globally distributed AWS CDN. Understand your use-case and what factors are important because HE has significant drawbacks in some cases.


that's the marketing side of the equation in reality cloud provides have such an amazingly complex control planes that you will see comparable amount of downtime. Notice the 2 google cloud outages were at network layer.


Didn't you say some manga re-uploaders used Blogger for free image hosing bandwidth? Did that get shut down?


I don't know how you remember that comment, it was from over a year ago: https://news.ycombinator.com/item?id=16784267

But as far as I know, kissmanga.com is still abusing blogspot.com/tumblr.com by using them as free image cdns.


Thanks for reminding me that I forgot to include the bandwidth costs! I've added them to the article.


I thought most of the clouds offered free ingress?


yes, but the topic is the price of egress.

i thought this was obvious to everybody: public clouds want to lock you in, so they make it very cheap and easy to get data in, but charge stupendous amounts for getting your data out.


MangaDex serves very high resolution scans of comic pages (egress)


Ah. I misunderstood the “push” part of the comment.




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: