Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Video streaming is expensive yet YouTube "seems" to do it for free. How?
422 points by pinakinathc 8 months ago | hide | past | favorite | 368 comments
Can anyone help me understand the economics of video streaming platforms?

Streaming, encoding, and storage demands enormous costs -- especially at scale (e.g., on average each 4k video with close to 1 million views). Yet YouTube seems to charge no money for it.

I know advertisements are a thing for YT, but is it enough?

If tomorrow I want to start a platform that is supported with Advert revenues, I know I will likely fail. However, maybe at YT scale (or more specifically Google Advert scale) the economics works?

ps: I would like this discussion to focus on the absolute necessary elements (e.g., storing, encoding, streaming) and not on other factors contributing to latency/cost like running view count algorithms.




Disclaimer: I used to work at a live video streaming company as a financial analyst so quite familiar with this

The biggest cost is as you imagine the streaming - getting the video to the viewer. It was a large part of our variable cost and we had a (literal) mad genius dev ops person holed up in his own office cave that managed the whole operation.

Ive long forgotten the special optimizations he did but he would keep finding ways to improve margin / efficiency.

Encoding is a cost but I don’t recall it being significant

Storage isnt generally expensive. Think about how cheap you as a consumer can go get 2 TB of storage, and extrapolate.

The other big expense - people! All those engineers to build back and front end systems. That’s what ruined us - too many people were needed and not enough money coming in so we were burning cash.


I'm guessing live video looks a lot different from a more static video site. I think encoding and storage are both quite expensive. You want to encode videos that are likely to be watched in the most efficient ways possible to reduce network bandwidth usage, and every video needs at least some encoding.

Based on some power laws etc., I would guess most videos have only a handful of views, so storing them forever and the cost to encode them initially is probably significant.


Live has a _huge_ advantage in the storage side. In a purely "live" sense all of the content is temporally synchronised; every viewer is requesting approximately the same segments at the same time. Store the current chunks, and the last few minutes of seek time, in memory and put out on the wire to all of the viewers. Twitch talked about this a bit just before/after the AMZN acquisition.

In a prerecorded video CDN managing that catalog is a PITA and does drive meaningful infrastructure cost. You need the "right" content to be in the correct location for low cost peering/transit/distribution, on the correct media for the total throughput:size, in the optimal number of encodings for efficient/quality playback, etc. This job is a lot easier when the provider controls the catalog, and has a limited catalog size. See some of the OpenConnect talks where they're "preloading" content offpeak to optimize IO allocation on the appliances. It was an absolute nightmare to try and manage with a many PB catalog with 3P content that service didnt control the release/popularity of.

Edit: source, principal at AWS and was responsible for a lot of the prime video delivery once upon a time.


> Live has a _huge_ advantage in the storage side. In a purely "live" sense all of the content is temporally synchronised; every viewer is requesting approximately the same segments at the same time.

Used to work at a live-streaming company on our stream infra.

I mostly disagree, unless it's pure live no replay at all and no closely timed events required. Usually live platforms will offer some sort of a VOD (VODs, Replays, Rebroadcasts), all of which will require a storage solution. Couple that in with the fact that anything requiring more complex timing than "show video live~ish" can get messy fast with sync and latency issues.


Yes, i was referring to “live only” and not VOD/“low latency hls” cases. This is a decade ago but my examples off hand are things like video game, game shows, and contests. Was definitely a category, infrastructure looked a lot closer to multicastish RTMP than todays dynamic manifest mpeg segment CDNs.

Edit: the above notwithstanding live sports etc is _still_ better on the storage side as viewers are so heavily synchronized. Lots of nice cache efficiencies when everyone is watching the same content at the same time.


> Live has a _huge_ advantage in the storage side. In a purely "live" sense all of the content is temporally synchronised; every viewer is requesting approximately the same segments at the same time. Store the current chunks, and the last few minutes of seek time, in memory and put out on the wire to all of the viewers. Twitch talked about this a bit just before/after the AMZN acquisition.

With Netflix's Live events, you can seek anywhere up to time zero, not just the last few minutes.


Apple does the same with the MLS soccer matches. I think you’re conflating “live” with “live and available on-demand anytime thereafter”

Interestingly enough the Apple and I assume Netflix live streams come from the colo equipment in your ISP. So each box has their own recording as it happens.


Netflix already solved the on demand streaming at scale though, for them it is harder to do live events given the fact they are new to it


Does the recommendation algorithm account for this? If I'm in a specific place, am I more likely to see content that's already in the right data center?


I cant speak to content recommendations; I worked on the “backend” infrastructure storing bytes and delivering bits. But in that realm yes absolutely. Any CDNs #1 job is to route an end user request to a nearby (or otherwise optimal) datacenter, usually via DNS response. For streaming content I believe “everyone” is doing (part of) this at the streaming client/API layer these days. When you request to start playing the returned url will include/encode hints that help the CDN to send your request to the correct part of the CDN that holds your requested catalog title. ie CDNs arent homogenous and not all content will be stored in every edge location. The service API servers may/will even allocate different requests to different CDNs. eg the streaming service might use any combination of 1P (OpenConnect, CloudFront) and 3P (limelight, akamai, level 3) CDNs.


Encoding and storage aren't significant, relative to the bandwidth costs. Bandwidth is the high order bit.

The primary difference between live and static video is the bursts -- get to a certain scale as a static video provider, and you can roughly estimate your bandwidth 95th percentiles. But one big live event can blow you out of the water, and push you over into very expensive tiers that will kill your economics.


> But one big live event can blow you out of the water, and push you over into very expensive tiers that will kill your economics.

But if you're broadcasting something live and what's killing you is that everyone wants to watch it at the same time... wouldn't you serve it P2P so that everyone is downloading it from each other rather than you?


P2P is going to be a big challenge for tons of reasons. Set top boxes aren't going to play. Lots of people are behind NATs that make it hard to connect. Mobile is battery sensitive and sending to peers is going to eat more battery. Some users pay for every byte they send, and they won't want to pay for you to save operating costs. Plus all the other stuff everyone said about latency.


FYI, apparently doing that (making a P2P system to offload your links to the users) is illegal in China.

>Due to the proliferation of P2P CDN (or PCDN for short), which includes a large amount of home broadband uplink bandwidth at the central office, increasing operational pressure, and cannibalizing the traditional CDN business revenue share of telecom operators […] access is technically detected If the user's traffic volume exceeds a certain threshold, the speed will be limited or even the user's Internet access service will be interrupted. If the user finds a complaint, the user will be required to ensure that he or she has not used or removed the PCDN corresponding access device in exchange for restoring normal access. access services; thereby preventing access users from overusing home broadband and infringing on the interests of telecom operators.

https://zh.wikipedia.org/wiki/內容傳遞網路#P2P_CDN


I doubt that live(!) P2P video sharing would work. You will have some users who get the video stream directly from you. These primary peers will then need to relay the same data through their tiny consumer DSL line (slow upload!) to secondary peers. These secondary peers will have a noticeable lag. It will get even worse when you have tertiary peers.


One great thing about P2P is you can provide more peers. You can surge inexpensive machines near your market and drastically reduce the load on your main servers.

And home connections —while still largely asymmetric— are much faster than they used to be. Having 10mbps up means one client can serve two more. And there's a lot more FTTP with 100-1000mbps up too. These really make a difference when you have a large swarm.


A problem with live is that everyone wants the content at the same time. One client can only serve two more after it has the content. Any drop in connection is also very disruptive because you don't have a big buffer and everyone wants the content now.

A place this could work is streaming a conference, live-ish is the goal and the producers aren't rich. Sports would be the worst case.


> A problem with live is that everyone wants the content at the same time.

Isn't the point of the P2P approach that it gets better the more this is true?


No, not really on those timescales. If it's about a popular show that's released the whole season today, yeah absolutely. Pulling ep1 from my neighbour while they watch ep2 makes sense.

It doesn't really work for something you want to watch simultaneously and reliably. I have to wait for my neighbour to get the chunk I want, then I get it. If they got it from someone else, we form a bigger chain, and then you have all the broadcasting etc to figure out who to actually get a chunk of video from.

Hearing the street cheer while I watch my national team captain take a runup for a penalty is really quite bad.


But the problem is that you have a gigantic audience. Many of them will make effective primary peers. If that weren't true, you wouldn't have a problem in the first place.


If they're not significant, then why does youtube build ASICs for doing video encoding? See e.g., https://arstechnica.com/gadgets/2021/04/youtube-is-now-build...


If you make a billion, a 1% saving is 10 million. You can hire and fund a lot of activity with 10 million.

If you make 1 million, 10k isn't going to go very far towards paying devs to save you 1%


Because when you are Youtube, even relatively marginal cost improvements can be huge in absolute. There is also the UX of having to wait X minutes for an uploaded video to be ready that is improved by this.


Doing so wouldn’t hurt and would make a sizable impact at the scale of Google?


AFAICT, the answer to "why does Google do X" is basically always "because someone needed a launch to point at when they're up for promotion".


Because significance varies, as does optimisation. At YouTube scale it might matter more, or the benefits might be bigger, even if just to save some energy or carbon footprint (and even that might be just for a compliance or marketing line).


VA-API, NVENC,

nvenc > See also: https://en.wikipedia.org/wiki/Nvidia_NVENC#See_also

NVIDIA Video Codec SDK v12.1 > NVENC Application Note: https://docs.nvidia.com/video-technologies/video-codec-sdk/1... :

> NVENC Capabilities: encoding for H.264, HEVC 8-bit, HEVC 10-bit, AV1 8-bit and AV1 10-bit. This includes motion estimation and mode decision, motion compensation and residual coding, and entropy coding. It can also be used to generate motion vectors between two frames, which are useful for applications such as depth estimation, frame interpolation, encoding using other codecs not supported by NVENC, or hybrid encoding wherein motion estimation is performed by NVENC and the rest of the encoding is handled elsewhere in the system. These operations are hardware accelerated by a dedicated block on GPU silicon die. NVENCODE APIs provide the necessary knobs to utilize the hardware encoding capabilities.

FFMPEG > Platform [hw video encoder] API Availability table: https://trac.ffmpeg.org/wiki/HWAccelIntro#PlatformAPIAvailab... :

> AMF, NVENC/NVDEC/CUVID (CUDA, cuda-nvcc and libnpp) (NVIDIA), VCE (AMD), libmfx (Intel), MediaCodec, Media Foundation, MMAL, OpenMAX, RockChip MPP, V4L2 M2M, VA-API (Intel), Video Toolbox, Vulkan


> I think encoding and storage are both quite expensive. You want to encode videos that are likely to be watched in the most efficient ways possible to reduce network bandwidth usage, and every video needs at least some encoding.

The minimum possible expenditure on encoding is "we require videos to be encoded like so; here's our help page on how you can do that".

It's not even slightly expensive.


There's still a "live" aspect to any YouTube video, though, since you can change the quality that's sent to you at any time.


Interesting background. I worked twice in digital video, once ~2000-2001 (ancient history - early IP, ISDN, the dead-end of H.323, bonded GSM channels, etc.) and once ~2009-2010. The second episode was fascinating, we specialised in mobile video at a time when it was just appearing on the consumer market. Most of the global mobile device manufacturers were clients. It got to the point where they would build the hardware and we would get airdropped in to their R&D to make it work - they had no idea how performant the architecture was going to be, because they'd never tried it. We also built the server side, the billing architecture with revenue share, carrier billing support (only possible with device preloaded apps due to Google Play (then "Google Apps"?) store restrictions on third party payment mechanisms), etc.

Encoding, scaling and transcoding are relatively cheap for stored content, and relatively expensive if you want real or near-real time.

If you want DRM (digital rights management = ~ineffective copy protection) then you need to add a bit more overhead for that, both in terms of processing and latency. If you need multi-DRM (different DRM systems for different devices the consumer owns) and a good cross-device experience (like pause and resume across devices), it gets real hard real fast.

It helps to be targeting a standard platform, for example a modern widescreen TV with H.265 support and solid 4K decoding. Otherwise you need a different version for every resolution, a different version for every CODEC, a different version for every bitrate, etc. We had great experience adjusting bitrates and encoding parameters for different device categories, for example if you had a certain phone and you ran it at max spec it might look great but if you were looking to preserve battery and were running on battery save mode the decode would fail and you'd get choppy performance and stuttering audio. This sort of thing was rife then.

As a series of specialist video providers emerged, ~all the cloud providers went and added these services, basically 95% of which are frontends to ffmpeg and some internal cloud storage scheme or similar.

Finally, billing is hard. Users have an expectation of free content now.

No experience with real time stream economics, but saw the inside of LA's stadium video control center one day. Didn't look inexpensive, I'll tell you that much. Probably for events with multiple cameras you're mostly paying site fees, ie. reliable bandwidth, humans, mixing desk if required. For studio broadcast these costs will be reduced. Both will have a slight real time encoding tax vs. stored content. If you want to figure out how to do it cheaply, look at the porn industry.


> basically 95% of which are frontends to ffmpeg

I wonder what the approximate net global economic benefit of ffmpeg is to this point?


Or the net global economic benefit of discrete cosine transform... "We're not in Kansas anymore, Toto" https://en.wikipedia.org/wiki/Discrete_cosine_transform#Hist... https://en.wikipedia.org/wiki/Discrete_cosine_transform#Appl...


quick someone post the xkcd

(https://xkcd.com/2347/)


> The biggest cost is as you imagine the streaming - getting the video to the viewer.

I seem to remember Google own some network infrastructure? That saves some money. On top of that at their size you are going to get things cheaper.

> The other big expense - people! All those engineers to build back and front end systems. That’s what ruined us - too many people were needed and not enough money coming in so we were burning cash.

There should be economies of scale on that. Its harder to build and maintain bigger systems, but the work required does not scale linearly with size.


Google own vast network infrastructure. The day Google acquired YouTube (I was there) they discovered that YT was on the verge of collapse and would run out of bandwidth entirely within months. There was an emergency programme put in place with hundreds of engineers reallocated to an effort to avoid site collapse, with the clever/amusing name of BandAid.

BandAid was a success. YouTube's history might look very different if it wasn't. It consisted of a massive crash buildout of a global CDN, something that Google historically hadn't had (CDNs aren't useful if everything you serve is dynamically generated).

One reason BandAid could happen was that Google had a large and sophisticated NetOps operation already in place, which was already acquiring long haul unused fibre wavelengths at scale for the purposes of moving the web search index about. So CDN nodes didn't need to be far from a Google backbone POP, and at that point moving bits around on that backbone was to some extent "free" because the bulk of the cost was in the fibre rental and the rackspace+equipment on either end. Also it was being paid for already by the needs of search+ads.

Over time Google steadily moved more stuff over to their infrastructure and off YouTube's own, but it was all driven by whatever would break next.

Then you have all the costs that YouTube has that basic video sites don't. ContentID alone had costs that would break the back of nearly any other company but was made effectively mandatory by the copyright lawsuits against YouTube early on, which Google won but (according to internal legal analysis at least) that was largely due to the enormous good-faith and successful effort demonstrated by ContentID. And then of course you need a global ad sales team, a ton of work on ad targeting, etc.

This story explains why Google can afford to do YouTube and you can't. The reality is that whilst they certainly have some sort of internal number for what YouTube costs, all such figures are inevitably kind of fantastical because so much infrastructure is shared and cross-subsidised by other businesses. You can't just magic up a global team of network engineers and fibre contracts in a few months, which is what YouTube needed in order to survive (and one of the main reasons they were forced to sell). No matter what price you come up with that in internal booking it will always be kinda dubious because such things aren't sold on the market.


Definitely a few factual errors here that ought to be corrected.

On day one of the acquisition, Youtube's egress network was at least 4x the size of Google's, built and run by two guys. This shouldn't be a shock, you need a lot more bits to serve video than search results. For the hottest bits of content, third-party CDNs were serving videos and thumbnails.

There was no collapse imminent, but there were concerns about getting YouTube and Google infrastructure on a common path. BandAid was named as such because the goal was "not to break the internet." It was a small project with maybe a dozen members early on, all solid people.

YouTube had its own contemporaneous project, née VCR - "video cache rack". We did not necessarily believe that BandAid would arrive in a reasonable amount of time. Generally Google has two versions of every system - the deprecated one and the one that doesn't work yet.

VCR was a true YouTube project, 3 or 4 people working with one purpose. It was conceived, written, physically built and deployed in about 3 weeks with its own hardware, network setup and custom software. I believe it was lighttpd with a custom 404 handler that would fetch a video when missing. That first rack maxed out at 80Gbps a few hours into its test.

After several months, Bandaid debuted. It launched at ~800Mbps and grew steadily from then on into what is certainly one of the largest networks in the world.

YouTube mostly moved things to Google based on a what made good engineering sense. Yes, a few of them were based on what would break next - thumbnails leaps to mind. Search, which we thought was a no-brainer and would be "easy" took more than a year to migrate - mostly due to quality issues. Many migrations were purely whimsical or some kind of nebulous "promo projects." Many more stayed separate for more than a decade. When a company gets large enough, the foolish consistency tends to trump a lot of other engineering arguments.

To the ancestral poster, do not despair. You can transcode, store and serve video, as you've likely surmised it's not all that difficult technically. In fact, it's so much easier now than in 2005.

What makes a great product is hard to describe and not always obvious. The economics will depend on your premise and success. "the cloud" gets pricey as you scale. There will be a long path of cost optimization if you get big enough, but that's the price of doing business at scale.


Even more detail:

YouTube continued building their own POPs AND network for ~18 months AFTER the google acquisition. Google did not have the network capacity to carry it. (Fun fact: YT had 25 datacenter contracts, and opened them at the rate of 1 a month) starting from March 2006 - 25 contracts were set up in 2 years. At the time of the google acquisition, there were, ~8. (So yeah, 17 additions over the next ~16 months)

Also YT had a far more streamlined (but less optimized) network architecture. Traffic was originally generated in the PoP and egressed out of the PoP. The was not a lot of traffic going across backbones (Unless if it was going to a settlement free peer). Initially, it was egressed as fast as possible. This was good for cost, not great for performance, but it did encourage peering, which also helped cost. Popular videos did go via CDN initially.

YouTube had a very scalable POP architecture. I agree with area_man that the collapse was not imminent. (See 17 additional pops) There were growing pains, sure, but there was a fairly good system.

Also, as it relates to bandaid from a datacenter and procurement perspective, the original bandaid racks were in YT cages. YT had space in datacenters, and Google didnt. (SV1, DC3). Also, the HWOps tech who went on-site to DC3, ended up tripping a breaker. (They were almost escorted out).

Side-note: the evolution/offshoot of bandaid into the offnet caching system - now called Google Global Cache, is what really helped scale into provider (end-user) networks, and remove a lot of load from their backbone, similar to an Akamai, or a Netflix open connect box. Last I heard GGC pushed significantly more traffic than the main google network.

The google netops teams that were of help in the first year of acquisition was the peering team, and some of the procurement team. The peering team helped us leverage existing network relationships, to pick up peers (eg: SBC)

The procurement team gave us circuits from providers that had a long negotiation time (eg: sprint)

Google also helped YouTube procure various Juniper equipment, which was then installed by the YT Team.


Thanks for the corrections. I was indeed thinking of thumbnails w.r.t. "what would break next".


This comment is a great answer to those commenters who say "Google bought YouTube/Android/..., they haven't invented anything since Search" miss the actual hard part entirely.

(same for Meta wrt Instagram)

These products that have been scaled by multiple orders of magnitude since original acquisition are like ships of Theseus; almost everything about how they work, how they scale and how they make money, have completely changed.


Great write up! I worked for a few months at a Google datacenter and a few times got to see the fiber endpoints.

Though the idea of such networks not being sold on the market makes be ponder if starlink will come to provide such a service. They’d need to scale out their laser links and ground stations.


There are hard physical limits on how much bandwidth Starlink can provide to do with spectrum allocations, so it will always be a somewhat boutique service and indeed prices for end users might end up climbing as word spreads and demand grows. They already practice regular dynamic pricing changes depending on demand within a cell. It doesn't make sense for corporate backbones.


Youtube owns a huge CDN to deliver video quickly.



Moderation seems like another big issue although the solution afaict seems to mostly involve shipping this work off to the Philippines or wherever and making people look at the most horrifying content imaginable for 40 hours a week at very low wages.


It’s not that expensive at YouTube scale. We are talking fractions of a penny per GiB transferred.


"not that expensive" is relative; it's still a lot of money. Sure, it's not trillions of dollars, but it's still billions of dollars. YouTube has historically not returned a net profit (and I haven't heard of that situation changing).


Do you have a public source for that? From what I’ve heard YouTube has been profitable year years at this point.


> Do you have a public source for that?

YT financials and P&L were not broken out in audited financial statements back in the day.


Still aren’t. Alphabet only publishes YouTube revenue.


... and therefore, saying they're "profitable" is meaningless.

Allocating costs for things from Google that they use (e.g. the Ads system) is difficult. The problem with any subsidiary.


Yt basically got unusable without premium though.

I have premium, my wife accepts all of those ads


Still on Ublck Origin and I don't see any ads and all the videos.


I did that for a while and then even just hearing my wife suffer through it made me upgrade to the family plan. Now I invited some other family members and we are all enjoying that premium bitrate no-ad lifestyle.


I have uBlock Origin.


That doesn't work on my LG tv


PiHole if the LG TV is an invariant. Otherwise, I believe casting from your ad blocked device should make the TV itself ad-free (Though for YT, "casting" may in face just be "using the app").


Skill issue.


It's impossible for me to watch without premium too


Infrastructure at scale is not as expensive as cloud vendors would have you think. Reaching scale and being a first-mover gives a significant advantage, because deploying a CDN with caches well connected with(in) every major ISP around the world takes time.


> YouTube has historically not returned a net profit (and I haven't heard of that situation changing).

I'm sure that's what Google's accountants would love us (and the IRS) to continue to believe.


Not sure, one could say they use their dominant search position and revenue to serve video at a loss and distort the market, making it very hard for anyone without an existing money printing machine to bootstrap a profitable video site. See vimeo.


that was my assumption when i read they weren't profitable

and also that streaming and storing video at that scale is almost a natural monopoly, with how much it must cost and how hard it would be to compete without existing resources


The important question is how much can an ISP charge for a broadband without YouTube and Netflix service. They do not pay even the fractions of a penny everyone else has to.


Yeah, YouTube is big enough to put their own cache nodes directly in ISP datacenters


Which would help for the crazy popular meme videos, but I bet the long tail on YouTube is insanely big, even if you did have the “watch next” engine getting in on the game steering you toward content already present in your nearby caches.


YouTube is estimated to have 1 exabyte [1] of data. Petabyte level storage is not unheard of [2], and a gateway server with 5PB storage would cover ~0.5% of all YouTube videos, which should be sufficient to serve a very high percentage of the most popular videos.

They can still afford to serve the occasional obscure video from the origin servers.

[1] https://www.qqtube.com/blog/how-much-storage-does-youtube-ha...

[2] https://www.qnap.com/solution/petabyte-storage/en/


Yup! This is the reason why its so cheap for them. Other companies in similar positions have cache nodes in the ISPs and this dramatically lowers the cost


Bingo, and also in Internet Exchanges - every ISP at KCIX the exchange has direct handoff to YouTube, Google, NetFlix, Cloudflare, ...


Next iteration of this will be video generated on demand with GenAI running closer to the request, ideally at the request.


> The biggest cost is as you imagine the streaming - getting the video to the viewer. It was a large part of our variable cost and we had a (literal) mad genius dev ops person holed up in his own office cave that managed the whole operation.

Sort of make's Cloudflare's R2 look more impressive since they do not charge for egress.


I'm digressing from the topic, but R2 looked good on paper, but have a long way to go in terms of reliability.


Hmm could you elaborate?


Curious: was your distribution client-server or peer-to-peer?

Or both, similar to Skype's supernode model?


The overwhelming majority of "legitimate" video streaming sites operate on a client-server model, which allows videos to be watched in web browsers, and on mobile devices (which don't generally do well in P2P as they find uploading difficult).

And generally torrent-based streamers don't hire financial analysts :)


Thankfully the FCC definition of "broadband" is getting more symmetrical over time. And doesn't webrtc take care of connecting browsers pretty well?

The current definition requires 20Mbps of upload, and uploading a youtube-quality video to two other people would not take a big fraction of that. Though it would help if ISPs stop trying to set bandwidth caps at <5% utilization levels.


It's not only the amount of upload bandwidth and the usage caps (although those are both big issues).

It's that you're also probably going to get CGNAT - and maybe even a firewall blocking unusual ports.

And you're going to be running the power-hungry data connection at least twice as much, bad for battery life.

And mobile connections are less reliable - transitions between towers, going through tunnels, switching between 4G and WiFi.

And mobile OSes are eager to suspend things - especially things that are using a lot of data and battery.


That's a problem if all your users are mobile, yeah.

I'm thinking of the situation where most of the users are using home connections and have power cables always in or in reach.


I'm actually kind of surprised serving media isn't trivial and solved yet.

Routers have ASIC switching, why can't we have dedicated cache appliances with a bunch of RAM and some kind of modified GPU with network access and crypto acceleration in each core?


Have you seen Netflix OCAs? Current off-the-shelf hardware goes really long way

https://openconnect.netflix.com/en/appliances/

https://news.ycombinator.com/item?id=32519881


Humorous


Thanks for sharing. Is it possible to join his team?


Gilfoyle: ":smiling"


I disagree. Storage is expensive. Think of an old video uploaded 15 years ago with total view count of 1k. You can't just put it to a cheap cold storage. Someday, somebody is going to watch it and you have to retrieve it instantly or that somebody will be disappointed.


You can.

Yt for example deletes your 720p after a while and replaces it with a potato.

And if you watch a old not relevant yt and it starts after 10 seconds instead of now, no one really cares.

You can put that old highly encoded potato at your huge and cheap storage system de located somewhere around the globe were it's just cheap (energy).

You can also calculate in the time for a band robot and only store half or the first minute of that potato on your cheap storage and let the robot grab the rest of it.

After all if video is your main thing plenty of weird optimizations start to make sense.


Or run 2-3 adds for 2 minutes each. Gives you plenty of time to fetch the video in the background.


No way, man! They don't want to interfere with the viewing experience of their primary revenue source! (Ads) :-)

They are definitely not fetching video in the background.


Fetch from cold storage to thier CDN; whilst they fill your bandwidth with ads.


I WISH it would be prefetching video in background while showing ads.

But no, always goes to spinny wheel buffering after ad ends. Oh, and thats after having some spinny wheel to load the ad in first place ffs.


> Yt for example deletes your 720p after a while and replaces it with a potato.

always bugs me to hell when i encounter a "high definition" video that has worse quality than pal/ntsc


But in a few years we will have 8k for all those videos with super AI upscalers.


ppl not big on 8k jokes anymore it seems


Tape robot time is NOT measured in minutes but hours. Also you wouldn't pull individual files like this out of tape.

Tape is for long term sunk storage, not cold infrequent access like a youtube video.

I know aws glacier has an "expedited retrieval time" of 1-5 minutes, but that is not how typical tape setups work. Frankly I would be very interested in what actually hides behind that product.


Glacier is not actually tape, the fancy tape-robot videos nonwithstanding. Most of it is just regular old S3 running on outdated storage hardware


Interesting. Why does it take hours to retrieve then? Network/disk bandwidth really is that bad? Also do you have a source for this?


It's mostly just bandwidth prioritization, slotting large transfers in when there is excess bandwidth.

You can tell this is the case for at least the flexible retrieval tier, because small objects can be returned in a few minutes, whereas larger requests take hours - if the files were actually on a tape drive somewhere, small requests couldn't be fulfilled dramatically faster than large ones, given that tape has shitty random-access performance.

(I used to work on a downstream team at AWS)


I believe you, I can see how it would make sense that AWS would create a tier to exploit the spare capacity in S3 disk bandwidth, just like they did for EC2 spare VM capacity with spot instances. Still it doesn't make intuitive sense to me how the performance AND the price can be so far. That's why I'd love a longer write-up if you know of any.

It's also weird that the retrieval gives you a regular fast S3 object you can then access. Given that it's already on that hardware, is a copy even happening?


To discourage access, maybe? That pushes you to use more expensive options.


This doesn't hold, not having the cheap option encourages you to use the expensive option even better...


You can use big disk, but not be able to access all the files on the disk with the same frequency, so you have 20% of the disk dedicated to hot storage, and 80% of it to cold storage. Cold storage access is queued , so the 1-5 minutes can come from there.


Yeah I looked closer and I think they are basically just packaging up some kind of offering on top of spinning rust for the two first glacier tiers.


Google gave up on tape a while back. Latest Google search indicates it is only used for air gapped backups. I don't OP was suggesting using tape though, especially with technologies such as hybrid SMR.


That's all true, but I don't think anyone mentioned tape storage.


100% that Google put videos on colder storage. Hot cached videos in memory cached at all possible locations. And cold videos stored compressed in a much cheaper storage container. The difference is maybe 500 to 3000ms.


I regularly run into videos on youtube that obviously came from tape because they stop for minutes to load.


Really? I don't think I've ever had a YouTube video take longer than 30 seconds to load on a good Internet connection


Really, I don't think my gigabit optical internet connection is responsible.


For the streaming factor specifically, I can at least offer something resembling an answer: Google. In the early 2000s they bought up a bunch of dark fiber and peered with all the major US ISPs, and they were able to do this because no ISP wants to be the one that blocks or degrades Google. As a result they were able to host video streaming on their network without immediately being shut down by Comcast and co. Instead they had to go after Netflix.

Google has a lot of custom encoding silicon, too, AFAIK.

Storage is the biggest question of the three. Linus Sebastian specifically called this out when YouTube started really pushing to make the non-Premium experience dreadful. There isn't really some secret special sauce you can buy or make for storage. Literally everything is being stored with the same hard drives, SSDs, discs, or tapes you can just go out and buy. The only specialization you can do is build or buy equipment to handle extreme numbers of them. Google does buy these in bulk, so they probably get a discount on storage, but it's not something that would make storage costs just go away.


> There isn't really some secret special sauce you can buy or make for storage.

Well, there's this:

https://www.microsoft.com/en-us/research/project/project-sil...


The bandwidth costs are the key. Good luck getting rates anywhere near what Google’s effectively are. Spoiler: you can’t. You probably can’t realistically get to 5x their costs, byte-for-byte.

Which makes competing with them effectively impossible except for a very-few other megacorps.


Bandwidth costs are actually free, so this isn't exactly accurate.

Most bandwidth is via settlement-free peering with thousands of ISPs around the world. At least that's how we did it at Twitch, and how we did it when I worked at a large CDN before that. There are still costs for backhaul, interconnect, colocation space, dark fiber, network hardware, and transit to fill the gaps. But this talk about how "Google can magically do it 5x cheaper" is nonsense.


I think vundercind was implicitly including Twitch in the phrase "except for a very-few other megacorps".

If you're wondering, if you can't get peering, you wind up like Twitch. In South Korea. South Korean telecom law explicitly shifts the capital expenditure costs of ISPs over to other online services, which is sort of like the fucked-up opposite of Net Neutrality regulation. So Twitch was being bled dry to pay for the chaebols' network expansion. Hell, even after Twitch left, South Korea fined them for leaving!


> Hell, even after Twitch left, South Korea fined them for leaving!

...is there some reason for Twitch to pay such a fine? Were any grounds stated for it?


> In the wake of them ceasing service, Twitch has been fined for 435 million Korean won – but not for the entire service being terminated. This is related only to them making it so users in South Korea can’t access VODs on the platform, something seen as a direct violation of South Korea’s telecom laws by Korea’s Telecommunications Commission (KCC).

> According to Yonhap, the KCC made the decision that Twitch terminating the ability for users in South Korea to access VODs wasn’t necessary to keep the service alive. When asked to justify their claims, Twitch declined due to contractual obligations related to keeping user and site data private.

> Additionally, Twitch would have to present evidence that their decision to gradually take features away from South Korean users & leave the country was necessary. This means that Twitch isn’t likely to return service to South Korea any time soon.

> There’s also a good chance Twitch will be forced to provide refunds for those who have been affected by the service being discontinued, with the KCC warning Twitch that they need to prepare “user protection measures” as they cease service in the country.

https://www.dexerto.com/twitch/south-korea-fines-twitch-over...


Is there even an enforcement mechanism other than arresting executives who go to South Korea or seizing assets that are there?


> There are still costs for backhaul, interconnect, colocation space, dark fiber, network hardware, and transit to fill the gaps.

Genuine question - aren't those gaps essentially what make a video streaming service operate at scale though? It'd be like saying "ya this bus can get everyone from NYC to Philly at $10 but doesn't stop anywhere in between", or am I missing something about all of those gap filling components?


To be fair, if they said "ya" they're probably in Scandanavia...


i


That isn't free. For every terabit of bandwidth, you have to physically build out a terabit of network. Not even remotely free. Having already built the network, and being already paying to run it, you can then use it for free, yes.

This is the model for all networks, indeed, most businesses - they pay a big upfront and moderate recurring cost to make a fast network (or restaurant or widget factory) and then sell it in slices with a large freedom to choose a pricing model. Pay per terabyte is a pretty reasonable way to pass on the network's fixed cost to consumers, just like part of the cost of the restaurant meal covers the interior decorations, even though the decorations don't actually cost more the more people eat, until the restaurant gets so busy it needs to expand.


  > For every terabit of bandwidth, you have to physically build out a terabit of network.
A lot of the content will be cached at/near the edge. I imagine a lot of time is spent watching popular videos.


Cached with what? The $0 cost hardware you gave to the ISP for free? You expect every ISP is just going to give you free CDN services?


I didn't say it's free of cost. My point is that a terabit of bandwidth delivered to consumers in a particular geographical area doesn't require a terabit of bandwidth from the center to that area, because much of the content can be cached.

So the 'terabit of network' the content provider needs to build need only span a few hundred feet within a single building.


I never said it did. You still have to build whatever you want. A terabit of edge network is a terabit of edge network that has to be physically built.

Not all networks are the same. Some have terabit backbones and gigabit edges, some have the reverse. We'd still call both of them, roughly, terabits of network, and you still have to build them. The one with the terabit core might actually be easier because you have less of the expensive really fast equipment.


OK I don't think we're disagreeing on any factual point.

The only point I wanted to make is that the 'terabit of network' doesn't have to be end to end, so it's not as scary as it may sound.


In some ways it's less scary: no long inter-site fiber runs that I assume are an absolute nightmare, and no renting those same runs at exorbitant rates. All your hands-on work remains in the datacenter, which is set up to make it a breeze. In other ways it's more scary: you have to negotiate with a lot more counterparties and visit a lot more datacenters.


Don't Comcast and friends throttle any peering points you use, until you hand over $x per subscriber per month for them to stop doing so?


Typically the problem is that Comcast won't peer with you. They always use the excuse that they only peer with equals, and since they aren't sending you as much traffic as you're sending them, it's not in line with their peering policy. This, of course, is a problem of their own creation; they're cable so customers all have a tiny amount of upload bandwidth and a large amount of download bandwidth. It is unlikely their policy permits peering with anyone. The ultimate effect is that you have to buy transit from a Tier 1 ISP instead of with the consumer ISP directly, costing you money. There are, of course, backroom deals where they sell a subscription tier that doesn't include video and then they throttle all the video. That's different than the peering issues; there is enough capacity to send all of your packets to them, they just throttle them on their end to squeeze money out of their customers.

I've worked at 2 ISPs and we obviously didn't have this peering policy, because it's dumb and it breaks the Internet. We also didn't throttle video, because it's dumb and breaks the Internet.


laughs in Australian My local DC charges $333 for 10TB (upload/download inclusive).


Bandwidth costs are really not that bad.

95th billing, adaptive, progressive playing and just cap buffer to the minimum to keep playing. Equals ~$1M/month for +10 tbit/s egress.

Source: Worked at one of the largest bandwidth consumers in the world.


Could anyone help me understand what this means in practice, at scale? I keep doing the math and getting numbers either way to small or way too big. If each users streams 200MB over a 10 minute session, and they arrive on the site uniformly distributed over time (false premises, I know), how much would 1,000,000 sessions a month cost? I get confused by terabits and terabytes and how "per second" bandwidth metering actually works. I keep getting to the conclusion that for $1,000,000 you could handle 16,200,000,000 sessions per month, which is more traffic than phub gets and less money than I assume they spend.


> Source: Worked at one of the largest bandwidth consumers in the world.

Mindgeek? :-)


They're no longer Mindgeek - they go by Aylo now. I worked for the company that hosted PornHub.


Just checked his LinkedIn and yes, literally lol


Just checked his LinkedIn and yes, literally lol


Oh wow, I didn’t even snoop, just a guess.


Assuming 10 Mbit/s per stream that would serve 1 million concurrent streams 24/7. If we assume people watch 2.4 hours per day on average, it could support 10 million active users. Or a cost to serve for bandwidth alone of 1 USD/user per year.


Yeah I'm coming to similar numbers. I guess this is why the Cloudflare CTO has said several times on hackernews that at scale bandwidth is free.


Pretty sure I didn't say that, but I know who did: Cloudflare's CEO (eastdakota). https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...


If bandwidth costs were very high a whole lot of the services we love would cost more or not exist.


I am so confused here. Either I’ve been doing high-bandwidth bit slinging extremely wrong for quite a while or a lot of HN has never done it at all and is opining on it anyway. It’s real money, IME.


My home connection uploads at 681mbit/s (just did the test over wifi) for 40 euros/months. At that price, I'd get 13tbit/s for 800k euros.

It's a bit surprising that you were not getting significantly better prices than individuals.


Price business pipes where they won’t cut you off for saturating that 24/7.

[edit] and that have good deliverability worldwide, no weird paths to other consumer IPs that intermittently fail to route or inexplicably have dial-up transfer speeds. And have anything like a real SLA.


I don't think they legally can in number of European jurisdictions. This isn't like Comcast overselling their copper cables - brand new fiber paid for by the EU comes with certain obligations.


Try uploading at that rate for a few months non-stop and see how long your residential ISP will continue doing business with you.


That can be taken out by a particularly unfortunate shovel or truck though, and isn’t sufficient to run a business on.


For 800k I would have 20000 different connections. This would allow me to survive many unfortunate shovels per month.


Anyone who says "bandwidth costs are not really that bad" should spend 2 minutes playing with the AWS cost calculator.

You would think the VMs are the expensive part, but no, egress is easily multiple times the cost of the compute.


Cloud bandwidth pricing has nothing with do with costs and everything to do with lock in .

You can get 100x cheaper and unmetered at a low cost provider like OVH or hetzner or similar bare metal data centers .

It doesn’t even need significant monthly commits to get that pricing if you are running video streaming at scale you are not running on AWS or even tier 2 like OVH for sure


Are they actually unmetered or are they unmetered*?

* up to 2TB per month


It's unmetered*

* if our upstream is saturated we're going to look at our biggest users and if the number is really big we'll send them a polite email to please reduce it or pay more.

There are reports of people getting emails from Hetzner after sending multiple Gbps continuously for several months. That's the level you have to reach before the * kicks in. Only 1Gbps servers are unmetered, so you'd have to have several.

If you want to know a better approximation of their true cost just look at their non-unlimited plans: 20TB/month included for free; 1€/TB (excl VAT) after that.

I have one more interesting data point to add: I was quoted 950€/month for a dedicated 10Gbps between Berlin and Amsterdam (about 600km) plus peering at AMS-IX, or 300€ for 1Gbps. (They're not secretive and you can just ask for a quote using their sales contact form). Extrapolating, it seems that 1€ is worth about 2.5 petabyte-kilometers, at least within the dense interconnections of continental Europe. About twice the price of shipping a petabyte of hard drives the same distance.


> If you want to know a better approximation of their true cost just look at their non-unlimited plans: 20TB/month included for free; 1€/TB (excl VAT) after that.

I will point out that this is still about 50x-80x cheaper than Amazon. Not far off the claim of 100x.


Mhm. Save literally 98% of your egress bill by avoiding AWS, GCP and Azure.


AWS is particularly bad at this for S3.

Cloudflare and most other Object Storage Providers either fully free egress for all users or at-least for inter-cloud transit so you can then put a free/cheap CDN like Cloudflare in front and not pay all that much for b/w.

AWS refuses to participate. Costs of retrieval of all data plus associated bandwidth is a so high for many people that they stick to S3 including me.


There are enterprise plans that offer significant savings if your scale is high enough.


Yes, it’s unmetered. Just colocation. You pay for the maximum rate basically not the amount sent, if that.

The cloud bandwidth you get on your VPS has nothing to do with costs at the scale where you’ve racks in data centers.


But anyone doing things at great scale, isn’t going with OVH. You could use it as an origin I goes but you’d still need CDN for decent content delivery.


Anyone doing things at great scale is buying many-gigabit connections directly. Which is also 100x cheaper than AWS.

The number above was a thousand dollars per month per 10gbps, and AWS would charge more than a hundred thousand dollars at the listed $0.05/GB price.


Note: that was for a dedicated 10-gigabit link from specific location A to specific location B, plus a peering at one large IX, without any access to the rest of the internet.

Nonetheless it does give a ballpark for the cost of bandwidth being a lot lower than people think. A 10G internet connection would be cheaper to provide in some parts of that equation and more expensive in others - should end up in the same ballpark.


By "above" I meant ckdarby's number.

And honestly 10 cents per Mbps sounds kind of high for raw transit, I interpreted it as a price for actual utilization.


It was more than 5 years ago and was not a direct commitment. Another company was being used for their data centers, and this was a lease/rental agreement of equipment; Think colocation model, but where you're like 50-80% of being the main client of the data centers.

Add 10-25% profit for that company to get closer to true "raw transit" pricing from the carriers directly.


Also, can you show me any public pricing that is 100x cheaper than AWS? With same QOS! Or are you just throwing numbers around?


What specifically is the QOS for AWS?

But sure, if you want citations for $0.10 per Mbps in bulk for transit, that's easy to find/beat.

https://he.net/ "Get BGP+IPv6+IPv4 for $0.06/Mbps!"

https://www.fdcservers.net/ip-transit Europe, North America: 10Gbps $499/month

And telegeography just sells information, but they had a blog post that's now three years out of date reporting that "In Q2 2021, the lowest 10 GigE prices on offer were at the brink of $0.09 per Mbps per month. The lowest for 100 GigE were $0.06 per Mbps per month."

You need to factor in that your utilization won't be 100%, but if you're comparing 6 cents for a Mbps and and 5 cents for a gigabyte, then the exact point where AWS is 100x more expensive is when your line is 36% utilized.


Yes, at certain scale you likely have deals with many CDNs.

At even higher scale (YouTube, Facebook or even Netflix ) you are going to be putting content caching servers at the local ISP PoPs : it is mutually beneficial to do so .


So you can piggy back on the CDNs that do the same until you can afford to do so, with the same performance. This isn’t a privilege only given to those cloud providers, anyone with a checkbook and a pulse can do it.


The key point is video streaming products are not impaired because of high cost of bandwidth, that only YouTube can only afford because Google subsidizes it from other revenue. YouTube is profitable by itself, the combination of premium and ads is more than enough to pay for it.

It is hard to directly compete on long-form video because of audience and content depth advantage Google has with YouTube.

There are successful niche players who are fairly large (like Vimeo or Twitch or even OnlyFans) who focus on specific markets that don't require social network advantage like corporate or smaller segments etc.

For general purpose media, creators are going to focus on the platform with most audience and vice-versa, very hard to break that.


Agreed, I guess my point in my original comment is; yes, bandwidth is cheap but you aren’t going to compete at any scale hosting at OVH as was proposed. And you’ll need a lot more than cheap VMs too.


It was not that OVH is competitive for streaming , it is that even likes of OVH is orders of magnitude cheaper than cloud, let alone actual setups for streaming companies whose cost data no one in this thread has access to.


To be fair, that is the penalty of being in a shared cloud. They are incetivized to keep their customers from using everything, everywhere, all at once.

Jump into a 'bare metal' datacenter and things can get much different.


Not even AWS. Any CDN. It won’t be AWS-bad (way, way, way under) but it ain’t good.


You’re getting scammed with the AWS cost calculator or anything more than $20/mbps


> Google. In the early 2000s they bought up a bunch of dark fiber and peered with all the major US ISPs, and they were able to do this because no ISP wants to be the one that blocks or degrades Google. As a result they were able to host video streaming on their network without immediately being shut down by Comcast and co.

This sounds like a good answer, but falls away drastically when you realise the vast majority of consumer of YouTube are outside of the USA, which in turn means so are those bandwidth costs.

Are you guessing or have I missed something here? I can't see how this could be a significant enough factor to make the global model work.


When does YouTube start deleting content?

They'll have to do that eventually, right?


I had an entirely private channel with a couple hundred 1-3 hour long HD recordings of video conferences / screen shares. Entirely private as in every video was private, no one but me ever had access to them. Last year the channel was nuked and the account banned because I apparently broke their rule on “impersonation”. Seriously. I could file a dispute, but in typical Google fashion that dispute disappeared into the void. Thankfully the YouTube ban didn’t affect other aspects of that Google account, and didn’t spread to my other Google accounts. I was worried for a minute since I have another account with >10k subscribers, and while I don’t do monetization, it would suck to have that nuked as well.

I suspect that’s Google beginning to trim unprofitable channels using a lot of storage: delete them for bullshit reasons.


So long as storage costs decrease exponentially I think they'll keep everything.

Especially if the amount of content uploaded keeps going up, so the relative benefit of deleting old stuff is small.


> So long as storage costs decrease exponentially I think they'll keep everything.

I agree. That's why I've recently made a practice of backing up things to which I'd regret permanently losing access.

https://ourworldindata.org/grapher/historical-cost-of-comput...


Relatively recently they updated their policy. If you don't login to your account for two years they will delete everything. Source: https://support.google.com/accounts/answer/12418290?hl=en

I have two accounts from my youth with mostly fansubs and "funny" vids. (A "Everybody Draw Mohammed Day"-Video that, 11 years after its upload, got band in Pakistan.) It includes two very successful (over 1M views combined) uploads of political TV shows. I will not log in to this this accounts. So this stuff will get deleted. So in the long run, stuff will disappear from YT.


I believe YouTube content was explicitly exempted from that deletion.


You seem to be right. This Google Blog post[1] says "we do not have plans to delete accounts with YouTube videos at this time."

[1] https://blog.google/technology/safety-security/updating-our-...

Would be nice if they would write that into the actual policy and don't just add a random sentence to a blog post.


If I remember right the original policy was to delete the YouTube videos and after pushback the YouTube exception was added.


You don’t need to keep everything “hot” all the time. Storage tiers exist for a reason.


Exactly! This was my go-to approach for reducing storage costs. Customers don't get spooked when they get an extra 1s delay for something they search once in a month. However, an extra 30ms delay in "everyday content" is a sure way to loose your users.

However, implementing this in practice is non-trivial. Knowing what is "everyday content" versus what is "once a month content".

To add more complexity -- you have these semi-predictable hype-waves especially two peaks in case of most YT videos where a "once-a-month" content becomes an "everyday" content before again becoming a "once-a-month" content. It feels you could specifically optimise for this -- reduce storage costs without sacrificing UX.


> To add more complexity -- you have these semi-predictable hype-waves especially two peaks in case of most YT videos where a "once-a-month" content becomes an "everyday" content before again becoming a "once-a-month" content.

Caching is hard but this sounds like an ARC would likely catch this, if it occurs on a small number of videos concurrently.


My 15 year old videos with 30 views still load nearly instantly. It’s as close to hot as hot is


Think about what you actually need to start a video. Maybe a dozen MB?

After that, you can plunge into colder storages and warm things up as you stream. Additionally, if you need longer to 'defrost' things, just cache a few more MB at the front. Cheat a bit by assuming 480p to start with if you need to; even less to store.


There is also location location location.

Maybe Google holds your content in 7 data centers round the world (~1 per continent for planned maintenance + latency + reduced oceanic fiber usage).

But with old rarely streamed content they might cut that down to just 3.


Speed/latency doesn't tell you much, because it's all on a hard drive somewhere.

The question is whether YT is serving up the one (redundantly-backed storage) copy they have of your almost-never-watched video, or whether it's serving it up from one of 1,000+ copies it's made across the globe for currently popular videos.


That doesn't match my experience. I have some unlisted videos that I or a small handful of friends might go back and watch once a year, and it takes several seconds of loading before they start playing. It's very noticeably different from the near-instant loading of most videos I watch.


But they are tiny 480p files yeah?


Old content has value for training AI models even if there are no human viewers anymore.


Yes, it's inevitable.

My guess is that the first step will be to re-encode all the non-popular videos with severe lossy compression.


They still keep the original uploaded files.

You know that because when they release a new format (eg. HDR or a different resolution), they re-encode from the original. Various people have tested that with moire patterns and various other ways to demonstrate if something was encoded more than once.


Once AI generated video gets particularly good, they could probably save a lot of storage by 'compressing' videos into textual descriptions and a few key screenshots that are reimagined on demand by the AI


getting closer to mythical storytelling of old every day


Is there a way to run these things deterministically with a seed? Totally AI-generated videos could just be compressed to text without any reference screenshots


isn't 90% of the videos compress-able to "cute kitty playing with something in a funny way"


What you are describing reminds me of Nvidia DLSS3 AI frame generation. Isn't the storage cheaper than compute, especially graphics?


YouTube generated $31.5 billion in advertising revenue in 2023.

https://www.businessofapps.com/data/youtube-statistics/#:~:t....

That's... a lot! Plenty of historical precedent for fully advertising-supported media with high expenses, from OTA network television and radio to free weekly newspapers... or inexpensive subscriptions to daily newspapers subsidized by advertising. Advertising has been paying the bills for electronic media for a century now.


The mind-blowing thing about this I found out recently, is that analysts think that Youtube is still losing money, even after making $31.5 billion in revenue!

From 2015 "Some unnamed person at Google reportedly said that the site is "roughly break-even." https://www.cbsnews.com/news/4-reasons-youtube-still-doesnt-... which in turn quotes https://www.wsj.com/articles/viewers-dont-add-up-to-profit-f...

From March this year (2024): "Analyst Michael Nathanson of MoffettNathanson estimates that YouTube TV will become profitable this year" https://www.newscaststudio.com/2024/03/29/youtube-tv-profit-...

Part of this could be because they pay out roughly 40-60% of the revenue to the content creators, that leaves Google / Youtube with half the revenue that they use to pay salaries, maintain infrastructure, including storage, hosting and serving content.


YouTube TV !== YouTube

YTTV is their "cable" product


Netflix spends over 50% of it's revenue on content production and licensing ($17B out of $30B), and they made $6B net profit in the past year.


Netflix needs less storage than YT, because not everyone can upload to Netflix. This also makes edge caching easier.


Netflix doesn't need to subsidize anything. YouTube subsidizes a lot of videos which don't have ads.


Netflix serves a slightly larger portion of internet traffic than YouTube on the same amount of revenue. So whatever subsidization youtube is providing for those videos, is clearly outweighed by the monetization of the remaining videos. YouTube has higher revenue per GB of bandwidth served than Netflix.


Netflix technically doesn't serve anything that isn't monetized YouTube does, and to some degree non-monetizable content skews the numbers Also, due to the nature of YouTube a large amount of content is repeated but still taking storage

Netflix needs (1) a copy of say Top Gun Maverick (And maybe variations for different resolutions), YouTube needs a bazillion Copies of it with small variations, clips with different music, people "commenting" on it, people actually making commentary, Highlights, YTP's, etc all of it with a wider range of resolutions and also to way more people, plus the overhead of checking against their master copy of copyright infringement


Do you have a source for that? I am surprised that Netflix would be bigger than YouTube.


> Do you have a source for that? I am surprised that Netflix would be bigger than YouTube.

Sandvine is the usual source in the industry and they say: "Sandvine's 2023 Global Internet Phenomena Report Shows 24% Jump in Video Traffic, with Netflix Volume Overtaking YouTube"

https://www.sandvine.com/press-releases/sandvines-2023-globa...


I wonder whether Netflix has a lot less content to cache, so they can push much more to edges.


Of course it's break-even if you don't want to pay taxes but prefer to pay employees more instead. Sure maybe Youtube itself is break-even but how many people inside it become rich?


Youtube TV is their cable replacement, not related to the website at all really.


Yes. For comparison, Netflix has about $30B in revenue... paid up front for all of its content (something youtube doesnt) and accounts for a larger percentage of internet traffic (likely because of higher quality streams)... and they still made $6B net profit.


Netflix content is highly coachable (tiny library Vs YouTube), which dramatically reduces the cost of serving the data to users.


I think londons_explore means "cacheable" - able to be cached. But I don't think either would need to cache the whole catalogue ahead of time.


I thought it was more directed towards "Coachable" as getting to select and train /trend towards more profitable content


Yeah I thought that way too at first, it's an interesting take. These days maybe ML could help auto-coach promising content producers. For the other take: maybe pre-caching content that's more cost saving to cache than it is to not cache it can be aided by ML too. But replicating smaller libraries of content at all regional replica nodes is probably more straight forward at first.


Which also means that diversity is a bad deal for Netflix.

They have a direct financial incentive to have more people watch the same thing.


It's not a "bad deal" - there will be lots of factors to consider.


I think this is the answer. Also, storage and compute purchase prices are probably falling roughly as fast (or maybe even faster) than their storage and compute requirements go up. So even with their massive storage requirements, I wouldn't be surprised if they are more profitable each year.


Revenue is not net profit.


No, but it demonstrates they have at least $31 billion to make it all work.


Disclaimer: I worked at Google and occasionally did things for YT, but it was 2015 and before. I did look at their P&L, somewhat.

egress costs were enormous and YT was not profitable. I don't know if it is now, but I wouldn't be surprised to find it is. They sure have enough ads.

As several people say below, caching content around the world is key, so that not all requests are serviced in NoCal.


Bingo, YT has always been a loss leader for Google dominance. Only recently have they squeezed the ads knob to maybe generate a profit but I’d bet it’s nothing like the high margin AdWords cash cow.


It's rather interesting that there is the possibility that you can't actually run a good, for profit streaming service, based on ads. The current iteration of the YouTube recommendation seems to suggest that you have to at least remove the "good" part of the equation. You're also correct in that they squeezed the ad knob, but I fear that they squeezed it to much. YouTube is unusable without ad blocking or YouTube Premium.

The cost of YouTube Music is $11 and YouTube Premium, which include Music is $14. To me that indicates that you can run YouTube for a given user for around $3 - $5 per month. Trying to watch YouTube with ads, the shear amount of ads and the length, could be a sign that ads on YouTube is almost worthless, at least they seem to struggle to get $5 per user per month.

YouTube isn't going to die at the hands of competitors anytime soon though, because the cost will deter anyone interested.


YouTube Music is just a reskinned YouTube client using the same catalog, playlists, history, likes, etc as YouTube itself. I wouldn't put much into costs based on that price.


There's nothing in the world like the AdWords cash cow. They're a one-trick pony, but it's a really good trick.

Content ads: not nearly as much.


Visa and Verisign are comparable cash cows:

https://www.macrotrends.net/stocks/charts/V/visa/profit-marg...

https://www.macrotrends.net/stocks/charts/VRSN/verisign/prof...

Epic electronic health records software is not a publicly listed company, but I would guess they have massive profit margins too.


As long as we’re talking about epic: is it an exclusively self hosted product? I don’t understand how it’s so susceptible to ransomeware compared to nearly every other cloud platform. You never hear about someone’s salesforce being ransomwared or their servicenow being held hostage, but it feels like we can’t go a month without another hospital going offline.


As I understand, Epic be hosted on-prem or in the cloud. Being in the cloud doesn't eliminate downtime risk from ransomware, though. Even if you treat each PC like a dumb terminal, you're still going to have downtime to replace or repair them when they no longer work.

> You never hear about someone’s salesforce being ransomwared or their servicenow being held hostage

You may hear about business organizations being unable to do work due to ransomware. Likely nobody mentions their inability to access salesforce specifically, because 1. the data in there usually isn't controlled and 2. likely nobody is going to die if they can't log into salesforce.


It's so wild to see people so confidently being wrong in a comment. YouTube has seen profit since 2013, with margins growing each year.


Do you have any source on that? Google has never released separate revenue/profit details for youtube.


it's so wild to see people so confidently wrong. I was there. Were you?


I was there as well, and Youtube was wildly profitable.

See how easy it is to make random statements on the internet.


[flagged]


The most frustrating thing about this wall of text is that the time it would take to verify all the parts is exponentially more than the time it took to generate.


Unrelated to this thread and completely OT - I feel like you've just touched a live wire that I haven't seen a lot of conversation yet wrt. LLMs.

We've seen plenty about media literacy / verifying news over the years of course, but the time and surface area considerations will change exponentially if people can start generating fake news with an AI (especially audio and video).

And I think it's too naive to go straight from there to the Dead Internet Theory, because that's like switching from "believing everything you read on the Internet" to "disbelieving everything you read on the Internet" (which is necessarily wrong with the opposite percentage margin).

For this thread, I'm inclined to believe the original comment(er). I'm not going to look it up, look up the username, ask an AI, or really care all that much about the debate, because I'm not directly interested in the outcome.

I _am_ interested that, in my own mind, the original comment lost an enormous amount of credibility when the author then reached for ChatGPT to defend the claim. So the only thing I can do without sinking a large amount of time into looking for sources is to leave this thread thinking "I don't know, maybe I'll actually look it up one day, and evaluate it properly when it becomes relevant to me (probably never)". And maybe that's a better outcome than my initial "yeah seems like a legit HN comment" anyway.

EDIT: There's a second level issue too where the people using AI to generate "facts" won't themselves know whether they're generating true or false information - presumably(?) not the case in this thread.


so frustrating when you don't have any facts of your own, too.

Here's an answer to you & all the downvoters: just download an annual report or a 10-K for Google, for the years in question.


Since you are so certain that the data is available there, please cite the net income for YouTube that is supposedly in it.


since someone else has made a statement easily verifiable, please ask that person to support it.


Telling me to go sift through several 50,000 word documents kind of proves my point.

Realize, I’m not anyone up the chain claiming any knowledge about YouTube’s specific profitability in any given year. I’m just commenting about how frustrating the internet is becoming. HN can be cool because you’ll be in threads like this and someone who worked on some original piece of YouTube compression algo might pop in to offer insight.

On the other hand, multiple paragraphs of ChatGPT regurgitation are next to useless. If you really want to share that kind of thing, maybe link to the publicly available chat instead of quoting, so people can read it if they want to or ignore it at their leisure.


It proves nothing. AI is sometimes wrong. So is everyone. Most of the time it's regurgitating publicly-available data.

So rather than assserting it's all wrong, why don't you find one thing in it that is?


I don't know why you're getting downvoted. Efficiency projects like the transcoding ASIC are a big part of pushing YouTube to profitability, as well as the alternate revenue streams and heavy increases in monetization. Video serving is extremely expensive and difficult compared to everything else Google does.

Ruth Porat has been on record many times indicating that YouTube wasn't profitable in the 2010's. I think her public statements have only indicated that YouTube was free cash flow positive as of the 2020's, but I haven't found exactly where that happened - Google has experimented with a lot of different kinds of breakdowns of its finances. I assume that hiding the economics of YouTube is part of this (as well as protection against a zealous DOJ saying that Google's businesses are separable).


> I don't know why you're getting downvoted.

Because I'm not here to read a wall of text generated by ChatGPT.


I normally hate the ChatGPT spam on this forum, but this was not a bad use for it (assuming it wasn't lying).


> assuming it wasn't lying

that's the key issue with this kind of ChatGPT writing. Code you can relatively easily check for correctness - just run it. For analysis on this level, it really had to be based on facts and reality, not generated by a bullshit generator to be of actual use.


I actually went back to the source it pointed out - SEC filings and the call transcripts are all public. It isn't citing the most recent statements on YouTube by far, but the citation of 2017 was at least correct.

In this case, digging through all the material to find the factual basis is the hard part, and corroborating it is not (for those who care).


Why are you here, then? Download the financial statements. It's not like they're behind a paywall.


> Why are you here, then?

Different user, but I'm here to read what people are writing. IMO, pasting GPT content is about on par with replying to something with a LMGTFY link.


LMGTFY is equivalent to "you could look it up, so go do it." I'm not your research service.


No. 2013? See GP, see Google financial statements, etc. I do think it started turning a profit over the last 3 years, but can't confirm that.


Where is this shown in the 10-K or 10-Q filings?

http://abc.xyz/investor


So this is the reason YouTube started going downhill around 2015, Albert stopped working on it!

(It’s a joke, I know YouTube started going downhill the moment they’ve decided to squeeze every penny out of it)

E: do you happen to know by any chance why algorithm for recommendation become so shit?


I have an ad blocker, so they decided to "punish" me by not showing me recommended videos anymore. It was a blessing!


also with adblocker but i do get recommendations, though the quality/relatedness has gone downhill.

i suspect privacy legislation and this cohort thing to be at the bottom of this


Never said I "worked on it."


It's fairly straightforward -- use your fountain of money from search ads plus a huge infrastructure to support it to pay the costs rather than pass them on to users.

Operate at a loss to drive out all competition and prevent new competition from arising. Increase ad obtrusiveness to drive up revenue, and every once in a while increase creator payouts to keep creators on your platform, until hopefully the lines cross and you start to make money. Maybe. Every once in a while a competitor might make a go for it, and you'll have to reduce ads or offer more incentives to creators to drive them out. Sometimes you may have to lobby the government to help you out on this.

One bonus is you can use the goodwill from your video site to drive traffic to your search engine.

If you don't have a fountain of infinite money from your search engine (or if you don't currently operate a gigantic search engine), then you might not be able to pull this off.


OK but let's not forget that YouTube was originally an independent offering and Google's own competitor never got off the ground, leading to them just acquiring it.


Yes, but they were never profitable and if Google didn't buy them. They probably would of eventually failed or completely change their model to survive. What the original YouTube team did was was create a great brand and experience. Thats why Google's version never took off.


They probably didn't intend to be profitable because their goal was user acquisition at the time. Intending to be profitable from the start would have implied a totally different approach and priorities.


That's true and easy to forget -- at the time, YouTube offered far and above the best experience for content creators and uploaders. And their end-user experience (the flash-based player) was miles ahead of the competition.


> Google's own competitor never got off the ground, leading to them just acquiring it

They only put about 1.5 years of effort into "getting it off the ground" before acquiring YouTube in 2006. I think they just didn't want to try anymore when they knew they could buy the YouTube audience.

Of course, GV continued to exist for a few more years because Google, but still. GV always seemed half-assed.


It had a lot of fans because the video itself was of a higher quality than YouTube offered at the time.


I really liked the local player that would download videos continuously on my crappy dial up connection. You downloaded a file (a few kb) and it just perpetually downloaded the data until you came back hrs later and it was ready. Man I can't believe I was that patient.


Oh so a trust. We need a new Teddy Roosevelt or Taft.


It's purely a scale problem.

On the "Revenue" side you will quite probably need to have enough eyeballs that advertisers come to you directly to display ads and do so in volume.

On the "Costs" side you'd want to be big enough that you can just store your content in ~3 of your own datacentres, cache the "hot content" in a site or two per country then give away caches to ISPs (who will gladly host them in their own network for free).

Biggest cost will be bandwidth/streaming servers. Encoding/storage is comparatively cheap. If you were small you would likely start to do this from a few 100Gbps dedicated servers per continent. https://www.fdcservers.net/configurator?fixedFilter=15&fixed... If we set an average of 3-4Mbps per stream you're looking at each server handling 20,000 videos served and the hourly cost of the server would be around say $4/hour so you're looking at around $0.20 per 1000 video hours in theory, in practice it will be higher. Worst case closer to $0.50 per 1000 video hours due to utilisation rates.


Aren't big players colocating their CDNs at internet exchanges? Bandwidth should be essentially free for content delivery.


Yes, exactly. I'm saying that even if you're too small for that it's still pretty cheap.


It may be different at this scale, but in my experience fast-ish redundant file storage has always been extremely expensive even at lower storage sizes, while getting a 100 Gbps line is relatively cheap.


it's different at scale, and you mostly don't need SSDs since you have cache boxes, most your storage is videos that are very rarely used. You're looking at probably a rack of SSDs for every 5-10 racks of spinning disks X 3 datacentres. Even that would give you approx 50-90PB of usable storage (replicated so it's in all three sites) for a few tens of thousands a month.

Even if you just put it all on S3 infrequent, which would be one of the most expensive ways of doing it, it's still not really expensive compared to serving the content.


For streaming, google have caches in like every ISP network. Also majority of the people watches the latest and same type of content that is mainly served from the homepage, which is easier to cache and serve.

If you have the ipvfoo extension, you can see it in action. (its easier to see with IPv6)

https://github.com/pmarks-net/ipvfoo


I wonder if that's part of the reason for their algorithms to push the same videos to everyone, they're already cached at the edge so it costs them nothing?


More popular works are more likely to be enjoyable to more people. There is no really objective measure of quality for any creative work, and taste doesn't scale, so publishers bias for popularity as it's one of the few things they can understand.


> taste doesn't scale

The motto of so many social networks. Even if they don't know it.


The majority will enjoy and like whatever is pushed on them. Decades of radio and TV should be enough evidence for this. Music or video is not popular because people like it, but because somebody decided to make it popular by pushing it on people.


> taste doesn't scale

Wow, I’m going to put this on a wall.


They’re cached because they’re popular, not popular because they’re cached .


I would imagine that's too insignificant to factor into that particular calculation.


I would imagine it's an unintended side effect of the "people recently watched this so it's relevant" part of the algorithm.


This is the correct answer. Caches in local PoPs https://cloud.google.com/cdn/docs/locations


They even have a map.

https://peering.google.com/#/infrastructure#edge-nodes

It also says "Static content that's very popular with the local host's user base, including YouTube and Google Play, is temporarily cached on edge nodes. Google's traffic management systems direct user requests to an edge node that provides the best experience."


Although note the map is only city addresses. For example the "London" pin is on the notional point where "London" is, ie Charing Cross. There is no giant network interchange at Charing Cross, it's just a convenient place co-ordinate for "London".


There is a big difference between 'caches in ISP networks' and 'caches in public datacenters adjacent to local exchange points'. The first implies preferential treatment from ISPs, the second is commodity available to everybody. Your link implies these local PoPs are more likely the second case.


Yes. This is the secret to Netflix's success too. They don't need a CDN, the ISPs do that for them. (Net neutrality what?)


they run their own cdn.

it's arguable cheaper than buying but then again it's also the core business and outsourcing put's the whole operation in danger


What exactly do you think a CDN is?


Okay I'll answer your low-effort question in case a lot of people, like you, think they know what a CDN is but don't.

A CDN is when you set up servers in datacenters around the country and distribute content to them so that your content is closer to your users.

What Netflix did is create guidelines for ISPs that want to cache Netflix content at the edge. So Netflix isn't running a CDN themselves, the ISP acts as a CDN.


Seems like you're talking about OpenConnect. In that, Netflix ships an OpenConnect Appliance (a server for sending out Netflix video) to an ISP that meets the minimum requirements and agrees to set up networking to the OCA in a certain way.

> A CDN is when you set up servers in datacenters around the country and distribute content to them so that your content is closer to your users.

Yes, exacly - the server in the Netflix case is called an OCA, and is set up in datacenters all around the country - in this case they are owned by the ISP rather than being a 3rd party data center where the ISP has a fiber and the CDN rents space. Here's the thing though.... Akamai, Cloudflare, and Fastly all do the same as Netflix too. The ISP wins because it saves transit, the CDN wins because it doesn't need to rent dc space, and customers of both win because the content is delivered faster to the people using that ISP.

Or maybe you're talking about the peering guidelines? Hate to break it to you, every CDN is happy to peer with just about anyone with a mutual POP - keeping the transit bill lower is always a win.


Yes, that's what I was talking about. As a small competitor of Netflix, ISPs will laugh me out of the room if I suggest they install one (let alone many) servers in their buildings.


What? Netflix has their own cdn, they talk about it often.


My car runs on gas. I don't own my own network of gas stations, but I still have a gas tank that keeps me going.


>What is Netflix Open Connect? Open Connect is the name of the global network that is responsible for delivering Netflix TV shows and movies to our members world-wide. This type of network is typically referred to as a “Content Delivery Network” or “CDN” because its job is to deliver internet-based content (via HTTP/HTTPS) efficiently by bringing the content that people watch close to where they’re watching it. The Open Connect network shares some characteristics with other CDNs, but also has some important differences.


Encoding is largely super-linear for a single stream, so you just need enough cores for the intake * formats. Streaming is mostly chunking and a smart player that loads the right chunks at the right time. Storage is bottom dollar, use whatever the cheapest disks you've got that you can attach to fiber, then cache the hell out of everything.

So in short, the only "on-demand" component is encoding, and if you don't have an 'available in an instant' promise, you can do it on spot instances on the cheapest cloud you can find; The rest is just storage and distribution - if you own a world-wide network of datacenters for your successful advertising service, that's kinda an already solved problem for you - just allocate a few racks to a new service.

I of course downplay everything and simplify massively - but at a high level, it's just a lot of ffmpeg -> S3 -> html5 player. The harder problems are in the long tail - high latency, content licensing & geo fencing, etc.

Source: used to SRE for a video streaming provider (not YT), also former GG


So basically YouTube is profitable because it strongly synergizes with adwords infra. That sounds both reasonable and not a little unfair.


The value of YouTube isn't purely monetary. Controlling YouTube is extremely advantageous for Google, even if it isn't particularly profitable.

Besides the general advantage of having control over such a massive platform, which definitely plays an important part in the lives of hundreds of millions of people, Google likely views YouTube as important to control. If YouTube were a separate entity, it could e.g. freely choose their ad providers or even provides ads themselves, essentially creating competition for Google. Google also has trivial access to the data there and therefore the easiest access if they want to train AI on that data. Last but not least I think Google sees YouTube as vital for their corporate image and their social mission presented in Google Jigsaw.


The AI training aspect for sure, but I think is is also about not giving a competitor the tools to profile users and target ads the way they do.


The economics are simple:

> I know advertisements are a thing for YT, but is it enough?

Yes, it is -- virtually certainly. We can assume YouTube is profitable. It's not broken out directly in quarterly reports, but it doesn't make any sense that Google would still be running it after all these years (almost 20) if it weren't.

But obviously YouTube didn't start out as profitable. You need scale, which provides two things:

1) Marginal storage and streaming costs go down (Google is big enough to save huge amounts of money by running its own data centers, peering agreements, caching near customers, etc.)

2) More advertisers running more ads that can be targeted to more users whose preferences you know more about

So no, you can't run it profitably.

This is a classic example of a business that is only profitable at scale, that needs to lose a lot of money at first as it grows until it achieves scale. And it's not just scale on the traditional tech/users side, it's scale on the advertising side as well -- advertisers aren't going to bother running ads on your platform until you have enough users for them to care.

It's also pretty strongly a "winner-takes-all" network effects situation, where video publishers want to put up their videos where the viewers are, and viewers want to visit the site where all the content is. So if you wanted to create a YT competitor, I don't know how you'd convince content creators to post their videos to your site in addition to YT, or how to convince consumers to watch said videos on your site instead of YT.


Also scale on the supply side. It took time to get to a place where obviously the video should be on Youtube, and that's hard to replicate.

This applies to the content which would exist anyway, and then doubly to content created for Youtube. Grand Pooh Bear would be on Twitch anyway, but it doesn't really make sense for a Tom Scott, let alone "Corrections" which is a Youtube-only addition to Seth Myers "Late Night" show.

Likewise until it gets fairly "big" it doesn't make sense to officially put your music videos on Youtube. Today that's basically the main way they're getting seen.


It sure doesn’t feel free. YouTube has cranked the ads up so frickin’ high that I swear they’re quietly in panic mode about profitability.

Every month I notice the temperature of the pot is up a few degrees. This month it’s unskippable 15 second ads before most videos. Last month it was the first search result now being an ad. Before that it was how 5 second ads are now 7 seconds.

If I thought to write them all down I’d have a dozen more steps to share.

My kids now call it “the Bad YouTube” vs. YouTube Kids because the former is flooded with ads.


For most players streaming are not profitable. Youtube is likely profitable, but since Google doesn’t report separate income, we don’t know how much.

Netflix is.

Disney, peacock, Paramount, Max , etc are not profitable with the hope they can capture future monopoly standing.

Prime Video is likely also not profitable or break even given their studio investments (e.g. MGM, first party content).


> since Google doesn’t report separate income

In 2023, YouTube's brought in 10.25% of Google's total ad revenue, totaling $31.5B. That's up from $29.2B in 2022. Alphabet does not, however, report profit of YouTube. https://www.statista.com/statistics/289659/youtube-share-of-...

> Disney, peacock, Paramount, Max , etc are not profitable

Disney is profitable, you can check out last year's financials: https://thewaltdisneycompany.com/the-walt-disney-company-rep...


That’s the revenue


Actually Disney Streaming business was profitable for the first time last quarter - https://variety.com/2024/tv/news/disney-q2-2024-earnings-str...


Let’s see how it goes. CFOs can move a lot of spending around to hit a good quarter.


As someone who works in a large cloud company, there is a lot going on to create a data center. By data center I mean compute and storage. These large companies have perfected the economics and engineering needed to create a data center. They spend billions on R&D. They basically own everything in their supply chain. Smaller companies can't compete.

Additionally, they don't pay the same electricity and water bill that others pay for their data center. They get a discount because they are creating jobs.

Getting streaming to be cost effective starts from decades of R&D investment + getting low cost electricity and water + owning supply chain.


DIY (instead of the cloud) is the answer. If you're pushing terabytes+ from day 1 on a shoestring, you're going to want your own CDN. If you can manage a queueing system and the occasional wait, run your own (or rented physical) hardware for transcoding at as high a utilisation as you can.

Build vs buy pushes you to "build" early on when your margins are slim and your volume is huge.


The economics fundamentally changed a couple of years ago when cloudflare released R2. I don't know if anyone has built a streaming client for it yet, but R2 takes the largest expense (outgoing bandwidth) and zeroes it out.

Yor business would be wholly dependent on cloudflare. But if you don't have to pay for bandwidth, the economics aren't that bad.

(I ran a large porn site a lifetime ago, long before cdns were ubiquitous. If I was in the business today, I would absolutely put everything on R2 and make it work no matter how much client development it took)


You are missing compute aspect. R2 only solves the egress problem. YouTube accepts many formats and tries to stores them efficiently such that it makes seeking different timestamps in the video possible (and efficient). They perfected the art of encoding. Read about it.


I'm sure that infrastructure is valuable and useful, but might not be necessary. We had a simple encoder farm back in my kink.com days and it was a rounding error in our budget. I can only imagine it's cheaper and easier almost 20 years later.


Don't they simply use ffmpeg for that?


It probably costs them a ton of money, but they probably make a ton more so that’s OK.

Also: bandwidth gets a lot cheaper when you own the pipes.

And one that is less obvious: despite having hundreds of millions of videos available, a large contingent of people are watching the same ultra-popular ones. There are some economies of scale to be had there.


The thing that fascinates me isn't the bandwidth part, it's the fact that every video needs to be (to some extent) transcoded. If nothing else, most videos get encoded at several different bitrates and resolutions.

While I highly doubt Youtube is just plopping the videos into a queue and firing up `ffmpeg -i myvid.mp4 -c:v libx264 -y outvid.mp4` and presumably they some dedicated silicon (or at least FPGAs) handling it, but even still it's incredible how they can pull that off. Video conversion is fairly expensive!


Transcoding isn't that expensive.

Especially when you're already running ginormous data centers and can transcode on processors that are otherwise idle.

Also, video quality on YT is not great. When you're using ffmpeg to transcode a movie at home, you're probably doing it with fairly high quality settings because it's a movie.

But you can also configure ffmpeg to transcode very quickly if low quality output is fine.


Even still, doesn’t YouTube get 300+ hours of video uploaded ever minute? Even if you do the ffmpeg veryfast setting I would think it would still take a lot of resources.


Well sure, but they have a lot of computers... the point is that it's not disproportionately resource-intensive compared to storage and streaming.


Fair, and dedicated ASIC chips are almost certainly used; hell if nothing else I use VAAPI with a regular AMD graphics card on a mini-gaming-pc and get better-than-software performance, and I'm sure a multi-billion dollar corporation can get better hardware than me.


> especially at scale

The marginal costs go down a bit if your scale is truly immense. Google can afford to design/manufacture/deploy hyper-efficient custom silicon ASICs for encoding. Also because their critical mass of users provide valuable network effects, they can get away with particularly poor quality encoding (IMHO) and the vast majority of users still won't switch to other platforms with higher visual quality - but other (non-pornographic) video platforms generally don't have that luxury.


Video streaming is getting cheaper all the time. Bandwidth costs are dropping every year (substantially in most cases) and bitrates aren't keeping up (the few exceptions probably being AppleTV+ and BBC iPlayer which do 30-50mbit/sec 4K HDR streams).

You can do this for so much cheaper than AWS etc price for bandwidth. You can get 100gigE transit from he.net for list price $4500/month. Add probably the same again for colo + hardware (don't need much hardware these days to saturate 100gigE) and you can probably stream videos to 20,000 concurrent users at 5mbit/sec for ~$10k/month.

Another way would be to use someone like OVH who offers dedicated servers with 10gigE (supposedly 'guaranteed') for about $800/month each list price, without having to bother with colo and ip transit setup.

Obviously this is highly simplified as you will require encoding resource and storage, but again with someone like OVH you'd be able to spin up a lot of cheap boxes to do this. How much this will cost will depend on how many videos you get and how many views per video etc.

So IMO the actual bandwidth is a bit of a non issue. The far bigger issue is getting users to use your platform (marketing is MUCH more expensive than IP transit) and then having advertisers on your site. This is a much harder problem to solve and where the real barrier to entry is.


I worked on YouTube transcoding about 12 years ago. First, the scale is mostly reused - what’s doing transcoding now is doing a different compute job later. Transcoding was also done for most videos only on idle compute. Second, Google had 300k+ caches around the world, in many surprising places (buses, cruise ships) as well as many thousands of other larger but not full data center locations; get the content as close as possible to the user. (I imagine now all transcoding from the mezzanine format is done in real time on an edge GPU for all but the most popular platforms and content). Tl;dr: build out a huge amount of infrastructure to serve ads very quickly and you can piggyback video serving on that at little marginal cost.


YouTube is vertically integrated.

They're not paying a margin to advertising companies because they are the advertising company, they're not paying a margin to datacenters because they are the datacenter.

The data gleaned from YT views helps them to run search and vice versa.


Google has hosts in just about every ISP data center. They sent out their hardware to spam IT departments for decades.

Other CDN like akamai are available, but the cloud business has eaten most colo providers lunches (a "gray" fiber network can be expensive to hold).

Since Google has other services likely reusing the same internal CDN resources, their stream-traffic routing costs could be an order of magnitude lower than traditional providers.

Just a guess, but live feeds tend to be high-latency for a reason... =)


youtube is the modern nanny and that is the most valuable ad space on the planet


I realized this as I saw many of my favorite gaming Youtubers chase the algorithm and change their content to stay relevant. It all eventually converges on content that children would find entertaining. They're the biggest demographic spending the most time looking at ads. That's where following the algorithm leads.


Gaming is perhaps more child-adjacent than other topics?


Google has huge interconnect and exchange deals and dark fiber all over the place. It’s hard to explain just how much free transit and connect Google has to ISPs, and how preferable their deals are for delivery. They are a MASSIVE pipe of extremely low cost Internet.

Since the main cost of video is streaming and the pipe required to deliver that streaming, they have an insane moat built because of it.

That is why. There are a lot of other explanations, but that is really why.


What’s dark fiber?


It's fiber interconnect between sites that is not actively visible to other people who are routing data around the world. Think of it as being pre-layed but not yet turned on/made available.

See: https://en.wikipedia.org/wiki/Dark_fibre


Thank you! I missed the question and appreciate you providing the info.


> Streaming, encoding, and storage demands enormous costs -- especially at scale

When you look at costs per unit, then it gets cheaper at scale, not more expensive.

For streaming, at scale you can afford to do peering yourself, instead of buying bandwidth.

For encoding, at scale you can afford special purpose encoding hardware, instead of using general purpose hardware.

For storage, at scale you can get cheap bulk deals with drive manufacturers.


>I know advertisements are a thing for YT, but is it enough?

Google uses search revenues to subsidize the cost of YT. It's anti-trust, but the government hasn't had the will to prosecute as of yet.

I think YT is finally self sufficient, but probably still relies on google's existing infrastructure to a large degree to keep costs down.


Video streaming is expensive for you. Google is incredible at making things cheap that we all thought were expensive.


>Google is incredible at making things cheap

As Curly would say, "Eh, a big cheapskate, nyuk nyuk !" ;)

It's true they do have a "cheapening" effect, especially over time, but Curly's a knucklehead, I wouldn't want to compete with them on their own terms. That's a big gorilla.

>If tomorrow I want to start a platform

Seems like one approach would be to start out with what you can easily afford to begin with.

Which brings me exactly to the bare bones of storing, encoding, streaming and nothing else.

If nothing else to minimize complexity and cost of getting started.

And to possibly obviate the need for monetization up to a point.

To launch, just pick one fairly popular & accessible format/bit-rate and encode all your raw content (or a test portion of content) the exact same way in advance. Afterward, you're done with that phase and free of any need for real-time encoding. You still need to store and subsequently "outstream" your ready-to-deliver content.

It may actually cost you nothing to store a working copy of your encoded content "library" on your own private server on your own designated premises, especially if you already own the storage devices and there is plenty of unused storage space. There are also alternatives that are not without cost, only you could decide if it was worth money or not.

Naturally you will be limited by the bandwidth and infrastructure at each storage location, as to how many viewers at what resolution you can directly serve at one time, and whether or not the ISP/router can be configured to allow outside access to your server.

If you're going to use 100Mbps of surplus upload bandwidth from a business internet account for instance, and your content was encoded at 1.5Mbps (don't even think about 4K), it may be no additional cost to start serving viewers directly from that server, but you would not be able to serve more than about 50 viewers at one time.

That might get you started (at an appropriate scale) with no cash outlay whatsoever, and if the demand was there beyond a few dozen viewers then you could decide to pass your stream along to a more capable content delivery network of some kind, at various incremental cost.

Alternatively, the whole thing could be outsourced and hosted for world-wide access in a turnkey operation where all you do is supply the content. Cost may be a prohibiting factor, it does seem like there are hosting plans with a free tier but not with enough bandwidth to serve a meaningful number of viewers compared to YT.

Fortunately for YT, when they got started they didn't have competition already showing 4K stuff to compare to.

But if the action you take, has cost within the range of what you can easily bear, you could then afford to deliver a completely superior, ad-free experience for your fortunate few viewers. If you wanted to. Something a multi-billion-dollar company seems to be less and less able to afford. What a position to be in. If the whole thing actually was costing you no cash at all you'd be free to make it seem as free and frictionless as YT, probably more so because it was free from the ground up.

>a platform that is supported with Advert revenues

If you did decide to go this route and were sustainable without ads to begin with, you could very judiciously choose your sponsors to be ones that did not conflict with any feature that is more meaningful to the visitors. You would also be financially ahead beginning with the first ad you decided to run. And you could decide to stop at any time.


Storage, transcoding/encoding, and any other compute operations (rendering, etc) are small compared to data transfer costs.

At the scale of the largest streaming apps (Disney, Netflix, YouTube, etc) you are moving petabytes of data PER DAY. At that size, you have access to significant savings on CDNs, backbone providers, etc. in many cases the discounts will be 90% - I have seen as high as 99% - or higher off the “list” price (which are usually never paid by anyone anyway).

You also tend to own your own backbone and can link in whichever ISP wherever you want for the “final mile.”

Final note, when you have been doing this long enough, you can start shaping the traffic based off previous patterns. I remember an eBay listing years ago for a Netflix local storage device that was meant to store shows at an ISP’s data center.


I suppose you mean these appliances? https://openconnect.netflix.com/en/


I remember a lot of chatter around Google buying up fiber after many companies had aggressively built out networks in the "broadband" scramble, which then lay unused. So-called "dark fiber". 2005 article:

https://www.cnet.com/tech/tech-industry/google-wants-dark-fi...

Google bought Youtube the next year, 2006. This must have been massively useful for moving video around at lower cost rather than the public networks, which were probably built more for normal web traffic. Then peer with local companies who have the clients that are watching videos.


In December 2017, competing video sharing site Vidme was convinced that YouTube was generously subsidized by the rest of Google https://medium.com/vidme/goodbye-for-now-120b40becafa Google certainly errs on the side of not passing along ad revenues to creators, which could be for brand protection, or could be because YouTube (like most streaming TV) is losing money. They tried to fight YT adblockers and that is also consistent with YT being money-losing or barely profitable.


They have what's called Google cache in every country. In some of these countries, and in many places, ISPs are allowed to connect to the cache for free, including at IXs, which are basically internet exchange hubs around the world. Google has its own fiber, so they do not pay anyone for transit in many places. Additionally, because so many people watch YouTube, ISPs are motivated to connect or peer with Google's caches on their own. This arrangement means they also don't need to pay for traffic. So streaming is extremely cost-effective for YouTube, making it hard for others to compete with them on streaming costs.


Bandwidth/Transport costs for live streaming, especially in group conference call scenarios (where N streams needs to be broadcasted to N-1 participants) become prohibitively expensive after about 8 or so participants unless you can offload those bandwidth requirements to other places (e.g., like in a peer-to-peer architecture).

How Zoom manages to do this in a client-server fashion and is still financially solvent is also a question I've had for a while, but like others say, discounts on the transport and peering arrangements will be a key part in making those economics work, as compression and storage are relatively solved problems here.


> Bandwidth/Transport costs for live streaming, especially in group conference call scenarios (where N streams needs to be broadcasted to N-1 participants) become prohibitively expensive after about 8 or so participants unless you can offload those bandwidth requirements to other places (e.g., like in a peer-to-peer architecture).

I doubt they send N-1 full resolution streams to each participant. They probably send only the currently focused stream in full resolution, the unfocused streams in low res, and don't stream any of the non-visible participants.

As you change focus between the streams you can sometimes see as it renders the low res stream briefly until the high res stream is received.


YouTube made $30B in revenue last year. 10% of Alphabet’s revenue.

Google services margin is around 30%. Even if YT is burning money they are likely making $5B in profit. They don’t report profit by income streams.

Google was one of the first internet scale companies with likely 10s of millions of servers and fiber that they own around the world.

They are also doing quite well in Cloud. Not as well as Azure and AWS but that division is growing.

At this point, something like Tiktok with a better sticky algorithm is the way to beat Google.

One of my mistakes was thinking Facebook will be crushed by Google.

Plenty of blind spots when you’re a big ship like Google.


It is expensive, and YouTube also makes a lot of money via ads, subscriptions, partnerships. Whether it is ultimately profitable or not is anyone's guess, since they don't report the numbers publicly.


Is there any way that I can pay YouTube so that the videos I post do not have ads when people view them for free? I think there is a clear answer to this question for video, and it’s ~$60/month.


unless I am very mistaken, that is the default behavior. youtube won't show any ads on the videos you post unless you put them there. so you need to be eligible for monetization (have videos that people watch, proven by some metric), and you need to enable it, and you need to explicitly post ads on your videos to generate revenue for yourself (and youtube)


I wish! Unfortunately, YouTube shows ads to my paying and prospective customers who just want to watch instructional content I create about things. I’ve seen them see the ads when doing user testing and watching over their shoulder. I’ve never setup or got any sort of monetization related to my videos. It makes sense they would show ads by default though, since the bandwidth and hosting costs them something…. It’s just that I would like an option to pay in order to improve the experience for my users.


or you can use AdBlockers -- I bet the infra people will love it :p


The problem isn’t for me. The problem is that customers who want to learn about my product via my videos are forced to watch ads if they don’t have an adblocker.


Can't you host the videos yourself or use another managed service (such as Cloudflare Stream—no relation)?


Yes of course that is possible. However there is significant value add from YouTube, comments, subscribe, SEO (YouTube videos often come up in google searches), works on mobile devices offline, etc


Streaming is expensive, but less than you think. If you run your own servers and don't use the cloud, you can serve a million user a month for a few thousand dollars.


And if you use the cloud?


You can make an MVP:

It doesn't support up or down scaling or recoding. Force the user to only upload supported file formats and codex.

Minimal hosting by forcing viewers to also upload: the WebTorrent protocol is ideal for this.

Why ads at all? Those already exist. It's an ideal now platform to try out micropayments. Look at http://value4value.io for example.

If this starts to roll, only then you scale.


Economies of scale. The variable price for a new start up to stream a TB of video is orders of magnitudes higher than for Google at this point and the ad revenue is massive now too. They're 11% of all video watched on a TV, that's ONLY counting TVS, and advertisers know this now and they are treated seriously.


They improve their stack on multiple fronts

1 - They use their own custom hardware encoders now optimised for their scale (previously they’d use a ton of intel cpu for encoding) This is often not possible for smaller startups to have their own custom hardware, but one could setup their own Xilinx FPGA based hardware accelerated encoding servers, Xillinx Alveo Boards are popular for media acceleration, they support both ffmpeg and gstreamer for acceleration needs. Nvidia Nvenc encoding is decent too but mostly for the most popular encoding types, and honestly an fpga based accelerator is probably better after cpu unless you have gpus lying around.

2 - Much lower bandwith costs, a lot of startups build their streaming applications on top of AWS, but that’s insanely expensive due to the insanely expensive price of AWS itself

As a streaming service you could immediately cut down costs by shifting to ovhcloud, hetzner etc, they often have dedicated gameserver like offerings which are beneficial for streaming usecases too. That itself should allow you to bring down costs by 10x or even more (you have to choose the plan correctly on other tho, ovh, hetzner etc say you have unlimited bandwith on some packages but there is always a catch or they throttle you later, need to be careful about the offering you choose but otherwise a lot better)

3 - Having colocation hosts with your own servers setup and each video encoded at different sizes for different media devices also help only sending the near-right amount of content quality based video size based on device screensize and resolution. This gives a lot of additional cost advantage

4 - Running your own CDN relay service, I’ve run my own cdn service before to cut down on costs aws cdn is quite expensive, but even cheaper cdn services like bunnycdn, etc come at a decent markup, cloudflare cdn can be nearly free yes, but they’ll kick you out the moment you start consuming too much bandwith. So I hosted my own anycast network with geodns and controlled multiple servers across the globe acting as cdn with ansible playbooks for maintenance and control, and monitored them with my own monitoring stack to keep the cdn service alive and kicking, it work decently nice honestly, yes it required me to learn additional stuff, but it made me even more happier, price for sufficient level of bandwith it essentially allowed me to get very cheap rates for cdn service, along with additional cost optimisations at the edge by pre-allocating more cdn servers at peak usage to relay the stream once to the edge server and then broadcasting it from there to all local devices snappily and then bringing down the number of servers, once a major stream is over.

That takes care of cost reduction in encoding, bandwith and services like cdn.

There are a 110 otherways to reduce costs, but those are the primary ones you want to invest in, I excluded storage because you can already get cheap enough storage from a lot of managed providers like wasabi, backblaze, etc moreover streaming typically is less focused on long term storage so short term storage cost optimisation should be last resort, using reliable storage is a better idea.

Here’s an additional talk by mike solomon (youtube engineering) on scalability from 12 yrs ago [5] also the written blog version of it is here [6], a lot of it is still relevant, especially for young startups because bigtech youtube scale decisions are often not rational for young startups who have 1-2 years runway and need to make the most out of vc money or heck even bootstrapped haha (I was bootstrapped so price was a big deal for me, hence had to do a lot of stuff custom).

While making decisions always make sure to ask yourself at each step, “is this move relevant for my business” at youtube scale where their bandwith costs are in billions, hiring a $30 million hardware acceleration team makes sense to them, it does not to your startup, always take tips and hints from large behemoths like youtube, instagram, etc who’s devs love showing how they do a lot of things in talks available across internet but never forget to ask yourself the critical question of if its relevant to me.

A lot of the optimisations i did like custom cdn, managing my own storage, algorithms to allocate edge servers to reduce central load while boosting reliability, low latency and also reducing costs, does require very enthusiastic developers who love to learn things and are decently competent about it too. Each optimisation might give you 5-7% boost, some big ones will give you 200-500% boosts, but even those small changes add up to a big cost reduction overtime, if you get a big bill initially for your streaming service, dont get disheartened, allocate a team that specifically focuses on this and has deep tech knowledge doing 7%-10% optimisation each month across parts of the stack, and compound interest will help you move mountains.

Good luck to everyone here who are working on their streaming startups !

[1](https://www.semianalysis.com/p/google-new-custom-silicon-rep...) [5](https://www.youtube.com/watch?v=G-lGCC4KKok) [6](https://highscalability.com/7-years-of-youtube-scalability-l...)


super useful! I was wanted some pointers like this. Thank you.


I just realised you’re prolly from Kolkata too, pretty cool !

Good luck with your startup !


Yup, from Kolkata !! This is a pleasant surprise :D


I'm actually always amazed by the scale of YouTube and how performant the entire site seems. Even unpopular videos as old as 10 years ago load in an instant - and I imagine this is across all petabytes of videos.


Have you ever playing a video that seems totally unrelated to where you're from or your interests, and it has fewer than 1k views? I bet it might not load right away.


I heard a big reason Google invested so much into chrome was to reduce streaming costs. VP8 I heard saved billions in bandwidth costs…


So bandwidth incurs the high costs. Are we in need of another year 2000 event, where the dot com bubble burst, sending bandwidth costs into he cellar?


Youtube makes more than $ 30B in advertising alone, and that ignores Youtube premium and on-demand content.

Quite sure they are making lots of money.



Mullvad give you practically unlimited bandwidth around the world for €5/month.

And Hetzner will sell you 1TB of hot storage on a 1Gbps connection for €3.49/month.

Taking these consumer level price points, you just need to make more than €4/user month to cover your marginal costs. Quite a demanding, intensive user too.


Can't someone use CDNs to deliver the bulk of the video files?


Power is more important than money to their financial backers. So they are willing to lose money to be the premiere online video site, which gives them tremendous power over what is censored and what is recommended.


Fingers crossed you are starting a competitor to youtube.


There is PeerTube already.


This seems like a valid and legitimate question, but I can't help but think that this question was posted, in a veiled manner, by Google/Youtube/Alphabet as propaganda.


I know a popular website rolled out their own nginx proxy servers to various low cost providers so that they could run their own cdn. It saved them tons of money.


"If something is free, you are the product."


Great question! YouTube's scale definitely plays a big role. With the massive user base, ad revenue can cover the costs. Plus, Google’s infrastructure helps reduce expenses. For a new platform, the economies of scale and established ad networks make a huge difference.


It's not free. They didn't have a business model for most of their existence.

Google, famously shitty at branding (and that's being kind), paid an obscene $1.6B for YouTube because "Google Video" was a monumental failure. Of course it was: Everyone thought it was just a search engine for video, not something that you would contribute to... any more than an individual "contributed" to Google's search results.

So they rewarded an enterprise that had no business plan.


they lose on the unit item, but make it up on the volume!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: