Maybe some are but I can say from personal experience that most of your traffic, if you're smart, comes out of a CDN. The sites themselves are definitely not that interactive which makes them simpler to publish. The pages are almost all cached and that doesn't take much horsepower to serve up. The big video sites have ratings and comments but they are not that big of a deal. People go to watch porn sites to watch porn, not interact. Customer analytics have shown that over and over.
I know of virtually no porn company that handles their own transactions, either. They all go through billing companies that handle things like PCI compliance for them.
Most sites also use a system like NATs to do their affiliate management. You need one that the affiliates trust isn't shaving sign-ups from their account. They tend to trust NATs.
For the data on the backend you just have a SAN to manage the data or you just manage it on a few servers with lots of disks but if you are really at the 100TB mark then you get a SAN I would think. That's what we did. Sure, it's a lot of space but they're big files so managing them isn't that hard.
I'd say the largest issue that a company like YouPorn will have is the amount of data in their working set for a CDN. CDNs generally charge you for the size of your working set that they keep at each POP in their network so you want to keep it as small as possible.
At the end of the day running a large porn network is more about integrating the myriad of partners you need to run the network. The infrastructure is interesting for a while but once you have it working the business of doing deals and handling promotions and figuring out why integration point A isn't working like it should is what keeps you busy.
It's really quite easy to serve large volumes of porn. The dataset doesn't change often and 90% of your working set is the first couple pages of content. Back in 2007 when I was in the business, there was only one CDN that would touch porn (LimeLight) and they were absurdly expensive. Today there are hundreds of porn-friendly CDNs and they charge 1/10th the price (no exaggeration).
Storing a couple hundred terabytes of porn is not expensive or complicated.
Sites like YouPorn don't authenticate their content. Most of the high-volume web page content is static. Even then, you're looking at just a few page views before the user spends 5 minutes watching a video that streams from the CDN.
Payment transactions are handled by third parties, and usually abstracted through third-party affiliate software like NATS. Which, BTW, is a piece of junk and the one part of our system which we had trouble scaling.
Big bandwidth numbers sound impressive but the truth is running even a mildly successful social network with heavily personalized pages is ten times harder than running even the largest porn sites.
Serious misconception: it's just a couple of boxes and two dudes, nothing more. It runs itself. CDN FTW!
And the only thing a CDN will help you with in this case, is offloading CSS, images and JS. You can't put that much streaming content up unless you host it yourself or want to spend every penny you have.
This is utter nonsense.
It is the nature of YouPorn's UX that the vast majority of requests are for the first couple pages of data. You don't have to put all the content on the CDN, only the part that represents 80-90% of your traffic. If you have a pull-based CDN you don't even need to plan it; the CDN automatically populates itself with what it considers a reasonable working set.
Updated: I should add, I designed Kink.com's modern porn-serving architecture back in 2007. Prior, it ran off of 20 apache httpd boxes at 365 Main. Now it runs off of a handful of appservers, a couple MySQL boxes, and a lot of CDN capacity... on vastly more traffic. Believe me when I say there's no reason that the bulk of YouPorn's traffic couldn't be served off of one or more CDNs.
This domain resolves out to:
Which is hosted by a CDN company called SwiftWill.
Besides, the article you referenced says that they are using nginx to act as an external engine for static content such as css, js, etc.
I'd imagine that paying extra for shaving a few 10s of milliseconds off latency might not really be much of a benefit in this type of a business, they are not doing VoIP phone calls. I'd imagine having fat pipes on a decent tier is #1 here.
Distributed CDNs are like the RAID of content serving. Each node can be simpler, cheaper.
Another bonus of using CDNs is that you're in a great negotiating position. If you're serving 80% of traffic through one and 20% through another, you can flip it around the moment one offers to shave a percent or two off the price. I've had people in the sales department of the formerly-80% side notice the traffic drop and suddenly call up with counteroffers. In contrast, getting someone to draw fiber cables across the datacenter usually requires a lot of onetime expense and long-term contracts.
I'd be really curious what kind of CDN deal they're getting.
At regular CDN rates you're looking at ballpark $150k/month for that kind of traffic (rather optimistic extrapolation from my own rates...).
Also the figures remain mind-boggling regardless how you slice them. 900T/day breaks down to a healthy ~80 GBit/s average. That's more than most mid-sized datacenter uplinks (plus conveniently ignoring any bell curves they may have).
>Software-wise, most large porn sites will use a very-high-throughput database such as Redis to store and serve videos
No video comes out of a database. Mostly because it can't, but also because it makes no sense to make it come out of a DB.
Why can't you store video in a database? YouPorn says that Redis is its primary data store.
Interested? I could probably make one when I get some time, though my knowledge is currently limited to Chrome.
I for one much prefer the uncensored HN so I can keep up with the latest salacious silicon valley e-gossip.
- Started with single processor Sun SPARCs, which were later replaced by a dual and quad core ones (went from 32 to 64 bit early due to file size limitations), along with a collection of Linux boxes from Penguin Computing (remember them?) Most were in the mid-hundreds MHz range, topping out at a blazing 1GHz by the end.
- Apache, mod_perl, MySQL (postgres for one system), later replaced some of the front end code with PHP.
- No CDNs! Akamai was more or less the only game in town and was still unproven/considered too expensive at the time so we did traditional multiple-host setups (things like image1, image2, along with RRDNS for some other bits)
- No really good, well-integrated turnkey billing systems. The ones at the time often took too large a chunk of the revenue or were designed for low volume/were very inflexible. Custom billing code to directly talk to charge processors (we spoke a custom protocol right over UDP to ours. We had a dedicated line to the processor, too IIRC. Every time a transaction was processed, you got to hear a classic modem-like noise. The hardware on our side was connected to a text-terminal (Monochrome, orange text.)
- In-browser video started out using NPH tricks(!), later used a custom Java applet. Most, however, was served directly to separate client applications. In the days before the YouTubes and Vimeos came along, you had to yes, have your customers download 3rd party software and then provide support for it.
- RAID 1 under Linux at the time had some ugly bugs which would partially corrupt one of the mirrors, requiring weekly manual rebuilds. I had a script monitoring for corruption which would send an email to this crazy old device called a "pager." The corruption always seemed to occur 15 minutes after I fell asleep, too.
Anyhow, interesting to see just how far things have come. Impressive numbers.
So basically you're trying to sell a $5 cup of coffee to a guy while someone next to you is giving away that same coffee for free + a bagel.
Not to mention most porn website owners are cheap... Doing any type of work in that realm is a pain.
One of the big things is, normally you see a hot girl and you're wondering how she's in bed... After working in porn for a year plus, you're wondering how much you could make off selling her on your site.
It runs on a JBoss/Hibernate stack.
* Not including NATS, which is an abomination.
Hibernate's clustered 2nd-level cache is still pretty magical. It means that the vast majority of web page requests are serviced out of RAM with zero database hits - without writing any special caching code. And it's transactional. For a certain set of scaling problems, this feature is golden.
Hibernate caching can surely be helpful, but you make it sound like a silver bullet and like there's no other approaches. There are plenty without tying yourself to J2EE hell. A little bloom filtering with memcached or redis can work wonders and might be more predictable than an opaque caching layer that can make you very unhappy once your working set exceeds a "magical" threshold (been there, with hibernate).
My point is that once you end up with a certain level of sophistication and scaling, you create your own hell. Hibernate is fairly refined technology for dealing with this exact situation. Homebrewing your solution is like wandering around in the desert - if you're smart, you'll make it out, if not, you'll be the next Friendster.
And FWIW, Java EE is not hell if you avoid the overengineered pieces like JSF.
Also, you try to stop saying the word hard after working in the adult business....everyone snickers when they know where you've worked. ;-)
Netflix, Hulu, Apple, Flickr, Dropbox, Steam...
I find it disappointing that this list (and the one about bandwidth saying only YouTube or Hulu comes close to Xvideos) are incomplete but they aren't really presented as such.
One of the difficult parts of being in the porn business, aside from the difficulty in getting a bank that will actually process your transactions, is that the way traffic is driven to your business changes annually. The affiliates bear the brunt of that, though, but you pay them about 50% of the revenue they bring in for that so it is costly. But you're always trying to find ways to bring in traffic yourself since you don't like giving up 50% of your revenue to an affiliate.
If you use PHP the way it's meant to be used, you are not gonna have any surprise, and it'll run faster than the alternatives (or close too), for lower development time, as well as easiness in finding developers.
Also, the article is a bit off on some points, a website like Pornhub (100Million+ pageviews/day), is on the most standard stack you could imagine: PHP, Apache, MySQL, Memcached/Redis. Varnish get mentioned a lot, but when I was working there (not so long ago), it was not in use, and as far as I know Youporn might be the only one relying on it right now.
If you know what you are doing with PHP, you will have no surprise, no performance issue, and maintenance will be trivial. But sadly I have to admit few PHP developers actually use PHP the way it should be.
I wouldn't be surprised if they're using PHP for what it was originally meant to do (add a thin layer of dynamic-ness to straight html) and precomputing all of the data it uses in something else.
I can think of some special cases where PHP would be better, especially in a porn site's case -- the most common clicks are front-page links and there are probably a bunch of common keywords and clicks to links off the first page of those searches, which means that caching whole pages is probably economical. As far as I know, both Perl and PHP are identically suited to talking with upstream caching proxies, but PHP might have felt more natural for day-to-day feature development.
I presume there's a lot of legacy code there.
For anyone with a handful of servers PHP is acceptable as the additional cost of 2-3 servers is not really that high.
What's it used for in the porn industry, do you know? I didn't know there'd much need for IMing
That should be more like 1.6% if those numbers are correct...
Still an absurd amount of traffic.
>It’s probably not unrealistic to say that porn makes up 30% of the total data transferred across the internet.< If this is the case, is the online porn industry held up as models of high tech and innovation? I thought I heard somewhere investors and VC's in particular, shy away from porn...
"It is a truth universally acknowledged, that a single man in possession of a good fortune must be in want of a wife."
It seems like there could be several intended meanings behind that.
Now I'm really looking forward to the creative things to be done to Google's new "Glass" product.
[...] when you factor in what those porn surfers are actually doing [...]
Ahem, I'd rather not. ;-)
I'm deeply sorry, but couldn't resist...