
Just how big are porn sites? - mrsebastian
http://www.extremetech.com/computing/123929-just-how-big-are-porn-sites
======
SystemOut
_sigh_ The article makes it sound like these sites are doing everything
themselves including pushing the bits.

Maybe some are but I can say from personal experience that most of your
traffic, if you're smart, comes out of a CDN. The sites themselves are
definitely not that interactive which makes them simpler to publish. The pages
are almost all cached and that doesn't take much horsepower to serve up. The
big video sites have ratings and comments but they are not that big of a deal.
People go to watch porn sites to watch porn, not interact. Customer analytics
have shown that over and over.

I know of virtually no porn company that handles their own transactions,
either. They all go through billing companies that handle things like PCI
compliance for them.

Most sites also use a system like NATs to do their affiliate management. You
need one that the affiliates trust isn't shaving sign-ups from their account.
They tend to trust NATs.

For the data on the backend you just have a SAN to manage the data or you just
manage it on a few servers with lots of disks but if you are really at the
100TB mark then you get a SAN I would think. That's what we did. Sure, it's a
lot of space but they're big files so managing them isn't that hard.

I'd say the largest issue that a company like YouPorn will have is the amount
of data in their working set for a CDN. CDNs generally charge you for the size
of your working set that they keep at each POP in their network so you want to
keep it as small as possible.

At the end of the day running a large porn network is more about integrating
the myriad of partners you need to run the network. The infrastructure is
interesting for a while but once you have it working the business of doing
deals and handling promotions and figuring out why integration point A isn't
working like it should is what keeps you busy.

~~~
tpsreport
For me, this was the dead giveaway that the person writing the article had no
idea what he was writing about:

>Software-wise, most large porn sites will use a very-high-throughput database
such as Redis to store and serve videos

No video comes out of a database. Mostly because it can't, but also because it
makes no sense to make it come out of a DB.

~~~
mrsebastian
Redis can store binary data -- and YouPorn's Redis cluster apparently handles
300K queries per second. Those queries obviously aren't all page views (the
site only peaks at 4000 PVs per second).

Why can't you store video in a database? YouPorn says that Redis is its
primary data store.

------
benologist
I wish HN would cap how many times ExtremeTech, ITWorld, MacObserver etc can
be submitted.... by employees of those sites.

~~~
wyclif
Add TechCrunch, Mashable, TheNextWeb, and PandoDaily to that list. I'm burned
out and fatigued by the middlemen. Seriously, _just link to the fucking
content._

~~~
Groxx
Sounds like a job for an extension, honestly. Something to remove entries
which link to URLs which match a regex.

Interested? I could probably make one when I get some time, though my
knowledge is currently limited to Chrome.

~~~
wmf
Spammers (er, social media experts) don't use anti-spam browser extensions.
Let's fix this on the server side.

~~~
corin_
Clearly the plugin he's talking about is for people like you to hide these
submissions from you - not in the hope that spammers will install it
themselves and find themselves unable to submit.

~~~
wmf
Ah, I didn't think about it that way. Then you're back to Usenet killfiles;
just hide the bad stuff (driving away newcomers who haven't learned to
killfile yet) instead of getting rid of it.

------
digisth
What's really interesting to me personally is how porn continues to stay ahead
of, or at least at/near the front of, the pack technology/performance-wise.
Back in the late 1990s and early 2000s I co-ran the technology department at a
very large network of high-traffic adult web sites (I'm not sure exactly where
we would have been in the rankings, but I'd take a wild guess to say it was
top 20, if not top 10.) We were doing streaming video (in Real, QT, and WM) at
a time when it was still images as the default. Reading comments from
SystemOut and stickfigure reminded me of just how (obviously) primitive
everything seems compared to today, but we still made it work. Some broad
notes from the period:

\- Started with single processor Sun SPARCs, which were later replaced by a
dual and quad core ones (went from 32 to 64 bit early due to file size
limitations), along with a collection of Linux boxes from Penguin Computing
(remember them?) Most were in the mid-hundreds MHz range, topping out at a
blazing 1GHz by the end.

\- Apache, mod_perl, MySQL (postgres for one system), later replaced some of
the front end code with PHP.

\- No CDNs! Akamai was more or less the only game in town and was still
unproven/considered too expensive at the time so we did traditional multiple-
host setups (things like image1, image2, along with RRDNS for some other bits)

\- No really good, well-integrated turnkey billing systems. The ones at the
time often took too large a chunk of the revenue or were designed for low
volume/were very inflexible. Custom billing code to directly talk to charge
processors (we spoke a custom protocol right over UDP to ours. We had a
dedicated line to the processor, too IIRC. Every time a transaction was
processed, you got to hear a classic modem-like noise. The hardware on our
side was connected to a text-terminal (Monochrome, orange text.)

\- In-browser video started out using NPH tricks(!), later used a custom Java
applet. Most, however, was served directly to separate client applications. In
the days before the YouTubes and Vimeos came along, you had to yes, have your
customers download 3rd party software and then provide support for it.

\- RAID 1 under Linux at the time had some ugly bugs which would partially
corrupt one of the mirrors, requiring weekly manual rebuilds. I had a script
monitoring for corruption which would send an email to this crazy old device
called a "pager." The corruption always seemed to occur 15 minutes after I
fell asleep, too.

Anyhow, interesting to see just how far things have come. Impressive numbers.

------
jetti
This and the article about YouPorn's stack make me really want to go work for
these places. I'm sure that the day to day challenges would be fascinating and
it would be a thrilling technical experience.

~~~
dude_abides
Yep! I wish it was more socially acceptable :)

~~~
jetti
Seriously! I think it wouldn't be hard getting a job after wards, but the real
challenge would be explaining it to the missus! :)

~~~
SystemOut
Pron doesn't have that much of a negative stigma attached to it when it comes
to the tech industry. At least none that I've experienced. I do admit that I
use the holding company's name on my resume/linked in profile but I always
disclose pretty early on what it was just to make sure. If they have a problem
with it I don't want work for them anyways.

Also, you try to stop saying the word hard after working in the adult
business....everyone snickers when they know where you've worked. ;-)

~~~
jetti
It isn't the tech industry's view I'd be worried about. It would just get
tiresome having to explain that while I work at a porn site doesn't mean I
watch porn all day when telling others what I do.

~~~
thret
I'd be worried about being the next Saeed Malekpour.

------
furyofantares
> While it obviously varies from site to site, most adult sites will probably
> store in the region of 50 to 200 terabytes of porn. This is quite a lot for
> a website (only something like Google, Facebook, Blogger, or YouTube would
> store more data),

Netflix, Hulu, Apple, Flickr, Dropbox, Steam...

I find it disappointing that this list (and the one about bandwidth saying
only YouTube or Hulu comes close to Xvideos) are incomplete but they aren't
really presented as such.

~~~
TazeTSchnitzel
Steam? Are you sure their catalogue and save files take up that much space?

------
pgrote
Does anyone have real world experience monetizing porn sites on a such a large
scale? I am not familiar with the business aspects of it. Is it driven through
affiliate? Direct advertising? Something else?

~~~
mcphilip
The guy who made videobox did an AMA on reddit not too long ago. You may find
some of it relevant to your questions.

[http://www.reddit.com/r/IAmA/comments/gt0l1/i_have_a_cs_degr...](http://www.reddit.com/r/IAmA/comments/gt0l1/i_have_a_cs_degree_from_harvard_and_i_run_a_porn/)

~~~
bdreadz
Guy that runs 4tube as well as a bunch of other high traffic sites did an ama
as well.
[http://www.reddit.com/r/IAmA/comments/htn7x/i_run_porntubeco...](http://www.reddit.com/r/IAmA/comments/htn7x/i_run_porntubecom_4tubecom_among_others_ama/)

------
medinism
Is anyone at all surprised their tech-stack is php? is it because of legacy or
is it because any sensible person moving petabytes of data would use? or does
it even matter

~~~
rmc
Lots of massively popular sites use PHP, e.g. Facebook, Wikipedia.

I presume there's a lot of legacy code there.

~~~
ihsw
It was recently (2011) rewritten on top of Symfony2 -- <http://symfony.com>
\-- and more importantly a modern, documented, and stable framework. It was
likely done because finding quality PHP programmers is easier than finding
quality Perl programmers.

~~~
culturestate
When you say "it was rewritten...," which are you referring to?

------
commiebob
>To put that 800Gbps figure into perspective, the internet only handles around
half an exabyte of traffic every day, which equates to around 50Tbps — in
other words, a single porn site accounts for almost 25% of the internet’s
total traffic.

That should be more like 1.6% if those numbers are correct...

Still an absurd amount of traffic.

~~~
mrsebastian
Sorry, it should've read 2%. Shift-5 = % -- damn my sticky shift key (I did a
lot of hands-on research in writing this story...)

------
caycep
i heard it offhand from an acquaintance who did a google internship that
google has to downrank porn sites by several orders of magnitude, otherwise
all that would ever come up in google searches would be porn...

~~~
luser001
I call bs. Everything I've seen about the link structure of the Internet
indicates that this is false.

~~~
patrickaljord
You are correct, Matt Cutts explains it here:

[http://www.youtube.com/watch?feature=player_embedded&v=D...](http://www.youtube.com/watch?feature=player_embedded&v=Dn87YqI73mY)

------
yaix
But how do they make money? They give all the naked girls and guys away for
free (and the users probably can achieve the purpose of their visit by means
of that free content alone).

------
fourmii
I love it how the article ends with >The Internet really is for porn<

>It’s probably not unrealistic to say that porn makes up 30% of the total data
transferred across the internet.< If this is the case, is the online porn
industry held up as models of high tech and innovation? I thought I heard
somewhere investors and VC's in particular, shy away from porn...

~~~
feral
Its interesting that the article starts with a riff on the opening line of
Pride and Prejudice.

"It is a truth universally acknowledged, that a single man in possession of a
good fortune must be in want of a wife."

It seems like there could be several intended meanings behind that.

------
kanchax
On a non-tech note, I read on the internet (yes, aldaily, do not have the link
right now) that 1/3 of Casanova's autobiography was about his affairs with
women. It came to me when the article said that a third of the internet was
dedicated to porn. Conclusion: A third of our lives are dedicated to sex.

------
silentscope
"While it’s difficult domain to penetrate..."

Very punny.

------
NameNickHN
From the article:

 _[...] when you factor in what those porn surfers are actually doing [...]_

Ahem, I'd rather not. ;-)

------
zerostar07
What would the traffic be if there these sites were not blocked in a large
part of the world?

------
jsherry
"While it’s difficult domain to penetrate"

I'm deeply sorry, but couldn't resist...

~~~
omgsean
I think that was intentional.

