
Why did OKCupid write their own web server? - mhp
http://answers.onstartups.com/questions/21323/why-would-okcupid-write-their-own-web-server
======
tseabrooks
My only real complaint about the nature of HN is the (as perceived by me)
strong anti-C++ bent, like the ones in these comments, from what seems like
people who don't have lots of experience using it in the work place. Am I
imagining this anti-C++ bias? Am I wrong and all the people bashing C++ have
tons of C++ experience?

I'll be the first to point out the numerous flaws C++ has but it just feels
like folly to make fun of the ugly chick at the party without realizing
everyone at the party is covered in warts... (The is a metaphor for all
programming languages having problems)

~~~
tptacek
(I'm a recovering C++ dev).

C++ has problems unique to C++: a deceptive illusion of abstraction (what
Spolsky would call "leaky" abstractions) makes a bunch of idioms in the
language dangerous, including virtually all "smart" pointers, iterators,
exceptions, allocators, and arrays. It is uniquely difficult to write reliable
code in C++.

Add to that the superficial but more practical and common complaints against
C++: the ghastly compile times, the header file dependency hell that forces
every class into a kabuki dance of "pImpls" and nested classes, the error
messages rendered in ancient Sumerian... you _know_ I could go on, but you get
the idea.

C++ is just not a very good language. C is a fine language. If you are
building a system for which C's abstractions are inadequate --- and I'll
stipulate that such systems exist --- you're better off with C and a very good
glue language (Lisp and Lua are two good, popular choices) than you are with a
uniform C++ implementation.

~~~
TillE
> C++ is just not a very good language. C is a fine language.

This is the attitude that I really, truly do not get.

If I want a linked list or any other common data structure in C++, I include
the appropriate STL header and declare an object with the template syntax.
It's two easy lines of code. If I want a data structure in C, I have to write
it myself. From scratch. Every time.

C++ is a powerful high-level language. C is a low-level language where
everything is explicit; I know this is "elegant" to many programmers, but
really, I'm just trying to _get things done_ , and C++ has served me far
better in that practical role than C ever could.

I don't love C++; it has tons of flaws (many of which disappear if you use
shared_ptr for everything). But I _loathe_ the tedium of working in C.

~~~
nostrademons
The comparison isn't between straight-C vs. straight C++. It's between C with
a good glue language (say, Python or Lua) vs. straight C++.

If I want a linked list in Python, I do:

    
    
      frobs = ['foo', 'bar', 'baz'].
    

The equivalent C++ is:

    
    
      vector<string> frobs;
      frobs.push_back("foo");
      frobs.push_back("bar");
      frobs.push_back("baz");
    

The reason people like these multi-language solutions is that Python or Lua is
_much_ , much more productive than C++ (as in, many times more productive),
and yet the most programs have only a very small core that needs to run fast.
You can write that small core in heavily-optimized C, making it quite a bit
faster than kludged-together C++, and then write everything else in Python.
And the total complexity of the system is _still_ much less than if you'd just
used C++ to start.

~~~
starpilot
Noob here. Where does ObjC fit into all this? I know it's primarily used in
Mac development, but is there any language-specific reason it's excluded from
these debates?

~~~
wisty
C is a good statically-typed language, but it's a bit basic.

ObjC is a good statically-typed language.

C++ is half a dozen good statically-typed languages, all fighting for
supremacy like a ham actor's "multiple-personality-disorder" shtick.

~~~
program
ObjC isn't statically-typed. Static typing informations only allows the
compiler to warn the programmer about type mismatch. From the official docs:

"Statically typed objects have the same internal data structures as objects
declared to be of type id. The type doesn't affect the object; it affects only
the amount of information given to the compiler about the object and the
amount of information available to those reading the source code."

------
ErrantX
The questioned is phrased very much in the now e.g. _Isn't the technology
stack basically a commodity at this point?_ and

And the answer highlights why you should never retrospectively "judge" design
choices several years after the fact.

------
cheez
I've had an opportunity to chat with some people tangentially related to OKC
and I think that they have definitely done some very cool stuff. There are
some things that the OKWS architecture does very well. As I understand it,
there is a bit of "Rails envy" but they seem to copy good ideas very quickly.
That being said, I think that if they were to start again, they would try and
use commodity technology.

But, these guys are really fucking brilliant and productive. Immensely... I
feel like a chump in comparison.

~~~
rubashov
Well these days writing an http server in C++ is about 30 lines of code using
POCO or boost.asio. If your business model was "serve a shit-ton of dynamic
requests super cheap" it might make just as much sense as ever to build the
whole thing in C++ embedded directly in a C++ webserver.

~~~
cheez
If only writing a web app was that simple. ASIO is simply a socket library.

~~~
rubashov
It comes with an http server implementation. POCO has a more full featured
one.

~~~
cheez
It does? Wow, I haven't looked at it in a while then... Can you point me to a
link?

------
ibejoeb
We've actually had this conversation before. It sounds ridiculous, but the
story is actually a little different: OKWS was built prior to OKCupid, and
they inherited it. See <http://pdos.csail.mit.edu/~max/docs/okws.pdf>.

~~~
mhp
That doesn't sound like the answer the founder of OKCupid posted here:
[http://answers.onstartups.com/questions/21323/why-would-
okcu...](http://answers.onstartups.com/questions/21323/why-would-okcupid-
write-their-own-web-server/21377#21377)

~~~
nostrademons
I could swear I remember, back when they were TheSpark, them claiming that the
TheSpark ran on a custom webserver written in C++. That would be consistent
with the idea that OKWS was done as Max's Harvard thesis project and then
inherited by TheSpark and OKCupid later.

There are a variety of reasons why startup founders may want to bend the truth
with their public statements.

------
gdulli
OKCupid is the only web site I can think of that has a regular, not
occasional, pattern of being very slow and simply not responding to an unusual
number of requests. Hitting F5 to reload a page just to get it to show up
instead of the Firefox server unavailable message is a regular part of my
usage of the site.

Even though all sites have bugs, broken links, what have you, I don't know any
other site that's given me such an expectation that it will be unresponsive
for a significant number of page views for any given session over a long term
period. Even the sites that started development circa 2003.

~~~
maxtaco
We rarely get complaints such as this one. Would you mind helping me debug it?

~~~
LaGrange
I get similar issues (long page loads, extremely large lag in chat and
questions), while via 3G in Poland. It's not solely 3G's fault, because many
other sites (including FB chat) work ok, but it's clearly triggered by a
specific environment.

With landline connections OKC works very well :-)

~~~
rdl
Every time I had problems with okc it was due to broken transparent caching on
wireless, cellular, or satellite networks.

------
dustingetz
_"general rpc servers for solving specific problems using in memory data
structures (e.g., who qualifies for a match search given dozens of constraints
and millions of users; what your match score is with 10,000 qualifying people,
given you've all answered hundreds of different questions each on average) ...
Great tech is available now, serving is cheaper, and you probably don't have
the computational workload OkCupid does."_

I know people who have used OKC before. OKC users in my social class (male,
white, educated) ignore the match percentages, because the SNR is really low.
They just plow through all the search results of people to find good pictures
and interesting profiles.

So, I'd speculate that match-percentages are a marketing thing, and that they
know they made a weak business decision which required lots of computation and
now they're stuck with it.

I'm probably wrong. Maybe the long-tale users pay attention to match-
percentage.

~~~
neild
I signed up for OKC, lurked for a while, and then sent a message to the person
at the top of my match list.

We're still together, five years later.

I'm probably an outlier, but hey--match percentage works some of the time!

------
tobias3
If you have a small team and everybody has much C++ experience you can pull
this off. Otherwise one person who doens't have the discipline to do the
manual memory management right can crash the whole server. Don't try it at
home ;) use an VM-language instead which can recover from such errors.

(I wrote some C++ webapps myself)

~~~
maxtaco
Manual memory management is for the birds. Use something like Boost's shared
pointers and you never need to worry about it.

~~~
Peaker
That's not entirely honest.

A) You still have to worry about it when interfacing with libraries that use
plain pointers

B) Shared pointers incur runtime penalties (larger data pushes things off the
cache, spurious inc/dec-refs messing said cache. If you don't care about that,
why do you use a language like C++ in the first place?

C) Reference counting is a poor form of automatic memory management, you still
have to worry about cycles, and use weak references or such to break the
cycles.

------
Johngibb
I'd think they'd begin transitioning to a higher level language now that there
are many options available. It's gotta be a burden at this point to be (1)
maintaining their own web server and (2) developing new features in C++. I'd
way rather use ruby/python (or even .Net) and fall to C++ for the really
performance intensive stuff.

(Disclaimer: I interviewed @ OKCupid in 2007)

------
jacques_chester
I remember this coming up at reddit a few months ago[1]. At the time I
downloaded and read the paper on the design[2]. It all made sense to me
because my own thinking had been heading in the same direction.

OKWS is less _a_ web server than it is an _architecture_ of servers. It's the
difference between sendmail and qmail/postfix.

It has nice security and performance properties because each service is run as
a separate user, with a separate process. Logging is handled by an independent
daemon. Request demultiplexing is handled by a simple daemon that binds to
port 80. Actual HTTP parsing is handled by a shared library that services link
to.

[1]
[http://www.reddit.com/r/programming/comments/exkk3/ok_webser...](http://www.reddit.com/r/programming/comments/exkk3/ok_webserver_from_ok_cupid_the_devs_at_a_dating/)
[2] <http://pdos.csail.mit.edu/~max/docs/okws.pdf>

------
itsnotvalid
Any languages could possibility be made to work. They did make it to work with
things like SFSLite[1] which looked like coroutines or fibers (actually
Stratified JavaScript, but that is not something common) would solve for async
callbacks. However, those are something that is acting as extensions to the
core language.

One of the biggest problem C++ has is the fact that core language has too many
stuff but still lacking things that people really want to use. It's certainly
workable, and the results are fast since it is compiled very well. However
'workable' does not mean 'a pleasure to work'.

[1]: <http://www.okws.org/doku.php?id=sfslite:tame2:tutorial> ,
<http://www.okws.org/doku.php?id=sfslite>

------
guelo
I imagine one big downside of this custom stack is that they will have a hell
of a time doing any sort of integration with Match.com.

Hiring and training is also probably more difficult, though that has got be a
huge boon to OKCupid engineers since Match cannot afford to lose them.

------
mjs
Ha, it gets better: the web server "was partially funded by the DARPA
Composable High Assurance Trusted Systems program."

<http://www.okws.org/doku.php?id=okws:sponsorship>

~~~
mkjones
Wait, the OKWS web site runs on apache?

~~~
starpilot
That was explained in a FAQ previously on the site. OKWS is meant for serving
highly dynamic content, while they feel Apache is better at static and more
modest dynamic content. Their original philosophy was that segregating dynamic
and static serving improved security, stability, and speed, though that's
mostly been nullified by cheaper hosting.

