

OKWS, the Ok Web Server (behind OK Cupid) - whalesalad
http://www.okws.org/doku.php?id=okws

======
aston
I worked at OkCupid for a bit and got to see OKWS in its full glory. Though it
sounds kinda daunting from the documentation, it's actually reasonably
workable. I was able to get a rough app up that talked to a bunch of different
internal services via async RPCs within my first couple days (the first
compile took forever, though...).

One of the coolest parts of the stack was that every function call you made
was non-blocking unless you wrapped it with BLOCK{...}, and within a BLOCK
code that was written sequentially got turned into callback-based async calls
behind the scenes.

The other cool part of the stack was getting core dumps on segfaults. Heh.

------
jrockway
This is a good design. For a project I've been working on recently, I've
organized it as a set of loosely-coupled async applications. That way, I can
run them all in one process (with ZMQ inproc messaging), or run them all in
separate processes (with TCP or Unix sockets), depending on my needs for
reliability (more uptime because I can deploy-and-restart one tiny component
at a time) or memory usage (all in one address space = more sharing). It's
worked quite well; I love being able to stop one component, deploy a new
version, and restart it without affecting the rest of the app or losing data.
It makes "oh, this process seems to be behaving oddly right now" much easier
to work with -- SIGTERM the faulty component, watch it return to normal, debug
without stress, commit fix, run tests, deploy new version, SIGTERM again,
problem fixed. Relaxation driven development, I'll call it.

But what I don't see is how this applies to OkCupid and web servers. To me,
their website looks 99% static, with a web form that sends messages. The
"secret sauce" is all offline background processes that builds
recommendations, sends email, etc.; certainly nothing to do with HTTP. And
once your website is nearly static, you stick a cache like Varnish in front of
it, and you can serve pages as fast as your system bus can stream from memory
to the network card. All without writing your own web server, and without
using C++.

That said, it is the nicest C++ app I have ever used.

~~~
there
_To me, their website looks 99% static, with a web form that sends messages._

have you used okcupid?

~~~
jrockway
Yes. It's pretty much the same every time I log in.

------
_ques
Personally, I feel this is the Achilles' heel of an otherwise splendid
website. Their NIH syndrome forces them to implement everything in C++ (
<http://www.okcupid.com/faq> : 200K lines of C++ ). I don't know how to be
agile with a system like that, and it probably explains the rate of feature
growth compared to other social networks (FB, Friendfeed, Quora)

~~~
slig
> I don't know how to be agile with a system like that

I think that they're implementing a lot of stuff in client side javascript
because of that. A few days ago a redditor discovered that there're a lot of
parameters of each profile right there on the page source, like the number of
messages received today, last week and some attractiveness scale.

~~~
jemfinch
Where else would those parameters be than in the page source?

~~~
slig
I wasn't clear: I meant, if you visit other people's profile, you can see that
info there, that is used to display whether the person replies frequently or
not.

------
zokier
"OKWS allows developers to program their Web applications in C++"

An application server for C++. Well, this makes it kinda interesting. Although
lines like

    
    
      okclnt_t *make_newclnt (ptr<ahttpcon> x); 
      okclnt_simple_t (ptr<ahttpcon> x, oksrvc_simple_t *o) : okclnt2_t (x, o), ok_simple (o) {} 

do not look especially appealing.

~~~
megrimlock
Is your objection just to C++ syntax in general (in which case: total
agreement) or something particular about OKWS? It seems pretty self-
explanatory, as far as C++ goes:

    
    
      okclnt_t *make_newclnt (ptr<ahttpcon> x);
    

This declares a function that makes a new client from an http connection. It
returns a raw pointer, so I presume the caller is responsible for the client
lifetime. Whereas the connection argument is some ptr template, which I expect
is a ref-counted or otherwise managed pointer, which suggests the client will
keep the connection object alive. This is copying the ptr value, which if this
is a ref-counting ptr would do an unnecessary extra ref-count during the
function call, so they might want to pass it as a const-ref instead.

    
    
      okclnt_simple_t (ptr<ahttpcon> x, oksrvc_simple_t *o) : okclnt2_t (x, o), ok_simple (o) {}
    

This is a constructor to make a simple client from an http connection and a
simple service. The only thing unusual is that a _t suffix usually suggests a
typedef, and I don't think you can use a type alias to define a constructor.

I haven't looked at the API or done any web programming; I thought it was
interesting what I could tell from this snippet regardless.

~~~
zokier
I think my biggest gripe is that naming scheme. Is it really worth
abbreviating "client" to "clnt" or "service" to "srvc". And single letter
variable names, seriously?

~~~
sedachv
It reads a lot like those medieval manuscripts back when the European
languages didn't have so many vowels in them. Maybe the OKCupid devs should
take another page from the monks and get rid of whitespace as well. After all
<strike>parchment</strike> horizontal screen space is a precious commodity.

------
ibejoeb
I'm not here to stifle innovation, but does a company like this really need to
invent infrastructure technology? I've served some pretty high traffic stuff--
secure web services included--and the current web server offerings have always
gotten me there. Is it really cheaper to invent this stuff rather than buy an
extra server?

~~~
starpilot
OKWS came before OkCupid. It was developed by Max Krohn
(<http://www.okws.org/doku.php?id=okws:publications>) of MIT/Harvard with
DARPA funding (<http://www.okws.org/doku.php?id=okws:sponsorship>).

~~~
hackinthebochs
This comment should really be at the top. It makes 99% of the discussion on
this moot.

------
max_okcupid
I thought I'd answer a few questions raised below:

(1) I've recently made a few updates to the wiki, adding a pointer to the new
subversion repository: svn://svn3.okws.org/okws2/devel/3.1. I'm still actively
checking in fixes and smallish new features, but there won't be any big
changes over the next few months.

(2) OKWS and all services written for OKWS are single-threaded, non-blocking
asynchronous processes. All database calls still go through RPC-to-SQL
translators, as mentioned in the paper. File system I/O goes through
libasync's aiod system: a small blocking helper process does the file I/O, and
the main process communicates with the helper over asynchronous RPC.

(3) The documentation is horrible, I realize. I never quite find the time to
do a good job of updating the wiki, or fully documenting what's there. If
anyone wants to help me on that, please contact me! Variable names often
truncate vowels, true. I'm stuck on 80-column mode and hate line wraps. If you
write OKWS subclasses (as you do when you make new OKWS services), you can add
vowels to taste.

(4) We actually think that given the size of our team (~10 engineers), we get
features out pretty quickly. When things take a while, it's not that we use
C++, it's that either the feature is a challenging technical problem that's
deeper than language choice, or there's a ton of front-end work required
(i.e., compatibility with two mobile apps and two HTML versions of the site).

As for why we wrote OKWS, and was that a good idea, one important thing to
realize is that the landscape was quite different back in 2003 when we started
OkCupid. Since then, threading on Linux has improved, and multicore is where
the performance gains are. Also, we've seen RoR and Django get big.

So a good question is: if we were starting OkCupid again now, and if OKWS
existed as is, would we chose it over RoR, Django, PHP, etc? Maybe. OKWS has
some really nice features now that make is worth considering, such as: (a) the
tame source-to-source C++ translation system mentioned by aston below. It's a
great way to manage server-side concurrency, and I prefer it over threads.
It's most similar to twisted in Python, but I prefer tame's syntax (perhaps I
am biased). (b) The "pub" templating system. There are of course many HTML
templating systems out there today, but the "pub" system built into OKWS gives
a natural split between front-end and back-end programming tasks; (c)
Performance --- we're still serving tens of thousands of pages a second from a
few dinky web servers. These pages are 99% dynamic! We draw every page from
scratch, more or less. If we wrote OkCupid in Python, we'd need about 10x the
number of machines, and our serving bills would increase. (d) Caching -- the
OKWS architecture allows simple single-process caches, which are really fast
compared to going to memcache or shared memory systems. (e) Security and
robustness --- we're still able to separate code so that one service can
crash, while everything else runs without a problem.

If you're considering using OKWS, I offer these suggestions: (i) build a good
build system, because it's true, C++ is a slow dog when it comes to big
recompiles; (ii) never hand-manage memory (i.e., don't use new/delete), but
rather, always use reference-counted auto pointers and safe C++ string/buffer
classes; (iii) make sure you can find good C++ developers, they are hard to
find!

------
endian
_"As of 30 March 2010, OKWS is still being maintained and worked on. See our
Release Plan for more details."_

<http://www.okws.org/doku.php?id=okws:releaseplan> isn't very reassuring

Neither is `svn co svn://svn2.okws.org/ok/okws2/devel/3.0 okws-3.0; cd
okws-3.0; svn log | less`...

------
patrickgzill
I don't know enough about their traffic to know how much machine efficiency
would help them... however given that there are always C++ programmers to be
hired, and that each program just does one thing, perhaps it gives them a
layer of isolation and programmer replaceability that makes management happy.

------
nwmcsween
So how does it handle blocking io? I doubt it's using async io. If it's single
process, single thread does the whole program block on io?

------
dirtyhand
why....

~~~
wccrawford
"Despite its emphasis on security, OKWS shows performance advantages relative
to popular competitors: when servicing fully dynamic, non-disk-bound database
workloads, OKWS's throughput and responsiveness exceed that of Apache, Flash
(the reigning king of Web server performance) and Haboob (an academic system
reputed to be the fastest Java Web server on the block). Commercial experience
with OKWS suggests that the system can reduce hardware and system management
costs, while providing security guarantees absent in current systems."

That's why.

~~~
bradleyland
I'm not sure "hardware and system management costs" are the biggest challenge
a startup like OK Cupid faces. When you look at the cost of development for
something like a web server, you really have to ask if that money wouldn't
have been better spent acquiring customers while building your app on one of
the myriad of excellent server/framework/language stacks available today.

~~~
qaexl
With their concept of "free-standing services" for each endpoint, it sounds
like they took the idea of CGI scripts and compile the logic against an HTTP
wrapper. It -could- work if the build process spits out separate binaries from
a single source base. I think I'll stick with using elastic computing to scale
horizontally and spend resources on customer acquisition instead.

------
kennywinker
Just ok? why not great?

