Hacker News new | past | comments | ask | show | jobs | submit login
OKWS, the Ok Web Server (behind OK Cupid) (okws.org)
76 points by whalesalad on Jan 6, 2011 | hide | past | web | favorite | 33 comments

I worked at OkCupid for a bit and got to see OKWS in its full glory. Though it sounds kinda daunting from the documentation, it's actually reasonably workable. I was able to get a rough app up that talked to a bunch of different internal services via async RPCs within my first couple days (the first compile took forever, though...).

One of the coolest parts of the stack was that every function call you made was non-blocking unless you wrapped it with BLOCK{...}, and within a BLOCK code that was written sequentially got turned into callback-based async calls behind the scenes.

The other cool part of the stack was getting core dumps on segfaults. Heh.

This is a good design. For a project I've been working on recently, I've organized it as a set of loosely-coupled async applications. That way, I can run them all in one process (with ZMQ inproc messaging), or run them all in separate processes (with TCP or Unix sockets), depending on my needs for reliability (more uptime because I can deploy-and-restart one tiny component at a time) or memory usage (all in one address space = more sharing). It's worked quite well; I love being able to stop one component, deploy a new version, and restart it without affecting the rest of the app or losing data. It makes "oh, this process seems to be behaving oddly right now" much easier to work with -- SIGTERM the faulty component, watch it return to normal, debug without stress, commit fix, run tests, deploy new version, SIGTERM again, problem fixed. Relaxation driven development, I'll call it.

But what I don't see is how this applies to OkCupid and web servers. To me, their website looks 99% static, with a web form that sends messages. The "secret sauce" is all offline background processes that builds recommendations, sends email, etc.; certainly nothing to do with HTTP. And once your website is nearly static, you stick a cache like Varnish in front of it, and you can serve pages as fast as your system bus can stream from memory to the network card. All without writing your own web server, and without using C++.

That said, it is the nicest C++ app I have ever used.

To me, their website looks 99% static, with a web form that sends messages.

have you used okcupid?

Yes. It's pretty much the same every time I log in.

Personally, I feel this is the Achilles' heel of an otherwise splendid website. Their NIH syndrome forces them to implement everything in C++ ( http://www.okcupid.com/faq : 200K lines of C++ ). I don't know how to be agile with a system like that, and it probably explains the rate of feature growth compared to other social networks (FB, Friendfeed, Quora)

Either they've completely documented the stack and API so that new employees can get up to speed quickly and if their lead developers get hit by a bus they can continue as a viable business. In which case they've spent lots of valuable resources working on documentation.

Or their technology is poorly documented and basically an oral tradition. God help them if their lead developers get hit by a bus.

> I don't know how to be agile with a system like that

I think that they're implementing a lot of stuff in client side javascript because of that. A few days ago a redditor discovered that there're a lot of parameters of each profile right there on the page source, like the number of messages received today, last week and some attractiveness scale.

Where else would those parameters be than in the page source?

I wasn't clear: I meant, if you visit other people's profile, you can see that info there, that is used to display whether the person replies frequently or not.

Ajax call? It's supposed to be private data.

Agility is just getting your process right. C++ can be extremely agile with the right team and the right process. We're living proof.

(My company sells high performance C++ post-modern databases and we've been able to pivot really quickly)

I guess it only makes sense that "post-modern databases" are written in C++. Meta me this!

We're not big fans of the "NoSQL" term. ;)

"OKWS allows developers to program their Web applications in C++"

An application server for C++. Well, this makes it kinda interesting. Although lines like

  okclnt_t *make_newclnt (ptr<ahttpcon> x); 
  okclnt_simple_t (ptr<ahttpcon> x, oksrvc_simple_t *o) : okclnt2_t (x, o), ok_simple (o) {} 
do not look especially appealing.

Is your objection just to C++ syntax in general (in which case: total agreement) or something particular about OKWS? It seems pretty self-explanatory, as far as C++ goes:

  okclnt_t *make_newclnt (ptr<ahttpcon> x);
This declares a function that makes a new client from an http connection. It returns a raw pointer, so I presume the caller is responsible for the client lifetime. Whereas the connection argument is some ptr template, which I expect is a ref-counted or otherwise managed pointer, which suggests the client will keep the connection object alive. This is copying the ptr value, which if this is a ref-counting ptr would do an unnecessary extra ref-count during the function call, so they might want to pass it as a const-ref instead.

  okclnt_simple_t (ptr<ahttpcon> x, oksrvc_simple_t *o) : okclnt2_t (x, o), ok_simple (o) {}
This is a constructor to make a simple client from an http connection and a simple service. The only thing unusual is that a _t suffix usually suggests a typedef, and I don't think you can use a type alias to define a constructor.

I haven't looked at the API or done any web programming; I thought it was interesting what I could tell from this snippet regardless.

I think my biggest gripe is that naming scheme. Is it really worth abbreviating "client" to "clnt" or "service" to "srvc". And single letter variable names, seriously?

It reads a lot like those medieval manuscripts back when the European languages didn't have so many vowels in them. Maybe the OKCupid devs should take another page from the monks and get rid of whitespace as well. After all <strike>parchment</strike> horizontal screen space is a precious commodity.

I'm not here to stifle innovation, but does a company like this really need to invent infrastructure technology? I've served some pretty high traffic stuff--secure web services included--and the current web server offerings have always gotten me there. Is it really cheaper to invent this stuff rather than buy an extra server?

OKWS came before OkCupid. It was developed by Max Krohn (http://www.okws.org/doku.php?id=okws:publications) of MIT/Harvard with DARPA funding (http://www.okws.org/doku.php?id=okws:sponsorship).

This comment should really be at the top. It makes 99% of the discussion on this moot.

"As of 30 March 2010, OKWS is still being maintained and worked on. See our Release Plan for more details."

http://www.okws.org/doku.php?id=okws:releaseplan isn't very reassuring

Neither is `svn co svn://svn2.okws.org/ok/okws2/devel/3.0 okws-3.0; cd okws-3.0; svn log | less`...

I don't know enough about their traffic to know how much machine efficiency would help them... however given that there are always C++ programmers to be hired, and that each program just does one thing, perhaps it gives them a layer of isolation and programmer replaceability that makes management happy.

So how does it handle blocking io? I doubt it's using async io. If it's single process, single thread does the whole program block on io?

I thought I'd answer a few questions raised below:

(1) I've recently made a few updates to the wiki, adding a pointer to the new subversion repository: svn://svn3.okws.org/okws2/devel/3.1. I'm still actively checking in fixes and smallish new features, but there won't be any big changes over the next few months.

(2) OKWS and all services written for OKWS are single-threaded, non-blocking asynchronous processes. All database calls still go through RPC-to-SQL translators, as mentioned in the paper. File system I/O goes through libasync's aiod system: a small blocking helper process does the file I/O, and the main process communicates with the helper over asynchronous RPC.

(3) The documentation is horrible, I realize. I never quite find the time to do a good job of updating the wiki, or fully documenting what's there. If anyone wants to help me on that, please contact me! Variable names often truncate vowels, true. I'm stuck on 80-column mode and hate line wraps. If you write OKWS subclasses (as you do when you make new OKWS services), you can add vowels to taste.

(4) We actually think that given the size of our team (~10 engineers), we get features out pretty quickly. When things take a while, it's not that we use C++, it's that either the feature is a challenging technical problem that's deeper than language choice, or there's a ton of front-end work required (i.e., compatibility with two mobile apps and two HTML versions of the site).

As for why we wrote OKWS, and was that a good idea, one important thing to realize is that the landscape was quite different back in 2003 when we started OkCupid. Since then, threading on Linux has improved, and multicore is where the performance gains are. Also, we've seen RoR and Django get big.

So a good question is: if we were starting OkCupid again now, and if OKWS existed as is, would we chose it over RoR, Django, PHP, etc? Maybe. OKWS has some really nice features now that make is worth considering, such as: (a) the tame source-to-source C++ translation system mentioned by aston below. It's a great way to manage server-side concurrency, and I prefer it over threads. It's most similar to twisted in Python, but I prefer tame's syntax (perhaps I am biased). (b) The "pub" templating system. There are of course many HTML templating systems out there today, but the "pub" system built into OKWS gives a natural split between front-end and back-end programming tasks; (c) Performance --- we're still serving tens of thousands of pages a second from a few dinky web servers. These pages are 99% dynamic! We draw every page from scratch, more or less. If we wrote OkCupid in Python, we'd need about 10x the number of machines, and our serving bills would increase. (d) Caching -- the OKWS architecture allows simple single-process caches, which are really fast compared to going to memcache or shared memory systems. (e) Security and robustness --- we're still able to separate code so that one service can crash, while everything else runs without a problem.

If you're considering using OKWS, I offer these suggestions: (i) build a good build system, because it's true, C++ is a slow dog when it comes to big recompiles; (ii) never hand-manage memory (i.e., don't use new/delete), but rather, always use reference-counted auto pointers and safe C++ string/buffer classes; (iii) make sure you can find good C++ developers, they are hard to find!


"Despite its emphasis on security, OKWS shows performance advantages relative to popular competitors: when servicing fully dynamic, non-disk-bound database workloads, OKWS's throughput and responsiveness exceed that of Apache, Flash (the reigning king of Web server performance) and Haboob (an academic system reputed to be the fastest Java Web server on the block). Commercial experience with OKWS suggests that the system can reduce hardware and system management costs, while providing security guarantees absent in current systems."

That's why.

I'm not sure "hardware and system management costs" are the biggest challenge a startup like OK Cupid faces. When you look at the cost of development for something like a web server, you really have to ask if that money wouldn't have been better spent acquiring customers while building your app on one of the myriad of excellent server/framework/language stacks available today.

Wikipedia says OKCupid launched in 2004 - are they still really a startup?

With their concept of "free-standing services" for each endpoint, it sounds like they took the idea of CGI scripts and compile the logic against an HTTP wrapper. It -could- work if the build process spits out separate binaries from a single source base. I think I'll stick with using elastic computing to scale horizontally and spend resources on customer acquisition instead.

I probably wouldn't use the fact that my software has higher throughput and responsiveness than Apache as a selling point. That also describes almost every other webserver out there (especially nginx).

It's hard to buy the competitive advantage argument. Why open source it if that's the case? I think they must realize it's not a core part of their business.

It predates their business.

Just ok? why not great?

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact