

Optimizing your CouchDB Calls by 99% (no strings attached) - timanglade
http://blog.cloudant.com/optimizing-couchdb-calls-by-99-percent/

======
wccrawford
I'll save everyone the time:

Don't use CouchRest::Database! because it tries to create the DB each time.
(Use CouchRest::Database instead)

Don't let multi_json pick the json library! Pick jsonx yourself before
couchrest runs.

Don't use HTTPS for internal connections.

For the last 3%... He hacked in a binary format into couchrest and couchdb in
10 lines of code.

~~~
pavel_lishin
Thank you. I'm too ADD for videos, and besides, don't know of an easy way to
import them into iTunes as a podcast-style video (one that won't restart from
the beginning if I stop it halfway through.)

~~~
RickHull
Is that a bug or a feature? ;)

------
rkalla
I like the idea that Tim is working on a binary-based protocol interaction
with Couch. If the binary protocol integration is generic enough I'd like to
discuss with them the possibility of supporting the Universal Binary JSON
format (<http://ubjson.org>)

While the HTTP-only protocol work up until now has been one the nice "Ease of
use" standout features for Couch, it also leads to a lot of overhead when you
know you absolutely don't benefit from it.

All the work on CouchApps (only CouchDB + JavaScript apps) makes a lot of
sense to have the rest-only interaction, but if you are trying to scale out a
high read/write couch install and are already using tricks like TCP no_delay
and server-side batching along with batch-doc operations on the client there
is only so much you can do before you slam up against what _looks_ to be a
6-10K docs/sec change rate due in no small part to Couch (even with its new
JSON parser) processing data structures into and out of textual JSON.

~~~
timanglade
To be clear, I'm only working on integrating a binary _data format_ , which is
different from a binary protocol.

HTTP is baked pretty deep into CouchDB, which makes it hard to remove (but who
knows, Couchbase seems to be having some success ripping it out). BigCouch
(Cloudant’s distributed version of CouchDB) has the HTTP layer neatly
separated through the Fabric library (<https://github.com/cloudant/fabric>),
so it'd probably be easier to add a binary protocol there.

~~~
rkalla
Tim, is the Couch server still going from binary disk data to JSON and then
you convert that to the binary format or is your work actually allowing Couch
to skip any textual JSON generation such that all the REST methods are still
there, but their response bodies are just MessagePack binary data instead of
text?

~~~
timanglade
The Couch server goes from binary Erlang terms (our default internal
representation) to MessagePack right away, skipping all textual JSON
generation. All the REST methods are still there, the body is just sent in
MessagePack format with an application/x-msgpack header
[https://github.com/timanglade/couchdb/commit/b601286dae04bdc...](https://github.com/timanglade/couchdb/commit/b601286dae04bdc2488a0d9bf028c58e6feb3449)

~~~
rkalla
Ah! That is exactly what I was hoping for (skipping the JSON in/out cycle
server-side).

Keeping the HTTP protocol in this case is fine for me and not require any
unexpected changes to Couch. I hope that binary format work can be make
configurable in the future with other formats supported.

Thanks for the clarifications Tim.

------
dochtman
Are there no slides or transcript, or blog posts? It would be nice to have
something that can be skimmed through.

I don't use Ruby, so I'm not sure how much of the talk would apply to me, but
we use CouchDB extensively at work.

~~~
rkalla
Are you guys using any of the replication? If so, have you seen any of the
replication failures that are tracked in JIRA right now?

There are 2 or 3 nebulous "replication breaks" bugs that are filed from older
versions, but one that says it is still happening on 1.1.1 with no followups
from the team.

Filipe I believe rewrote the replication code from scratch for 1.2 so again,
it is hard to know if the replication hickups are non-issues or not so I'm
curious if you have real world data points here.

~~~
timanglade
The replication is borked, we know. The replicator database introduced in 1.1
only solves part of the issues at hand. Everybody at Cloudant & Couchbase is
pretty confident that the 1.2 replicator will be much more stable & efficient,
which is why there isn't a lot of activity right now on plugging holes in the
old replicator.

~~~
rkalla
I had only read ancillary data about the new replicator work on the lists and
in JIRA, really glad to hear that it got the rewrite treatment.

Is this work going to go into Couchbase 2.0 builds? (not necessarily the
current preview, but the final GA release?)

~~~
damienkatz
Yes, it's going into Couchbase.

------
andrewvc
I have to wonder why so many presentations have 'X sucks, I say use Y which
rocks' as an attitude. Negativity doesn't really sell me on much.

I realize he has just grievances against multi-json and couch-rest, but really
that kind of attitude drags the whole community down.

OSS should be about collaboration, not a dick swinging contest.

~~~
timanglade
I have to wonder why so many comments have ‘X sucks, please say Y which
rocks’, as an attitude. Negativity doesn't really sell me on much.

Seriously though. The poor design (& egregious misuse) of those libraries
became a major concern for us at Cloudant. I wanted people to become aware of
what they were doing wrong & why. I thought it would be best done with a bang.
Fixing the situation beyond the patches I submitted would have required that I
go on a campaign to have those libraries removed from the face of the earth
(which seems more extreme than very strongly recommending people stay away
from them).

~~~
jchrisa
When I originally wrote CouchRest, it was just the "non-magic" parts, and was
a clean little library. Then one day I had about 3 hours fun, adding a query
generator as an experiment. Next thing you know, that aspect had gained
contributors and grown into a full blown ODM. If you just use the basic
features, CouchRest should treat you just fine. The only reason you'd use the
extended stuff is if you are more comfortable with Active Record style APIs
than with HTTP. Turns out that's not a small set of people...

------
cloudhead
Mostly a waste of time.. there's nothing about optimization, just common
sense, like "don't parse json with ruby".

------
baudehlo
So in summary the performance issue was entirely with JSON encoding/decoding.

------
Will_Price
I know this is going to sound pedantic, however, optimizing by 99% would make
them 1% less efficient? Right? Or am I muddled?

~~~
cloudhead
optimization is inherently positive, so it could either mean +99% performance,
ie 199% or -99% time.

~~~
Will_Price
Bingo, thought what I had said sounded wrong, thanks for the clarification.

