

10x speedup in write performance in Riak Innostore based on keyname - LiveTheDream
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-April/003819.html

======
smanek
Be careful with this trick. Although it appears to work well for Riak, it can
cause lots of problems for BigTable style stores, like HBase, since it
bottlenecks all the writes through one node.

See: [http://ikaisays.com/2011/01/25/app-engine-datastore-tip-
mono...](http://ikaisays.com/2011/01/25/app-engine-datastore-tip-
monotonically-increasing-values-are-bad/)

HBase (and I've heard BigTable) work best with purely random row keys - like
the author of this post was using initially.

~~~
strlen
It should be noted that Riak uses consistent hashing (where you route a key
based on the Murmur or FNV hash of its md5 checksum) and virtual nodes. That
means that even if keys are next to each other, they will get routed to
different virtual nodes. Even in a single virtual node is "hot", virtual nodes
do not map 1-to-1 onto physical nodes.

BigTable uses token ranges, which allows for range queries, but makes it
vulnerable to this kind of a situation. This, however, shouldn't be needed
with BigTable or HBase as BigTable and HBase use LSMs instead of a
conventional B-Tree (what InnoDB -- which is what Innostore uses -- is): all
writes and updates are strictly sequential, so this kind of "trick" is not
needed.

[Disclaimer: Voldemort developer here, we use consistent hashing, virtual
nodes and -- by default -- a log structured B+Tree from BerkeleyDB Java
Edition]

------
jchrisa
This is also a best practice in CouchDB - in Couch as long as you use the
UUIDs that Couch generates for you (via the /_uuids API endpoint) you'll get
keys that are designed to minimize the work the b-tree has to do, to insert
them.

~~~
rdtsc
There are basically 2 type of common UUIDs : 1 and 4.

UUID1 is generated from mac address of the machine + timestamp + random bits
and UUID4 is completely random. Sometimes you'd want one sometimes the other.

You can try these in python as:

    
    
        >>> import uuid
        >>> uuid.uuid1()
        >>> uuid.uuid4()

------
antirez
IMHO for all this class of problems, that is: need to log big amount of data
over time for years, the way to go is not Riak, nor Redis, nor <put your
preferred DB name here> but, simply writing to files in append only mode (and
when you can, using a fixed size record for fast access later).

There are good reasons IMHO for writing a small networked C server doing this
work.

~~~
strlen
_> There are good reasons IMHO for writing a small networked C server doing
this_

Right, because:

<https://github.com/cloudera/flume>

<https://github.com/facebook/scribe/wiki>

<http://sna-projects.com/kafka/>

[http://www.freebsd.org/cgi/man.cgi?query=syslogd&sektion...](http://www.freebsd.org/cgi/man.cgi?query=syslogd&sektion=8)

... don't exist?

(Formulation courtesy of abhay, plug for Kafka mine)

~~~
antirez
if there is something already great at doing this, sure, no need. I did not
checked however, so can't talk about this specific projects.

~~~
benblack
Well, yes. Exactly.

Big ups to the Homo Sapiens posse.

\- Lil' B

