

LightCloud: Distributed key-value database built on Tokyo Tyrant - amix
http://opensource.plurk.com/LightCloud/

======
dh2k
Hey,

my test results are the following:

[root@server test]# tcrmttest write -port 1978 localhost 100000 <Writing Test>
host=localhost port=1978 tnum=1 rnum=100000 nr=0 ext= rnd=0

......................... (00010000) ......................... (00020000)
......................... (00030000) ......................... (00040000)
......................... (00050000) ......................... (00060000)
......................... (00070000) ......................... (00080000)
......................... (00090000) ......................... (00100000)
record number: 200001 size: 6928736 time: 22.460 ok

[root@server test]# tcrmttest read -port 1978 localhost <Reading Test>
host=localhost port=1978 tnum=1 mul=0 rnd=0

......................... (00020000) ......................... (00040000)
......................... (00060000) ......................... (00080000)
......................... (00100000) tcrmttest: tcrdbget: error: 7: no record
found record number: 200001 size: 6928736 time: 21.996 error

[root@server test]# tcrmttest remove -port 1978 localhost <Removing Test>
host=localhost port=1978 tnum=1 rnd=0

......................... (00020000) ......................... (00040000)
......................... (00060000) ......................... (00080000)
......................... (00100000) tcrmttest: tcrdbout: error: 7: no record
found record number: 100001 size: 6928736 time: 22.692 error

how can I reach the 1M put/get?

Looks like TT is around 2-3K records / sec in read/write. I've tested with all
kind of table structures (on-memory hash, b+ tree, disk based hash, b+ tree,
table, etc). and it was the same speed all the time.

------
amix
Comments, questions and reviews are very welcomed.

~~~
bdr
What does LightCloud add that Tokyo Tyrant does not already provide? I've read
the websites of both products, it's just kind of confusing.

From some other comments, it seems I'm not the only one confused. Tokyo
Cabinet+Tyrant are pretty new on the scene and there isn't a lot about them in
English yet. So, if you add a level of explanation to the site that seems
excessive to you, people would probably find it more useful than you expect.

~~~
amix
LightCloud adds horizontal scaling. If you just use Tokyo Tyrant then you can
only scale by buying bigger servers. If you use LightCloud you can scale by
buying extra servers.

When scaling upwards you would generally _really_ want to scale horizontally,
since the vertical scale has a limit and you can quickly reach it (plus,
buying bigger machines is generally much more expensive than buying extra
machines).

~~~
chadr
How horizontally have you scaled your systems in production? The homepage
mentions 2 servers. I'm wondering if you are using it in production with more
than 2.

~~~
amix
These two servers run 3 lookup nodes and 6 storage nodes (i.e. 6 lookup nodes
and 12 storage nodes in total). These servers are quite powerful [32GB of RAM
and using RAID10], they also run MySQL.

------
sebastian
Isn't memcachedb faster?

Comparing the posted benchmark results with
<http://memcachedb.org/benchmark.html>

You get around 2800 r/s using LightCloud vs. around 64000 r/s using Memcachedb

and around 1080 w/s using LightCloud vs. around 23500 w/s using Memcachedb.

I would be interested in seeing some benchmarks that compare both.

I really like LightCloud's idea of automatic scaling, failover and load
balancing.

~~~
amix
Please do see <http://news.ycombinator.com/item?id=498699> (memcachedb !=
LightCloud). memcachedb should be compared with Tokyo Tyrant and not
LightCloud.

And if you liked, you could extend LightCloud with memcachedb support (which
we also had at one point and ran it in production [see my posts on memcachedb
mailing list for proof]), but really, when it comes to key-value databases,
it's really hard to beat Tokyo Tyrant, which is the fastest and most feature
complete key-value database out there (IMO and I have looked at most of the
popular solutions).

~~~
jwinter
That comment says: "memcachedb is not distributed, meaning that you can only
scale vertically (i.e. by buying bigger machines)." Is that true? The docs on
memcachedb seem to imply the opposite.

~~~
amix
memcachedb is not distributed - it only supports replication. I.e. with
memcachedb you can only scale reads, but not writes (or at least not without a
system like LightCloud on top of it).

------
leej
Tokyo Tyrant has different kinds of databases LightCloud has support for all?
I think so but just be sure.

Do you have any plans for developing a PHP API for LightCloud?

Thanks for your excellent work. I hope documentation will be improved.

------
siong1987
I am wondering. From the benchmark, it is obviously slower than memcached. Why
someone wants to use this instead of memcached which has better support?

~~~
amix
memcached is a memory based key-value database. LightCloud is persistent i.e.
data is saved to disk. I'll make this more clear on the website.

~~~
siong1987
I will be more excited to see how this key-value database actually helps to
scale plurk. I am interested to integrate this into Rails if this really works
very well.

~~~
catch23
I use an in-memory tokyo cabinet as a memcached replacement on my rails
system. (well actually it runs merb, but close enough)

------
binarray2000
amix, thanks for your effort (regarding both the development and the
explanations here). Upvoted and will be considered for the next project. Keep
up the good work!

------
catch23
it's down!

btw, i'm also using tyrant myself, a very cool thing indeed!

------
moonpolysoft
How does it deal with events like disk failure, network partitions, and
concurrent updates? The design documentation is rather light, so it's really
hard to make out how this actually distributes data.

You say it doesn't have any concept of eventual consistency. Yet how does it
coordinate updates to nodes? Does it do two phase commit? Paxos?

~~~
amix
Every node in both hash rings is replicated using master-master replication -
i.e. node A and node A' can both receive updates and reads. Node A and node A'
sync their updates via an update log and can fail at any time and come back at
any time without taking down the system.

Additionally, if high availability is really a big issue, then a node A''' can
be introduced that can be in another data center.

If you add nodes to the storage ring, then some of the existing keys will be
invalidated. To solve this issue and the issue of routing a lookup ring is
created. Lookup ring holds a pair (key, storage_ring_location). The system
will automatically update (key, storage_ring_location) if it's at some point
invalidated (such as that key does not point to node A, but node D).

I have tried to find an easy solution for a rather complex problem. Keeping
membership state, doing Paxos and keeping routing tables would have been much
more time consuming to make - so I have tried to solve the problem from
another angel (by using master-master replication for high availability).

------
trezor
Like I said on reddit
([http://www.reddit.com/r/programming/comments/814no/lightclou...](http://www.reddit.com/r/programming/comments/814no/lightcloud_a_distributed_persistent_keyvalue/c07yr44)),
the performance seems somewhat lackluster. Especially considering the
extremely small test-load.

I'd be more interested and might provide a somewhat less negative attitude if
you were to do some real testing on a proper dataset (several hundred
megabytes or gigabytes, not 10 bytes) and could show that adding servers
actually improves performance.

The current test-data and test-script is simply insufficient to the point of
being useless.

~~~
amix
Like I have already stated I am interested in how the system will run in
production. Generally, you will do lots of small updates and lots of small
fetches with key-value databases. You won't do batch operations - which makes
your benchmark pretty irrelevant.

Try to benchmark your relational database by doing this: \- create a new
connection \- fetch one row \- close connection

And try to compare this to selecting multiple rows at once. The result will be
MUCH different. And this basically outlines the difference between your
benchmark and mine.

This said, you will only hit limitations with a relational databases if you
are having lots of data. If you run a blog, a low traffic site or can keep all
your data in memory, then you won't have any problems. And I do have
experience in the world of relational databases and using MSSQL won't solve
this problem for you (else you would see Facebook, Friendfeed, Twitter and
Google etc. use MSSQL or Oracle).

