
CrimsonDB: A Self-Designing Key-Value Store - ngaut
http://daslab.seas.harvard.edu/projects/crimsondb/
======
graycat
I keep reading about key-value stores with this and that property.

But for the Web site for my startup, early on it was clear that I needed a
good key-value store as a Web user session state store (server). So, I
designed one, coded it up, and have been using it. It seems to work fine.

But on key-value stores, I don't _get it_ because what I did was easy, too
easy, really, to say much about it except how easy it was to do.

So, a Web server has a user and their session state. For the user, I also have
a GUID (globally unique identifier) that is the _key_ for the user and their
session. The session state is in an instance of a class I've defined.

So, the Web server _serializes_ the instance of the class to a byte array and
passes that and the GUID key to a subroutine that uses TCP/IP sockets to start
a TCP/IP session with my session state store. To Windows, my session state
store is just a simple, console application.

The session state store is listening for a TCP/IP connection request, accepts
the request, and gets the GUID key and serialized object instance with the
session state.

The session state store has two instances of a standard .NET collection class,
say, A and B. Instance A uses the GUID as the key and the serialized session
state as the value. Class B uses the then current system time-date (GMT) as
the key and the GUID as the value.

So, whenever the session state store runs, before it returns, it looks at the
earliest time-date stamp in instance B and deletes from A and B all the key-
value pairs that are too old, that is, timed out, that is, have encountered
session time out.

The code of the session state store is simple -- single threaded. Then the
code needs a first-in, first-out (FIFO) queue, and for that I am using just
what TCP/IP offers with a maximum queue depth of 100 which I hope is enough --
will be able to tell by looking at the site log file.

That's about it.

It's simple. It's darned fast and nicely scalable via virtual memory and,
then, _sharding_ , etc.

I designed the whole thing in a few minutes and implemented it in about a day,
but I'm no expert in session state stores, key-value stores, Redis, etc.

So, what is wrong with what I designed?

Thanks.

~~~
jdoliner
Nothing is wrong with what you designed. It's simple, it does exactly what you
want it to do and best of all: you understand it end to end.

However, if you're asking why people spend so much time writing Key-Value
stores, the answer is that what you've designed is, I'm assuming, using the
filesystem to store session state for each user, which is pretty much a Key-
Value store unto itself. So people who write databases feel the need to
reimplement the algorithms in that filesystem layer, and expose them over a
TCP api rather than system calls. There can be performance and scale reasons
for this. But the filesystem is the oldest trustiest Key-Value store and if
it's good enough for your purpose it's so much easier.

~~~
graycat
> I'm assuming, using the filesystem to store session state for each user,
> which is pretty much a Key-Value store unto itself.

No, in the key-value store, the storage is all in the two instances of the
standard .NET collection class and, thus, in main memory. If the classes get
really big, say, GBs, then maybe Windows will use some virtual memory. Or
maybe someday, if I have a big company, I'll program some big, 14 TB or so
solid state disks as direct access files, say, B+ trees, to do the storage.

But for a long time, the storage for the key-value pairs will be just in main
memory.

So, for a user a second, 20 minutes active time per session, and after that 40
minutes of time until session time out, can calculate how many key-value pairs
need and, if I just look up the size of the values, get a decent estimate of
the main memory needed.

I did that: For the 16 GB of main memory for my first server, the storage
seemed reasonably small.

E.g., assume users arrive at 1 a second. Keep each user's session state for 20
minutes for their active time plus another 40 until time out for 60 minutes
total. So, after running for 60 minutes, the server has reached a steady state
of

    
    
         60 * 670 = 3,600 
    

sessions. If each session takes, say, 2 KB, we're talking total space of
7,200,000 = 7 MB, plus some for collection class overhead, maybe 10 MB total,
and, on a 16 GB main memory server, that's trivial.

Besides, if I get a user a second and they interact with my Web site for 20
minutes each, I'll be getting a new Corvette in a few weeks and have lots of
money for new, larger servers, racks, UPS boxes, emergency motor-generators,
etc.!!! :-)!!!

Thanks.

~~~
CyberDildonics
How is that not just a normal hash map?

~~~
graycat
I'm guessing that the .NET collection class is likely AVL trees (in Knuth's
_Sorting and Searching_ ) or red-black trees, in something by Sedgwick, and
not hashing, at least not normal hashing. E.g., hashing can have collisions.

Sure, there is

Ronald Fagin, Jurg Nievergelt, Nicholas Pippenger, H. Raymond Strong,
'Extendible hashing-a fast access method for dynamic files', "ACM Transactions
on Database Systems", ISSN 0362-5915, Volume 4, Issue 3, September 1979,
Pages: 315 - 344.

with a graceful way around collisions.

Our group used that paper in our work in a high end, _cross memory_ (one
address space gets data to/from another address space, calls code, etc. in
another address space), _active_ (generalization of, say, _triggers_ common in
relational database), _dynamic_ (the object definitions change during
execution) object store.

There is also a good article with good details and some Python code at:

[http://en.wikipedia.org/wiki/Extendible_hashing](http://en.wikipedia.org/wiki/Extendible_hashing)

Actually the code we used was nicely simple -- I have some notes someplace,
but the paper itself is good enough documentation.

But also have to have some _persistence_ of the session state across different
HTTP POST operations from the user; so, if have no user affinity with a
particular executing instance of the Web server software, and I'm looking
forward to having several executing instances of my Web server software, I
want no such _affinity_ , then I need something in a different Windows
_appdomain_ , e.g.,

[http://msdn.microsoft.com/en-
us/library/system.appdomain.asp...](http://msdn.microsoft.com/en-
us/library/system.appdomain.aspx)

process, address space, etc. like an external server for the session state
data.

Sure, that server could use the extendible hashing paper. But, since the .NET
collection class is there, and no doubt by now used successfully more often
than McDonald's has served a hamburger, I just used what Microsoft provided.
I'm willing enough to beg, borrow, buy, or (hush!) steal a wheel instead of
reinventing it!

I did my DIY session state store partly because I guessed that I could write
my own code in less time than it would take me just to understand how to use
something highly regarded and off the shelf.

I guessed that I wrote my code in less time than I would need just to
understand Redis. That IBM work was intended as a high end object store -- I
knew how hard it was to use something like that. And from some of my notes I
see

"Microsoft Unveils FASTER – a key-value store for large state management"

as at

[https://www.microsoft.com/en-us/research/blog/microsoft-
unve...](https://www.microsoft.com/en-us/research/blog/microsoft-unveils-
faster-key-value-store-large-state-management/)

and

[https://news.ycombinator.com/item?id=17267403](https://news.ycombinator.com/item?id=17267403)

But I discovered this option long after I wrote my own. Again, I suspect that
I wrote my own is less time than it would take me just to understand the
Microsoft documentation for their FASTER. If my DIY work gets weak in the
knees from too much load, etc., then, sure, I'll consider options off the
shelf.

------
jensenbox
Can we see some source code for this project?

------
pacuna
Sounds kind of similar to CMU's Peloton.

