
Sophia – An embeddable key-value database - isaacb
http://sphia.org/index.html
======
rgbrgb
I know it seems superficial but beautiful docs are one of my most trusted
heuristics when I'm considering using a library. If the author cares about the
aesthetics of the docs, it often means they care about the aesthetics of the
code, which really does matter a lot. We can make ugly things or we can make
beautiful things. I really respect people who take the time to make beautiful
tools.

~~~
VikingCoder
Was that intended to be a compliment of this project? I had the opposite
reaction. I accept that English is often a second language, but this page was
off-putting.

Sophia is a modern [should add comma] embeddable key-value database designed
for a high [should hyphenate] load environment.

It has a unique architecture that was born as a result of research and
rethinking primary alghorithmical [sic, should say "algorithm"] constraints
associated with a [sic] getting popular Log-file based data structures, such
as LSM-tree [should say "trees"], it's [sic] variations based on Fractional
Cascading ideas and a B-Tree. (see architecture) [run-on, meaning unclear]

It is very fast. (see benchmarks)

it [sic] is easy to use. (see documentation)

Implemented as a small C [should probably hyphenate, or just rewrite] written,
BSD [should probably hyphenate] licensed library.

~~~
jasonwatkinspdx
It's quite clear English is not the primary language of the author, and I
think it's in poor taste to criticize grammatical errors that are clearly
sourced in this.

English is not an easy second language to learn as an adult, and technical
English doubly so.

~~~
falsedan
These are basic grammar mistakes. Mastery of basic grammar is achievable with
a small amount of dedication; if the author does not have the time, they could
ask an English-speaking acquaintance to proof-read their docs.

The author's primary (and perhaps only) contact with users is through their
documentation. Incorrect capitalization and apostrophe use is distracting and
will put off some potential users. The author looks sloppy and uncaring
because these types of mistakes are preventable.

Being a non-native speaker is not an excuse for basic errors (excepting
novices). Proof-read your docs! If you're not confident in your language
skills, ask someone else to!

~~~
jasonwatkinspdx
To be blunt: do you speak a second language fluently? If not, I don't think
you have any idea how much dedication you're demanding.

Instead of bashing someone on hnews comments, you could send them an errata
patch.

~~~
falsedan
The onus is not on me to make this project's documentation presentable: it is
on the author. My language skills are immaterial! The author's language skills
do not excuse lazy presentation, they only help explain it.

~~~
jasonwatkinspdx
They are material when you say that the author is being lazy for not writing
English better or finding someone to contribute. It implies you don't know the
difficulty of what you're asking, so why are you calling him lazy?

But of course you want to make it clear that you feel no obligation to
contribute despite your criticism. So it's lazy for him, but not for you?

~~~
falsedan
The grammatical errors exist regardless of whether I speak one, three, or a
hundred languages fluently. Better to ask the number of projects I have
documented...

Note I comment exclusively on the author's presentation, not thrir personal
behavior: obviously they are not lazy. I have no obligation to contribute, no:
ncome from or personal interest in this project. I contribute to the projects
which benefit me and people who I care about.

------
saurik
Anyone know anything about how this compares in practice to Lightning MDB
(which uses a memory mapped B-tree, I think, and is apparently insanely faster
than most of the other enbedded key-value stores people normally examine)?

~~~
hosay123
This one copies, and it has no concept of transactions from the looks of it
(not even LevelDB-style snapshots)

------
apendleton
Having just gone through the exercise of picking an embedded key-value store
for a project, some things that would be nice: how does it compare to other
things besides leveldb (which, to be frank, isn't a stellar performer)? In
particular, how does it compare to Tokyo/Kyoto Cabinet, Lightning MDB, or
Sqlite4's LSM? Does it support data compression (either with a single pre-
selected algorithm like LevelDB and Snappy, or in pluggable fashion like LSM)?
How does it deal with concurrent access by multiple processes?

~~~
shepik
I've gone through picking embedded key-value store, too.

What really concerns me is why never in benchmarks they perform on already
filled database (like, 14G, 28G, 60G)? Because "add 100k random keys into an
empty database" is very different from "add 100k random keys into a large
database". And that is where more novel algorythms start to shine.

Yes, read speed of leveldb (and, i assume, sophia) with its fancy sst's is
lower than of plain old b-trees or hashtables (kctree/kchash), but it is still
high enough for most tasks. Write performance of kc* (and btree-based
libraries in general) is, however, unacceptable, at least on hard drives, and
even with a reasonable-sized database (~90% of RAM) it degrades to a couple of
random write per IOPs (so, 200-300 writes per second on a consumer-grade HDD,
or up to 1000 on a 2x10k sas hdd in raid-0, if i remember correctly)

It may be reasonable to use kc* on SSD, but i did not test that.

------
clumsysmurf
For Java / Android, I've been using H2's MVStore, which is log structured and
uses counted B+-trees. It's nice not having to go through JNI for good
performance in Java.

[http://www.h2database.com/html/mvstore.html](http://www.h2database.com/html/mvstore.html)

~~~
eropple
Funny - H2's slowness (either with a standard storage system or with MVStore
and a standard key format) is the main reason we're moving back to a hand-
rolled data storage system that's specific for our data on Android.

~~~
clumsysmurf
Thomas Mueller, the author of H2 / MVStore, gives some thoughts on H2's
performance issues under Android here if you are interested:

[https://groups.google.com/forum/#!topic/h2-database/Q8K-nbCh...](https://groups.google.com/forum/#!topic/h2-database/Q8K-nbChf1w)

~~~
eropple
Yeah, I saw that. The main problem for us is that we need on-disk encryption
due to regulatory issues and the encrypted SQLCipher build was causing us no
end of grief. And we don't really _need_ a SQL database, it's just what the
developers of our iOS app were doing and--it being our first Android mobile
app--we thought it was a good idea to do the same. V2 is ripping that out both
for perf and for code-sanity reasons.

------
dfischer
Typography is hard to read.

~~~
XorNot
Seconded: that font at that size strains the eyes a fair bit.

~~~
Amadou
It looks great with javascript disabled, maybe the font in the examples was a
mite small.

I turned on javascritp and it looked a lot like a man-page.

------
i_have_to_speak
Cute website. Some random thoughts:

Concurrency:

\- No mention of it. There appear to be spin locks in the source. No multi-
threaded tests.

Stability and data safety:

\- Github has 2 days of history, and 4kLoC of test code. Why should I trust my
data to you?

"high load environment":

\- So what exactly does it do in a "high load environment"? How do you define
"high load" in the first place? CPU load? I/O load from other processes? What
shortcomings of the competition under a "high load environment" are you trying
overcome?

Backup:

\- How do I do hot backup?

Benchmark:

\- LevelDB is not a fair comparison as it offers additional non-trivial
functionality (snapshots) that cannot be built up on top of Sophia. LevelDB
APIs are also safe for concurrent use, which adds overhead. Kyoto Cabinet
would have been more suitable as a peer to benchmark with.

\- 3 million records with 16-byte keys and 100-byte values is not really an
interesting benchmark dataset.

\- Iteration over a static database is not interesting, either. Is there any
alternative other than locking an entire mutating database for the duration of
iteration?

------
pwpwp
I, for one, wouldn't trust my data to a library by somebody who uses the same
text decoration for hyperlinks as for plain text.

~~~
msvan
Computer scientists aren't known for their design chops. I'd take that as a
sign of authenticity.

~~~
FraaJad
REAL computer scientists do not use CSS. Bonus points if they use FONT tags
(in caps of course!).

------
oscargrouch
Really guys, can you give more constructive or at least more (not based in
bullshit assumptions) comments? if not, just shut up..

This is a non-trivial effort, and all people do is to complain about the font
face or if the punctuation was right?

First, in the benchmarks it just crush leveldb, this is already by itself a
great achievement. can you confront the benchmarks? you do it one yourself
with a different configuration? no?

Second, if you are not a database expert and can create proper critics
(constructive or not), just keep it to yourself.. i wonder how so many people
get up with all of this conclusions so fast, without a proper look at the
source code and to have a reasonable amount of time to know what are they
talking about.

its very hard to create things like this, but very easy to critisize without
any background.. dont forget about it

if you have something to say about a small thing, that do not have a direct
relation to the product or thing itself, if theres already one comment about
it, that enough! do not spam, answering it, or creating new comments about it,
this is just so rude and unrespectful..

really, things are getting creepy on HN.. and its not only in this thread

~~~
VikingCoder
You: Making the documentation readable and easy to parse adds no value to
projects! Everyone who disagrees should shut up.

If I'm being kind to you, HN commenters (myself included) should do a better
job of commenting politely, and spend more effort making sure their criticism
comes off as constructive rather than just whining and aggressive... ...but I
think you make it sound like criticism of anything outside of the source code
itself is creepy, rude, and disrespectful.

------
Negitivefrags
It says that the benchmark source is on github, but I can't find it.

It doesn't appear to be in their primary repo.

I would like to try and do my own test against another embedded data store
like Berkeley DB but I want to know more about the conditions on the test. How
many threads were used, that kind of thing.

~~~
the1
[https://github.com/pmwkaa/sophia_benchmark](https://github.com/pmwkaa/sophia_benchmark)

------
Goopplesoft
Very cool. As a suggestion, increase the link size under the main title,
wasn't clear to me what the next step was at first after reading the
introduction text.

------
laichzeit0
In case any of the devs read this:

1\. Can multiple processes use the same database concurrently? (Separate
address space processes, not fork()'d)

2\. Have you tested this with uClib/cross compiler? (I would like to use it on
a MIPs embedded router)

The reason I ask this is because I recently had the displeasure of having to
hack a non-volatile RAM library to work with shared memory / thread safe and
something small like this would be a perfect replacement with a lot less pain.

------
dkhenry
Was this man's computer use being charged by the key stroke? I mean I
understand using a few abbreviations here and there, but i.c ? At least name
your files descriptively.

I would avoid using this for realzies if only for the fact that if something
broke trying to fix it in that code base would be prohibitive

------
hosay123
Looks nice, but note this doesn't appear to support consistent reads (unlike
LevelDB snapshots)

------
conductor
I love the simplicity of the site and the C code. I will definitely use it,
thank you.

------
MichaelGG
I've got a need for something like this, but would like to have the keys and
values delta encoded to achieve simple, yet effective, compression.

~~~
leif
How about [http://github.com/Tokutek/ft-index](http://github.com/Tokutek/ft-
index)? It's embeddable (BDB-like API), has compression built in, and is a
similar data structure to this but with more mature features like
transactions.

------
acron0
Had a quick, 30 min bash at a win32 port using msinttypes and pthread-win32
but no luck yet :( Would love to see one though...

------
ksec
Something for Mozilla to consider using inside Firefox inplace of LevelDB ( If
that was ever landed )

------
buster
I'd be far more interested in benchmarks versus BDB (and maybe even sqlite).

------
maaku
Snapshots? I could find it in the documentation..

------
jgalt212
any word on support for unicode keys?

~~~
dlundqvist
Keys are arbitrary data (you pass in pointer to data and length in bytes), so
you can use anything that makes sense for you as keys.

------
luisbebop
Awesome work, congratulations!

