

What Happened When We Entered Every Music CD in Existence into RavenDB - stagga_lee
http://www.dzone.com/articles/what-happened-when-we-loaded
A total of 3.1 million disks and 43 million tracks. And we had some performance problems. But we got over them, and I am proud to give you the results . . .
======
hhrvoje
Started new project with RavenDb, and some experience:

\- no need for ORM tools. Finally. I was little tired from Nhibernate, EF, and
their's complex mappings. They are great tools, but not for every project and
scenario.

\- fast dev: simple or almost no configuration. Just point your code (raven
session) to server URL

\- safe by default: most of linq queries works just fine, dynamically created
indexes

\- hosting option: start raven .exe, host it in web app from
subfolder/subdomain, use embedded db (it took me some time to configure web
app hosting, but there should be some good docs now, i hope, because it's
simple)

\- nice management tool for testing queries, inspecting docs, etc.

\- had problem with reading docs in Silverlihgt that are stored in web app,
because some problem with JSON serializer. Had to give up from raven for that
project

\- still struggling with creating Map/Reduce indexes in practice, there's no
many samples

\- no sample apps

\- hard to make mind shift from relational thinking, it should be closer to
object/domain model, but we are corrupted with years and years of relational
data and joins. Most of our domain object model is really not object oriented
model

So, I'll definitely try it on some next not-critical project. There are
scenarios for which raven, or any other doc db, is great, and that doesn't
have to be "100-db/web server installation web startup", I think it's just
fine even for small projects.

------
jmount
Isn't "3.1 million disks and 43 million tracks" only a small amount of data
and "0.1 seconds" slow (assuming this is only metadata, not the music itself)
at current machine scales? The article mentions use of a 300 GB 7200 RPM disk
drive but I would assume (couldn't find it in the article) they would easily
have more than 8GB ram- more than enough to hold everything plus indexing
structures.

~~~
endersshadow
As somebody that works in data warehousing, this strikes me as a very small
data set. To have a full text index and return a simple search on "Adele" in
0.8 seconds on only 3.1 million rows (or even 43 million) is not particularly
impressive.

Was this supposed to be something that surprised me? I regularly deal with
billions of rows of very wide data, so I admit that my sense of scale is a
little...skewed.

~~~
jbigelow76
Maybe it's because RavenDB is a document database as opposed to an OLAP
database (I'm assuming that's what you refer to when you say billions of rows
with wide data and not another NoSQL like Mongo). Indexes and data arrangement
between the two types are not an apples to apples comparison.

~~~
endersshadow
I was mostly referring to a relational database, but I do a lot of work with
multidimensional databases (mostly SQL Server Analysis Services). You don't
typically do full text searching on a cube. In warehousing, you often get wide
tables because they're optimized for the reads, not the writes, so are
typically second normal form (2NF) rather than the third normal form (3NF)
that most people are familiar with.

------
justin_vanw
Wait, is this supposed to be good performance? This is terrible performance.
Those query times are just awful, unless it's a completely cold index.

I've seen a variety of tools (mysql, postgresql, solr, sqlite) load data and
query data far far faster than this, on my laptop even. 100x faster than this
on harder data sets for the query time, using basically out of the box solr or
postgresql full text search. The load times are also not really impressive at
all. (I just searched for "nintendo wii" sans quotes on my development solr
instance, 35GB index, I didn't prime this query, 44ms. Follow that up for a
query for nintendo: 1ms, nintendo wii: 1ms. On a laptop.).

~~~
jondot
Relevant for your comment as well as others, RavenDB uses Esent as backend:
[https://github.com/ravendb/ravendb/tree/master/Raven.Storage...](https://github.com/ravendb/ravendb/tree/master/Raven.Storage.Esent)
<http://en.wikipedia.org/wiki/Extensible_Storage_Engine>

[http://trycatchfail.com/blog/post/Alternatives-to-
Relational...](http://trycatchfail.com/blog/post/Alternatives-to-Relational-
DBs-ESENT.aspx)

Notice "DivanDB" in the blogpost :).

so, there it is.

~~~
jbigelow76
Sweet baby Jesus, RavenDB is Microsoft Access with JSON on top! I'm all kinds
of conflicted now :)

~~~
jeffesp
There are two MS "JET" database implementations. One that runs access, and
then this one. I think this one is commonly referred to as JET blue. The
Access one is called JET Red. This is outlined in the wikipedia article linked
in the parent.

------
robomartin
OK, I have to ask. Where does one get a database containing every CD in
existence?

~~~
mvanga
His earlier post he linked to has this information:

<http://www.freedb.org/en/download__database.10.html>

~~~
robomartin
Nice, thanks.

------
tiernano
been playing with RavenDB and throwing twitter data at it... seems very stable
taking about 30-50 tweets a second on a VM... think the bottleneck is CPU
(hovering around 80%)... the VM only has one core dedicated to it currently...

~~~
pestaa
My guess is that we're not talking about 7kb of data (50*140) per second, but
30-50 signed, stamped, delivered transactions, right?

~~~
tiernano
yea, its more than 140 bytes! the average "tweet" object is about 3k, and
includes a lot of meta data about the user, the tweet, location, etc... Its
direct from the Twitter stream. think there is a bit of overhead of actually
parsing the JSON file into a CLR object... since RavenDB actually knows about
JSON it may be possible to just give it the data, and see what happens...

------
chris_wot
What were the performance issues, and how did you fix them?

~~~
sp332
The "previous post" link mentions some of them
[http://ayende.com/blog/154401/ravendb-amp-freedb-an-
optimiza...](http://ayende.com/blog/154401/ravendb-amp-freedb-an-optimization-
opportunity?key=707da286f3004472918a87aed724ee2c)

------
fleitz
Tests were run on the same machine, and the database HD was a single 300 GB
7200 RPM drive. Was the drive partitioned such that the

What about running on 24-48 SSDs with two BBWC controllers w/ 128 GB of RAM?

I've never really been impressed by database testing on these small IO
constrained workloads. Also what database is it being tested against and is
the pKey an int or a UUID?

~~~
smiler
You clearly didn't even bother to read the article if you are asking what
database was being used.

