

ArangoDB - majidazimi
http://arangodb.org/
An open-source database with a flexible data model for documents, graphs, and key-values. Build high performance applications using a convenient sql-like query language or JavaScript extensions.
======
saurik
> As a relational database user we are used to treat the database as stupid
> and mainly use it to save and retrieve data. ArangoDB lets you extend the
> database using Javascript (production ready) and Mruby (experimental).

?!? A common complaint against relational database people are having "too
much" logic in the database. (I clearly don't agree, using store procedures
and custom extensions ;P.)

~~~
Pxtl
Personally I loathe stored procedures because often they include a lot of
logic that _shouldn 't_ be on the database and they also generally involve SQL
extenstions that are pretty terrible for the general-purpose computing you see
them used for.

But if that layer is primarily used for providing, controlling, and optimizing
access to the data I can see the appeal. And in that case? Being able to write
the procedural parts in a language that's less-terrible than typical SQL
extensions would be really nice.

~~~
jeffdavis
"Being able to write the procedural parts in a language that's less-terrible
than typical SQL extensions would be really nice."

Postgres allows writing functions in many languages, including C, python, and
javascript.

------
shin_lao
I have a feeling, tell me if you share it.

"Wouldn't it be cool to have a multipurpose database which we would be able to
query with a language, but not SQL, because SQL sucks, for some reason.".

Put it differently, what does ArangoDB, MongoDB, whateverDB bring that
relational databases didn't bring 30 years ago?

~~~
jsteemann
To name just a few: \- being relaxed about schemas: no more long-running ALTER
TABLE commands, no more up-front schema definitions that waste time when doing
proof-of-concepts etc. \- being friendly to variable and hierarchical data: no
more entity-attribute-value patterns and necessity to store JSON etc. as BLOBs
\- integration of scripting languages such as JavaScript, so you can have one
language for the full stack if you want \- embracing web standards (HTTP,
JSON) \- no object-relational mismatch (there are no relations), as you can
more easily map a single programming language object to a document

Relational databases partly offer solutions for this, too. But in a relational
database, these things are (often clumsy) extensions and not well supported.

~~~
Spearchucker
I do all of what you mention using SQL Express (I don't use JSON though,
because binary serialisation is faster). Abstraction means I save my data as
documents/blobs, can still do joins, don't have to alter tables (when de-
serialised entities are self-describing), fully indexed entity content.

It means thinking about what and how you're going to do something before you
start coding (design up front). It means creating a throw-away proof of
concept before you start coding. But it is flexible, extensible, and changes
to the schema, as it were, do not impact up-stream dependencies.

~~~
nawitus
SQL Express guarantees consistency, but most NoSQL databases guarantee
availability. Although many SQL databases can be used to implement the same
functionality as a NoSQL database, that doesn't mean it's as easy. And
easiness is what matters, because if something is easier, you may spend less
time working on it and that saves money.

The term "NoSQL" database is a bit problematic, because the definition only
says that the database is not relational, but the average NoSQL database has
other differences with the average SQL database: using JSON and JavaScript,
and perhaps queries with HTTP and so on.

The point is not "what is possible", but choosing the best tool for the job.

~~~
Spearchucker
Indeed. Although I'd say most no-SQL databases guarantee partition tolerance
(I can guarantee availability using a SQL database).

SQL Express runs as a single instance on a client so you get all three -
consistency, availability _and_ partition tolerance[1]. When running something
else on more than one node you can choose two of those three. Availability and
consistency are usually chosen because of business drivers. If partition
tolerance is required (I've yet to encounter a scenario that makes a
compelling case for it [2]) then eventual consistency is the price.

There are databases out there that will suite any combination of the three.
Unfortunately the equation is somewhat more complex because databases that
offer consistency and partition tolerance (for example) don't typically offer
JOIN -like functionality.

Everything's a trade-off. Other considerations are tried and tested v.
bleeding edge; painful v. painless; ideal v. affordable; and so forth.

[1]
[http://en.wikipedia.org/wiki/CAP_theorem](http://en.wikipedia.org/wiki/CAP_theorem)

[2] I'm not saying there aren't scenarios where it makes sense (Google, for
example) - just that I've not encountered one. I do run into many people that
want Mongo and, not understanding what they're asking for would be better of
with a relational database.

------
vicaya
"Transactions in ArangoDB are atomic, consistent, isolated, and durable
(ACID)." "Collections consist of memory-mapped datafiles...". "by default,
ArangoDB uses the eventual way of synchronization...synchronizes data to disk
in a background thread."

So it's not ACID by default and practically not usable with immedaite sync
turned on (huge amount of seeks due to use of mmap), just like mongo.

~~~
jsteemann
As in many databases, ArangoDB allows some choices regarding durability.
Immediate disk synchronisation is turned off by default in ArangoDB.
Synchronisation is then performed by a background thread, which is frequently
executing syncs. By the way, several other NoSQL databases have immediate
synching turned off by default, e.g. CouchDB, MongoDB.

In ArangoDB you turn on immediate synchronisation on a per collection level,
or use it for specific operations only. So it's up to you how you want to use
it. This gives the database user a fine-grained choice.

I remember using some relational databases in the past where we turned
immediate synchronisation off as well to get more throughput. So it's probably
not fully uncommon to do it, but I understand the expectation of relational
users that everything is fully durable by default.

Memory-mapped files don't have anything to do with ACID. It's just a detail of
the internal organisation of buffers. You can have full durability with
memory-mapped files. You just have to use msync instead of fsync/sync.

~~~
cwmma
IM pretty sure you can yank the power cord from couchdb as soon as you get a
(positive) response and the data will be saved.

Unlike mongo, couch has a sophisticated append only btree format for storing
data, that is almost impossible to corrupt.

~~~
Xylakant
Saved (durable) and hard to corrupt are different properties of a database.
For example elasticsearch uses the lucene index format in the background. It's
write once per segment. Once a segment is written, data is save and it's
(apart from disk corruption) impossible to corrupts, since the file is never
opened for writing again. However, segments are not written immediately after
a document is received - so when yanking the power cord right after a write to
the cluster, you'll loose data - however without any danger of corruption
since the last, partially written segment is discarded. Couchdb behaves in a
similar fashion: If the last bit of the storage file contains corrupt data, it
is discarded. I'm not absolutely certain atm about the default durability
settings in couch, so I can't say if the write happens before or after the
"ack" from the server. However, since disk controllers cheat and sometimes a
"flush" to disk doesn't actually flush, you can get data loss regardless of
the promises your database makes.

~~~
bjerun
Durability is indeed hard to archive - as you pointed out disk controller
sometimes simply tell you a lie.

With respect to corruption ArangoDB behaves similar: It uses an append-only
log file with CRC checksums. So, if the last bit of storage contains corrupt
data, it is discarded.

------
nateberkopec
> "In typical applications with "complex" database operations there is often
> no clean API to the persistence layer when to or more database operations
> are executed one after each other which belong together from an
> architectural perspective."

Is that even a real sentence?

------
sedlich
> Put it differently, what does ArangoDB, MongoDB, whateverDB bring that
> relational > databases didn't bring 30 years ago? (Let's leave MongoDB out
> here ;-) What I really love and what the relationals do not have are:

* Graphs as first class citizens! (try to view them in the web gui :-) * The tight V8 & JavaScript integration (FOXX is more then cool. Hope I will be able to use it from Clojure Script)

What you might find in earlier databases but not completely in others today is
(my personal hitlist :-) : * The increadible amount of indices with even skip
and n-gram! * MultiCore ready * Durability tuning (already mentioned by Jan) *
AQL covering KV, JSON and Graphs! (Martin Fowler was quite sceptical that this
model integration could work...) * And a MVCC that makes it SSD ready. *
Capped Collections * Availablity on tons of OS versions as Windows, iOS, all
UNIXes and even Travis-CI (how cool is that?!)

Try it. Might be fun in production compared to other famed NoSQL DBs.... (at
least to me)

------
mikro2nd
> There are driver for all major language like Ruby, Python, PHP, JavaScript,
> and Perl.

I chuckled at the absence of the most widely deployed language on the planet.

I dare say that there _is_ a driver for Java - didn't look, because after
browsing through a reasonable portion of their site, I still couldn't get a
simple explanation of what this DB allegedly does and doesn't do.

~~~
RyanZAG
There is a Java driver and even an object mapper based off jackson -
[https://www.arangodb.org/drivers](https://www.arangodb.org/drivers)

This seems to be a mongodb clone with some extra features added on to make it
a bit closer to a relational db, I guess. Looks interesting but likely suffers
from the same problems MongoDB suffers from (data safety, scaling
difficulties, etc)

EDIT: Have to say, the idea of a mongodb database with graph operations built
in is pretty attractive for small network oriented problems...

~~~
nutate
I jumped on arango, looking for a small graphdb I could run locally on my
machine. They've since moved it to a much wider scope. Turns out I abandoned
that project idea (like most... :D), but I think Arango is sufficiently unique
to merit more investment.

------
mk3
I have tried to run ArangoDB under Node.js had some little successes, but in
general their claim that they have drivers for many platforms is far from
reality. Also why not to release node.js binary drivers instead of pushing
people to use foxx. Also simple browsable docs would be nice instead of
chunked documents covering hell knows what. Finding link to their query
language was a big hassle :) Also their graph traversing is still in infancy
as I understand.

~~~
neunhoef
I would call
[https://www.arangodb.org/manuals/current/UserManual.html](https://www.arangodb.org/manuals/current/UserManual.html)
a browsable doc (3 clicks from the main page). And a further click fires up
the chapter about AQL...

~~~
mk3
I would preffer more or less the api style docs with short general examples.
than a lot of unusable text.

------
coolsunglasses
I like that they're sufficiently ignorant of MongoDB's implementation to
misattribute the primary cause of excessive space usage.

Gives me confidence trust them with my data.

~~~
neunhoef
The big advantage of ArangoDB with respect to memory/disk usage is that
despite the Schema-less-ness, the database automatically recognises common
"shapes" of the documents in a collection and thus usually does not have to
store all attribute names many times. In addition, the possibility of
transactions makes it less necessary to keep many old revisions of documents,
in comparison to for example MongoDB.

~~~
coolsunglasses
I'd be more interested in a document store that was historical and didn't
mutate.

Failing that, I'll just keep using RethinkDB for this sort of thing.

------
eonil
> In ArangoDB, a transaction is always a server-side operation, and is
> executed on the server in one go, without any client interaction.

It doesn't seem to support interactive transaction. That means only simple
batch read & write, no complex transaction. It seems in between CAS and _real
generic_ transaction. Doesn't seem to be much useful.

~~~
don71
No - it goes way beyond simple batches of operations. Basically you have to
write your transaction as JavaScript program. So, you can do anything you
could do on the client-side - with the exception of waiting for another source
(i. e. user interaction). You could read a document from one collection, chose
different actions based on the attribute. Change documents in multiple
collections. I think the PHP driver uses some kind of abstraction to hide the
JavaScript from the developer.

------
sgarg26
This will be even cooler if they add 'turn-key' scaling. Their scaling
approach is still a work in progress.

[http://www.arangodb.org/2013/05/22/replication-and-
sharding-...](http://www.arangodb.org/2013/05/22/replication-and-sharding-in-
arangodb)

Anyhow, good job so far to ArangoDB team.

~~~
neunhoef
Thanks, and: you are right, scaling by sharding is important, and that is why
we have made this our top priority for the coming three months.

~~~
sgarg26
Good to hear. I'd love to consider ArangoDB for analytics project at that
time.

------
poseid
By the way, if someone gives this a try on a VM, I wrote an blog post about it
here: [http://thinkingonthinking.com/A-Data-Platform-
in-15-minutes/](http://thinkingonthinking.com/A-Data-Platform-in-15-minutes/)

------
basicallydan
I'm really excited about the look of this. Being a big fan of Mongo et al. for
the structurelessness I also sometimes miss the graph-like structure that you
can easily create with SQL. Arango looks cool. I shall try it :)

~~~
bjerun
There is a screencast by McHacki about the graph explorer he wrote:
[https://www.arangodb.org/2013/11/29/visualize-graphs-
screenc...](https://www.arangodb.org/2013/11/29/visualize-graphs-screencast)

~~~
neunhoef
This is awesome. I like the way the vertices are automatically moved in a way
such that they do not overlap very much and that the edges are easily visible.
It is also good that "similar" vertices are automatically collapsed into a
"multi-vertex". I think this is very useful functionality to inspect a big
graph locally.

------
obruehl
What I particularly like is the functionality to process graphs and explore
them interactively in the browser. This has been added in some recent version,
and it makes working with graphs a lot easier than before.

------
tjbiddle
Quickstart link doesn't exist:
[http://www.arangodb.org/quickstart](http://www.arangodb.org/quickstart)

Looking forward to learning more when it's online!

~~~
bjerun
[https://www.arangodb.org/quickstart](https://www.arangodb.org/quickstart)
works. There is also an online tutorial
[https://www.arangodb.org/tryitout](https://www.arangodb.org/tryitout)

------
poseid
What I like on ArangoDB is the speed of development, as well as its native
support for building RESTful interfaces.

Last but not least, it is open-source!

~~~
obruehl
Building a REST interface with access to your data is easy thanks to
ArangoDB's Foxx framework. You can implement all your backend code in
JavaScript and upload to the server. Thus you can do any sort of preprocessing
on the server and make that available to frontends. And it's easy to integrate
with a front-end because it's all about passing JSON around via HTTP.

------
etanazir
And the tree v. table debate continues...

------
saintfiends
I wonder how it compares with RethinkDB?

~~~
poseid
also the query language of RethinkDB looks interesting:
[http://www.rethinkdb.com/docs/rethinkdb-vs-
mongodb/](http://www.rethinkdb.com/docs/rethinkdb-vs-mongodb/) \- compare with
[https://www.arangodb.org/manuals/current/Aql.html](https://www.arangodb.org/manuals/current/Aql.html)

~~~
saintfiends
Wrong link for ReQL, for anyone checking. ReQL reference is here:
[http://rethinkdb.com/api/javascript/](http://rethinkdb.com/api/javascript/)

------
aerolite
Is this named after Juan Arango?

~~~
whereismypw
Arango is a special sort of avocado - but Moenchengladbach (where Juan ARango
currently is under contract) is next to Cologne, ArangoDB's head quarter :-)

