?!? A common complaint against relational database people are having "too much" logic in the database. (I clearly don't agree, using store procedures and custom extensions ;P.)
But if that layer is primarily used for providing, controlling, and optimizing access to the data I can see the appeal. And in that case? Being able to write the procedural parts in a language that's less-terrible than typical SQL extensions would be really nice.
Or, what use case is your comment about?
- some access control (e. g. sessions)
- compute some fields (e. g. "age" derived from "birthday")
- combine and filter data before transfering it the client (e. g. for graph traversals)
"Wouldn't it be cool to have a multipurpose database which we would be able to query with a language, but not SQL, because SQL sucks, for some reason.".
Put it differently, what does ArangoDB, MongoDB, whateverDB bring that relational databases didn't bring 30 years ago?
For instance, graphs are totally doable with a traditional rdbms, however it is painful. You end up joining a table against itself (or via an edges table) multiple times, or alternately bouncing many queries off the table as you iterate the graph. One common type of NoSQL db is the graph database that is designed with graphs in mind and you don't even have to think about this access. It is nice.
Another case that you can do with traditional RDBMS but is annoying is loose user defined fields, such as "tags" where you have to create a tags table and a join table to make it work, with a lot of potential inefficiency there (even with indexes). Or even worse, when you have user defined attributes - lots of custom table creation per user, or big joins against a star topology to do it properly. (Or if not, you end up with something that looks like an sql database to a bunch of frontend code).
Of course other times you'll find yourself doing something in a NoSQL database that is effectively doing a bunch lookups against a table then combining a result (at which point switching to RDBMS is the solution)...
I guess what I'm saying is that lately the data store is being looked at as a component more like a library than a subsystem. I'm not sure if this is good or bad, but it certainly has helped a few cases I've dealt with nicely - ripping out a horribly complex data layer and replacing it with a NoSQL solution where the data model is shaped like my data.
Basically it's a matter of the right tool for the job.
Relational databases partly offer solutions for this, too. But in a relational database, these things are (often clumsy) extensions and not well supported.
It means thinking about what and how you're going to do something before you start coding (design up front). It means creating a throw-away proof of concept before you start coding. But it is flexible, extensible, and changes to the schema, as it were, do not impact up-stream dependencies.
The point is not "what is possible", but choosing the best tool for the job.
SQL Express runs as a single instance on a client so you get all three - consistency, availability and partition tolerance. When running something else on more than one node you can choose two of those three. Availability and consistency are usually chosen because of business drivers. If partition tolerance is required (I've yet to encounter a scenario that makes a compelling case for it ) then eventual consistency is the price.
There are databases out there that will suite any combination of the three. Unfortunately the equation is somewhat more complex because databases that offer consistency and partition tolerance (for example) don't typically offer JOIN -like functionality.
Everything's a trade-off. Other considerations are tried and tested v. bleeding edge; painful v. painless; ideal v. affordable; and so forth.
 I'm not saying there aren't scenarios where it makes sense (Google, for example) - just that I've not encountered one. I do run into many people that want Mongo and, not understanding what they're asking for would be better of with a relational database.
Indexes can be created on fields within JSON documents for faster access.
Online ALTER TABLE doesn't take any long running lock, so it can run in the background without affecting your queries.
That doesn't mean one is better than the other, it's two alternative ways of achieving things.
Sometimes a document collection is just a whole lot easier to reason about.
So it's not ACID by default and practically not usable with immedaite sync turned on (huge amount of seeks due to use of mmap), just like mongo.
In ArangoDB you turn on immediate synchronisation on a per collection level, or use it for specific operations only. So it's up to you how you want to use it.
This gives the database user a fine-grained choice.
I remember using some relational databases in the past where we turned immediate synchronisation off as well to get more throughput. So it's probably not fully uncommon to do it, but I understand the expectation of relational users that everything is fully durable by default.
Memory-mapped files don't have anything to do with ACID. It's just a detail of the internal organisation of buffers. You can have full durability with memory-mapped files. You just have to use msync instead of fsync/sync.
The problem with just mmapping files is that, to sync, you have to do a bunch of random writes. To commit two transactions, you have to jump two places in the disk and do two writes. You can defer them, but then your transaction commit latency goes up dramatically. So the user is between a rock and a hard place: uncertain durability for extended periods of time, or long commit latency.
Compare that to a system based on a Write-Ahead Log (WAL). The log is 100% sequential (and often preallocated in large chunks), and a transaction is durable if the log is flushed up to some certain point. All transactions go into the same log, so under high concurrency, one flush to disk might commit several transactions. And even if you flush for each transaction, at least you don't have to jump around on disk (and, if using a controller with battery-backed cache to reduce latency, you can make do with a fairly small cache).
The writes to the main data area can be deferred for a long time (30 minutes might be normal), and syncing those is called a checkpoint. You can spread the checkpoint out over time (it's a continuous process, really) so that they don't cause transaction latency spikes. Deferring the writes for so long allows the writes to be scheduled more efficiently without sacrificing durability at all.
On top of that, if you are OK with small windows of time before the commits are durable, postgres allows you to choose on a per-transaction basis not to wait for the WAL flush before returning to the client. If you crash, you are guaranteed to be consistent still, and if a normal transaction comes along, it will of course force a WAL flush. You can control the window of time before postgres will force a WAL flush, where 200 milliseconds might be normal. It can be a small number and still gain you a lot because it's just writing a sequential log, so there's no need to defer it for multiple seconds.
In other words: Mmapping files gives you a choice between very short commit latency and long periods of uncertain durability; or long commit latency. WAL gives you a choice between very short commit latency and short periods of uncertain durability; or short commit latency.
I understand ArangoDB is new. The description sounds interesting, and I like some things about the goals. But I think it's way off the mark to tout the durability as offering a nice trade-off, when a much better method (at least for OLTP) has been known for two decades.
 Basic idea introduced by ARIES paper in 1992. Couldn't find a link to a PDF, but it is a well-known paper.
Unlike mongo, couch has a sophisticated append only btree format for storing data, that is almost impossible to corrupt.
With respect to corruption ArangoDB behaves similar: It uses an append-only log file with CRC checksums. So, if the last bit of storage contains corrupt data, it is discarded.
Is that even a real sentence?
* Graphs as first class citizens! (try to view them in the web gui :-)
What you might find in earlier databases but not completely in
others today is (my personal hitlist :-) :
* The increadible amount of indices with even skip and n-gram!
* MultiCore ready
* Durability tuning (already mentioned by Jan)
* AQL covering KV, JSON and Graphs! (Martin Fowler was quite sceptical that this model integration could work...)
* And a MVCC that makes it SSD ready.
* Capped Collections
* Availablity on tons of OS versions as Windows, iOS, all UNIXes and even Travis-CI (how cool is that?!)
Try it. Might be fun in production compared to other famed NoSQL DBs.... (at least to me)
I chuckled at the absence of the most widely deployed language on the planet.
I dare say that there is a driver for Java - didn't look, because after browsing through a reasonable portion of their site, I still couldn't get a simple explanation of what this DB allegedly does and doesn't do.
This seems to be a mongodb clone with some extra features added on to make it a bit closer to a relational db, I guess. Looks interesting but likely suffers from the same problems MongoDB suffers from (data safety, scaling difficulties, etc)
EDIT: Have to say, the idea of a mongodb database with graph operations built in is pretty attractive for small network oriented problems...
Gives me confidence trust them with my data.
Failing that, I'll just keep using RethinkDB for this sort of thing.
Well known and understood. Even well-documented by 10gen themselves, nothing to do with the bling you added to Arango.
It doesn't seem to support interactive transaction. That means only simple batch read & write, no complex transaction. It seems in between CAS and real generic transaction. Doesn't seem to be much useful.
Anyhow, good job so far to ArangoDB team.
Looking forward to learning more when it's online!
Last but not least, it is open-source!