I would really recommend to the founders of the company to invest in marketing , it’s really important for developers to have something that speak to them.
My point is maybe the issue here isn’t performance of features of the database but rather the marketing that prevent it from finding its market fit.
"A native multi-model database from the ground up, supporting key/value, document and graph models. You can model your data in a very flexible way."
Which actually seems like it could actually justify having its own query language. The built in search is also a nice touch.
You get 1000+ results.
So you filter by avg.star-rating > 4.0
Still 500+ results.
Those with just one 5 star rating in front of the one with 300 reviews and a 4.8 avg. Annoying.
What I really want:
I would like to filter for products that have at least 5 (relatively long) reviews, an average rating of 4.0 and at least 2 of these review comments mentioning the use case for which I would like to use this product. Maybe I just want the verified purchases to be counted or the reviews of friends and friends of friends...
Using a native multi-model approach you can do both. Simply retrieve all category X products ranked by product rating, limit 50/page or perform advanced lookups - without having to synchronize data from a document or relational model with an additional graph or search engine.
Combining full text search with scorers, graph traversals and/or join operations you could do an ad-hoc query in AQL to get the most relevant products & reviews with a single query.
Multi-model provides choice. In data modeling and querying.
That's easy enough to solve with a bayesian average. The problem is many developers and product managers don't know much or anything about stats.
It should be a bar to pass for every distributed database.
I had too many struggles in the past with MySQL and RethinkDB so I will go with what cloud providers are offering.
I don't understand under what specific use case ArangoDB works best; the comparisons section lists Cassandra and Neo4j, and my understanding was Cassandra was for something like chat apps and Neo4j was for something like GIS analytics. Enlighten me?
Also, how convertible is a proprietary query language like AQL/CQL to SQL? Is it fully declarative, and version completely independently of the database core?
AQL is versioned along with the database core, as sometimes features are added to AQL which the database core must also support and vice versa. However, during further development of AQL and the database core, one of the major goals is to keep it always downwards-compatible, meaning that existing AQL queries are expected to work and behave identically in newer versions of the database (but ideally run faster or are better optimized there).
We were absolute idiots, but I still think a data warehouse should be idiot-proof, which is why I like SQL.
I read through the documentation for ArangoDB and I would be concerned about the lack of native strict type definitions and referencing in AQL, as well as the dearth of type availability in ArangoDB in general. Is this a design decision related to not supporting data/database administration, or something to be added later to the roadmap?
It sounds like if you support write-intensive paths through the database, it would be considered an OLTP database for some OLTP workloads; do you publish TPC-C benchmarks anywhere? What about resource utilization?
To keep things simple and manageable, we originally started with AQL just being a language for querying the database. It was extended years ago to support data manipulation operations. I don't exclude the possibility that at some point it will support database administration or DDL commands, however, I am just one of the developers and not the product manager.
And you are right about the main use case being OLTP workloads. For OLAP use cases, dedicated analytical databases (with fixed data schemas) are probably superior, because they can run much more specialized and streamlined operations on the data.
To my best knowledge we never published any TPC benchmark results somewhere. I think it's possible to implement TPC-C even without SQL, however, implementing the full benchmark is a huge amount of work, so we never did...
If you want to learn more about CQL:
If you want to learn how it's different, here's a quick StackOverflow answer:
Many search engines access data stored in JSON format. Hence integrating a search engine like ArangoSearch as an additional layer on top of the existing data models is no magic but makes a lot of sense. Allowing to combine models with search is then rather an obvious task for us.
For example, a specialized OLAP database which knows about the schema of the data can employ much more streamlined storage and query operators, so it should have a "natural" performance advantage.
However, a very specialized database may later lock you in to something, and in case you need something different, you will end up with running multiple different special-purpose databases.
Not saying this is necessarily bad (or good), but it is at least one aspect to consider how many different databases to you want to operate & manage in your stack.
(Apologies for coming in sideways to this thread. Hat's off to ArangoDB, and all in the NoSQL arena who are pushing the envelope in terms of new Big Data solutions.)
Check out the SQL / AQL comparison page to get a quick overview:
There was a Show HN yesterday about "Graph Processing with Postgres and GraphBLAS"
Was your usecase similar and could ArangoDB be subbed for psql in the above linked project?
Postgres is a multi-model db, with document/keyvalue/graph -- isn't it just pretty easy for an established player to add data model onto their platform?
We were playing with data to make it easy to go from a specific analyte that was generated all the way up through its protein, DNA, chromosome, disease, and phenotype via the graph. I'm sad the project never went anywhere, but even back then Arango was great.
Congrats to the team!
Congratulations on the funding btw! I'm a happy and grateful user.
AWS DocumentDB seems to be pretty much tied to the MongoDB API right now... So At the moment this will somewhat limit its functionality. However, they will not stand still and probably also extend into the multi-model space at some point. Apart from that, not everyone will be willing to pay for DocumentDB or have their data located in Amazon datacenters.
I could imagine that they didn't build DocumentDB from the ground up.
DocumentDB is probably just a MongoDB compatible API for one of their base services (S3 or DynamoDB).
As far as I know, they build Serverless Aurora on top of S3, with the help of S3 select. So they will probably just create another custom-DB compatible API if they have the impression that this custom DB becomes the next big thing.
It's an implementation of its own, leveraging many the base building blocks and infrastructure Amazon has created.
DocumentDB is currently tied to the MongoDB 3.6 API, that means all the transactional extensions MongoDB has added recently is not present in DocumentDB (yet).
We've used ArangoDB for a while where I work, and have only had positive experiences so far. The query language, speed, and flexibility are all nice to work with.
Coming from a JS background AQL is actually pretty easy to learn. Personally the only thing that keeps tripping me up is that AQL doesn't have a triple-equals and JS has trained me to avoid double-equals in comparisons.
This is how you fetch every user in a collection:
FOR user IN users RETURN user
FOR user IN users FILTER user.role == "admin" RETURN user
FOR user IN users FILTER user.role == "admin" RETURN user.email
SELECT email FROM users WHERE role == "admin"
I'd say the mental overhead of switching into AQL and out doesn't quite compare to that of e.g. SQL or even MongoDB queries but you are of course correct that there is some overhead nevertheless. That said, there are community-maintained ODMs for ArangoDB if you don't want to touch another language to write the queries by hand.
I would strongly recommend giving AQL a try though. When I started using ArangoDB (before becoming a contributor) I was hesitant as well but what quickly won me over was that I was able to read most AQL queries without having to learn an entirely new language.
await db.users.filter( user => user.role == "admin" )
await db.users.filter( user => user.role == "admin ).map( user => user.email )
Here's an update query:
user.email = "firstname.lastname@example.org"