Hacker News new | past | comments | ask | show | jobs | submit login
ArangoDB Receives $10M Series A Funding (arangodb.com)
190 points by kylesellas 43 days ago | hide | past | web | favorite | 68 comments

Seeing all the comments here it seems like Arango is a good fit for many use cases.

I would really recommend to the founders of the company to invest in marketing , it’s really important for developers to have something that speak to them.

My point is maybe the issue here isn’t performance of features of the database but rather the marketing that prevent it from finding its market fit.

We don't really care about the performance (speed) but mostly about features (text search + timeseries) and scalability with clusters

What? We who?

"We" is a prospective customer of this db. Maybe other devs/companies do need speed over multi-functionality

Congrats to the Arango team! I used it at my earlier workplace for computing suggested friends/followers which replaced an older service (Postgres + redis and application server-side caches). The resulting solution was faster, ran on a single modest machine (8 GB RAM, 4 cores) and allowed us to spin down 3 higher-end machines and reduce a layer of caches on application-servers. Writing the microservice using Foxx (which is built-in to Arango) was a pleasure and the easy deployment + Swagger API was a great developer experience. The community slack was friendly and helped me out with some AQL.

I'd never heard of Arango before seeing this post, but the main selling point seems to be:

"A native multi-model database from the ground up, supporting key/value, document and graph models. You can model your data in a very flexible way."

Which actually seems like it could actually justify having its own query language. The built in search is also a nice touch.

Imagine you would go to your preferred online marketplace and search for a generic product.

You get 1000+ results.

So you filter by avg.star-rating > 4.0

Still 500+ results.

Those with just one 5 star rating in front of the one with 300 reviews and a 4.8 avg. Annoying.

What I really want:

I would like to filter for products that have at least 5 (relatively long) reviews, an average rating of 4.0 and at least 2 of these review comments mentioning the use case for which I would like to use this product. Maybe I just want the verified purchases to be counted or the reviews of friends and friends of friends...

Using a native multi-model approach you can do both. Simply retrieve all category X products ranked by product rating, limit 50/page or perform advanced lookups - without having to synchronize data from a document or relational model with an additional graph or search engine.

Combining full text search with scorers, graph traversals and/or join operations you could do an ad-hoc query in AQL to get the most relevant products & reviews with a single query.

Multi-model provides choice. In data modeling and querying.

> Those with just one 5 star rating in front of the one with 300 reviews and a 4.8 avg. Annoying.

That's easy enough to solve with a bayesian average. The problem is many developers and product managers don't know much or anything about stats.

How exactly? By adding a number (C) of average (2.5 stars) "virtual" reviews to all products?


you can also use of a shrunken average. If the group has much data, it has its own average, else it is just a slight deviation of the global mean.

Never heard of this before too and it looks like a very good fit for our ML production data. Currently we use cassandra and I would have to see how easy it is to upgrade our tech and if it's worth it.

Amazing. I am happy they got funded. I have sent across this doc to our team to explore as we need time series + graph db + text search

I'm waiting for an official round of testing by Jepsen (e.g. not in-house testing, but paid testing by Kyle Kingsbury).

It should be a bar to pass for every distributed database.

Well deserved! At work I have been using ArangoDB for a few years now as a graph database. So far it's been working great with up to 100K graph nodes across two dozen collections.

I liked the idea very much, but I guess I won't use any unmanaged infrastructure software any time soon.

I had too many struggles in the past with MySQL and RethinkDB so I will go with what cloud providers are offering.

Congrats, the ArangoDB folks are also very nice people other than skilled developers.

Claudius from Arango. Thanks Salvatore! Long time not seen, I hope you are well.

All fine, greetings!

Congratulations on the funding!

I don't understand under what specific use case ArangoDB works best; the comparisons section lists Cassandra and Neo4j, and my understanding was Cassandra was for something like chat apps and Neo4j was for something like GIS analytics. Enlighten me?

Also, how convertible is a proprietary query language like AQL/CQL to SQL? Is it fully declarative, and version completely independently of the database core?

Regarding the question on the query language, AQL is fully declarative. In this respect it is like SQL. However, there are a few differences between AQL and SQL: * SQL is an all-purpose database management and querying language. It is very complex and heavy-weight as it has to solve a lot of different problems, e.g. data definition, data retrieval and manipulation, stored procedures etc. AQL is much more lightweight, as its purpose is querying and manipulating database data. Any data definition or database administration commands are not part of AQL, but can be achieved using other, dedicated commands/APIs. * for data retrieval and manipulation, the functionality of SQL and AQL do overlap a lot, but they use different keywords for the similar things. Still simple SQL queries can be converted to AQL easily and vice versa. There are some specialized parts of AQL, such as graph traversals and shortest path queries, for which may be no direct equivalent in SQL.

AQL is versioned along with the database core, as sometimes features are added to AQL which the database core must also support and vice versa. However, during further development of AQL and the database core, one of the major goals is to keep it always downwards-compatible, meaning that existing AQL queries are expected to work and behave identically in newer versions of the database (but ideally run faster or are better optimized there).

Okay, I like how backwards compatibility is preserved. I worked with mongoDB at my previous company and we ended up not being able to migrate to mongoDB 3.x. I think it was because we forked 'eve-mongoengine' and couldn't merge upstream changes, which ended up forcing us to version the entire stack through the database at the same time, which passed the threshold of feasibility.

We were absolute idiots, but I still think a data warehouse should be idiot-proof, which is why I like SQL.

I read through the documentation for ArangoDB and I would be concerned about the lack of native strict type definitions and referencing in AQL, as well as the dearth of type availability in ArangoDB in general. Is this a design decision related to not supporting data/database administration, or something to be added later to the roadmap?

It sounds like if you support write-intensive paths through the database, it would be considered an OLTP database for some OLTP workloads; do you publish TPC-C benchmarks anywhere? What about resource utilization?

Is there a particular reason to support JavaScript first? Is it because Swagger has JavaScript-first support, or a different reason?

ArangoDB is a schema-less database. There is currently no support for schemas or schema validation on the database core level, but it may be added later, because IMHO it is a very sensible feature. When that is in place, AQL may also be extended to get more strict about the types used in queries. However, IMHO that should only be enforced if there is a schema present.

To keep things simple and manageable, we originally started with AQL just being a language for querying the database. It was extended years ago to support data manipulation operations. I don't exclude the possibility that at some point it will support database administration or DDL commands, however, I am just one of the developers and not the product manager. And you are right about the main use case being OLTP workloads. For OLAP use cases, dedicated analytical databases (with fixed data schemas) are probably superior, because they can run much more specialized and streamlined operations on the data. To my best knowledge we never published any TPC benchmark results somewhere. I think it's possible to implement TPC-C even without SQL, however, implementing the full benchmark is a huge amount of work, so we never did...

Forgot to answer the JavaScript question... JavaScript can be used in ArangoDB to run something like stored procedures. ArangoDB comes with a JavaScript-based framework (named Foxx) for building data-centric micro services. Its usage is completely optional however. When using the framework, it will allow you to easily write custom REST APIs for certain database operations. The API description is consumable via Swagger too, so API documentation and discoverability are no-brainers.

Apart from that, ArangoDB comes with a JavaScript-enabled shell (arangosh) that can be used for scripting and automating database operations.

Aql is similar to n1so from couchbase. Imagine if SQL was designed around Json instead of.rows and that is what aql is.

Somewhat at least... N1QL tries to be more close to SQL in terms of keywords and such, whereas the AQL approach was to pick different keywords than SQL. Apart from the difference in keywords I tend to agree.

As an aside, Apache Cassandra CQL is now used by a growing number of wire-compliant databases: • Cassandra • Scylla (full disclosure: I work for ScyllaDB) • DataStax Enterprise (DSE) • Cosmos DB • Yugabyte

If you want to learn more about CQL: https://docs.scylladb.com/using-scylla/cql/

If you want to learn how it's different, here's a quick StackOverflow answer: https://stackoverflow.com/a/19140553/6995180

ArangoDB is a multi-model database so it tries to target several use cases. It provides functionality working with key-values, documents, graphs and fulltext indexing/searching. It provides some flexibility in the sense that it does not force you into a specific way of working with the data. For example, it does not force you to treat each use case as a graph use case. This is in contrast to some other specialized databases, which excel at their specific area, but also force you to completely adopt the type of data-modeling they support.

So what’s the downside of using this DB which does it all vs using a specialized DB? Scaling, performance, etc?

Think we have to be a bit more precise here. ArangoDB supports documents, key/value, and graph. It is not really optimized for large timeseries use cases which might need windowing or other features. Influx or Timescale might provide better characteristics here. However, for the supported data models we found a way to combine them quite efficiently.

Many search engines access data stored in JSON format. Hence integrating a search engine like ArangoSearch as an additional layer on top of the existing data models is no magic but makes a lot of sense. Allowing to combine models with search is then rather an obvious task for us.

Specialized databases have the advantage of being, well, specialized...

For example, a specialized OLAP database which knows about the schema of the data can employ much more streamlined storage and query operators, so it should have a "natural" performance advantage.

However, a very specialized database may later lock you in to something, and in case you need something different, you will end up with running multiple different special-purpose databases.

Not saying this is necessarily bad (or good), but it is at least one aspect to consider how many different databases to you want to operate & manage in your stack.

Interesting. Pretty much every startup I've worked for has run 2-3 databases. Usually Redis plus some search (typically Elastic now). I could see this making that easier.

Also, Cassandra, and Cassandra-like databases (like Scylla) are capable of far more than 'chat apps.' There are a lot of IoT, adtech, and other use cases. I just published this blog today: https://www.scylladb.com/2019/03/14/from-sap-to-scylla-track...

(Apologies for coming in sideways to this thread. Hat's off to ArangoDB, and all in the NoSQL arena who are pushing the envelope in terms of new Big Data solutions.)

You should certainly be able to use ArangoDB as a replacement for Cassandra or Neo4j.

Check out the SQL / AQL comparison page to get a quick overview: https://www.arangodb.com/why-arangodb/sql-aql-comparison/

Used the free edition for a while, mainly for graphs, and it was amazing. Will most likely use a paid version once it exists as a managed service.

Would you mind sharing how you used this for graphs? What usecase(s) you had?

There was a Show HN yesterday about "Graph Processing with Postgres and GraphBLAS"

> https://news.ycombinator.com/item?id=19379800

Was your usecase similar and could ArangoDB be subbed for psql in the above linked project?

I also feel managed service is quite critical offering they're missing right now. Considering recent AWS and Elastic debacle, market is going to be tough for Open Source products like ArangoDB.

Jan from ArangoDB here. Can't disclose anything yet but feel free to join the webinar of our co-founder Claudius. He will share details about our future plans in only 2 weeks https://www.arangodb.com/arangodb-events/why-native-multi-mo...

I'm a bit more confused on this one, not having seen the tech before. Isn't the DB space absurdly saturated with open source tools like this not really having much life in them?

Postgres is a multi-model db, with document/keyvalue/graph -- isn't it just pretty easy for an established player to add data model onto their platform?

The art is to combine all data models using one query language without duplicating data.

ArangoDB is a perfect tool for prototyping or early stages of companies that need data that might have multiple looks to it. I've used it on tons of small projects and have nothing but praise. It's a solid not-so-little beast.

Can you ELI5 the use cases?

I played with Arango a few years ago to prototype some graph stuff. Super fun to play with and it was awesome being able to traverse the graph so easily.

We were playing with data to make it easy to go from a specific analyte that was generated all the way up through its protein, DNA, chromosome, disease, and phenotype via the graph. I'm sad the project never went anywhere, but even back then Arango was great.

Congrats to the team!

ArangoDB is definitely my database of choice. There is a lot to like. Ease of setup and clustering, free REST API, solid graph features with AQL, great docs. I have been promoting it in my projects. I would love to be their partner or tech evangelist for Southeast Asia. If you guys are looking, I am game for it.

Congrats, Arango! We recently ported a large rethinkdb app to arango and it has been a joy to use. AQL is awesome.

I wonder how would one douse investors concern of having Open Source product like ArangoDB, and AWS effectively eating their lunch if/when wide adoption comes?

Congratulations on the funding btw! I'm a happy and grateful user.

The database market will all its competition is definitely challenging. I have no doubt AWS will increase their database market share over time. The good thing about this competition is that it is forcing all vendors to be innovative and to find (more) USPs.

AWS DocumentDB seems to be pretty much tied to the MongoDB API right now... So At the moment this will somewhat limit its functionality. However, they will not stand still and probably also extend into the multi-model space at some point. Apart from that, not everyone will be willing to pay for DocumentDB or have their data located in Amazon datacenters.

"AWS DocumentDB seems to be pretty much tied to the MongoDB API"

I could imagine that they didn't build DocumentDB from the ground up.

DocumentDB is probably just a MongoDB compatible API for one of their base services (S3 or DynamoDB).

As far as I know, they build Serverless Aurora on top of S3, with the help of S3 select. So they will probably just create another custom-DB compatible API if they have the impression that this custom DB becomes the next big thing.

Exactly, AWS DocumentDB is only MongoDB API-compatible, but it's not using any MongoDB components.

It's an implementation of its own, leveraging many the base building blocks and infrastructure Amazon has created.

DocumentDB is currently tied to the MongoDB 3.6 API, that means all the transactional extensions MongoDB has added recently is not present in DocumentDB (yet).

AFAIK MongoDB Atlas still earns quite a bit of money, despite the 3.6 API compatible AWS DocumentDB.


We've used ArangoDB for a while where I work, and have only had positive experiences so far. The query language, speed, and flexibility are all nice to work with.

Arango feels wonderful already. I'm thrilled to see how new funding improves the user experience. :) Cheers and congrats!

hmm, I wonder how hard it would be to make a JavaScript driver that lets you manipulate data just like you do in JavaScript, eg. using map,reduce/filter, push etc. For me it's a lot of overhead when switching back and forth between different languages, eg. between JS and SQL. Even though SQL is a powerful language and I'm really good at it.

(full disclosure: I work for ArangoDB but this is my own personal opinion)

Coming from a JS background AQL is actually pretty easy to learn. Personally the only thing that keeps tripping me up is that AQL doesn't have a triple-equals and JS has trained me to avoid double-equals in comparisons.

This is how you fetch every user in a collection:

    FOR user IN users RETURN user
This is how you fetch every admin:

    FOR user IN users FILTER user.role == "admin" RETURN user
This is how you fetch their email addresses:

    FOR user IN users FILTER user.role == "admin" RETURN user.email
Compare this to the equivalent in SQL:

    SELECT email FROM users WHERE role == "admin"
The AQL example is IMO easy to read if you know JS or any similar language. AQL even has object and array literals. There are a few idiosyncrasies but you can get very far without needing to invest time to "properly" learn the language. The naive approach usually results in pretty good performance out of the box.

I'd say the mental overhead of switching into AQL and out doesn't quite compare to that of e.g. SQL or even MongoDB queries but you are of course correct that there is some overhead nevertheless. That said, there are community-maintained ODMs for ArangoDB if you don't want to touch another language to write the queries by hand.

I would strongly recommend giving AQL a try though. When I started using ArangoDB (before becoming a contributor) I was hesitant as well but what quickly won me over was that I was able to read most AQL queries without having to learn an entirely new language.

> This is how you fetch every user in a collection:

    await db.users
This is how you fetch every admin:

    await db.users.filter( user => user.role == "admin" )
This is how you fetch their email addresses:

    await db.users.filter( user => user.role == "admin ).map( user => user.email )

I think it can be done using JavaScript Proxy.

Here's an update query:

   user.email = "updated@email.ltd"

Arrange has this kinda, with caveats. It's called Foxx.

Single best tool out there and am a big fan and user in our products.

What are the closest equivalents / competitors to ArangoDB?

OrientDb is a multimodel db with graph support, so it seems pretty similar.

Maybe neo4j ?

Outstanding! Well deserved for an outstanding product!

Congratulations to the whole team. Well deserved!

Nice going, congratulations!

World domination from here

Great work boys.

Congrats, guys!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact