Hacker News new | past | comments | ask | show | jobs | submit login
RavenDB 4.0 Indexing Benchmark (ayende.com)
44 points by redknight666 on Dec 26, 2016 | hide | past | favorite | 8 comments



I don't see any articles on motivation / benefits of RavenDB. For example:

- How many benefits does it have over something like Mongo?

- I'm not sure how .net is a benefit since others have C#/linq drivers

- Isn't this concerning? https://jeremydmiller.com/2013/05/13/would-i-use-ravendb-aga...

- $6000 is not cheap - Is Marten free? Seems hard to compete with a foundation as powerful as Postgres.


I havent worked with the latest RavenDB versions, but we did some work with older versions.

RavenDB is a document database that (imo) is well suited for more complex web apps, such as e-commerce and publishing websites. It's coming from the .NET world, and is very developer friendly (in that respect comparable to Mongo, but for .NET instead of javascript). It uses Linq natively for queries and indexes, without resulting in the sort of issues that entity framework creates when translating Linq to SQL. After getting proficient in RavenDB, it's possible to write very clean code that does very powerful things.

RavenDB can replace both a relational database and a search database like ElasticSearch. It has real transactions and it wraps Lucene and can do full text searches. Its indexes are extremely powerful, facets, geospatial, full text etc. We built pages that efficiently queried results like "show me all products in category X with properties Y that are in stock in a location less than Z km away from me", based on stock level/location data inside the product document. Indexes are eventually consistent.

With .NET Core, RavenDB 4.0 has now been ported to Linux and OSX and they have an alpha version out. The database itself has been improved and tuned over several years, a lot of work by smart people went into it. I find it really interesting to see how it will fare against the best databases outside the Microsoft silo, and hope to see some good testing of their distributed transactional and indexing features by Aphyr or other reputable people.


All indexes are eventually consistent or just lucene ones ? I remember they created a ~clone of lmdb which has transactions (and can build indexes on top of it).


All. It's possible to wait for indexes to be complete using the WaitForNonStaleResultsXxx query options.


Marten is a great product (in real life the Marten is a major predator or the Raven :-) and built on a solid foundation in Postgres. RavenDB has a lot of gotchas at scale, and is one of the big reasons the OctopusDeploy team moved away from it.

https://octopus.com/blog/3.0-switching-to-sql


I'm using it in production as the backing data-store for a web app. There are a few very nice things about it:

  - ACID Transactions on multiple documents + queries by ID
  - BASE (eventually consistent) map reduce indexes
  - Great client libraries: Unit of Work pattern, optional client-side caching
  - extremely productive, no ORM required due to document model, unit testing support via In-Memory DB, JSON.NET serialization, support for polymorphic object graphs etc.
  - very powerful indexing (we use map reduce, spatial + multi-facets) + fast querying
  - Integrated RavenDB 3.x management studio based on HTML5 and generally solid to work with
  - safe defaults, I have never lost data with it (heard some bad things about Mongo in that regard though)
However, there are clearly some gotchas where in trying to save you from shooting your foot off, RavenDB will occasionally shoot off your leg instead. Upside is, these are well-documented and usually caught if you test with realistic or production workloads:

  - RavenDB will by default throw an exception after more than 30 queries per session to warn you about SELECT N+1 issues
  - No unlimited result sets (will silently cap at 128 unless limits explicitly specified). You can use streaming to fetch _all_ documents if you really need it but this is explicit
  - BASE Indexes require some thought in your UX/UI to handle properly
Those are the most common issues I've heard about. Once you know them, they are easily caught in code review. I've run into a few issues though. Defining indexes using LINQ works great in 95% of cases, but those 5% of cases where it doesn't are a PITA to debug. Some of those I ran into were caused by bugs, other times I failed to understand correctly how certain things worked correctly. With all issues I've had, the team has always been super responsive and fast to turn out a fix or offer advice on how to do it correctly, even though I wasn't paying for a support contract. All they need to get going is a minimum reproduction (e.g. unit test) they can work off and a mail to the RavenDB mailing list.

Would I use it again? Yes, definitely. I've heard good things about 4.0's performance though I expect it will be quite a while before it's stable. And they've been working on a new client library that should help prevent the common "beginner mistakes" above a little better.


We used RavenDb 3.0 as an experiment for an internal management system. Spoiler alert: We ended up switching to SQL.

For the upsides, RavenDB is by far the easiest DB I've ever written against. Getting up with CRUD operations is stupidly simple and just works. The in-memory instance you can create on-the-fly for unit testing is amazing.

However.. (please bear in mind, these are my opinions and largely from recollection, it was >1 year ago now)

LINQ for secondary indexes is awesome.. when it works. When it fails, you're left with useless stack traces and really, really bad error messages. Often this doesn't happen immediately. In one instance, someone had accidentally inserted some enum values using strings instead of integers (or vice-versa, whichever was the way we weren't using). The index operation would appear to work fine, then eventually get stuck on these incorrect records (after minutes or even hours), retry a bunch, then mark the index as failed and it would suddenly be useless. It took a long time to figure out the cause of this.

I also found the secondary indexes actually got quite difficult. It would either be really easy to get something working, or mindbogglingly confusing. Sometimes you have to project/cast results, other times you don't, and it would never be obvious (to me, though others on my team had the same complaint) -- you'd just have to see you're missing a bunch of fields, then realize "oh this one needs to be projected or needs a cast".

I found joins to be even worse. They either worked right away, or required an hour of messing around with to figure out what magic options needed to be passed. I'm sure this is lack of experience/knowledge on it, but we found the learning curve to be VERY high. This included stuff like when we changed a join from using one index to another (that included some other data) the join would have to be changed dramatically, and it never seemed obvious why. In several cases it go to the point where we'd just run two queries and do the join in (trivial) .NET code. Not coincidentally, this is also where we were starting to have serious conversations about moving away.

At one point our ops team accidentally put the main database on the (slow) system drive instead of the SSD-backed storage. This wasn't immediately obvious, until it started timing out and doing other strange things. Essentially, when you hit load limits, it just acts badly and doesn't tell you what's wrong. We only figured this out from looking at I/O graphs and seeing the system drive maxed out.

Stale indexing really is a big problem. We may have been misusing this feature, but we had an index that gave a list of objects that included counts and some other data from inside each document, and this was used as the main UI list. Because sometimes it would go stale, you'd add an object, then go back to the list and it wouldn't be there. Telling it to wait for non-stale is okay, when indexing is working, but when you have some of the other problems we had you end up with a useless system.

So I'm sure a lot of this can be chalked up to using it wrong or not being educated enough.. but that is also a documentation issue. I don't often feel stupid when I use products like this, but I did get that feeling often from RavenDB.

The combination of bad situations we got into (eg: "ANOTHER problem with RavenDB? Why are we still using this thing?") and the fact it was not a core business system or product for us was what basically lead to us moving away: we felt we were spending more time fighting RavenDB problems than we were supposedly saving by using SQL.


Who uses RavenDB in production? How does it scale over a cluster? I have seen this project pop-up multiple times but I feel something less known like ArangoDB or OrientDB is more used/mature than RavenDB. Is it so? I won't even consider using it until it makes it's way to Linux .NET core or Mono.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: