This is NOT second system syndrome. This is business, pure and simple. I had a long, LONG series of emails and calls with CouchBase about commercial support for Couch. We have some big production apps on it.
When it got down to time to pay for support, they told me (this is 2 months ago) in a rare and unusual bit of candor, that they were going to drop Couch in less than six months, so did I want to buy commercial support for just six months?
I told them not only do I not want commercial support, but I just got so freaked out I would not recommend couch for future projects to clients, because it was obvious that internally the team had moved on.
They asked me not to tell anyone so I didn't, but now that this is out there I can say what I deducted from our discussions: Couch doesn't make any money; MemBase does. Period.
Kevin Smith at Opscode said they're moving away from Couch (and to MySQL) as Couch just doesn't scale. No finely grained reader/writer locks, one reader/writer thread/db, huge and random delays due to checkpointing that can make the server inoperable, difficulty ever finishing a view checkpoint under load, etc. I think he's right - it's been abandoned as a platform.
It's absolutely their prerogative to move on and it seems like the right decision, but it's not a technical decision. The reality is that CouchDB is largely a product running small, toy apps, written by people who won't PAY anything for support. MemBase is being used by big companies with a lot of money to spend on commercial support and enterprise features.
Actually there is still one case where I recommend Couch and it's when you need the mobile sync features. I doubt that they'll make those a priority anytime soon on the MemBase product (at least I haven't seen or heard anything).
This is not news, and should not reflect poorly on Apache CouchDB. AFAICT, Damien's last commit on the project was 18 months ago, and he is by no means the most active commiter, even on a historical basis.
Apache CouchDB's data model is sublime for the domain I often work in, and I think the same goes for a lot of people. It's not just mobile; personally, I don't do anything mobile- or sync-related.
I have no plans on moving on. Apache CouchDB is as active and healthy a project as ever AFAICT, and having hosting providers like Cloudant makes the entire model all that more attractive.
If it's all about business, then slagging on the project that gave your company its name is perhaps not the greatest approach. Moving on is fine, a fact of life. Making things more difficult for those you leave behind isn't cool, especially when your moving on happened a long time ago.
So, JChris is at Couchbase and developing on a mobile project that synchs with CouchDB. Yet the flagship product at Couchbase cannot do so. I wonder whether Couchbase is planned to synch with CouchDB in subsequent versions?
re: "...Couch just doesn't scale...": I am curious, did you try using the BigCouch project, open source released from Cloudant? I have never used BigCouch on a customer project, but have experimented with it and it is easy to setup and is the scaling architecture that Cloudant uses for their data store as a service business.
Not yet... really, it scales enough for us now. but i've noticed people pick Couch because they think it scales better than, say, MySQL out of the box but the reality is the opposite - you have to pull out the book of scale tricks much earlier w/Couch than w/MySQL.
Also, i really like Couch... so don't take what I am saying here or above as an attack.
Probably it is very hard to create a company around an open source project without disrupting part of the open source ecosystem around it... on the other hand it is important to create some kind of structure to pay the developers to continue working on the project. I'm not sure what I would do without VMware helping Redis... but I guess that some decent compromise should exist, like creating a support company for the software but leaving the project itself as an open source effort with it's own separated site, mailing list, and so forth.
There is also an ethical problem IMHO. For instance is Redis mine? I guess it is not: VMware is funding the development so I can write code thanks to Vmware. Pieter is contributing a lot of code. The community is helping a lot the project: with talks spreading the word, helping on the mailing list, helping to fix bugs, and so forth.
My guess is that the developers really conceptually "own" a very small piece of the pie, and when they create business around an open source software, they should take this in mind otherwise it is simple to involuntarily use work/efforts that other people did in the past and turn it into your business.
It's not easy however... as: the end users should be happy and well served, and the software should not just be "open source" but an open process, with an open community, and so forth. At the same time the developers should pay their bills without issues and earn enough to avoid being tempted to joined company XYZ instead of working to their project. It's not trivial to have all this together I guess, and I feel very lucky that there is VMware making this simple for me, but not all the open source developers are equally fortunate so I guess it is crucial that the open source community keeps working on different ideas to find viable solutions.
I'm not so sure the business model of LuaJIT can easily be applied to other open source projects. I mean, I've been running a business for more than 15 years. I basically downscaled consulting as I've acquired paid open source work. It pays less, but I'm happier now.
I think you'd have a hard time doing this from zero, without some backup plan or enough funds of your own. Well, have a look at the price point of scripting languages, toolchains and such. Getting a vague business plan for an open source project in the toolchain/middleware business through a VC round? I don't think so.
Business knowledge was essential, too. Negotiating open source sponsorship contracts with corporate lawyers for half a year is not that easy. Some companies were extremely easy to work with, but I guess sponsoring open source projects is simply not an established business procedure at most companies. I don't think you could possibly afford to hire an international IP lawyer to do this for you.
Actually it was the economic downturn of 2008 which triggered all of this. The (rarely followed) advice is to invest in a downturn, because you'll be the first one to come out ahead in the following upturn. Ok, so I invested my time and knowledge into a new project (#). It worked out for me, but in retrospect I realize it was a high risk investment.
Final words: you think we're seeing some kind of downturn again? Carefully assess the risks and the potential for yourself. But don't be shy if you're onto something good. Innovate, don't replicate.
(#) LuaJIT 2.x is a complete rewrite, using the experience gained working on LuaJIT 1.x, but very little of its code. It's been on my backburner since 2007 and the first public release of LuaJIT 2.0 was at the end of 2009. One might say it's about a two man-year investment before it started to pay back. Most of the time was spent on doing research and prototyping rather than writing code. I coded and then threw away almost three complete virtual machines in the process, because these approaches turned out to be dead ends. I bet you'd need a lot of courage to explain this to a VC. ;-)
Salvatore, it is great that VMWare supports you and Redis: representative of the best open source model (open source developers paid to concentrate on writing code). Especially great for Redis users since the source code is still open source, can be read, modified, etc.
It is interesting, when they dropped Couchbase Single Server I jumped on #couchdb irc channel and said how there will be a developer drain from CouchDB since quite a few of them work for Couchbase and will be working on Couchbase Server only (instead of supporting and sending patches back to CouchDB as well).
I got a lot of disagreements and nasty responses back, suggesting that it would not happen and CouchDB is doing awesome, and how this basically doesn't affect CouchDB at all. However here is its creator, urging everyone in so many words to drop CouchDB and switch to Couchbase.
Won't turning more towards a fast key-value store mean competing with Riak & Redis? Both are already established products that do that. I use Redis for example for temporary fast, in-memory key values.
So I guess my question is what would make Couchbase Stand stand out and want someone move to it compared to the existing key-value dbs? I think it is imperative for Couchbase Server team to make that stand out. A short bullet point list will do.
Another issue I see is, ok, let's say you I see Couchbase Server as an evolution of CouchDB, is there a way to replicate from one to another, is there way to smooth the transition (install both products have a replication set up and slowly move code to use Couchbase Server).
In the end I understand that developers have to eat too and that projects and people have to move on. It is a Darwinian competition and sometimes projects just lose, sometimes it is luck or timing, sometime they are misunderstood, and it is just marketing.
memcached (binary) protocol for both doc manipulation and "changes" helps performance considerably. We looked at inventing a new protocol to stream stuff in and out faster, but we decided to go with the one we already had. Though the internal APIs have been shifting a bit, it was built entirely without touching the couchdb core and gave us a huge performance benefit over http: https://github.com/couchbase/mccouch
rewriting (not core, but slow) components in C is always the right way to do this sort of thing. We've been in the process of building a NIF for document updates for a while now (TBH, I don't know where the source is off the top of my head, but we don't keep stuff closed). We've got massive performance gains in pure erlang, but we expect to cut CPU consumption down more with this new code. It's intended to be available to Apache CouchDB if they want it.
I like CouchDB a lot myself, and use it for lots of projects (including critical parts of couchbase, inc.). If we can't solve the same problems with the new product, we'll know it because we use it ourselves.
One of the claimed reasons to move to Couchbase Server is supposedly because CouchDB has performance issues.
So parts of Couchdbase will be written in C/C++ for performance. So my point was why not try to improve the performance of CouchDB by writing performance critical parts in C, keep the existing API, possibly add a new telnet like interface with protobufs instead of completely moving to something else?
It is like they threw away the best parts of CouchDB and started to compete with Riak and Redis. That is why I really want to know the list of features that will make Couchbase Server better than those two products. Since those 2 are stable and Couchbase Server is still in beta, I know which one I am not choosing for a key-value store.
The new Couchbase Server is based on Membase (it adds JSON documents and incremental Map Reduce). Most of critical code is already in production today on lots of very big sites: http://www.couchbase.com/customers
So when Couchbase Server 2.0 comes out of beta, we'll mean it. The developer previews we've been releasing are as solid as some open source projects ever get, and yet we are still putting a full QA team on torturing it before we'll call it ready.
> The new Couchbase Server is based on Membase (it adds JSON documents and incremental Map Reduce).
That is good news. I will look into it some more.
What would really be helpful is to have a bullet point comparison of features between Apache CouchDB and Couchbase as well as between Couchbase and Riak ,Redis, MongoDB, Membase. Basically an updated and extended :
As I understand your post, CouchDB and CouchBase are both NoSQL document stores, but CouchBase is designed to be more scalable with a better defined support. I think this is fantastic and I look forward to take a ride with CouchBase.
A few questions:
If you are using the best parts of CouchDB, then how is this not a Fork? Will you use any of the code?
Similar query about Memcached / Memebase: What is going on with that code base? How have you merged the two with regards to functionality?
IMHO In order to succeed, you have to provide a bridge from CouchDB to CouchBase, but you said there is no upgrade path. Can you elaborate?
The ideas in CouchDB were really great, but I was very disappointed with what was termed a 1.0 release of CouchDB, as it did not feel ready for prime time, and it ended up torpedoing a project I was working on. I'm also disappointed that it did not quickly improve, and that it appears to be abandoned by it's creator. I am therefore going to avoid Couchbase or anything by it's creator, as I do not trust that it's a foundation that I can build upon.
That sucks and I'm sorry. I made lots of mistakes, the biggest was trying to run a business instead of the technology. I'm again back where I should be. Whether you choose our technology or not, I wish you success.
I'm being too petulant. If couchbase turns out very solid with a strong and lively community, or if CouchDB remains solid for a lengthy period of time, of course I'd return, but I'll just wait on the sidelines for others to work out the issues. I hope it turns out great.
I don't think that's what Damien is saying. I think he's saying that, in the process of building couch.io, he should have worn the "technology and product leader" hat instead of the "business leader" hat.
In your defense Damien, sometimes there's simply nobody else to do the job and as a founder it's our responsibility to pick up the slack until the right person is hired.
As a guy who was lurking around for basically whole duration of CouchDB development. I would argue that Damien did not fall prey to the second system syndrome. He's just too much of a pragmatic for something like that. The soundness of Damiens engineering views and approaches is hugely similar to those of Linus Torvalds.
The truth in my eyes is that CouchDB is misunderstood, the same way Lotus Domino is misunderstood. And if majority of users misunderstand what it is good about then it is not going to be used in an optimal fashion.
What I guess that Damien is going to do is build a database that does shit people expected CouchDB to do. And I believe that Damien is a hedge that ensures that goods are going to be delivered.
Would you mind to explain why Lotus Domino is misunderstood? This is genuine question by the way, not the kind that ask for debate/arguments. I have friends who do Lotus Domino apps and since I have no idea much about the platform, perhaps you can share story about your statement.
There are a few key points I could analyze, but I will just briefly touch on the CAP theorem. CAP Theorem states that every database engine can satisfy two properties from a pool of three (Consistency, Availability, Partition tolerance). CA type databases are the vast majority of datastores in use out there (everything SQL). While Domino and CouchDB (CouchDB is sort of Free Software Domino) are AP type databases.
What does that mean? Well first it means that a lot of the design patterns commonly used in CA CRUD apps goes out the door and requires a different approach.
Let me list some:
1. No JOINs, you can either store the referenced data directly in the referee object data OR you can save a key reference and do another query to get related data. It may not seem a lot, but once datasets start growing, this can be quite a pain.
2. No Ad-Hoc querying. There is no concept of "let me open psql and prod the data a bit", not in a production scale database at least. This becomes a contention when customers want a way or a tool that enables them to create arbitrary reports. Usually this can be worked out with a bit of patience and foresight (let me build you another view), but humans suck at that kind of behavior.
3. They are not (really) scalable. Replication in Couch and Domino is not really intended as performance measure. It is more a failover and data portability measure. Also in this kind of distributed database, you need to pay attention to sort of "write jurisdiction". Couch touts "advanced conflict resolution mechanisms", which is true until the same field is modified in two different replicas, it is impossible to merge this kind of conflict without loosing data and human must be used to decide what gets to stay. The issue with this is that people seeing "replication conflict", don't think "Oh we had a race condition, let me resolve this" - they think "Oh its the stupid, crappy, goddamn database again, we really should move to SQL".
4. No schema. Document is a bucket, you throw in whatever you please. This may be a good thing or a bad thing. Depends. But if you are too enthusiastic about it, you might wake up to quite a head ache one morning in a couple of years.
5.(Couch specific) Map/Reduce: Perhaps originally it was intended to become a cluster level querying mechanism, but the truth is that albeit it being a very smart indexing mechanism, people associated it with Google's MapReduce (which was a big, big buzzword back in 2006), which led to a lot of disappointment on the users side.
6.(Domino specific) IBM doesn't really know what to do with Domino platform. They tried to kill it, but failed at it. Then forgot to market it. Then remembered to market it, but forgot to develop it. Then remembered to develop it, but failed at it. Then bolted some Java Abomination on top of it. Its the kind of MBA stuff nobody really understands. Why is this important? Because Domino is pretty good application platform for developing "access" type applications for businesses and IBM hates that part, because this kind of stuff is IBM GCS turf and why would they let you develop internal app for 1000$, when they can rob you of 50$K?
The main point is that with Couch/Domino you can do pretty much everything you can do with relational data stores, but it will look different. It will feel different and it means some compromises you might not expect (or are at least not used to).
Honestly all things considered, after years of experience there are indeed very few problems that call exactly for a Couch/Domino type database. However I certainly see how a DB of this kind should find its place in each and every major information system.
This post is pretty vague. Having heard of Couch a few times over the years but not used it, it would been more helpful if he said something like: "I started a company and forked the open source CouchDB project that I founded. The company and new commercial product is CouchBase and it will be better for these reasons..." And I would be curious about some examples of where the open governance limited the CouchDB project as he's implying.
Those seem to be orthogonal things -- he could have just created a non-Apache open source project rather than making it commercial.
He is beating around the bush so much and using such vague wording that I wonder if he is hiding something, or just ashamed that he's cashing in on his creation. There's no shame in making money and no shame in making a commercial fork of your own project. But the CouchBase website looks awfully "enterprisey" now and I think there is some shame in that...
I don't use couchdb either but I do follow the nosql space a little and what I read was that he's creating an all-new project.
Couchdb is an Apache foundation-led project and will continue on its own path. Couchbase is an all-new project that will solve some of the same use-cases but be better at scaling. It's not a fork at all.
This is really bad news. Not exactly out of the blue, but there's no ambiguity now.
There's such a wide functional gap between CouchDB and CouchBase that it feels like the heart has been ripped out and placed into an entirely different beast. Of course the Apache project is still there, but I have grave doubt over whether it will continue to be actively developed.
To ease anxieties, it would be great to see a roadmap or some statement of commitment from those remaining in the CouchDB community. Including Iris and Cloudant.
I've spent the past year building an app on CouchDB and I've really enjoyed using it. The couch.io to CouchBase transition was poorly communicated, so I'm glad Damien's made such a clear statement of intent. Clearly I'm going to have to stick with Apache CouchDB for now. I was interested in moving to CouchBase, but that's surprisingly difficult:
1. There's no easy transition for my data. I thought I could just install CouchBase and replicate from my existing CouchDB -- nope, can't be done. Huh? But why...
2. It's partly because CouchBase drops the CouchDB REST API. Which also means, none of my existing code works with it. So I guess it's no big deal that my data won't move over, because my app won't be able to retrieve it anyway.
Because there's no easy transition, they've created a situation where anyone considering a move to CouchBase is just as likely to re-evaluate all of the other document (or k/v) stores.
"We are moving more and more of the core database in C/C++, while still using many of the concurrency and reliability design principles we've proven with the Erlang codebase. And Erlang is still going to be part of the product as well, particularly with cluster management, but most of the performance sensitive portions will be moving to over C code. Erlang is still a great language, but when you need top performance and low level control, C is hard to beat."
This is interesting. If I remember correctly, CouchDB was first written in C++ and then moved to Erlang. Now the project has come full circle (which is fine of course).
I agree with much of what you've said there. You clearly understand enough that I don't need to go into detail describing why I agree. :) But this isn't quite right:
> There is a reason antirez choose Lua and not JS for a scripting language.
To be fair, I embedded lua in ep-engine well over a year ago, but haven't ever released it. As it turns out, v8 is performing very well lately (certainly faster than plain lua) and people get it more easily. I've done some fun things with lua and it's really fun and easy to embed, but whatever gets shipped has to be supported and lua has some pretty dark areas.
"it's fine (not necessarily bad) to come full circle in general, not that it will work out in this particular case."
exactly. I was not saying this step is specifically a bad (or good!) one. I didn't want the thread to devolve into a "which is better Erlang or C++?" flame war. I am sure the devs on the project have good reasons for the change. I just thought it was interesting that the project came full circle.
Yes. Erlang is great for proving ideas and making something work reliably. I have mad respect for Erlang, it's the kind of language where one guy can code a whole a distributed database. But when you need high performance and low level control, and you have the resources and team to make it happen, you really can't beat C. I'll write more about that soon.
Oh, I didn't realise much of Riak was in C, and the website now mentions Erlang a whole lot less than I remember. Still, in the technology stack http://basho.com/technology/technology-stack/ Riak Core seems to be all Erlang. Bitcask is possibly C--is that what you're referring to?
Your language and line of questioning leads me to assume that you're perhaps not that familiar with Erlang.
Erlang has facilities for running native code within the VM (NIFs and linked-in drivers) and for interacting with non-Erlang processes in an Erlang like way (ports and c-nodes). Extending the VM via these mechanisms is not the default path but it's also not unusual - much of the standard lib is implemented via these mechanisms after all.
To answer your question, bitcask, ebloom, eleveldb, erlang_js and skerl all have c_src directories which suggests they're likely either all or in part implemented via NIFs or linked-in drivers.
It would have been a bold move, if Damien had left the "Couch" name with Apache CouchDB, and released his CouchBase product under another name. Also, this would have liberated him from having to distance himself from CouchDB, Erlang and Apache, when promoting his new product.
I agree with what I think that you and other people here are saying: the Apache CouchDB project will continue to be supported by a good community so there is very little technical risk for using CouchDB. BTW, I think that datastore as a service companies like Cloudant, MongoHQ (and many other good companies) are a great convenience, but for self hosting, I wonder why anyone really needs support for CouchDB, MongoDB, etc. unless they have very large deployments.
I think it's a little unfair to lump Cloudant in with MongoHQ and label them a "datastore as a service" company. You should check out what they've done with BigCouch and their annoucement today about how they will be committing the BigCouch changes back into CouchDB proper.
I looked at couchbase and it seems somewhat user-friendly. The way to preview your created views by using only a subset of data is nice. Another nice thing is the split into ?16? different databases that allow compaction to occur on a smaller level rather than having to compact the whole 60 gb file at once
The downside for me so far:
- I couldn't find any way to import my current couchbase single server data over to couchbase.
- The old couchdb webinterface (futon) made browsing through data easy, the couchbase interface seems to make this a bit more complicated. (Maybe I didn't look in the right places?)
- I couldn't figure out if I can still hook up the _changes feed to elasticsearch
We have something higher-performance than _changes and have had elasticsearch support as a third-party plugin for a while. We don't have firm plans to bring it in first-class, but we do think it would be beneficial to many of our users.
Shame. I really like CouchApp. It felt like CouchDB was one rails moment away from being a whole new way to develop web apps, where the DB, the client, and the application were all part of the same glorious union rather than being a bunch of ugly parts bolted together.
The sources all seem to be sitting in https://github.com/membase/ -- they are componentized, so may be... interesting to work out how to get a running server, but it seems to be there :-) ep_engine looks like the heart of it.
"Almost all of the HTTP REST API that makes up the interface for communicating with CouchDB does not exist within Couchbase Server. The basic document operations for creating, retrieving, updating and deleting information are entirely supported by the memcached protocol."
Querying views and map/reduce functions in Couchbase server will be similar to CouchDB except the output will be slightly different.
I found it surprising that Damien comes off so completely unapologetic about his decision to abandon Apache and the CouchDB community. As a long time CouchDB user and open source believer the whole tone of his post leaves a dirty taste in my mouth.
"And I'm dead serious about making it the easiest, fastest and most reliable NoSQL database. Easy for developers to use, easy to deploy, reliable on single machines or large clusters, and fast as hell. We are building something you can put your mission critical, customer facing business data on, and not feel like you're running a dirty hack."
That sounds like something I very much want, I hope Damien and the team can deliver.
A big part of this will be getting solid, up to date documentation in place. Even though CouchDB is fantastic to work with, when the official docs are out of line with how it actually works, you immediately feel like you're walking on eggshells.
That said, I really do like the idea of Everything Restful, and wish them the best!
A shame.. Is this a case of what's better for the dev team outweighing what makes it fun to use as a user? Sorry if that seems harsh, but (personally) this plus the simplicity of setting up map/reduce functions and seeing the results directly in futon are what made CouchDB stand out...
Yeah I agree. I think they are chasing VC money. Riak is successful so they want to follow suit. But the problem is they are trying to do what Riak does and Riak does what it does best. A second project with largely the same feature is going to have a very hard time displacing a stable product.
I think this is an excellent move that fits with the current times. If I'm going to use a DB for production use, I want it to be well supported, well documented, easy to use, etcetera. Very few consensus-based projects I know reach all those goals. CouchDB sure hasn't entirely. If this move is going to improve on that, I may very well end up a Couchbase customer.