There is also an ethical problem IMHO. For instance is Redis mine? I guess it is not: VMware is funding the development so I can write code thanks to Vmware. Pieter is contributing a lot of code. The community is helping a lot the project: with talks spreading the word, helping on the mailing list, helping to fix bugs, and so forth.
My guess is that the developers really conceptually "own" a very small piece of the pie, and when they create business around an open source software, they should take this in mind otherwise it is simple to involuntarily use work/efforts that other people did in the past and turn it into your business.
It's not easy however... as: the end users should be happy and well served, and the software should not just be "open source" but an open process, with an open community, and so forth. At the same time the developers should pay their bills without issues and earn enough to avoid being tempted to joined company XYZ instead of working to their project. It's not trivial to have all this together I guess, and I feel very lucky that there is VMware making this simple for me, but not all the open source developers are equally fortunate so I guess it is crucial that the open source community keeps working on different ideas to find viable solutions.
On a side note: who owns the Redis brand (and logo)?
I'm not so sure the business model of LuaJIT can easily be applied to other open source projects. I mean, I've been running a business for more than 15 years. I basically downscaled consulting as I've acquired paid open source work. It pays less, but I'm happier now.
I think you'd have a hard time doing this from zero, without some backup plan or enough funds of your own. Well, have a look at the price point of scripting languages, toolchains and such. Getting a vague business plan for an open source project in the toolchain/middleware business through a VC round? I don't think so.
Business knowledge was essential, too. Negotiating open source sponsorship contracts with corporate lawyers for half a year is not that easy. Some companies were extremely easy to work with, but I guess sponsoring open source projects is simply not an established business procedure at most companies. I don't think you could possibly afford to hire an international IP lawyer to do this for you.
Actually it was the economic downturn of 2008 which triggered all of this. The (rarely followed) advice is to invest in a downturn, because you'll be the first one to come out ahead in the following upturn. Ok, so I invested my time and knowledge into a new project (#). It worked out for me, but in retrospect I realize it was a high risk investment.
Final words: you think we're seeing some kind of downturn again? Carefully assess the risks and the potential for yourself. But don't be shy if you're onto something good. Innovate, don't replicate.
(#) LuaJIT 2.x is a complete rewrite, using the experience gained working on LuaJIT 1.x, but very little of its code. It's been on my backburner since 2007 and the first public release of LuaJIT 2.0 was at the end of 2009. One might say it's about a two man-year investment before it started to pay back. Most of the time was spent on doing research and prototyping rather than writing code. I coded and then threw away almost three complete virtual machines in the process, because these approaches turned out to be dead ends. I bet you'd need a lot of courage to explain this to a VC. ;-)
When it got down to time to pay for support, they told me (this is 2 months ago) in a rare and unusual bit of candor, that they were going to drop Couch in less than six months, so did I want to buy commercial support for just six months?
I told them not only do I not want commercial support, but I just got so freaked out I would not recommend couch for future projects to clients, because it was obvious that internally the team had moved on.
They asked me not to tell anyone so I didn't, but now that this is out there I can say what I deducted from our discussions: Couch doesn't make any money; MemBase does. Period.
Kevin Smith at Opscode said they're moving away from Couch (and to MySQL) as Couch just doesn't scale. No finely grained reader/writer locks, one reader/writer thread/db, huge and random delays due to checkpointing that can make the server inoperable, difficulty ever finishing a view checkpoint under load, etc. I think he's right - it's been abandoned as a platform.
It's absolutely their prerogative to move on and it seems like the right decision, but it's not a technical decision. The reality is that CouchDB is largely a product running small, toy apps, written by people who won't PAY anything for support. MemBase is being used by big companies with a lot of money to spend on commercial support and enterprise features.
Actually there is still one case where I recommend Couch and it's when you need the mobile sync features. I doubt that they'll make those a priority anytime soon on the MemBase product (at least I haven't seen or heard anything).
Apache CouchDB's data model is sublime for the domain I often work in, and I think the same goes for a lot of people. It's not just mobile; personally, I don't do anything mobile- or sync-related.
I have no plans on moving on. Apache CouchDB is as active and healthy a project as ever AFAICT, and having hosting providers like Cloudant makes the entire model all that more attractive.
If it's all about business, then slagging on the project that gave your company its name is perhaps not the greatest approach. Moving on is fine, a fact of life. Making things more difficult for those you leave behind isn't cool, especially when your moving on happened a long time ago.
 8/24/2010, according to https://github.com/apache/couchdb
If they want to win the mobile market though, they will have to re-implement the DB in C. Erlang was a good choice on the server but in the current mobile ecosystem it's clearly a drawback.
iOS TouchDB: https://github.com/couchbaselabs/TouchDB-iOS
We are also exploring TouchDB on Android.
If you want to join the community and help us build these and other wonders, we do it in this group: https://groups.google.com/forum/#!forum/mobile-couchbase
Also, i really like Couch... so don't take what I am saying here or above as an attack.
I got a lot of disagreements and nasty responses back, suggesting that it would not happen and CouchDB is doing awesome, and how this basically doesn't affect CouchDB at all. However here is its creator, urging everyone in so many words to drop CouchDB and switch to Couchbase.
It would have been really exciting for example to add:
* Support for msgpack or protobuf protocol to insert docs
* Rewrite core components in C, but keep the external API the same.
* Websocket changes feed
* An option for an in-memory only db version
* master-to-master replication
* _changes feed
* clean RESTful API
So I guess my question is what would make Couchbase Stand stand out and want someone move to it compared to the existing key-value dbs? I think it is imperative for Couchbase Server team to make that stand out. A short bullet point list will do.
Another issue I see is, ok, let's say you I see Couchbase Server as an evolution of CouchDB, is there a way to replicate from one to another, is there way to smooth the transition (install both products have a replication set up and slowly move code to use Couchbase Server).
In the end I understand that developers have to eat too and that projects and people have to move on. It is a Darwinian competition and sometimes projects just lose, sometimes it is luck or timing, sometime they are misunderstood, and it is just marketing.
rewriting (not core, but slow) components in C is always the right way to do this sort of thing. We've been in the process of building a NIF for document updates for a while now (TBH, I don't know where the source is off the top of my head, but we don't keep stuff closed). We've got massive performance gains in pure erlang, but we expect to cut CPU consumption down more with this new code. It's intended to be available to Apache CouchDB if they want it.
I like CouchDB a lot myself, and use it for lots of projects (including critical parts of couchbase, inc.). If we can't solve the same problems with the new product, we'll know it because we use it ourselves.
Can I ask why?
So parts of Couchdbase will be written in C/C++ for performance. So my point was why not try to improve the performance of CouchDB by writing performance critical parts in C, keep the existing API, possibly add a new telnet like interface with protobufs instead of completely moving to something else?
It is like they threw away the best parts of CouchDB and started to compete with Riak and Redis. That is why I really want to know the list of features that will make Couchbase Server better than those two products. Since those 2 are stable and Couchbase Server is still in beta, I know which one I am not choosing for a key-value store.
So when Couchbase Server 2.0 comes out of beta, we'll mean it. The developer previews we've been releasing are as solid as some open source projects ever get, and yet we are still putting a full QA team on torturing it before we'll call it ready.
That is good news. I will look into it some more.
What would really be helpful is to have a bullet point comparison of features between Apache CouchDB and Couchbase as well as between Couchbase and Riak ,Redis, MongoDB, Membase. Basically an updated and extended :
So someone can basically figure out, "Hh look JSON documents, map-reduce and the speed of key-value storage, I could use this".
A few questions:
If you are using the best parts of CouchDB, then how is this not a Fork? Will you use any of the code?
Similar query about Memcached / Memebase: What is going on with that code base? How have you merged the two with regards to functionality?
IMHO In order to succeed, you have to provide a bridge from CouchDB to CouchBase, but you said there is no upgrade path. Can you elaborate?
In your defense Damien, sometimes there's simply nobody else to do the job and as a founder it's our responsibility to pick up the slack until the right person is hired.
> now, as it turns out, I have a chance to do it all again
> throwing out what didn't work, and strengthening what does
> not feel like you're running a dirty hack
Second-system syndrome  ahoy!
 See http://www.the-wabe.com/notebook/second-system.html, http://c2.com/cgi/wiki?SecondSystemEffect, http://en.wikipedia.org/wiki/Second-system_effect, http://www.joelonsoftware.com/articles/fog0000000069.html, et al
The truth in my eyes is that CouchDB is misunderstood, the same way Lotus Domino is misunderstood. And if majority of users misunderstand what it is good about then it is not going to be used in an optimal fashion.
What I guess that Damien is going to do is build a database that does shit people expected CouchDB to do. And I believe that Damien is a hedge that ensures that goods are going to be delivered.
There are a few key points I could analyze, but I will just briefly touch on the CAP theorem. CAP Theorem states that every database engine can satisfy two properties from a pool of three (Consistency, Availability, Partition tolerance). CA type databases are the vast majority of datastores in use out there (everything SQL). While Domino and CouchDB (CouchDB is sort of Free Software Domino) are AP type databases.
What does that mean? Well first it means that a lot of the design patterns commonly used in CA CRUD apps goes out the door and requires a different approach.
Let me list some:
1. No JOINs, you can either store the referenced data directly in the referee object data OR you can save a key reference and do another query to get related data. It may not seem a lot, but once datasets start growing, this can be quite a pain.
2. No Ad-Hoc querying. There is no concept of "let me open psql and prod the data a bit", not in a production scale database at least. This becomes a contention when customers want a way or a tool that enables them to create arbitrary reports. Usually this can be worked out with a bit of patience and foresight (let me build you another view), but humans suck at that kind of behavior.
3. They are not (really) scalable. Replication in Couch and Domino is not really intended as performance measure. It is more a failover and data portability measure. Also in this kind of distributed database, you need to pay attention to sort of "write jurisdiction". Couch touts "advanced conflict resolution mechanisms", which is true until the same field is modified in two different replicas, it is impossible to merge this kind of conflict without loosing data and human must be used to decide what gets to stay. The issue with this is that people seeing "replication conflict", don't think "Oh we had a race condition, let me resolve this" - they think "Oh its the stupid, crappy, goddamn database again, we really should move to SQL".
4. No schema. Document is a bucket, you throw in whatever you please. This may be a good thing or a bad thing. Depends. But if you are too enthusiastic about it, you might wake up to quite a head ache one morning in a couple of years.
5.(Couch specific) Map/Reduce: Perhaps originally it was intended to become a cluster level querying mechanism, but the truth is that albeit it being a very smart indexing mechanism, people associated it with Google's MapReduce (which was a big, big buzzword back in 2006), which led to a lot of disappointment on the users side.
6.(Domino specific) IBM doesn't really know what to do with Domino platform. They tried to kill it, but failed at it. Then forgot to market it. Then remembered to market it, but forgot to develop it. Then remembered to develop it, but failed at it. Then bolted some Java Abomination on top of it. Its the kind of MBA stuff nobody really understands. Why is this important? Because Domino is pretty good application platform for developing "access" type applications for businesses and IBM hates that part, because this kind of stuff is IBM GCS turf and why would they let you develop internal app for 1000$, when they can rob you of 50$K?
The main point is that with Couch/Domino you can do pretty much everything you can do with relational data stores, but it will look different. It will feel different and it means some compromises you might not expect (or are at least not used to).
Honestly all things considered, after years of experience there are indeed very few problems that call exactly for a Couch/Domino type database. However I certainly see how a DB of this kind should find its place in each and every major information system.
Those seem to be orthogonal things -- he could have just created a non-Apache open source project rather than making it commercial.
He is beating around the bush so much and using such vague wording that I wonder if he is hiding something, or just ashamed that he's cashing in on his creation. There's no shame in making money and no shame in making a commercial fork of your own project. But the CouchBase website looks awfully "enterprisey" now and I think there is some shame in that...
I don't use couchdb either but I do follow the nosql space a little and what I read was that he's creating an all-new project.
Couchdb is an Apache foundation-led project and will continue on its own path. Couchbase is an all-new project that will solve some of the same use-cases but be better at scaling. It's not a fork at all.
(*couch* puppetlabs.com *couch* ;)
There's such a wide functional gap between CouchDB and CouchBase that it feels like the heart has been ripped out and placed into an entirely different beast. Of course the Apache project is still there, but I have grave doubt over whether it will continue to be actively developed.
To ease anxieties, it would be great to see a roadmap or some statement of commitment from those remaining in the CouchDB community. Including Iris and Cloudant.
1. There's no easy transition for my data. I thought I could just install CouchBase and replicate from my existing CouchDB -- nope, can't be done. Huh? But why...
2. It's partly because CouchBase drops the CouchDB REST API. Which also means, none of my existing code works with it. So I guess it's no big deal that my data won't move over, because my app won't be able to retrieve it anyway.
Because there's no easy transition, they've created a situation where anyone considering a move to CouchBase is just as likely to re-evaluate all of the other document (or k/v) stores.
This is interesting. If I remember correctly, CouchDB was first written in C++ and then moved to Erlang. Now the project has come full circle (which is fine of course).
Moving to UnQL and Memcached protocol may solve some of the performance problems.
There is a reason antirez choose Lua and not JS for a scripting language.
> There is a reason antirez choose Lua and not JS for a scripting language.
To be fair, I embedded lua in ep-engine well over a year ago, but haven't ever released it. As it turns out, v8 is performing very well lately (certainly faster than plain lua) and people get it more easily. I've done some fun things with lua and it's really fun and easy to embed, but whatever gets shipped has to be supported and lua has some pretty dark areas.
Why would it be fine?
exactly. I was not saying this step is specifically a bad (or good!) one. I didn't want the thread to devolve into a "which is better Erlang or C++?" flame war. I am sure the devs on the project have good reasons for the change. I just thought it was interesting that the project came full circle.
Erlang has facilities for running native code within the VM (NIFs and linked-in drivers) and for interacting with non-Erlang processes in an Erlang like way (ports and c-nodes). Extending the VM via these mechanisms is not the default path but it's also not unusual - much of the standard lib is implemented via these mechanisms after all.
To answer your question, bitcask, ebloom, eleveldb, erlang_js and skerl all have c_src directories which suggests they're likely either all or in part implemented via NIFs or linked-in drivers.
EDIT: You might also find http://vimeo.com/17078993 interesting viewing.
It's better to promote Couchbase server for it's own merits rather than promoting it as the future of CouchDB.
CouchDB is dead, long live Apache CouchDB.
The downside for me so far:
- I couldn't find any way to import my current couchbase single server data over to couchbase.
- The old couchdb webinterface (futon) made browsing through data easy, the couchbase interface seems to make this a bit more complicated. (Maybe I didn't look in the right places?)
- I couldn't figure out if I can still hook up the _changes feed to elasticsearch
I don't see any external commits though
Packaging isn't quite as awesome as I'd like, but all the parts are definitely there.
A tangential question:
Can i still replicate a couchdb database with a couchbase server. Is this documented somewhere?
It's written here
"Almost all of the HTTP REST API that makes up the interface for communicating with CouchDB does not exist within Couchbase Server. The basic document operations for creating, retrieving, updating and deleting information are entirely supported by the memcached protocol."
Querying views and map/reduce functions in Couchbase server will be similar to CouchDB except the output will be slightly different.
Personally, I've never used fetch-manifest.rb (even on my mac). Can you tell me what you did and how it didn't work?
cd couchbase; ruby fetch-manifest.rb branch-1.8.xml; brew install libevent; make
That sounds like something I very much want, I hope Damien and the team can deliver.
(Which is arguably worse, IMNSHO) :)
That said, I really do like the idea of Everything Restful, and wish them the best!
Clue: Dealmaking is not codemaking.
From my understanding, these are not part of the CouchBase server, right?
What are the future plans for these features?
Telling from the 2 CouchConfs I've been to, Couchbase is focused on huge one-db setups, distributed and fast ... but what about the "database per user" setup? Is this going to die?