So I feel Slava's pain (and feel somewhat culpable as a user of RethinkDB), but I don't entirely agree with the analysis. Yes, infrastructure is a brutal space because it has been so thoroughly disrupted by open source -- but the dominance of AWS shows that open source can absolutely be monetized (just that it must be monetized as a service). Indeed, right now, I would love to deploy a RethinkDB-based service -- and if the company still existed, we (and by "we" I mean "we Samsung", not "we Joyent") would potentially be in a serious business conversation about what a supported, large-scale deployment would look like. Actually, that's not entirely true, and it leads to me to one thing that Slava didn't mention. While it kills me to bring it up, it does represent the single greatest impediment I have personally found to deep, mainstream, enterprise adoption of RethinkDB: its use of the AGPL. Anyone who has been part of a serious due diligence knows that the GPL alone is toxic to much of the enterprise world -- and the GPL is a buttoned-down corporate stooge compared to the virulent, ill-defined, never-tested AGPL. The AGPL is a complete non-starter for any purpose, and while I appreciate why the AGPL was selected by RethinkDB (and very much appreciate the explicit riders that RethinkDB put on it to make clear what it did and didn't apply to), the reality is that no one -- absolutely no one -- has built a business on AGPL software and it seems vanishingly unlikely that anyone ever will.
I still use RethinkDB and still intend to -- and I still harbor hope (crazily?) that whatever entity that owns the IP will see fit to relicense it to something that is palatable to the kind of people who care about their data. I actually believe that a business can be built on the technology -- which, to Slava's points, I have found to be delightfully sound in both design and implementation. And that RethinkDB is open source means -- for us, anyway -- it will continue to live on, despite the unfortunate demise of its corporate vessel (AGPL or no). That said, I would love to have a lot more company -- here's hoping that we see RethinkDB relicensed and a thriving business behind it!
I strongly disagree with that. Take RavenDB, which is very much comparable to RethinkDB in terms of a business is dual licensed as AGPL and has a commercial license available . This forces any business doing non-OSI approved OSS work to purchase a commercial license. I don't know too much about RethinkDB but I think they must have used a very similar licensing strategy?
From an outsiders' perspective, RavenDB appears to be going strong and doing well commercially. They are hiring and expanding and have invested a lot into v.4.0, which I'm really excited about.
Interestingly, on the spectrum between MongoDB and RethinkDB in terms of "Wrong metrics of goodness" (Correctness, Simplicity of the Interface) they appear to me much closer to RethinkDB than MongoDB. In terms of stability, they may have been somewhat in the middle (e.g. not as bad as MongoDB, but probably not as good as RethinkDB). And they have moved very fast to add features and fix bugs in the product. It's a really nice DB to work with as a dev, I suppose the #1 reason they are not as successful as Mongo is that they're coming from the .NET niche...
MongoDB server is AGPL - client drivers are Apache. To be fair the business side is based around larger scale deployment and management, and is proprietary add-ons.
I'm kind of surprised that approach wasn't mentioned for Rethinkdb - a free open core product, with paid extras dealing with pain points in larger deployments.
Nothing new, just a strange place to be in.
source : https://docs.google.com/document/d/1f8qODp7voKIqwioQ3si69q3-...
Now you get to innovate on top of these modules using reactive APIs, user space TCP, Fancy serialization, Standards compliant SQL etc.
Some of the current incumbents were written in the 90s when C++ was a different programming language, which hurts innovation.
Currently the pre-packaged databases on Joyent don't support both clustering and consistency. Is there anything in the works for this segment?!
This world is always willing to wait, it was the RethinkDB team and/or their investors who weren't. The world will wait for you in eternity; it's just not willing to provide you with capital during the wait.
Your statement makes sense if there were a dozen competitors in the database space. With hundreds of competitors, actually evaluating them all becomes more expensive than just choosing eg. a managed SQL solution and then switching provider if it doesn't live up to its promises.
That's so much more flexible from a business perspective than evaluating a hundred different open source database systems, settling on one, and then having to rewrite your database code when the database you've chosen ceases development, and you're left as the maintainer of it.
I can only imagine what the team at Realm is thinking after going through comments and the post-mortem. Realm was an open-source mobile first database for the past 3 years. It's free. Its the most correct, consistent, and simple mobile database that is running on millions of devices (Yup fine, kind of my opinion but trust me mobile community will back this up). They ship FAST, they ship real USECASES. They just recently entered the cloud space with their Mobile Database Platform, and they are even experimenting with new things like a version built for Node.js.
I'm honestly not trying to make a specific point, but asking more questions... Is mobile different? Is it easier? Are mobile developers more willing to try new technology? Are they more willing to pay for technology? (Realm is free, haven't paid them a penny, so not sure about this one...)
My main point I guess is that it's not a terrible market, Realm made it work with VERY similar ideals and goals. So what else is going on here?
Yes, mobile in this context is very different. Or rather, local is very different. Building a database that resides on a local device and is primarily accessed by a single user is an entirely different class of problem than one that is on a server. Usually most of the hard problems in building a database system involves dealing with multiple people accessing or modifying data at the same time. When you have a single user, that problem either disappears or is massively reduced.
As you said, Realm does appear to have a server offering now, but from the sounds of it, it hasn't really been proven yet, so I don't think it's as easy as saying they've succeeded where Rethink failed.
He details why they put marketing and shipping first with constant improvements on correctness or security. On top of this, I advise just embedding the benefits into the product & designing it for easy change so components can be replaced once big revenue comes in. As development pace slows, QA pace can increase on components stabilizing.
They've raised a roughly similar amount of VC money as RethinkDB did. Their last raise was a few years ago, before Parse's shutdown, so I suspect that investors are more antsy about this space today.
The fact that Realm recently entered the cloud space sounds similar to RethinkDB's trajectory too... Of course I hope Realm makes it, it's a great product. But I'm worried that they're trapped in the VC-funded model like RethinkDB was.
It's easier to get adoption for mobile, I'd think.
Database component designed for use by mobile applications: something you base an app around
Database server: something you base your business around
Furthermore, mobile is becoming in-demand or even core in both consumer and enterprise. For enterprise especially, though, you have to think of it as mobile + sync, not on-device only.
Turns out sync is the really hard part when you want to build something that's robust in the face of poor network connectivity.
Developers have a big advantage when making devtools in that they are their own target market. They are, but only at the start of the project before they become embroiled in it. Ironically this feeling of making it for oneself leads to a mistaken urge to thrust complexity onto the target. As an infra engineer myself, this is something I always try to keep in mind -- I worked on a modelling language at a large company and I credit its success to its absolutely straightforward and accessible syntax, lack of dependencies and straightforward integration into workflows. The temptation is always to make sure users are aware of the choices, force them out of bad habits and make them consider the tradeoffs (this Strange Loop talk from Peter Alvaro addresses the social treatment of users https://www.youtube.com/watch?v=R2Aa4PivG0g ). For RethinkDB, it was wrong to try to force people to learn a new language to use a DB, when there are just tons of interfaces already, and the new language has incremental benefits over say SQL.
However elegant the design of software, the deployment that is the largest or most demanding for it will always find problems. There's no point seriously trying to claim great reliability or quality for a nascent project. It's unbelievable. Instead you need to focus on getting it used in some friendly project that is willing to pay the cost of these teething problems for some particular feature. Once battle tested, reasonable tech leads can choose to use it, comfortable that they won't be the biggest deployment (e.g. git bootstrapped by its use in the Linux kernel). Alternatively, and less scrupulously, you can try to bootstrap by providing a very convenient experience at the start of a simple project for junior engineers - there are many NoSQL DBs that promise the world and are incredibly easy to set up (see the https://aphyr.com/tags/jepsen Jepsen blog for many sad examples, RethinkDB proudly did well in this test).
RethinkDB refused to make the market fit compromises necessary. Market fit is about timing. Don't read the Economist. Ship early and keep your users on board through whatever engineering compromise necessary.
With RQL you could very clearly determine how rethinkdb was fetching your data, and you knew it was going to get your data in the same way every time the query ran.
FYI, the post is still in its unpublished state. It's basically right, but I've been meaning to edit it for tone and rewrite the market failure section for clarity. Didn't get the chance to do that before it made it on HN, so keep in mind that you're reading a (late) draft.
One thing resonated strongly with me. When we started out many years ago someone said to me that the worst market you could go after was to target developers. Case in point - every developer who's ever put up a web site has built or thought about building a CMS. There must be thousands upon thousands of them, virtually all with a price of $0. Who would want to sell a CMS product?
We heeded that advice and I often reflect on it during the sales process when we have the "lets just run your offering past the IT team" meeting. So often, the line of questioning from the smartypants dudes in the room is all about tripping us up, finding our weak spot (I'm a techy so I get it - I think like this myself).
The best you can normally come out of that meeting with is grudging acceptance from IT that a SaaS solution should be OK, given that the business is in a rush and all, and given that IT has other priorities. (To be fair its much less like this these days, but a few years ago this was the norm).
By contrast, showing your product to a business person is a breath of fresh air. They don't want to build it themselves, they don't really care what's under the bonnet, they just want good, quick solutions to their actual problems, so they can go home at night and sit down to Netflix. HR people (our target market) are even nicer.
What's more those business people want to pay for your solution. A quick way to kill the deal is to joyously tell them "its free!". They need to understand how you get your buck at the end of the day. They'll never buy until they understand that. And the best (because its the simplest) answer is "we've got the best product, so we charge good money for it".
HR people are some of the worst because everything is super secret and they don't want to get IT involved, then they go and buy a system that lets you insert a carriage return into the phone number field and produces CSV files that don't have primary keys for data extracts.
It really depends on what you need... Every non-sql database tends to sacrifice something for some performance gain... RethinkDB is as close as I've seen to one without sacrifice.
CouchDB was super slow. Cassandra was amazing but a bit confusing. MongoDB made sense and worked.
If you're open to edits, the $200k revenue/employee rule of thumb really doesn't make a lot of sense when you're comparing your three alternate business models (Hosted Rethink, DBaaS Rethink and "Hey, maybe PaaS on top of Rethink?").
Like you, I thought it would be way too hard to run a service (my own company was encouraged to do so, and we shied away for it for the same reason you stated). However, having seen it now from the inside for GCE for nearly 4 years, I can say it's much easier to "service-ize" yourself than it is to have built a product people want. And moreover, the shell scripts and operations stuff really scales (effectively, the hosted DB offering is the binary plus a bit of bash, while the other offerings require yet more "not done before" software).
Even more surprisingly, this statement:
> Managed hosting is essentially running the database for people on AWS so they don't have to. The alternative to using these services is setting up the database on AWS yourself. That's a pain, but it isn't actually that hard. So there is a very hard cap on how much managed database hosting services can charge.
turns out to just not be true in practice. People pay us (Cloud SQL) and AWS (RDS) a surprising amount of money to run MySQL in a VM for them. It turns out that people really like outsourcing even a small amount of pain, but that we're (as engineers) easily blindsided to "Um, why would I pay you XX% over the raw VM price? You're just running some shell scripts for me". Little things like automatically updating to the latest version, configuration replication, etc. are things that nobody will bat at an eye at paying you for (I believe in the DB space in particular, it's because the overall cost of your DB is dwarfed by "All the rest", so it's a good trade of people time versus $$s).
I hope this doesn't come across as Monday-morning quarterbacking, but if I did mine over again I'd take the "Build a great piece of software, and even if it's open, service-ize it for folks". It's true that you're competing with the Cloud providers, and that it seems like you wouldn't really be able to get people to pay you for it (and it's not your core competency), but I think it's the best option for this flavor of infrastructure software moving forward.
Sure, but for many venture funded companies and even large enterprises top-line is the top metric you're focused on. If you're in a growth industry it's about top line revenue, capturing market share, and cost of underlying bits will eventually come down.
> turns out to just not be true in practice. People pay us (Cloud SQL) and AWS (RDS) a surprising amount of money to run MySQL in a VM for them.
For infrastructure providers to "service-ize" you can set cost as a much smaller markup on the cost of the raw infrastructure. Most large organizations (AWS/GCE) have a model where they give some credit for the underlying infrastructure back to the team that servicized things. This is quite a bit different from a smaller startup that still needs to make some reasonable margin but has significant cost of goods of the infrastructure itself.
At the same time I fully agree with the notion that you should build and run a service, but if the marketing or even product perspective of such a thing is we installed a VM for you you're entirely missing the point. Having built an as-a-service product a few times over, there is a lot of engineering and product work that goes into it. Turn key backups, point in time recovery, high availability, performance monitoring are all not just installing a VM. And it's very different from building a core database engine. While there can be many benefits from collaborating on them, they are very different skill sets and the notion of focus shouldn't be underweighted.
So outsourcing the database is a very logical thing to do: if you can trust a third party to have a reliable backup/recovery story, that's one thing that can kill your business that you don't have to worry about.
As a happy RDS user, I totally agree. Surely, I could run MySQL and the result would be mostly fine, after some mistakes and lessons learned. But all the things such as upgrading, etc still take time, add a bit of stress, one more thing to care about. Could that time and energy be spent on more valuable (from business point of view) activities? Yes.
I have regretted this decision.
First with MySQL and then with MongoDB, a pattern is emerging. There's a popular open-source database. Its feature set is awesome and the ecosystem of software on top of it is good because lots of people are using it, but it's buggy and unreliable. So it's controversial. Since it's popular, it keeps being developed, and over time, it slowly gets more reliable.
To me this post is a reminder that, just because I can see clear flaws with a competing product, it doesn't mean those flaws are exploitable. Better to focus on what your own customers are demanding than on criticisms of your competitor.
Like MongoDB, it was popular mostly for being popular, which is a nice gig if you can get it.
I can't help but wonder if the story had been different if the RethinkDB devs organized a bunch of codecamps and positioned their db as the 'default choice'. This is how Mongo gained a ton of mindshare, helped by other trends at the time, like the NoSQL wave, the Node.js boom, and the renaissance of client-side JS development.
Or people simply don't care/are not willing to pay for alternatives. MongoDB looks as unreliable as it was on the first day, but somehow people don't seem to care. I hope it's because their use-cases can live with the unreliability, but in fear it's because "won't happen to me, just to others".
> We set out to build a good database system, but users wanted a good way to do X.
If you replace "database system" with "[any product]," it encapsulates the reason why many startups fail. I was guilty of it too. I think it's a common pitfall for technical founders who tend to think in terms of solutions rather than problems.
If you have some free time, I would recommend the book What Customers Want by Anthony Ulwick, which offers a systematic approach to formulating a product that satisfies a market's underserved needs.
There's a very closely related form that is common for non-technical founders which is "We're going to use technology to fix ____ industry"
A founder with experience in a particular industry may see how inefficient it is, and how much scope there is for improvement based on technological solutions, but business live-an-die by the question: "Are there enough people who will pay us to solve this problem for them?"
Superficially, it seems like "I can make your business run more efficiently" is a reasonable approximation for "I have something that you will pay money for", but it has a very similar value-gap to "I can provide you with a technically better database".
"More efficient" and "technically better" are both things that ought to be valuable to customers, but often aren't valuable enough to get them to hand over their money.
As an aside, lately it seems it's "OO is a deadend, Functional to the rescue!" ... so maybe you can leverage that wave instead... or deep learning... OverThinkDB maybe :)
Just don't create front end tools for front end developers, for example. Firebug had a donate button and we funneled all of it to volunteers in places where the currency was weak to make it at least seem material.
I have a rather naive question: Why do you recommend "the Economist" as oppose to something like the HBR?
The Economist offers insight, thoughtful analysis and writes about the world: literally but also politics, culture, and science. The obituary is surprisingly fascinating for instance. You may not always agree with its position, but you know that their position will be worth engaging with and thinking about.
No, I don't work for the Economist, I just appreciate thoughtful articles. Same goes for Private Eye, Foreign Affairs, New Yorker.
By the way the tone is fine given your experience.
What do you think of Spolsky's "Good software takes 10 years"?
Figuring out the right GTM is tough, and the timing was weird, but GOOG and MSFT must have been pretty desperate for a NoSQL solution 2013-2014 to compete with DynamoDB. Firebase got scooped up in October 2014, but it wasn't until April 2015 when Azure announced DocumentDB.
I understand that you're waiting for a legal framework to accept donations as an organization, but there was an article recently about an individual GIMP developer being funded on patreon. Is this something you would consider? I'd be happy to contribute to someone who's actively working on RethinkDB (of course without any backer rewards or similar expectations), and perhaps others would as well.
> we fought the losing battle of educating the market.
I've been at ten startups. At least half of them chose that battle, and that would be the less successful half. Sales cycles are fragile things. Anything that slows them down can be fatal. Clearly you need to explain how your offering is unique and special, but if you have to pause to explain basic concepts before they get it then you've probably lost their attention and their business.
The VP of marketing at one of my previous companies was generally an idiot and one of the chief drivers of that company's failure, but he introduced me to the concept of market vs. technical risk. A product can be easy to market but a challenge to build, or it can be the other way around. You can build a strategy around either. If it's hard in both ways, you're in trouble. Frankly, it sounds a bit like that's the trap RethinkDB fell into. They were doing things that were technically hard and hard to explain to users. That's how you get three years behind.
It's a sorry situation, but kudos to the RethinkDB folks for trying to do the right thing, mostly succeeding from an engineering perspective, and writing honestly about how that played out in a market of people who aren't equipped to appreciate it.
And this point of failure for many startups was beautiful represented/mocked in the silicon valley series of hbo: https://www.youtube.com/watch?v=Lrv8i2X3gnI
This bit alone provided the right keywords to help me express some of the ideas in the essays I have been writing. Thanks for posting.
RethinkDB as a product is not dead.The community and one of ex RethinkDB senior developer are still maintaining it. Since it is fully Opensourced , we are welcoming contributors , lets make RethinkDB great , together!
Here are current rehtinkdb plans :
Most of ex RethinkDB developers are going to keep contributing rethinkdb, after things got settled down a bit.
Please try out rethinkdb , and if there any questions we are activly answering at :
Please join https://rethinkdb.slack.com/messages/open-rethinkdb for OpenRethinkDB updates.
Thank you very much , rethinkdb is an excellent Database.
I see a lot of New users at RethinkDB slack.
Before RethinkDB slack was quite empty but now it have lot more active user after RethinkDB Shutdown.
@coffeemug , those who've tried rethinkdb are staying with rethinkdb. The product is not a failure and the actual users values the standards you have set!
Many of us are going to stay with RethinkDB .
Secondly, and with the risk of hijacking the OP:
I think it would be a good thing if you could put a status update either on rethinkdb.com or ~the first paragraph of the readme on GitHub about what you are trying to accomplish related to the name/rights/ownership/license/... I check those about once a month to see if something has changed.
As a current user and someone evaluating future usage, this is causing the most uncertainty. I don't know how these things happen (it is probably complex). But I want to commit. Even if progress is slow. Bugfixes are important, but secondary if the project might somehow be significantly damaged trying to "escape" its current ownership.
As an example: you are trying to transfer ownership of the build tools? I understand that might take a while, but what if that somehow can't happen? Is recreating them realistic?
Basically a list of threats/risks would be nice to have! :)
EDIT: this is also relevant for contributors/donations/... any type of investment.
I don't think Rethinkdb.com is under control of the Current Leadership Team . But it would be very good idea to put the status updates in ReadMe. I will submit a PR on it.
Development is still continuing , but slowly. It is close to release 2.4, @atnnn and @srh is working on it.
> Bugfixes are important, but secondary if the project might somehow be significantly damaged trying to "escape" its current ownership.
> As an example: you are trying to transfer ownership of the build tools? I understand that might take a while, but what if that somehow can't happen? Is recreating them realistic?
@atnnn is building and running own build tool server and its almost perfect , he will need help there:
source /etc/lsb-release && echo "deb http://download.rethinkdb.com/apt $DISTRIB_CODENAME main" | sudo tee /etc/apt/sources.list.d/rethinkdb.list
wget -qO- https://download.rethinkdb.com/apt/pubkey.gpg | sudo apt-key add -
sudo apt-get update
sudo apt-get install rethinkdb
So that's an other "take-home" from the post-mortem: be careful about your "first impression" on the market, because it can be really hard to change.
I believe the storage backend remained largely the same (ie current RethinkDB, IIRC, is still optimised for SSD). Maybe I'm mistaken.
Postgres is the default datastore (because schemas are awesome) for me, and I haven't had a use case yet where I needed something that Postgres wouldn't do. Maybe if you have a very specific need, you'd reach for another datastore (come to think of it, I have successfully used Cassandra for multi-DC deployments of a distributed read-only store), but that's not the norm for me.
The truth is that NoSQL has great promise, but the folks with actual money to spare today are the ones who want a better SQL database, because they can afford the tradeoff of development time for reliability that SQL administration and migration provide. And people who solve the SQL pain points, such as the excellent folks at CockroachDB, and the various folks building replication layers on top of Postgres, can hopefully keep the market for database startups alive long enough for someone else to fix the NoSQL landscape as well.
 https://aphyr.com/posts/284-call-me-maybe-mongodb , where one finds that Mongo actually isn't guaranteed consistent during partitions, actually did hit us in production, and we had to dig through a rollback file manually (shudder); fun times. On the other hand, see: https://aphyr.com/posts/330-jepsen-rethinkdb-2-2-3-reconfigu...
Rethink definitely had its use cases, I just never saw it as my primary data store.
When you hit those edge cases or specific needs, you implement the specialty solution. Most people never come close to those edge cases though. When they do, PGs foreign data wrappers make the adjustment as smooth as possible.
Because of single threaded query execution, lack of MPP, or something else?
If it doesn't meet your needs please file an issue, our goal is to remove the no-good-GUI limitation that was imposed on postgres sixteen years ago.
Hell Python Tools for Visual Studio almost makes me want to do python dev work on a PC.
End result is that I jumped on the MySQL bandwagon, and even all the negative posts against it now and the lauding of Postgres in here all the time cannot make me change tracks. I just have too much data and IP invested in it now.
v9.6 finally brought some parallel scans but is still limited in scope and far behind the commercial databases. Same with replication and failover, although there are decent 3rd party extensions to get it working.
I wasn't able to set it up properly some years ago and settled down with warm standby replica (streaming + WAL shipping) and manual failover. Automating it (and even automating the recovery when old master is back online and healthy) was certainly possible - and I got the overall idea how it should operate. But the effort required to set it up was just too big for me, so I decided it wasn't worth the hassle and settled down with "uh, if it dies, we'll get alerted and switch to the backup server manually" scenario.
Having something that goes in line of "you start a fresh server, tell it some other server address to peer with, and it does the magic (with exact guarantees and drawbacks noted in documentation)" would be really awesome. RethinkDB is just like that. PostgreSQL - at least 8.4 - wasn't, unless I've really missed something. I haven't yet checked newer versions' features in any detail, so not sure about 8.5/8.6.
This is the one thing I ask of every database software, yet we still don't really have it. 90% of problems could be solved if there was a focus on the basics like easy startup, config and clustering.
SQL Server availability groups requires Windows Server Failover Clustering, which is not quick or easy.
It offers easy commands to perform failover and even has an option to configure automatic one.
After reading about github issues, I am a bit cautious about having automatic failover though.
As for automation... Things can always go wrong, sure. But I wonder how many times HA and automatic failover had saved the day at GitHub so no outside observers had a faintest idea there was something failing in there.
Horizontal scalability is important for HA which every production environment would like or need.
Here how it compares:
The only values that has higher than rest are on page 13, except those are latency and lower is better.
This paper shows clearly that Mongo doesn't scale.
And even for single instance Mongo is slower than Postgres with JSON data: https://www.enterprisedb.com/postgres-plus-edb-blog/marc-lin...
There are actually add-ons to Postgres that add MongoDB protocol compatibility, and even with that overhead Postgres is still faster.
And even such benchmarks don't tell full story. I for example worked in one company that used Mongo to regional mapping (map latitude/longitude to a zip code and map IP address to ZIP). The database on Mongo was using around 30GB disk space (and RAM, because Mongo performed badly if it couldn't fit all data in RAM), mainly because Mongo was storing data without schema and also had limited number of types and indices. For example to do IP to ZIP mapping they generated every possible IPv4 as an integer, think how feasible that would be with IPv6.
With Postgres + ip4r (extension that adds a type for IP ranges) + PostGIS (extension that adds GEO capabilities (lookup by latitude/longitude that Mongo has, but PostGIS is way more powerful) things looked dramatically different.
After putting data using correct types and applying proper indices, all of that data took only ~600MB, which could fit in RAM on smallest AWS instance (Mongo required three beefy machines with large amount of RAM). Basically Postgres showed how trivial the problem really was when you store the data properly.
> Correctness. We made very strict guarantees, and fulfilled them religiously.
It is true that the user base did not recognize this as important as the RethinkDB team, also because there is part of the development community which is very biased towards this aspect of correctness, at the point that you literally never experienced a problem for years with a system, but then a blog post is released where the system is shown to be not correct under certain assumptions, and you run away. So part of the tech community started to be very biased about the importance of correctness. But 99% of the developers actually running things were not.
Correctness is indeed important, and most developers have indeed less interest that they should have in formal matters that have practical effects. However, I don't think this point can be simply reduced to developers ignorance, that make them just wanting the DB to be fast in micro benchmarks. Another problem is use cases: despite what top 1% people say, many folks have use cases where to be absolutely correct all the time is not as important as raw speed, so the reason they often don't care is that they often can actually avoid caring at all about correctness, as long as things kinda work, while they absolutely need to care about raw speed in order to take things up with a small amount of hardware.
So I suspect that the people who cared about the same things that the team did are also the people who 1) either aren't the ones making the final business decision or 2) aren't in a position where they could or would pay much for it.
But even in those cases, the operation can be retried or the operation could take longer
If you're "Social startup of the year" it doesn't matter if one post appear to some followers 100ms later than they should be.
And for most of those developers, atomicity guarantees (the A from ACID) are very important, because without transactions if that one operation fails, before it can be repeated as you say, the system also has to rewind the previous operations that succeeded in order to leave the data in a consistent state and implementing this rewind by yourself is extremely challenging (take a look at the Command pattern sometime for a possible strategy).
Or in other words, most developers need atomic guarantees / transactions as well and if the DB doesn't support transactions with the needed granularity for your app, it's easier to change the DB than implementing it yourself, which is why RDBMSs are still more popular than NoSQL systems, because they are more general purpose and made the right compromises for a majority of use-cases.
I don't disagree consistency and correctness are important, but it's the same with uptime, where to have it an order of magnitude 'more confidence' you have to spend more time/resources (because even if your DB did everything right, your hard drive could have had a glitch, etc)
There are ways of working around the lack of atomicity, for the cases where you really need that guarantee (I really wouldn't try "writing my own" generic transaction manager, but you can try relying on the atomic operations and limiting changes to small steps - and if you wrote information that is irrelevant now because the operation didn't finish that's fine - you can gc it later)
Now if the operations you need to do are very complex then it is probably better to keep using RDBMSs
That's latency, not correctness. I think even "social startup of the year" cares that the post shows up at all, is attributed to the correct author, isn't corrupted somewhere and so on. Data correctness/consistency is about making sure that the data isn't in an inconsistent state.
That's "eventual consistency" in replication
> People wanted RethinkDB to be fast on workloads they actually tried, rather than "real world" workloads we suggested.
> For example, they'd write quick scripts to measure how long it takes to insert ten thousand documents without ever reading them back.
> MongoDB mastered these workloads brilliantly, while we fought the losing battle of educating the market.
Don't tell people what "real world" workloads are. They know better than you do. When I presented my performance testing scripts to you guys it was a replication of our real world workload. Trying to explain to someone that their workload isn't "real world" hurts the confidence in the system. Its better to either say that isn't a workload you are optimized for or that you are working on it.
That being said, I'm now building a start-up and using RethinkDB because it is the best system out there! My performance requirements for a start-up are significantly different than the billion dollar survey company I used to work for ;) But I'd also like to commend you on the performance you've achieved with RethinkDB in general, it has come a long way and any company would be happy to hit the scale that RethinkDB has issues with!
He says that the market was terrible because open source developer tools suffer from oversupply. I don't think that's what happened. Rather they chose to make a product for which the next best alternative (e.g. Mongo or Cassandra or MySQL or Postgresql) had almost zero cost. It doesn't matter how oversupplied the market is if someone is willing to give away a viable product for free. Perhaps "oversupply" is another way to say "someone was willing to sell at zero cost", I'm not sure.
Unless you can find a set of customers who see significant value in your product vs. the alternative zero-cost product, you're never going to make money.
I agree with most of your comment, but I take issue with that last part. Moore's law has no relation to the price of computing. It only makes predictions on the number of transistors in a CPU, not the performance of the CPU nor the price/performance ratio of the CPU. Even if it did, compute costs much more than the CPUs; memory, disk, power, cooling, networking, enclosures, and more all cost a significant amount relative to the silicon.
I know Google Cloud declared they were going to decrease prices at pace with Moore's law, but even if they can deliver on that it will be through innovations outside of the CPU.
Right now I don't know what other database I could use, because I found no other solution that supports this kind of functionality, especially in a distributed setting.
I agree with most of your analysis and I think that the most important part is about "worse is better": many of your potential customers won't understand or won't care about correctness or consistency.
That said, I really like RethinkDB (oh, it does have its warts, but overall it's great), I'm very thankful for it, and I hope it will continue as an open-source project.
The only other option is to put something in front of a DB like FeathersJS, or deepstream.io
RethinkDB is a breath of fresh air and I really hope the community will be able to keep it alive.
It is more of an object model that a straight up JSON store, but it has some of the best notification features I have tried. You can observe both object and query results in realtime, and since the dataset is replicated to you it has zero-latency local access.
They used to only support mobile, but now they also have a node.js version for server side use.
In my opinion Realms observability is light years ahead of all the other offerings out there.
Have you looked at Datomic? It's not free but I think it provides the functionality you are looking for.
But "$5000 per Year per System" is outside of my price range for the foreseeable future.
Think about some examples.
MongoDB is nothing but the object databases of the 80's, rebranded. It's main selling point is "you don't need to learn SQL". Object databases failed and people at the time were willing to learn something new (SQL). Nowadays, who will do that?
Go's entire selling point is "faster than python, no need to learn anything new". In terms of language features it's again living in the 80's - Java got generics in 2004 and Java was way behind the times. In terms of speed it's worse than Java and Rust, maybe comparable to Haskell.
And NodeJS is nothing but "write server code, no need to learn a new language". Isomorphism simply didn't work. People think Node is fast because it can do async, but Java has been able to do threaded async for 15 years.
I wonder how much the industry could advance if we could simply persuade every developer to learn Java Postgres.
I am not terribly fond of Go, but it has a decent answer in auto-implemented interfaces and duck typing. It's not about dumbing down, it's just different, and easier on the compiler, which is a significant advantage.
And as for async in java... ugh, how awful. We finally got something like promises and lambdas, but most APIs are still trapped in synchronous calls and using the JPA is a disaster of hidden thread-local state. What kind of stack are you using to make java async work? Maybe I'm just missing something.
As for Java + async + threads, I use Akka and it's basically seamless. But Netty + core Java Executors is fine too, just more verbose.
SQL per se is not hard.
But the whole data modeling, creating tables, picking types is just needless bureaucracy
Yes, I want to save a json snippet and be done with it. I don't want to create another table because I have a 1-N relationship. And while ORMs make this easier it is still not painless
An internally-schema'd database just recognizes that this is absurd and lets you put the bottom three layers of validation logic inside the database, where you only have to do it once.
Do you think that doesn't happen in regular DBs or that schemas don't evolve?
Love for static typing is basically Stockholm's syndrome
Your system will eventually end up with a fixed schema, but until you get to that place you don't need to "stop the world" to change it. "But Postgresql can add columns without downtime" yes it can, but what about your system?
Funny, that's sort of how I feel about my (long) time "loving" dynamic typing. Now I find it is such a relief to validate data once on the way in and completely trust it to be what it says it is thereafter. I found dynamic types required more bureaucracy (if I wanted to sleep at night) in the form of unit tests and precondition checks everywhere.
YMMV I suppose.
This merits clarification
Most of programming in dynamic typing should be implicitly statically typed. Know what you're passing. Obey the implicit (duck-typed) interfaces
Basically, don't just check "if it's a list do that, if it's one element do something else, if it's a number do another thing" - this is a beginner's mistake (and I did those)
Besides that, unit tests are a good idea, and if you want "compile type checking" in Python use Pylint (it's a good idea to use it regardless of your situation).
Though it's not so much static typing that sucks, it's its implementation in the Java/C++ way. Go and Rust make it better.
Yeah, I thought that too. After years of persistently seeing stupid (sometimes critical) bugs due to assuming the correct thing was being passed, I concluded that 1. the only sane thing to do is add explicit precondition checks that raise meaningful exceptions when assumptions are broken, and make sure those paths are exercised by unit tests, and that 2. I would much rather have that done statically and automatically by a compiler.
> Basically, don't just check "if it's a list do that, if it's one element do something else, if it's a number do another thing" - this is a beginner's mistake (and I did those)
I think this may be a misunderstanding of what I meant by "precondition checks" - I meant explicitly checking the assumptions being made by a method, most of which are type checks (foo.kind_of?) or (more commonly) interface checks (foo.respond_to?).
> Though it's not so much static typing that sucks, it's its implementation in the Java/C++ way. Go and Rust make it better.
I don't have much problem with Java's implementation of static typing at all, and I don't think C++'s problem is its static typing implementation. I find Go's implementation a bit inflexible and hard to work with though.
And sorry, but "I wonder how much the industry could advance if we could simply persuade every developer to learn Java Postgres."?
Plenty of other people would say "I wonder how far along the industry would be if you people would actually learn how to manage your memory and write C".
Plenty of other people would say "I wonder how far the industry would have progressed if we let go of the harmful and outdated concept of imperative programming and embraced functional paradigms."
... and on it goes.
Additionally, you say "I do find it truly sad how anti-intellectualism is pervading our industry" only to end your post with "why can't we abandon all our experimenting and learning and just stick to good old Java which obviously would solve all our problems", which is itself MASSIVELY anti-intellectual.
I don't see the industry gravitating towards bad re-implementations of C with less functionality and a flatter learning curve. The only "replacement" for C that I'm aware of is Rust, and Rust is most definitely not anti-intellectual in the same way I criticize Go.
Rust actually requires the developer to learn new concepts (borrow checker) which are not found in C, Python, Java, etc.
"why can't we abandon all our experimenting and learning and just stick to good old Java which obviously would solve all our problems", which is itself MASSIVELY anti-intellectual.
I didn't say to end experiments. I said everyone should actually learn some fundamentals before jumping on low-learning curve bandwagons which offer nothing new.
Well they do not want GC for their language hence borrow checker while Go already has an excellent GC.
As opposed to learning new things we need useful new things which couldn't be done before fancy new language.
That's a mis-characterization. There is a difference between experimenting and learning and pushing all sorts of questionable fashion trends into production.
> I wonder how much the industry could advance if we could simply persuade every developer to learn Java Postgres.
And that is preceded by what I can only described as a bunch of undue mischaracterization of several languages and technologies, all of which definitely seems to imply that "yeah, all that other stuff is just a bunch of fads, why don't we stick to the real deal and all write Java."
Now, I don't want to be unkind to Java here. It's true that Java is an incredibly powerful language and there are a wide variety of technologies and tools that are absolutely incredible that you can only get via the Java ecosystem. But that can be said of many languages, e.g. C++/C.
I think what the GP may have meant, and which I do agree with, is that our industry would progress if we as developers spent a bit more time learning about the context and history of current and past solutions. We'd probably re-invent the wheel a bit less often.
But I don't want to limit that recommendation to "learn Java Postgres". I think Java developers might benefit from examining the benefits of a language like Haskell, Python programmers may benefit from a language with strong typing, such as C++/Rust/Go, that Unix lovers might learn a lot from examining the Windows NT kernel API's, and so on. If we examine the things that aren't close to us, the things with which we're the least familiar, that's when I believe the most fruitful learning can occur.
Just for kicks I wrote a desktop todo list in GTK and C, not tools I had extensive knowledge of ever and nothing recently. It was a breath of fresh air how simple it was compared to doing the same in modern frameworks. A single ~30 line function to create the gui and the rest held together with a couple of function pointers. It was faster (to write and run) and simpler than anything I'd written in years. More responsive than the most responsive web framework too.
It's okay to bemoan Electron for being bloated but multiple balls have been dropped by OS and platform makers, GUI toolkit makers, language runtime makers, and that giant amount of missing glue inbetween, to get to where we are.
It's rather sad that Electron, out of all things, is the first thing to fulfill the holy grail of write-once, run anywhere, in a way that's good enough and palatable to the plurality of users' and devs' satisfaction.
Java beat Electron to write-once-run-anywhere by a couple of decades, and still does in the sense that Electron requires you to make platform-specific builds of your app even if the core code is the same (Java gives you that option but doesn't require it, you can still distribute a jar or web start file).
Meanwhile, Electron has managed the feat of being even bloatier than Java is. The DOM was never designed for GUIs and the terrible performance of web and Electron based apps is a testament to that. At least Java apps tend to have meaningful menu bars and context menus.
Electron satisfies web devs who have either never written desktop apps in other frameworks, or did decades ago and think nothing improved there. They "satisfy" users in the sense that users are rarely offered any alternative so have to suck it up and tend to judge web apps against each other vs a well coded, tightly written desktop app (office suites being a notable exception).
The biggest takeaway though is how simple it was to build. If all platforms where that simple then having seperate UI code for each platform really wouldn't be that big a deal.
At the moment I'm somewhat hopefully the libui project (https://github.com/andlabs/libui) will be successful though. It's learned the lessons from previous attempts.
To me, Go's selling points have been numerous:
- Faster than python
- Statically typed/no runtime bundling requirements
- Easy access to concurrency/multicore scaling
- Excellent tooling and integration with programming text editors
- Great documentation
- Excellent standard library which you can treat as a continuation of your own program
- Great design with pragmatic choices (this one is subjective, I agree)
In short, there's nothing else like that out there.
Without Go, I'd have been stuck with JVM, Haskell or Erlang for most of my projects. And I dislike all of them, so Go has been a godsend.
Incindentally, I also happen to enjoy working with Postgres and prefer it to any other database I've worked with (SQL or not).
Can you come up with any intellectual rather than emotional reason for preferring Go?
I don't think you are really disagreeing with me - go does nothing new, just lower learning curve.
I was disagreeing with your original statement, Go is not just a faster python.
For you, maybe. For plenty of others, that's clearly not true.
> While you do get more proficient with time, you never get to the state where the language gets out of the way and becomes your friend instead of something you have to wrestle most of the time to enjoy the benefits it offers (which are also offered by Go, BTW)
Go offers weaker type-safety benefits that Java (with the benefit of less type-safety ceremony), much less Scala or Haskell (against which it has less ceremony benefit, because Haskell and Scala have reduced ceremony with stronger type systems).
Go doesn't offer the benefits Haskell or Scala (or Rust) do. OTOH, some people find it more intuitive, and it does offer different benefits which may be more relevant to some use cases. The sweet spot for Go seems to be the place where, before Go, you might use Python but be upset about performance and might use C but be upset about boilerplate; Go improves over the weaknesses of either on that border, while preserving most (but not necessarily all) of their strengths.
> Go isn't just a faster Python.
Faster (and more parallel) Python, less obtrusive C, lower ceremony and more native Java -- it's not just any of these, but they sort of capture its primary strengths.
How did you come to this conclusion? Based on your own experiences? If so, why do you find it reasonable to assert your anecdotal truth generally?
> enjoy the benefits it offers (which are also offered by Go, BTW).
Are you seriously saying Go offers the entirety of features that Haskell offers. So what, you think people are just using Haskell for no real reason?
Java ecosystem is just toxic waste full of abandoned projects, over engineering everything, crashing build because everyone use arbitrary versioning schemes, ring-fences code with licenses/patents and developers with C++ mindset.
Node.js is maybe build on top hack on top of hack. With some ugly code, lack of any abstraction, lack of sensible std library, debugging capabilities, minimal editors support and finally with package manager that cannot be trusted.
But I can build CRUD API working with any database under 1h in node.js. Where In Java I need to start with learning all maven quirks, write pointless JSON mappers and learn stupid annotations that every library introduce.
Go's entire selling point is "faster than python, no need to learn anything new". In terms of language features it's again living in the 80's - Java got generics in 2004 and Java was way behind the times. In terms of speed it's worse than Java and Rust, maybe comparable to Haskell.
This, piled on with subtle ageism, ensures that any useful lessons learned will only be learned by the time it's too late, and there's a new group of can-do people who feel stifled by the combined wisdom of the ages, so they break off and do their own thing.
I wrote about this here  with nicer language.
A lot of people don't care much about the tech, and just want to build something. If easier (for them) tech lets them, that's a win, no?
Of course... that's easy to say but harder to swallow when they end up building something, making some money, and then hiring a developer to clean up their hideous mess. I've been there too.
Major Oracle shops aren't dumping them for Mongo/Couchbase/... because they're trying to avoid SQL. They do it for scalability, availability, performance, flexibility and more.
I don't see that the facts in the real world back you up here.
Not claiming NoSQL is a panacea. You're definitely making some trade-offs, and in many cases it's just the wrong tech. But it's absurd to say it's happening to avoid a learning curve.
Care to elaborate? I'd really like to know your opinion as this is an important aspect of my developer-life.
The article talks alot about big companies that make lots of money but as a user of databases I always saw the competition to ReThink DB being Postgres, MySQL and to a lesser extent ArangoDB and MongoDB.
The core problem for Rethink being that, well, I can get as much database as I want for free. I explored many/most of the new/nosql databases and in the end always found that Postgres was the best solution - you just shouldn't bet against Postgres it seems.
I really liked RethinkDB and thought it was well implemented but it just wasn't needed.
I do wonder if ArangoDB is going to go the same way.
Replication and a nice UI, sure, those are nice. But Postgres has fantastic single-host performance, and Rethink is not yet in a position to compete there, except in some horizontally scaled use cases. Rethink doesn't even have transactions, making it potentially useless for some things that require strict atomicity over a complicated data model. In general, Postgres has such a richness of features that it's hard to compete with it; PostGIS alone is a "killer app" if you do any kind of GIS stuff.
For me, there weren't any such "killer app" features. The change streaming is nice, but it's not difficult to do that with Postgres either. That's it really: Every solution offered by Rethink could be countered with "that's nice, but trivial with Postgres".
The one exception being horizontal replication. I have yet to launch a project where Postgres couldn't scale out well enough, and I suspect that if I got to that point, I would reach for bigger/different guns such as Cassandra. For our apps we typically use Elasticsearch in front as a kind of read-only mirror of Postgres; the latter being the slow, serializable path, and the former the fast, slightly-out-of-sync path. Users don't know the difference.
I'd like to add some colour, although I've never used RethinkDB so take it all with a large pile of salt.
> Rethink doesn't even have transactions
The post-mortem dwells heavily on how RethinkDB focused on correctness and was beaten by MongoDB which is, to put it mildly, not concerned about correctness.
But Rethink is a JSON DB that apparently doesn't have transactions and which on its web page says "Query JSON documents with Python, Ruby, Node.js or dozens of other languages".
Here are three technologies I do not strongly associate with correctness:
* Schemaless databases
* Scripting languages
Rethink's own FAQ says "RethinkDB is not a good choice if you need full ACID support or strong schema enforcement".
So in the database market there are of course lots of users who do strongly value correctness, but they tend to use databases and technologies that are statically typed (Java/C#) and database engines that have are ACID and provide schemas.
Very few of them want a database that doesn't provide ACID, doesn't provide schema enforcement and which appears to either not support standard industrial languages at all, or is at least distinctly uninterested in them.
In other words Rethink targeted the hipster startup market of companies building social networks, ride-hailing apps and other things where correctness is not important and low-correctness tech dominates, but tried to sell a product that was oriented around correctness. This seems like a market/product mismatch.
That said, I don't think lack of correctness itself is a design goal. I think it's accidental, or at least an intermediate state before we have something better.
Case in point: I'm currently working on an open-source (it will be released soon) data store that uses JSON to express "documents" of data, but also has schemas, joins and transactions. It uses PostgreSQL internally, but completely hides the underlying storage. The developers who are currently using it for actual work are pleasantly surprised about how nice it is to work with: You get to work with rich, structured data in its natural shape without needing some kind of ORM to mediate between an app's data model and the database's, and at the same time it's very strictly validated without being pushy (it has gradual typing: you can start out schemaless and then tighten the schema). When we designed it, we wanted to build something that reflected how people actually use a database, and I think we're achieving that. (I'm looking forward to make it public and share it on HN.)
So it's certainly possible to have the best of both worlds. It's just that the direction that the world moved in with "NoSQL" temporarily moved us off course for a little while. It started out as a natural reaction to scaling issues, and in that sense I think it might have been a necessary step while people figured out what they were doing. We're seeing the same evolution in the ways of people's thinking with regard to dynamically/statically typed languages, too.
1) Don't build a company around your FLOSS project, accept it'll be free in all senses forever
2) Start with a customer that will finance you and build slowly (focus on profit not revenue)
That's obviously not very helpful as finding that one customer is the really hard part so it's usually a hybrid of expecting what you build to be free forever and then finding that one customer.
Quote/Charge a lot. Nope more than that. Now double the price and you're roughly there.
On the plus side you'll get very enthusiastic developers that tend to be intrinsically motivated which is a huge plus once you can build that company. I'd argue that this is one of the key competitive advantages of a "FLOSS company"
When it comes to new customers...Always be aware that you're not selling your cool tech but rather selling the solution to a specific problem (basically you're selling pain-easing medicine). I feel like it is beneficial to visualize a somewhat mean guy at the other end of the table who is only thinking this one line over and over "enough with the nerd speak, how does it help me and does it roughly cost what I expect it to cost". Yes we like to imagine that we can impress people with nifty tech details and shudder that the competition might be "that horrible Oracle database every developer hates".
Hand in hand with this...charge a lot. The price serves as a signal. Just assume that evil guy you're talking to only cares about getting back to that golf course and won't even look at the number that closely as long as it's fitting his expectation. Yes that means charging too little can kill your deal.
Running a company where a large part of the users is developers is very hard. The secret sauce is providing a good product and create a business where some of the users would pay to have something more, like support and/or an Enterprise edition.
The truth is, AFAIK, no NoSQL company backed by VC is still profitable today. Not even MongoDB that has got more than $300M and is able to collect just $60M/year by spending much more to be up & running.
Disclaimer: I'm the author of OrientDB.
I was wondering about this, as it's not explicit in the article: What is the business model that makes money for Docker and MongoDB? From MongoDB's website I gather they have some "Enterprise" things, but they want me to give them my personal data just to access a "datasheet" describing this. Docker's "Enterprise" offering seems to be a mix of support and hosting.
So is that it? Support, hosting, and donations from Big Business?
The article also says: "Thousands of people used RethinkDB, often in business contexts, but most were willing to pay less for the lifetime of usage than the price of a single Starbucks coffee", but I don't understand what those users would have payed for. What was the product being sold? All I can gather from the article is some cloudy hosty database-as-a-service thing that might have made money but never shipped.
This is a variation of the "no one ever got fired for IBM" trope, and is in fact a big moneymaker for the likes of IBM, Oracle, Microsoft, and the like, even in situations where the standard notions of vendor lock-in may not even apply.
The Economist covers a lot of macro-economic news and frames the world in terms of market forces.
I think Slava is saying the type of biz-savy "personal development" he experienced (belatedly) could have been accelerated by more of this perspective.
Admittedly, that was a few years back, so that could have been true then, but the feeling stuck with me and I just stopped taking it seriously.
The problem is that they got a lot of attention early on and expectations were huge and instead of building some hype around it, they were actually too honest. In hindsight I, as an engineer, would have probably done the same, so I think there are some valuable lessons there about marketing and human psychology. At least that part the people at MongoDB did brilliantly.
First of all, huge respect to Slava for this writeup. Having your startup fail is hard and it is not the time you want to blog about it. RethinkDB going broke was a sad thing to see for me, I can't imagine how it felt for him.
I think the analysis of the two root causes (hard market, focus on the wrong metrics) is accurate. It is very sad to see that in the end correctness doesn't win the day, not even for databases.
Since I run a startup too I can't help but apply his criteria to GitLab.
Good metrics to focus on:
1. Timely arrival => we try to ship great features every month on the 22nd, something that both our users https://twitter.com/PragTob/statuses/767777202045915136 and the parody account likely run by our competitors employees agree on https://twitter.com/gitlabceohere/status/768440048802947073
2. Palpable speed => we're doing OK on self hosted, really bad on GitLab.com (fixes are in https://gitlab.com/gitlab-com/infrastructure/issues/947 ). If you look beyond latency but to workflow we're doing great in integrating various parts of the process https://about.gitlab.com/2016/11/14/idea-to-production/
3. A use case => we're making GitLab an integrated tool with a broad scope https://about.gitlab.com/direction/#scope A good way to develop software from idea to production.
I hope the above list doesn't come across as pretentious. I welcome pushback and the opportunity to talk more about the OP.
I think the OP premise that it is hard to make money in the cloud is accurate. We're spending hundreds of thousands of dollars to make GitLab.com run and revenue takes a long time to grow. Selling on-premises software products has higher margins than a service.
BTW I appreciate the shoutout to GitLab as one of the five exceptions that are doing well in the open source developer market.
1 thing we are always transparent with users about is how our business model works. 1 of my favorite videos on the internet about this is . We are very much based on closed source for making revenue. This has actually helped with our community a lot.
We are basically focused on closed source solutions for specific verticals. The only way to get clients to pay is to make something more efficient or don't open source/core it. For open source companies there is always a trade off of community vs customers. You have to find a hard line and stick to it.
Community is great, but most of those people won't want to pay you. You can try to convert them, but that takes a lot of time and isn't the best path to market.
We strive to provide a great base line experience to our users, but a few of the things we have found in practice that helps:
1. Features have customer names on them
2. There's a mass of community interest in a specific feature
You have to be careful about where time is spent.
It's very easy to mistake "users" for "customers".
I wish slava and co the best of luck and really appreciate their post mortem. 1 thing we've learned as an infra company is "be as close to the app as possible". In our case the bulk of what we focus on is banking/telco anomaly detection.
It's a straight forward revenue making product that can benefit from deep learning. The great thing is we can grow in to other use cases later on if we find something else interesting to work on.
Even seen this backfire at many commercial companies. Adding whatever features a company is willing to pay for will often pull a product in multiple, incompatible directions.
That actually goes hand in hand with "users look like customers". Also: "Not every customer is a good customer."
I remember when iPhone first launched, the first 30 min was spent just trashing then-current smartphones - the blackberrys.
Lol. "It's not hard to win an Olympic medal in the 100m sprint. All you have to do is move your legs faster than those other guys!"
A) I have no idea what makes rethink db better than mysql off the top of my head (maybe stuff exists, but I don't know what it is. I also am risk-averse and don't really have any major complaints with mysql).
B) I understand there are a lot of database technologies out there (mysql is always improving, people speak highly of postgres, people keep mentioning this maria DB thing, etc) which leads to a decision fatigue. It's much in the same way that if I have to choose between learning 3 frameworks, I'll just use none and see which one dies first because I'm optimizing for not learning dying technologies.
So unfortunately it comes down to marketing, which is a disappoint to anybody who wants to believe the best products will market themselves and win out. Sorry betamax.
I would have paid a licensing fee to continue using it, but we've taken the time to switch over to Postgres now.
EDIT: Also, re: metrics of success, the examples listed as wrong there are _exactly_ why I enjoyed the product and was willing to invest time learning and implementing in a relatively new technology.
They didn't have a poor market; they didn't have any market at all. If I'm the CTO of XYZ, explain to me what I'm buying. I hate to keep coming back to it, but that's what Cygnus did so brilliantly: "You need an embedded development system. The existing solutions do nearly what you want. Pay us slightly less, and we will make GCC do exactly what you want." I know what I'm paying for.
"Build it and they will come" does not work in the open source space -- because if you have already built it, they don't need to pay you. And if you haven't, then they don't want it.
Now, if they had built a DB similar to Mongo and then sold a service that migrated people off of failing Mongo installations and onto RethinkDB, then they would be on to something. Tired of having your DB chew your data for breakfast? Want to have your reports finished the same day you ran them? For the price of a single developer, we will fix all your problems by migrating you to RethinkDB. That's something that will make money.
But who wants to charge for services? How will we get our multiples if we only charge once for our work? That's just folly!
Those who take them into production are startups who are themselves trying to succeed or large companies who have tons of engineering talent who can create the product if need be but since its open source and available they use the open source version but they don't value it. Think the recent article where Github, forget supporting Redis has not even reached out to the developer once to thank him.
The demands of the community are high touch, even one or two weeks without commits and people start getting restive and raise 'alarms' on github and here. If there is potential value that you are still figuring out how to capture and a lot of attention you could end up expending time dealing with the politics of a fork. On the other end purchase decisions of most companies brings a whole new sets of requirements from internal processes, approvals, policies to decision makers and most open source companies do not have the resources or experience initially to manage the expensive and high touch sales and marketing process that most companies buying technology require.
If you have funding the pressure is even higher. There is a missing piece of how to translate support and usage of open source software for commerical reasons into some sort of revenue even minimal without friction. I think there should be a spotlight on companies using open source and their support and contribution back, and not just acquiring projects or hiring their developers which is an 'influence play'.
I don't think The Economist is what to read for this situation though. A book on the history of Oracle or something might have been more appropriate.
If I look at google trends  mongodb starts trending July 2009 (It exists earlier but it is the time that rises) rethinkdb equivalent is March 2015 and it never rises so steeply. That is six years.
With all the people I've talked nobody new rethinkdb it and it was not a problem of re-educating them on its merits. So what was needed was marketing, success stories and competitors' failure stories.
People did not know it existed. Maybe in the cognitively saturated NoSQL market we more. Gone are the days we could predict developers' needs on our hunches or our interpretations of the feedback we are given from early adopters because they are a biased sample.
Worse: we might need people in marketing and, god forbid, some MBAs.
This to me explains the massive success and quick rise of things like redis, mongodb etc. Sure the defaults are insecure but they work and they work immediately.
That was it. Every time MongoDB has come up in conversation since, I've said "Mongo eats your data" and recommended another solution, even though I'm pretty sure that particular bug was fixed long ago. You had one job, MongoDB.
I see this as equivalent to a company marketing a smartphone with an impressive feature set and a high rate of bursting in to flames. Most owners would return their recalled phones and think twice about buying the next model from that manufacturer.
One can even imagine a future where each court case is followed by a postmortem. Possible bullet points:
1. Was this an actual problem? Did anybody get hurt? If not so, what's wrong with the law? How should we fix it?
2. What were the ultimate causes of the problem? Ok, A killed B, but why? Could it have been prevented?
3. What are the action items to prevent this exact scenario from happening again? Who's assigned to each action item? What's the deadline for the item?
That being said, it's an interesting database and I'm still looking for a solid distributed OLTP document-database with SQL/joins and multi-cluster replication... maybe someone else will try again.
On this very busy Quadrant (which ought to be a clue too) from October 2015 , I don't see RethinkDB at all, while many other niche players are present. This to me indicates that somehow, somewhere, not enough press was generated on RethinkDB, and it was largely unknown to the sorts of decision-makers who decide what products to evaluate down the road.
5-6 years ago I thought wow this is a really complicated product (a better database?) and I hope they pull it off as a startup.
IMO the biggest issues are (a) a database is a complicated product and (b) there are many options that are good enough and stable. You have to do 10x better than the status quo and that is very hard to do in a space where products are complicated.
Having a popular OSS project helps me a bit when getting job opportunities but that's all... Even if you considered me as a single-person team and counted 'better job opportunities' as a return on my investment, I don't think the increased salary even comes close to getting me a ROI on time spent.
OSS is a terrible investment. It's like buying a lottery ticket.
While you're building you assume you'll have to gain your own level of proficiency. So you don't know at what point you'll need support.
Having to go back and ask for an increased budget after it has shipped is a harder internal sale.
As much as I appreciate rethink's approach, I can see it as a challenging business to be in
Thanks to the RethinkDB team and Slava for everything they've done over the years, and for making RethinkDB open source, I continue to use it in my projects.
While it might have ultimately lead to your demise, thank you for valuing correctness, simplicity, and consistency. I've never seen another document story company get things so right.
As a systems-guy (more or less) I find that frustrating, but I have seen many times that this is so.
It still worth having the courage to try and these guys did that. It could have worked. Hats off.
RDS, Heroku and Compose. Google and Azure still dont do hosted postgres.
So many people pay for Dynamodb. You could be an alternative.
The point is - rethinkdb still exists. you could still do it. via preorders on Kickstarter.
They have the credibility. You don't question if these companies really have the people to keep the stuff running. In many cases the customers also already have a billing relationship with them. Much less friction to just add one new service to the already quite big invoice vs buying from a different company.
Competing with AWS or Google is hard, they would be still making money on the infra even if they gave the managed database out for free.
Rethinkdb is an incredible architecture. If I was in the market for something like rethinkdb/dynamodb, I would completely go with something that is built by the original inventors.
Just like docker for example.
rethinkdb's core issue IMO was that they took VC money, which did two things:
- mandated that success had to be a home run
- gave them a fixed runway
Sure, rethinkdb made some mistakes, but I think those were successfully corrected. The problem is that those corrections were too late and they ran out of runway, and the success looked more like a triple than a home run so they couldn't go back to VC's for more.
Of course, not taking VC would have slowed things down even more, but that's fine. The primary attribute many look for when choosing a database is trust, and trust takes years to earn.
mongodb vs rethinkdb is very comparable to mysql vs postgresql, and I think the final results will turn out very similar. mysql did very well and was the leader for a long time, but today postgresql is the clear favorite.