Hacker News new | comments | show | ask | jobs | submit login
How We’re Building a Business to Last (cockroachlabs.com)
154 points by orangechairs 62 days ago | hide | past | web | 50 comments | favorite

RethinkDB [ex-]founder here.

The problem wasn't that we (and presumably others) didn't plan for open-core/cloud. We did, but there are structural problems in the market that prevent this from working.

Open-core didn't work because the space is so crowded with high-quality options that you have to give away enormous amount of functionality for free to get adoption. Given how complex distributed database products are, by the time you get to building a commercial edition you're many years in and very short on cash.

Cloud didn't work because AWS/GCloud have enormous moats of pricing and brand recognition. They drive margins down to epsilon, and if your product sees meaningful adoption in the industry they launch their own service and take all your customers.

Do you think there's something different that Cockroach could do? Or do you think that the database market as it exists today leaves them without a path to building a lasting business?

Personally I believe that latter. I don't see how they're going to get people to pay for Cockroach with all the options that are already out there. I think they might have an even more difficulty than RethinkDB did because their interface is SQL which means that migrating to things like Postgres or RDS is a lot easier.

A majority of software companies can operate on RDS (and DynamoDB for that matter) and utilize their built-in scaling functions. CockroachDB adds no value to those customers, which will presumably make up the bulk of them hypothetically becoming a profitable company.

Contrast CockroachDB with some of the other nascent open-source-to-successful-businesses:

Influx - Solves timeseries applications challenges. (biz problem)

Citus - Solves a scale out technology problem with an already established DB platform. (tech problem)

Confluent - Solves scaling issues with an already established data streaming database. (tech problem)

Cloudera - Solves scaling issues with an already established big data platform. (tech problem)

Elastic - Solves search application challenges. (biz problem)

Cockroach - Solves scaling issues with SQL databases by offering an alternative DB. (??)

Unless Cockroach can position itself as either solving a technology challenge with an already adopted DB solution, or solves a business problem, it's going to be very difficult to achieve profitability.

I should add. One of the reason I thought that RethinkDB might actually survive was that (as far as I could tell) their database not only solve scaling issues, but also helped solve real-time push to mobile apps. I'm surprised they didn't position more to mobile app devs for adoption.

When I tested RethinkDB, it was an order of magnitude slower than Postgres. That didn't match the website marketing, so I just gave up.

Dunno what that anecdote adds, but... that was my experience.

Personally, I think the database market is closed and the opportunity has already been captured. I'd love to be proven wrong, though.

I've never started a database company, so I don't pretend to know the market the way you do. But I track the database market (open source, commercial, and anything in between) more meticulously than probably anybody on the planet.

I've been working on a series of blog posts since October around the subject to databases and their future, and one post was intended to be my thoughts on RethinkDB. The leak of your postmortem that was revealed on Tuesday has made me reconsider releasing it, but your comment above makes feel obligated to share a few thoughts.

1. The database market is NOT closed. In fact, we are in a database boom. Since 2009 (the year RethinkDB was founded), there have been over 100 production grade databases released in the market. These span document stores, Key/Value, time series, MPP, relational, in-memory, and the ever increasing "multi model databases."

2. Since 2009, over $600 MILLION dollars (publicly announced) has been invested in these database companies (RethinkDB represents 12.2M or about 2%). That's aside from money invested in the bigger established databases.

3. Almost all of the companies that have raised funding in this period generate revenue from one of more of the following areas:

a) exclusive hosting (meaning AWS et al. do not offer this product) b) multi-node/cluster support c) product enhancements c) enterprise support

Looking at each of the above revenue paths as executed by RethinkDB:

a) RethinkDB never offered a hosted solution. Compose offered a hosted solution in October of 2014. b) RethinkDB didn't support true high availability until the 2.1 release in August 2015. It was released as open source and to my knowledge was not monetized. c/d) I've heard that an enterprise version of RethinkDB was offered near the end. Enterprise Support is, empirically, a bad approach for a venture backed company. I don't know that RethinkDB ever took this avenue seriously. Correct me if I am wrong.

A model that is not popular among RECENT databases but is popular among traditional databases is a standard licensing model (e.g. Oracle, Microsoft SQL Server). Even these are becoming more rare with the advent of A, but never underestimate the licensing market.

Again, this is complete conjecture, but I believe RethinkDB failed for a few reasons:

1) not pursuing one of the above revenue models early enough. This has serious affects on the order of the feature enhancements (for instance, the HA released in 2015 could have been released earlier at a premium or to help facilitate a hosted solution).

2) incorrect priority of enhancements:

2a) general database performance never reached the point it needed to. RethinkDB struggled with both write and read performance well into 2015. There was no clear value add in this area compared to many write or read focused databases released around this time.

2b) lack of (proper) High Availability for too long.

2c) ReQL was not necessary - most developers use ORMs when interacting with SQL. When you venture into analytical queries, we actually seem to make great effort to provide SQL: look at the number of projects or companies that exist to bring SQL to databases and filesystems that don't support it (Hive, Pig, Slam Data, etc).

2d) push notifications. This has not been demonstrated to be a clear market need yet. There are a small handful of companies that promoting development stacks around this, but no database company is doing the same.

2e) lack of focus. What was RethinkDB REALLY good at? It push ReQL and joins at first, but it lacked HA until 2015, struggled with high write or read loads into 2015. It then started to focus on real time notifications. Again, there just aren't many databases focusing on these areas.

My final thought is that RethinkDB didn't raise enough capital. Perhaps this is because of previous points, but without capital, the above can't be corrected. RethinkDB actually raised far less money than basically any other venture backed company in this space during this time.

Again, I've never run a database company so my thoughts are just from an outsider. However, I am the founder of a company that provides database integration products so I monitor this industry like I hawk. I simply don't agree that the database market has been "captured."

I expect to see even bigger growth in databases in the future. I'm happy to share my thoughts about what types of databases are working and where the market needs solutions. Additionally, companies are increasingly relying on third part cloud services for data they previously captured themselves. Anything from payment processes, order fulfillment, traffic analytics etc is now being handled by someone else.

Very thoughtful notes, thanks. Waiting for your full blog posts.

Have you examined emerging databases like Tarantool https://tarantool.org/, GunDB http://gundb.io, TiDB https://github.com/pingcap/tidb, ClickHouse https://clickhouse.yandex/ ?

It would be great to read some deep and independent analysis for them to.

GunDB isn't an emerging database, it's snake oil.

Tarantool has yet no sharding. gundb is in js. tidb isn't out yet (hope for tikv to be good)

We provide multiple solutions for sharding, ranging from https://github.com/tarantool/twemproxy-docker twemproxy port to https://github.com/tarantool/shard. Tarantool is close to a data grid in its architecture and features from the database world do not apply 1:1.

We have been working on a general-purpose resharding for over 3 years, but have yet to release it to the open source community: it's very hard to do it well.

But our customers get a sharding scheme that best suits their business needs, including fully automatic shard management and data re-balancing. I submitted a talk about the technology and know-how behind this to Percona Live 2017: https://www.percona.com/live/17/sessions/best-practices-appl...

[Post author here] I second the sentiment that you should publish. This would be invaluable.

On your last point at least being a proper SQL database makes the switching costs _into_ Cockroach lower, too.

True, but I think that'll have an asymmetric effect. Generally when people adopt a product it's because they have some sort of burning desire, it's faster, it let's me do stuff I couldn't before, it allows me to avoid spending lots of money, etc. Then, once they want to switch they consider the switching costs to see how feasible it is. Switching off of Oracle would presumably save many companies lots of money, the fact that they don't can only mean that switching is too tough.

This means that even with a low switching cost they still haven't created that burning desire for people to adopt the product and that's generally the harder part of the equation. I do think Cockroach is creating that in other ways with their Geo replication and sharding capabilities to name a few. But no one is switching to Cockroach because it's SQL so why not. However we know they're creating a burning desire for people to switch off their product by charging them money since people would always rather not spend money. The switching cost has to act as the counterbalance to that desire. The lower the switching cost the less you'll be able to charge people. This can be a very hard thing to solve after the fact and companies resort to all sorts of contrived things to try to get people locked in to products that don't inherently have strong lock-in.

Post author here.

First off, I really appreciated your frank blog post on the RethinkDB post mortem. The distillation of years of experience is incredibly valuable for us, and I'm sure for many others.

I agree that the database market is crowded with solid offerings. However, I believe that differentiating features do still matter and there will be tremendous growth in the database market for the foreseeable future. In your blog post you listed the metrics of goodness which you optimized for, perhaps incorrectly. You indeed had amazing execution on those original metrics. We have paid RethinkDB the compliment of doing our best to emulate the standards set with simplicity and consistency, in particular. You are also correct that the alternate metrics including timely arrival, palpable speed, and a use case are probably better ones to optimize for in an entrepreneurial setting.

We have been optimizing from the start for a still-small use case, but one which is likely to become a top of mind concern for every major enterprise over the next five years: building global, "multihomed" services. This is something Google has pioneered over the past decade, but which remains an elusive challenge for most everyone else. For an interesting read, check out https://static.googleusercontent.com/media/research.google.c... (tl;dr here: http://highscalability.com/blog/2016/2/23/googles-transition...)

You mention AWS/GCloud as existential risks for a cloud DBaaS offering. I would take that a step further and cite them as the biggest risk to all database companies. We must compete with them both by pushing the boundaries of what the database can accomplish, and by aggressively driving an anti-vendor-lockin message: embracing a proprietary cloud DBaaS offering is an unacceptable risk if there are non-proprietary alternatives.

> if your product sees meaningful adoption in the industry they launch their own service and take all your customers

Yes, very similar to Windows, Android etc, when owners of the platform learn which product goes well, and then make it themselves.

In almost this exact space: https://aws.amazon.com/elasticsearch-service/

Yeah, I didn't appreciate Cockroach's blog as a response to RethinkDB's shut down here. Admittedly it is timely, but they don't provide any insight (although with the long blog, it "appears" as though they do) that you and Mike haven't already covered.

I have appreciated the snippets of mentorship that I have gotten from you and Mike (I've interacted with Mike more) - I'm Mark from the GUN team. Here are my thoughts:

RethinkDB's shutdown spells doom for Cockroach. However, I do disagree with you Slava, that the DB market is impossible.

Rethink and Cockroach are both Master-Slave, and I think you hit the nail on the head that that is an impossible market to try and compete in. However, it does not represent the entire DB market (albeit, it is the overwhelming majority).

The market take over is going to happen with P2P/decentralized databases (Cassandra, mine http://gunDB.io/ , even things like IPFS, etc.) because Master-Slave databases have a limit of how large they can scale and shard. Up to another 5B people are coming online into 2020, so the demand alone is going to reshape the industry towards the growing Master-Master databases. Cockroach is in the wrong place.

My company is going through an inflection point, so I'll presumably be one of the guinea pigs. If I'm right, we'll be able to keep all of our technology completely MIT/ZLIB/Apache2 Open Source, yet still grow healthily and fast enough for our VC backing. Why? The inevitable shift to P2P systems is going to require the existing monoliths and governments to get on board and they need experts who have designed those systems. (I'm already seeing this happen with some of our customers and potential clients).

Startups will be able to reap all the benefits for free, even if they get Pokemon Go level hyper growth. Why? Services with hundred million plus users will be the norm, not the enterprise. And their services will become more robust with more users on it. However the dinosaurs, governments, centralized services, etc. will still pay handily in order to keep spying on their users. Unfortunately, users will still use these services because those companies actually make a profit which they re-invest in conveniences that keep users around.

That will be the divide (and has always been) between free and paying DB customers. None of this CCL stuff.

From what I've read about CockroachDB, I thought it was master-less. If so, I certainly hope there's a market for a horizontally-scalable, ACID-compliant database. I know it would come in handy for a number of applications I've worked on that required consistency but had workloads that were difficult to fit on a single server or a master-slave setup.

I agree with this comment, so why was it deleted?

Great direction, I think they it on the head. If I would have to give suggestions they would be:

1. CockroachDB Community License (CCL) might sound like a Community Edition, and that normally refers to the open source version instead of the proprietary one.

2. It is hard to quantify the difference between a startup and an established company. We put the difference for GitLab at 100 people that can potentially use our software.

3. I see the CTO commenting here https://news.ycombinator.com/item?id=13438863 that they will never move features from open source to the CCL Maybe they can consider publishing a set of promises to the community. We did that at about.gitlab.com/about/#stewardship

From the chapter by Michael Tiemann in Open Sources:

"At first I tried to make my argument the way that Stallman made his: on the merits. I would explain how freedom to share would lead to greater innovation at lower cost, greater economies of scale through more open standards, etc., and people would universally respond "It's a great idea, but it will never work, because nobody is going to pay money for free software." After two years of polishing my rhetoric, refining my arguments, and delivering my messages to people who paid for me to fly all over the world, I never got farther than "It's a great idea, but . . .," when I had my second insight: if everybody thinks it's a great idea, it probably is, and if nobody thinks it will work, I'll have no competition!"

I still find it interesting how many people dismiss Cygnus's business model out of hand when entering the open source market. (Cygnus was acquired by Red Hat for $600 million and Michael Tiemann is still VP of Open Source development IIRC) What is interesting to me is that I've never heard of anyone else even trying it. No successes. No failures. As Michael Tiemann said, no competition. And Red Hat enjoys that competitive advantage even today.

I highly recommend reading that chapter for an alternative view on how to approach open source development.

> I still find it interesting how many people dismiss Cygnus's business model out of hand when entering the open source market. […] What is interesting to me is that I've never heard of anyone else even trying it.

Well, as I understand it, part of the reason is that contrary to their origin story, Cygnus didn't really follow the "Cygnus Business Model" either, and anyone trying similar tactics since then has had to deal with much greater visibility.

It looks like you are trying to replicate the MongoDB Inc. business model (regardless of major differences in the actual product being offered).

MongoDB offers a commercial version of their product with enterprise features (encryption at rest, LDAP auth, etc) and support - MongoDB Enterprise.

Additionally they also offer managed, cloud hosted MongoDB deployments - MongoDB Atlas.

Over the last few years the valuation of MongoDB, Inc. has been slashed by institutional investors such as Fidelity and BlackRock. While they haven't had mass layoffs or some other negative corporate event, they have clearly had some difficulty making their (and apparently your) business model work.

Do you agree that this is a fair comparison? And what do you makes CockroachLabs more likely to succeed with this business model than MongoDB?

(Post author here) I'm not incredibly familiar with the ins and outs of MongoDB Inc's business model, but I agree with your assessment. They certainly seem to have embraced both of the OSS business models I described in the post as viable alternatives. We are going to start with just one, and there's still a huge amount on our plate.

MongoDB Inc did have its valuation reduced by some institutional investors. Hard to say whether that was premature or what the impetus was behind their decision. MongoDB is an incredibly well-adopted product that has gotten considerably more capable over the years. I would argue they've had good success with this business model, as building a $1.6B business is a huge accomplishment whether you've got an OSS business model or not.

It would be fair to ask whether they've done the balancing act as well as they might have. They've certainly knocked OSS adoption out of the park. On the other hand, I've heard anecdotally that they waited a long time before introducing enterprise features.

Regardless, I view MongoDB Inc. as a big - and still growing - success, and consider much of what they've accomplished to be worthy of emulation.

> (Post author here) I'm not incredibly familiar with the ins and outs of MongoDB Inc's business model, but I agree with your assessment.

"I don't know much about one of our signifcant competitors' business model." doesn't sound like something you would want to hear from someone just announcing their business model.

I don't think that's a fair interpretation of 'not incredibly familiar.'

Point still stands. I'm not even in the market but have looked at the community vs enterprise editions of successful companies to learn what splits are already market-proven. Just curiosity for me in case I got into the business. If I was in the market, I'd know everything about market leaders, main competitors, and failures that seem like they should've worked. It's necessary to compete with them to best effect.

I'll add to the other commenter that LDAP is a great, enterprise feature since it's mostly used by them and most enterprise editions seem to have it.

Of course it was slashed. It's simply not a multi billion dollars company. They had a lot of hype from web dev, they hardly had any sales or sellable product, the product wasn't executing to quality expectations.

In the same vain, the next to be slashed will probably be Docker.

the mongodb model can make plenty of money for the company management and employees.

what it probably won't do is make plenty of money for investors.

This can be seen as a kind of response to concerns about the survival of other open source databases raised after closing RethinkDB and its recent postmortem https://news.ycombinator.com/item?id=13421608

> In 2017, any product whose core capabilities cannot scale without requiring a commercial license is probably setting the bar too low.

Is this a dig at InfluxDB for removing clustering from their open source version?

InfluxDB never had clustering in any version.

They announced that when they'd have clustering, if they ever do, it will only be in a paid edition.

That's open for another debate entirely: Advertising and promoting features that don't exist and won't in any near future.

Do you envision features moving from CCL to APL? For instance, should the database ecosystem change such that everyone and their mother are offering row level geo partitioning in OSS databases, would it be likely that that feature would become APL licensed?

(Cockroach Labs CTO here) We don't really anticipate making a lot of changes like this, but yes, it's possible that as the product and market evolve we may change our minds and relicense some CCL features as APL. Of course, we wouldn't move in the other direction - once something has been released under the Apache license it will stay that way.

He covers that in the post. TL;DR yes.

> So what doesn’t a startup need to succeed, but an established company would consider an important requirement.

> The first is a fully-distributed, incremental capability for quickly and consistently backing up and restoring large databases using configurable storage sinks (e.g. S3 or GCS). The same functionality, but non-distributed, will be available for free to all users.

I appreciate that you're trying to write a good database and build a business, but what do you mean by "startup"?

If a database can't guarantee it can make backups, why would a startup attempt to use it in the first place?

(Cockroach Labs CTO here) This could have been worded more clearly in the post. There will be two versions of backup functionality: a basic implementation for free (Apache license) and a faster distributed and incremental implementation as a paid feature (CCL). It's like the difference between mysqldump and an InnoDB-aware backup tool.

Sounds like crippleware for a distributed database.

If you can't incrementally back it up, you can't really afford to run it in production in a cluster that has a large dataset. If you don't have a large dataset, you don't need cockroach db (first law of distributed objects, etc).

Maybe you'd be better off designing features for clients with specific requirements and very deep pockets.

Not necessarily, even small datasets can benefit from a distributed database. Configuring a HA database setup for any of the open source DB's requires a lot of work. For a startup, a small cluster can provide HA, redundancy, and ease of scalability should the need arise.

No one cares about the difference between 99.9 and 99.999 reliability at the DB layer and then adopts a new open source DB to solve that problem. Especially when that exciting, new, experimental database cripples your ability to back it up. Hilarious.

Is there a reason you're calling your non-free product something with "Community" in the name? CDB is intriguing but this feels like intentional doublethink to me.

If they really want to actually make money with a database, they would implement binary compatibility with Oracle's SQL dialect, Oracle stored procedure language, and its OCI binary protocol for programming from C.

There are plenty of big customers locked into Oracle. If you gave them a backwards compatible but scalable database for half the price, they would happily cut their costs in half, and make cockroachlabs very rich.

And then Oracle would offer to buy your company.

What's wrong with actually charging money for what you make? I think the over open sourcing of things leads to a tragedy of the commons whereby no one can profit enough to afford a marketing budget. Or if they can afford one, its 100x smaller than anyone else in the space that "charges" and is closed not open.

I'm really looking forward to the 1.0 release of CockroachDB. S3 and GCS were mentioned as configurable storage sinks, are there plans to release with Azure Blob as well?

Yes, Azure blobs will be supported. (I'm a dev here and just finished that feature.)

These are famous last words. You can't predict any of this.

Disclaimer: I am outside the database market (we view it all as just "storage")

We are a similar play (open core, etc) I commented in the rethink thread as well.

A few lessons on the business model we ran in to:

1. Cloud is commodity/experimental. It could be the future though and there's no reason not to have an offering at least. We are launching a partnership with microsoft on azure to experiment with cloud for their hadoop offering. We found it was bring your own license and very similar to on prem. It seems like a no brainer to at least try this.

2. We are on prem first. Enterprise customers need "pay to blame" kind of like insurance. We have closed source features we license. Support is secondary and comes only with a license.

3. Bundling and minimum purchasing is key. You need to validate a customer has a budget.

1 thing that I notice database companies don't cover which kind of surprises me (that cloudera,red hat, etc do) is training and the "services swamp" 1 way to qualify customers and increase run way is a "small" amount of support you charge a high rate which in turn drives licensing revenue on top. A lot of the companies that have actually stood the test of time started like this.

Where we deviate:

1. Building a horizontal platform but focus on 1 product at a time. In our case, we have a generic platform we can license for teams that just need "something better". We focus on a use case such as fraud/network intrusion/money laundering and sell that. This is very similar to the oracle model. "Bundle a database with the app"

2. Any services we end up doing we put aside to turn in to a product down the road. In our case, we observe and learn the patterns and reapply those as "templates". In our case we accumulate expertise, make a bit of money, and monetize later. Many SAAS companies in our space keep the customer data because they need to build better models. I'm not quite sure what the analog in the database market here is. We just keep the "lessons learned". The key here will be scaling that. In our case, we do that by focusing on 1 vertical use case and owning that. We also minimize engineering time on the platform.

In short: We have a dual licensing model where we have an "app" that's easy to sell in the market, get paid to explore the market (while focusing on/prioritizing 1 use case for more scalable revenue, and for teams that just need a better platform, we can just offer that and make some fairly passive licensing revenue.

1 other neat anecdote: for all the people raving on HN about what google's doing with AI, the problems people had 50 years ago are still the same ones people pay for today. The database market maybe similar, it might not be bad to learn from your predecessors but maybe just update the business model a bit (eg: cloud offerings,..) At the end of the day people pay for use cases over everything else. Think about the reason they are about HA not "HA is cool, they obviously want to pay for that"

What about AGPL?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact