I was curious what "hybrid logical clocks" meant and found the linked paper a bit over my head. I found this more layman description:
Apparently Google used GPS/atomic clocks to keep time synced:
>> To alleviate the problems of large ε, Google's TrueTime (TT) employs GPS/atomic clocks to achieve tight-synchronization (ε=6ms), however the cost of adding the required support infrastructure can be prohibitive and ε=6ms is still a non-negligible time.
And CockroachDB created more of a hybrid version that works on commodity hardware.
Distributed systems programming sounds endlessly challenging as you are always balancing trade-offs.
First of all I think what you are doing is great.
My question is what's the point of clocks at all? The current time is a very subjective matter and I'm sure you know this, the only real time is at the point when the cluster receives the request to commit. Anything else should be considered hearsay.
Specifically the time source of any client is totally meaningless since as you say further in the discussion that client machine times can be off by huge margins.
If you accept that then one has to accept the fact that individual machines within the cluster itself are prone to drift too, although one can attempt to correct for that I appreciate.
Wouldn't you think though that what is more important is that the order is more based on the bucketed time of arrival (with respect to the cluster).
I don't see how given network delays anyone can be totally sure A is prior to B, atomic clocks or not.
What is important is first to commit.
 Yes would love to talk privately about this topic @irfansharif
You throw three more observers in and how do you make sure that all of them observe the requests arriving in the same order? Not even the hardware can guarantee that packets arrive at 4 places in the same order, even if the hardware is arranged in a symmetrical fashion (which takes half the fun out of a clustered solution).
I would highly recommend to read the link by irfansharif. It's probably the best primer ever written on the subject.
distributed systems like cockroach shouldn't use the client's conception of current time for anything at all, except possibly to store it (_verbatim_, don't interpret it) and relay it back to the client or to other clients (and let the client interpret it however they want).
Building a distributed database that can optionally benefit from the same optimization actually makes a great deal of sense. Your average hobbyist won't care, but spending some extra few kilo bucks on hardware in a dc and get big throughput improvements out of your database system is a steal.
The CAP theorem still holds, so we pick which 2 out 3 to be strengths and where to compromise as little as possible. It's a guaranteed 87.3% effective hair loss formula. I find Quiet Riot helps.
If you're planning to run on VMware, be prepared to handle rather dramatic system clock shifts. I've seen shifts of up to 5 minutes during heavy backup windows. Not all customers might be willing to have their nodes go down due to system clock / NTP issues.
Yep, we've also had our share of troubles with noisy clock on cloud environments, so that's something we're very aware of. Further down the road, we're considering a "clockless" mode, which of course isn't clockless, but depends less on the offset threshold: https://github.com/cockroachdb/cockroach/issues/14093
That said, even today, configuring a cluster with a fairly high maximum clock offset is feasible for many workloads.
Or are you saying that you see heavy clock skew despite having NTP in place?
This gets exacerbated in cloud settings where VMs get moved between physical machines, or racks since now it's not just the pause, its that the clock is now pointing to a new hardware time source.
 in quotes since it's viewed as a single piece of hardware to the software inside the VM.
Vanilla ntp makes assumptions about the hardware clock (that drift is stable) that don't apply to virtualised clocks. Using tsc clocksource may help as well.
VMware has this but it does not appear to have been updated in a while. https://kb.vmware.com/selfservice/microsites/search.do?langu...
* Set the esxis to have five external sources
* Search fwenable-ntpd (https://www.v-front.de/2012/01/howto-use-esxi-5-as-ntp-serve...) and download the .vib (do a security audit on it - its a zip file I think - to ensure it is what you think it is). Install the .vib which simple adds a ntp daemon option to the firewall ports. This works on v6.5
* Run ntpd on Linux VMs, pointed at the hosts with the local clock fudge as a fallback
* For Windows VMs in a domain, set the AD DC with PDC emulator role to sync its clock to the host via the VM guest tools, leave the rest alone
* On your monitoring system make sure that it has an independent list of five sources and use plugins like ntp-peer for ntpds and ntp-time for Windows (Nagios/Icinga etc)
With the above recipe, ntpq -p <host> shows offsets less than 1 ms across the board for ntpds after stabilising.
I don't suppose anyone knows how to make a Windows NTP server permit queries? Googling does not seem to reveal anything insightful. I know how to do this for ntpd but am stuck with dealing with a Windows NTP server right now.
Has CockroachDB some health status page or REST api? (like Nginx/Apache/Redis/Memcached or a special table like MySQL)
It would be helpful to monitor the CockroachDB database in production.
I see there is some feature inbuild, but it only sends that data home to your server for analytics. (can be turned off) https://www.cockroachlabs.com/docs/diagnostics-reporting.htm...
There's also a lot of rpc end points used for the admin UI that can be queried to get more fine grain info. However, they're primarily for internal use and might change in the future.
Additionally, you can get some of the same status info on the dashboard using the `node status` command (https://www.cockroachlabs.com/docs/view-node-details.html).
I'm quite frankly amazed that Go's runtime is able to support a database with such demanding capabilities as CockroachDB!
More technically, here's a somewhat random set of thoughts on the subject:
The Go GC is performant and predictable, unlike the JVM GC. We do have some very memory-allocation-conscious code patterns to minimize the performance impact of working in a garbage-collected language runtime, but in the end it's not as bad as you might expect if your expectations are coming from the JVM world.
Library support is good. To quote our CEO, "Most of us on the team have done extensive work with C++ and Java in the past. At Google, C++ was the standard for building infrastructure and there are a lot of good reasons for that. It's fast and predictable. It would be a good choice for Cockroach, except that in the world outside of Google, in open source land, the supporting libraries for C++ are either terrible, incredibly heavyweight, or non-existent. We didn't want to rebuild everything which you take for granted at Google from scratch. It turns out that Go has many of the necessary libraries, and they're straightforward and very well written."
Basically, if Google's internal C++ libraries, tooling, style guides (and the tooling to enforce them) were available externally, we might have gone with C++.
Some of us are fans of Rust, but Rust sadly did not exist in a stable state when CockroachDB started. I'm not sure we would pick Rust were we to start today (tooling is still a concern there), but it would certainly be part of the discussion.
The native support for concurrency in Go is a huge plus. We use thousands of goroutines in CockroachDB, and that's been a huge blessing.
I can answer any more specific questions if you have them.
Much of Java's GC focus has been on correctly partitioning the heap so that long-lived objects can be less aggressively collected than short-lived ones. (An example of a challenging long-lived object is the entire set of classes used by a program, all of which need to available to the runtime for reflection. For many bigger apps, the class hierarchy alone takes up many megabytes of RAM!)
Go can make use of the stack to a much larger degree (structs and arrays can be passed by value), and so it can get by with a much less advanced GC. As a result, Go team's main focus has been on reducing pause times more than anything else.
We write our own GPU algorithms, Java native interface transpiler (eg: we generate JNI bindings) as well as our own memory management.
We've found the JVM to be more than suitable. Granted - we wrote our own tooling and had reasons we can't move (those customers are a neat thing most people don't think about :D)
I understand why you guys did go though. Congrats on pushing the limits of the runtime.
(Also, congrats on the 1.0!)
1. gofmt and goimports really helps enforce a single uniform style. We don't really care what the style is, as long as it's consistent across our 30 engineers and 200k lines of code. We have hand-rolled more Cockroach-specific linters on top of this as well, but we could do that for Rust too.
2. go tool pprof is a great profiler. Being able to quickly dig into allocations, cpu usage, etc. is great, and we do so regularly. As a result, the overhead of the GC is minimized, since we can rapidly identify and mitigate the allocation overhead with the application of a few known patterns.
Now I don't know what the state of the art of rust profiling is, but if we were to litigate Rust vs Go starting CockroachDB from scratch today, we'd probably pay close attention to what the answer is here. The Xooglers on this team have a tonne of C++ experience, and were very happy with C++ profiling tools, and thought the Go profiler matched up to the best tools they had used previously. If there is a Rust equivalent, this isn't a problem.
3. Consistency of code (in both style, but also patterns used) across third party libraries is a concern. The existence of a single toolchain that enforces a single style in Go really helps keep the whole ecosystem healthy here. Even if tools exist for Rust, if they aren't universally used, that is not as powerful.
I honestly think that Rust would probably be a close contender if we litigated this question today. The TiDB folks use Rust for their KV side, but Go for their query engine, which is an interesting mix. If faced with this decision today, I personally would push for Rust; I'm not a fan of the Go type system's various limitations, which we are running into particularly as we write a more sophisticated query optimizer that has to do more classical programming languages reasoning. But I am one of the most junior engineers on the CockroachDB team, so I'm not sure I would prevail in this fight! :)
Overall we've been happy with the choice. The GC is sometimes a performance issue, but it's manageable (and Go gives you better tools to limit the cost of GC than many other garbage-collected languages)
This builds up my confidence in their tech, so much so that even though I had no real reason to try this new DB, I'm gonna find one! :D
I'm trying to determine whether there's a place for Cockroach within what I think are the constraints in the database space.
* Traditional SQL Databases
- Go to solution for every project until proven otherwise.
- Battle tested and unmatched features.
- Hugely optimized with incredible single node performance.
- Good replication and failover solutions.
- Solved massive data insert and retention.
- Battle tested linear scalability to thousands of nodes.
- Good per node performance.
- Limited features.
And if you genuinely need huge insert volumes, because of the per node performance you'd need an enormous cluster whereas Cassandra would deal with it quite comfortably.
We have load generators for YCSB (just raw key-value ops in a firehose) and TPC-H (very complicated read-only queries) running right now, and we're about to start running TPC-C queries (moderately complex queries in large volume) as well. You can follow along on our progress here: https://github.com/cockroachdb/loadgen
In the context of your dichotomy, we want to bridge that gap. We want the linear scalability of your second group along with the full feature-set of the first group.
We will be publishing our performance numbers, but we haven't so far because the product has improved rapidly, and our numbers have been quickly obsoleted, but rest assured, we will be publishing a series of blog posts very soon. Anecdotally, our beta customers are not finding that they need very many more CockroachDB nodes than their existing database solutions, even with something as high-performant (but inconsistent) as Cassandra.
I always see companies making the claim of linear speedup with more nodes but surely that can't be the case if the nodes are geographically disjointed over anything less than gigabit links? Perhaps linear speedup with more nodes is only possible over high speed connections? How high is that exactly?
Congratulations to the team on the release! Introducing this kind of database is no easy task - thank you and great job, keep up the good work!
A query that inherently requires shuffling because the data is geographically distributed can't get past the bandwidth needs of performing the shuffle. At the very least, with the literal simplest query plan, you're going to need all the raw data to be transported to a single node/datacenter, and I doubt there's a query and network setup where that's more efficient than doing networked shuffles themselves.
I don't think you need gigabit networks, but you're certainly going to want at least 10 megabit links. We have not tried to benchmark scenarios where we are bandwidth constrained, so I can't tell you precisely what the minimums are. All the cloud scenarios we've tested (on GCE, Azure, AWS, DigitalOcean) are constrained on other dimensions (i.e. CPU cores, memory, disk IO).
And thank you :)
Thank you very much for your detailed answer and good luck with the continued rollout!
I'd imagine CockroachDB is doing something similar for distributed join.
Another interesting idea I read about (I can't find it anywhere online) was called "join zippering". Basically you first request the cluster to solve a join by querying and streaming the key columns from a join predicate back into the cluster itself to identify which nodes have matches and then streaming the results from each node in parallel, and doing the join in the stream.
This is hard stuff but so cool too :)
I agree! we have some semblances of pushdown filtering across aggregations and some other interesting techniques as documented in the RFC that first proposed the distributed execution model.
Thanks for doing this. You're very much appreciated.
(BTW I love the name and the logo!!)
What kind of other recent SQL features introduced in Postgres 9.4 do you use? Postgres has a ton of features, as I'm sure you're aware, and while we strive for wire compatibility with Postgres it's not a goal of ours to implement support for every Postgres feature out there.
However Odoo leans heavily on Postgres, migration would be a lot of work I imagine. The first snag I've hit with CockroachDB is the lack of 'CREATE SEQUENCE'.
Plus, Odoo uses REPEATABLE READ + a hand-rolled system of locks for consistency, I'm not sure how that would play out with CockroachDB. In my experience some of the performance issues come more from long lived locks in the app than from sheer DB performance.
Also, we use JSONB fairly extensively -- I see the tracking issue here https://github.com/cockroachdb/cockroach/issues/2969 but no movement.
It's a really solid CDC framework which has connectors for PostgreSQL, MySQL and MongoDB.
Does anybody know if this feature is planned in the short or medium term ?
I will say that this is the single feature that I personally am most invested in at the company, so it will happen.
Of course a small database probably won't need a lot of the unique features, but is this aiming to replace PG/MySQL in the small/mid-size projects?
Also, CockroachDB is super easy to install and get started with!
My question is, in your opinion, what does it take to become proficient in CockroachDB sufficiently enough to be comfortable using it in a high volume, high-uptime-required environment?
Note that I haven't actually ran CockroachDB yet, so I can't confirm if it really delivers on that promise, but I'm hopeful.
This is a minimal requirement for any modern database.
Here are all our issues that track performance: https://github.com/cockroachdb/cockroach/issues?utf8=%E2%9C%...
Here’s our open source repository where we keep our load generators: https://github.com/cockroachdb/loadgen
A blog post (well, many) are in the works outlining our performance benchmarking. The situation on the ground is changing fast - our performance has improved rapidly over the past months, and each time we sit down to write a blog post, it gets quickly obsoleted. So, trust that we will have a blog post talking about performance very soon.
Anecdotally, our customers are not finding performance to be a bottleneck. I encourage you to set up a Cockroach cluster, and try the various load generators (we've got the standards and a couple other homegrown ones in the repository).
Yes, if very low-latency (i.e., P99 latency sub-5ms) reads and writes are critical to your application, CockroachDB should not be your first choice. That said, one of the primary motivations for CockroachDB is that most existing systems don't handle eventual consistency well. In our experience, most developers will eventually write code that assumes a consistent database, either accidentally or intentionally, because it works most of the time. Dealing with eventual consistency is hard.
Rather than "if you can work with eventual consistency, you should look elsewhere," the sentiment we're trying to cultivate is "if and only if your performance requirements can't work with strong consistency, then you should look elsewhere."
I don't think anyone goes back from eventual consistency. It's more appropriate for this asynchronous world, easier and more reliable.
So, Im not seeing it so clear cut in favor of eventual consistency.
After re-reading the F1 paper, my mistake seems to be thinking they relied on eventually-consistent stuff internally. It appears that was just an option for 3rd party developers in their cloud products. Thanks for the peer review as I found some more stuff double checking. :)
Has the team cooked up any latency benchmarks for different configurations? E.g. same-rack, same-zone, multi-zone, multi-region?
I've had a question for quite some time though (and I think there is an RFC for it on GitHub): do we still need to have a "seed node" that is run without the --join parameter, or can we run all the nodes with the same command line, with the cluster waiting for quorum to reconcile on its own?
That's okay, for now, I run a simple StatefulSet where each pod checks whether the Service is reachable on port 26257 to determine if it should join or init the cluster.
It's not as nice as if it was handled by Cockroach itself, but it does the job.
Short answer: no.
Long answer: at their closest earth and mars are about 54m km apart, at the furthest it's over 400, with an average of around 225m km, so theoretical latency is varies between 4 and 24 minutes.
CockroachDB uses synchronous replication via raft, and that latency would cause problems as would some other setting like our window sizes and their interaction with timeouts.
Deep space aside, I wish the announcement just said that! I came back to HN for insight into the paragraph about "multi-active availability... an evolution in high availability from active-active replication". Marketing... sometimes... I tell you what.
"When replicating across datacenters, it’s recommended to use datacenters on a single continent to ensure performance (inter-continent scenarios will improve in performance soon). Also, to ensure even replication across datacenters, it’s recommended to specify which datacenter each node is in using the --locality flag. If some of your datacenters are much farther apart than others, specifying multiple levels of locality (such as country and region) is recommended."
In short, IIUC, even _planetary_ deployment doesn't come for free (yet). Perhaps I'm just not well-enough versed yet in how people deal with globally-distributed databases, but I'd love to see the docs dig into this a bit more: practical limits of cluster deployment, recommended strategies and tools (if any) to replicate data between clusters, etc.
Some of the big details relate to not requiring atomic clocks:
Here's their comparison chart, though naturally it's biased for things-cockroach-does:
(I guess you can't write to Spanner with SQL? That seems like a big difference. No INSERT/UPDATE?)
I'd be interested in hearing:
- the backup story
- the replication/failover story
- horizontal scaling story (is it plug and play)
Your other questions are better answered on the blog post, but quickly:
* CockroachDB core comes with a `dump` command to backup your databases. CockroachDB Enterprise has blazingly fast _incremental_ cloud backup and restore, the kind that you might want for a very large deployment.
* Replication is managed under the hood by sharding the data into many ranges that are each 64mb in size. Each range is replicated using Raft, and if a node goes down, the other replicas scattered across the cluster seamlessly take over and upreplicate a new replica to "heal" the cluster.
* The horizontal scaling is indeed plug and play - just add more nodes to the cluster and they'll automatically rebalance replicas across the cluster with no downtime and no additional configuration.
As for our backup story, our doc page on the subject should shed more details.
From the high availability page  in the docs:
> Cross-continent and other high-latency scenarios will be better supported in the future.
Do you have a specific timeline in mind? I've been working on an application that needs to be highly-available, and which uses Oracle right now. It seems like you can add all sorts of tools to the mix (RAC, DataGuard, etc), but there are always significant caveats around the capabilities of the resultant system. We're talking 1 to 2 TB of data total, tables of up to 100 million rows with 1 million rows added per day, distributed across three data centers (US, EU, Asia).
And regarding high availability in the context of application deployments, is there any documentation on the locking characteristics of DDL statements? I'm interested in the ability to modify the schema during an application deployment without having to bring down the system or implicitly locking users out. Apologies if I missed it somewhere on the website!
Regarding DDL statements, this blog post  has details. In a nutshell, online schema changes are possible; the changes become visible to transactions atomically (a concurrent transaction either sees the old schema, or the fully functional new schema).
Everything under "The Future" really excites me, especially the geo-partitioning features. That is something that I'm really looking forward to be using!
Is it more for web apps, analytics, or what? When would I consider switching from e.g. Postgres to CockroachDB?
For just a couple billion rows and a dozen joins, a single node will suffice (with the caveat that you really want at least 3 nodes because CockroachDB is built for replication and fault-tolerance and you're not getting that with a single node cluster), but you'll get linear speedup as you add more machines.
Your performance on a single node should be on the same order of magnitude as doing this in Postgres right now. We are rapidly closing that gap, and intend to close it completely for TPC-H style queries, while retaining the linear performance speedup with more nodes.
The reason this gap isn't already closed is we've been focused on transactional performance in distributed, fault-tolerant situations rather than analytics performance, for 1.0. There are lots of optimization low hanging fruit that we haven't focused on in analytics scenarios that we are just getting started on.
On the feature FAQ joins are describe as 'functional' which doesn't inspire a lot of confidence but maybe it's just a perception thing. What exactly does functional mean?
A SQL db without joins sounds a lot like just a NOSQL db with a familiar query dialect.
"Functional" is our caveat that if you run Joins across your data in an OLAP setting, it will work, but it may not be the most performant Join possible. For example, our query planner does not currently plan Merge-joins even if the appropriate secondary indices exist. So after a point (joining ~billions of rows of data) it no longer is as performant as it could be. Now we expect to roll out this particular fix within 6 months. However, optimizing 4 or 5-way nested Joins in OLAP-cube style settings isn't something we're going to be performant at for years. We need a lot more infrastructure built up before we start solving the kinds of problems revealed by, say, the Join Order Benchmark paper (http://www.vldb.org/pvldb/vol9/p204-leis.pdf).
They are gonna earn back $50 million by selling...a backups tool?
RethinkDB and FoundationDB are great, but require a paradigm shift I think.
I'm excited to track this project!
In these cases you can help the cluster out by following some of the advice on the "Recommended Production Settings" page (https://www.cockroachlabs.com/docs/recommended-production-se...) around specifying which `--locality` each node is in.
If it's about maintaining an open connection in order to notify the client, that part makes sense, but at the very least the changefeed itself should be toggleable and easy to query in any DB.
By just looking at your max load over the last 24h or perhaps week, it would be pretty easy to see when to down scale.
That being said, as long as you remove the cockroach nodes one at a time , it's pretty easy to down scale a cockroach cluster.
Whereas CockroachDB aims to be strongly consistent. This makes life for the application developer much easier.
This just comes down to the fact that Windows is a special snowflake that does everything differently. Sometimes for good reasons, but usually not for good reasons.
Unfortunately this is a version of the thing it's trying to stop, as is plain from the below. These balls of mud are immune to negation; they laugh at it and grow stronger.
Also, they know what their sales cycles look like. They hear feedback from actual customers. They have people whose job it is to notice any advantage they could have along the way. And yet! They're still selling stuff, they're at 1.0, and they're still alive — with the name they have.
I think the dismissal that business people won't look at it because of the name is purely opinion-based. But what do I know?
HN users are giving vital advice, for free. Those who ignore it will have only themselves to blame.
As I say every time this comes up, would you be so dismissive about critics of naming a product PubesDB? Or GonorrheaDB? Or [n-word]DB? Then you agree that disgust-invoking connotations of the name matter, and we're just haggling over the details.
Ubuntu, Mongo, Swagger (edit: Hadoop also) ... they're weird, sure, but they don't evoke the visceral feeling of disgust that cockroaches do.
It so far appears not to be hurting them. In the slightest.
This "warning" comes from the HN crowd every time something is posted about CockroachDB. I think it's time to LET IT GO.
I for one, completely disagree with you but that's because I have a different understanding of the relationship between the business side and engineering. We are already looked at as eccentric and strange people, rarely if ever has an absurd technology name caused issue.
Someone talking about "cockroach" is equivalent to talking about "unicorns" or "git." Its considerably less offensive than talk of "masters" and "slaves." If you think this is such a problem for you, then work on your salesmanship as I wouldn't hesitate to talk to other departments or investors about this product.
I was a CTO up until I took medical leave this past October and I cannot stress how important salesmanship is to the role. I think your examples of other databases are hyperbole and not the point. You want them to be equivalent but they aren't. This comes down to what you can sell in your organization and if there is merit to it, then selling it should not be a problem.
One last point is other departments don't give a shit what the database technology is called unless it's something to put on their CV. Just call it the "database" as they most certainly will.
I feel like that is tough to judge because the public has only known them by one name as far as I know. If they switched to this name from another name and saw no difference then we could surmise that the name has had no affect.
Your statement that it "absolutely will hurt adoption" is unqualified and nothing but opinion. And what exactly is "more successful?"
The handful of people who won't try this because of the name won't matter to their bottom line. If it's good enough then for even a large majority of those they'll end up using it anyway.
Pretty much any reasonable definition will do. For example, higher adoption is one metric that can be used to define success.
> Your statement that it "absolutely will hurt adoption" is unqualified and nothing but opinion.
It's an opinion that a lot of people share, judging from the HN threads I've seen about CockroachDB. And really, I shouldn't need to defend the idea that having a name that disgusts people will hurt adoption. It's just common sense. The only real question is how much damage will the name do? The better the product is, the more people will forgive things like bad names, but there will definitely be at least some level of damage.
In addition, if there's multiple products in the same category that are fairly close in quality, then subjective things like names will matter more. Maybe CockroachDB is significantly better than the alternatives right now (I really have no idea; this product category isn't something I know anything about), but if so, surely it won't remain "significantly better" forever. Other products will catch up, or other products will be created to compete, and we'll end up with several products that are similar, and once again, naming will become more important.
And finally, you're completely ignoring the fact that a lot of decisions about tech stack aren't actually made by technical people. They're frequently made by managers rather than engineers. And when the decision is made by non-technical people, marketing (e.g. name) is very important. Heck, even when the product is made by engineers, marketing is important, because that's how you convince the engineers to spend the time investigating the product to see if it lives up to its claims or does what they need.
Speaking as an engineer, if tomorrow I suddenly have the need for a cloud-native NewSQL database, I'm probably not even going to look at CockroachDB, simply based on the name, unless someone else convinces me that it's clearly superior. I find the name very off-putting and I'd rather not be confronted with the mental imagery of cockroaches any time I use the product.
You can't know how many VCs didn't fund due to the name or how many tech decision-makers at companies will pass on this product due to the name. That being said, I doubt it will be/was significant in any case.
It will never be let go, because each new person is a new interaction with the system that prompts the same point again.
It's like those '*porn' subreddits. You can explain and explain 'till you're blue in the face why the subs are so named, but there will always be some sniggering discussion when it is introduced to new users no matter how much you try and silence or control for it, because it's based on a natural response.
Capitalize all you like, but that's just how people work. :)
Still seems like bikeshedding.
It's not your company. You're (probably) not an equity holder. Have you personally been harmed by the name because your company wouldn't let you adopt it in spite of its technical merits? Are you worried it won't succeed because of the name and thus are fighting on the company's behalf for its survival?
> Most people ... and others you need to appeal to ... don't want anything to do with cockroaches
You're making so much of this up out of thin air.
> giving vital advice, for free
> As I say every time this comes up
As the parent said, the staff have already seen these messages. They have decided to keep the name. Advice is helpful, but once the decision is made, it's not. Let it go.
It is both a negative reaction and is memorable. It is not clear which wins, and it isn't your job to decide. Yes, you have an opinion but you may not be right.
I remember in the mid-2000s thinking that a particular politician couldn't possibly succeed with a Muslim sounding name. Turns out that a lot of people thought that. Yet Barack Hussein Obama managed to become President.
Your opinion has definitely been registered. Continuing to state it has no value.
So here's my question to you: could you be wrong about this having "concrete effects on adoption"? And if you are wrong, is this just bike shedding?
And to continue the bike shed metaphor, it's about people ignoring nuclear power plant design whose worst case scenario is nuclear meltdown. For CockroachDB 1.0, what's the equivalent, data loss? So are you discussing something technically trivial (colour is easy to understand) over the design (technically complex) that would prevent data loss? If the answer is yes, aren't you bike shedding like a champion?
tl;dr Bike shedders don't know they're bike shedding and think the discussion is very important.
I would appreciate your thoughts.
With respect to your specific point: if we could resolve how much it matters, then yes, that would obviate the debate. But the bikeshedding metaphor doesn't add much there because it's precisely in dispute about how much it matters.
I agree that it resolves to how much it matters, and I guess I disagree with you on how much it matters. How it relates to the bike shedding metaphor is starting to feel like a semantic argument, which is not something I want to continue.
In response to your escalated names like PubesDB... my opinion is that I agree I wouldn't work with them, not because of any internal disgust reaction, but because the name signals a level of maturity that I don't want in my stack. Some people might have the same reaction to Cockroaches.
I didn't feel it was abrasive at all.
For my part, I'm just upset that I went to such great lengths (in the comment I linked) to unpack where the bikeshed metaphor does or doesn't apply, disentangling the various issues and merging them into a general understanding, right where that comment was needed, and yet that's the one that no one is responding to... (what's worse, it was downvoted less than a minute after I posted it ).
>In response to your escalated names like PubesDB... my opinion is that I agree I wouldn't work with them, not because of any internal disgust reaction, but because the name signals a level of maturity that I don't want in my stack. Some people might have the same reaction to Cockroaches.
Right, like I said, "we're haggling over the details"; it should be regarded as a question of which names are so disgusting to be out of the question, yet people are dismissing the entire naming issue as "lol emotional primates".
Not taking a stance either way on the name, but that is the definition of bike-shedding (aka law of triviality). A committee won't vote for my nuclear plant because the bike shed is red. The bike shed's color has concrete effects on adoption.
EDIT: I would just like to acknowledge the irony of bike-shedding bike-shedding.
> ...but that is the definition of bike-shedding (aka law of triviality)
> A committee won't vote for my nuclear plant because the bike shed is red.
> The bike shed's color has concrete effects on adoption.
> Parkinson observed that a committee whose job is to approve plans for a
> nuclear power plant may spend the majority of its time on relatively
> unimportant but easy-to-grasp issues, such as what materials to use for
> the staff bikeshed, while neglecting the design of the power plant itself,
> which is far more important but also far more difficult to criticize constructively.
> -- https://en.wiktionary.org/wiki/bikeshedding
> A reactor is so vastly expensive and complicated that an average person cannot
> understand it, so one assumes that those who work on it understand it. On the
> other hand, everyone can visualize a cheap, simple bicycle shed, so planning
> one can result in endless discussions because *everyone involved wants to add a
> touch and show personal contribution*.
> -- https://en.wikipedia.org/wiki/Law_of_triviality
> -- https://books.google.com/books?id=RsMNiobZojIC&pg=PA317
If I were to rephrase those two excerpts:
> Parkinson observed that a committee whose job is to approve plans for a
> [globally distributed relational database] may spend the majority of its time on relatively
> unimportant but easy-to-grasp issues, such as what [the name is],
> while neglecting the design of the [globally distributed relational database] itself,
> which is far more important but also far more difficult to criticize constructively.
> A [globally distributed relational database] is so vastly expensive and complicated that an average person cannot
> understand it, so one assumes that those who work on it understand it. On the
> other hand, everyone can [read a name], so planning
> one can result in endless discussions because *everyone involved wants to add a
> touch and show personal contribution*.
It's so meta it hurts.
The bikeshed story is to illustrate overemphasis on something that is trivial. It uses the example of a bikeshed color and a committee wanting to spend a lot of time on it because a) they care a little about it, and b) they understand it well enough for hard-headed members to wade into the dispute rather than trust experts.
It's a failure mode -- by stipulation -- because the bikeshed color doesn't matter beyond minor (but real) aesthetic feelings among the committee, that are far outweighed the cost of high-level personnel devoting time to it. Had they been aware of the general dynamic of these thing, they could entirely prevent the loss by moving on; it's purely an internal matter.
The bikeshed model ceases to demonstrate a failure mode if and when the bikeshed color has impacts far beyond things under the control of the committee. For example, if the majority of the world's people had a near-religious devotion to destroying facilities that house a blue bikeshed, and that fanaticism was hard to defend against, this would be a valid reason not to make the bikeshed blue, and would warrant the committee's attention.
I summarize such situations as "that's not bikeshedding", though of course, to be more technically correct, I should say "that situation does not illustrate the avoidable failure mode in the parable of the bikeshed".
Similarly, if adoption matters for more than just that committee -- if they need to convince numerous other committees to adopt the design -- it's likewise "not bikeshedding" because the first committee doesn't have control over all the other ones; with respect to the first, it's an external matter, and they can't stem the loss just by saying "hey, this is trivial".
Now, you are correct that, a high enough level, this could work as a bikeshedding example, if you could simultaneously get the entire world to collectively agree on the non-importance of aesthetics on technical matters, and on what counts as technical vs aesthetic. Then the world could play the role of that first committee and say "wow, this is trivial" and it's done.
But if that were actually feasible, then that should be your product (producing universal agreement on matters where you have a logical proof-of-correctness), not a database!
They don't need to appeal to any of these suits. Just the technical decision-makers, whose express job it is to choose solutions on their technical merits, not their spurious emotional reactions.
So these Suits you speak of, won't be able to get past the product name, enough to hear any technical merits of why this technology should ever be considered. Due to disfunctional leadership not even having a role of Chief Technical officer, or Chief Informational officer at the senior leadership level. A lot outsource because they don't want to hire/pay for this in house. It also shifts responsibility away, giving the CEO,COO,CFO, etc... the ability to point fingers at an outside entity.
That is a double whammy! internal can't sell/justify it to management, and outside IT providers/contractors can't sell it either.
So while they may be surviving with the current name they have, that does not mean they wouldn't be crushing the market share with a different name. If they are getting negative comments about the product name, then that's a warning that they should do market research to find out how many people would avoid the product because of the name.
But what the hell do I know, I'm making yet another HN comment post.
There was similar criticism about their name in the early days, but it has waned as mongo has grown. This will too.
Oh man, that's too much. lolol
The Spanish Wikipedia suggests many usages of the term "mongo", which probably wouldn't persist if the term was so repulsive: https://es.wikipedia.org/wiki/Mongo
Edit: Now that my memory kicked in, it's racist as well.
Is "mongo" the equivalent of English "retard", in terms of being a low-class insult that invokes a visceral reaction among the majority of the population?
I didn't believe that at first; if so, why didn't anyone ever put it in Wikipedia? English has "retard" (in the pejorative sense):
And why doesn't it show up in a top-result Spanish dictionary?
If it's merely an insult with numerous other meanings, I don't think it's comparable.
But let's assume it is equivalent to "retard". In that case, I would agree that it shouldn't be used as a name. But you have to pick your battles: all words will have that trait in some language. For my part, I would consider the Spanish-speaking market big enough not to expect them to buy [the equivalent of] RetardDB. So I agree there.
Edit: I agree with the sibling commenter networked's points.
Now, even if I had associated MongoDB with that explanation, and now that I do remember it's inherent meaning under a certain context, I take no offense in it since the people behind MongoDB didn't have that intent. Obviously this is an assumption on my part.
Let us not get derailed from the main point, which is the 'visceral' feelings that cockroachDB has on so many people as you mentioned in several comments. It is true, it happens to me as well. But not the word itself, but when I'm around one. Those feelings of fear, whatever, when around one are irrational. I don't remember the explanation why it's irrational, I've never worked in the field of psychology.
Maybe you have to be culturally immersed to know those things. Mongo, mongol and mongólico are the terms you should research.
I specifically said I would be sensitive to the offense it would cause in other languages, at least the major ones.
- SQLite: SQL database with no sugar. Less calories!
- MySQL: A selfish database.
- IMB DB2: Released in 1983, but never got promoted to DB3. Probably abandoned software?
- Postgresql: Gesundheit!
- CouchDB: A database for lazy people. Part of the NOSQL family, the Zen database family, that achieve SQL by not achieving SQL... like I said, lazy.
- Microsoft Access: It's very accessible. Ironically, most people that use Office don't know what it is, or that it exists, and thus, don't use it.
- dBase: De-bases your data.
- Sybase: Pronounced sigh base, which is the sound people make when you suggest it.
MySQL is named after the founder's daughter "My". The fork is named after his other daughter "Maria": https://en.wikipedia.org/wiki/Michael_Widenius#Personal_life
MySQL: a proprietary product if ever I saw one.
CouchDB: wow, does that hide bits of data until you search next week?
So now the top thread is about how terrible HN is for bikeshedding instead of talking about the actual topic... except this top thread is also not talking about the actual topic. Worth considering, imo.
But then I realized that as someone who doesn't care about the name, even positively enjoys it, I have a competitive advantage over those people.
Now I feel good again.
On the flip side though if I were in charge of CockroachDB I would look at doing something about the name. Maybe rename it something like "Resilient" as part of the "exit from beta" milestone. It's going to be a serious liability for them selling to the kinds of customers I described above, and unfortunately that's where most of the money is in these devops/infrastructure markets. The key to success is to make a superior product and then figure out how to sell it to pointy haired bosses. The latter often means making it look more boring than it actually is.
Fun factoid: scientists sometimes do this with grant proposals. I've had two scientists independently tell me that they often take cool, fascinating research proposals and "make them boring" to sell them to bureaucrats. "You have to hide all the interesting stuff and make it sound like you are doing boring incremental research. If you talk about anything 'revolutionary' you will never get funded."
It's to expected with the massive infestation of HN by suits and khakis in the last few years.
(Within reason. Someone on here actually said this argument is reasonable to have "because what would you do if they named it 'n-word'DB." Seriously.)
It's the classic case of everyone saying "I think it's great buy <somebody> will complain". Which ends in mindless mediocracy.
Same thing if your database was called BedBug.
That puts everyone competing with you at a HUGE competitive advantage. Making technical decisions based on the name of a product is the worst type of decision making.
"Well first we collect all of the data in the Epidemic schema, run it through the Apocalypse pipeline to transform it into something that our Extinction servers can handle, and finally store it in CockroachDB."
A common problem for open source projects is that the name is not recognizable enough (e.g. too technical) or too generic (e.g. a simple English word which makes is heard to search on Google).
In this case the name evokes negative emotions of fear and disgust which are not what you want to associate with a database.
I tried googling for "Amazon echo gift certificates" but I couldn't quite find what I was looking for.
I miss AltaVista.
The name is, indeed, evocative. Good names don't have to universally convey "positive" emotions.