Hacker News new | comments | show | ask | jobs | submit login

Creator & CTO of InfluxDB here. I won't respond too much in this write-up specifically but felt I had to respond to the requests for me to comment. For the benchmarks, I haven't looked at their fork of our original code and we haven't had engineers attempt to reproduce their results. In truth, we probably won't put much effort into that. We have every intention of putting more effort into benchmarking, but I'll talk a bit more about that at the end of this post.

Much of this comparison is the technology equivalent of an argument through appeal to authority. The old "nobody ever got fired for buying Big Blue" argument. It's true that Postgres has been around for much longer than InfluxDB. Mike is actually overly generous when he pinpoints InfluxDB's first release in September of 2013. First commit was September 26th, 2013 where I added the MIT license and the empty README where I refer to the project as ChronosDB. The first "release" wouldn't follow for another six weeks and I would hardly qualify that as an official release (0.0.1). If you followed the commit log, you'd see that InfluxDB is actually younger than that. We rewrote the entire thing from November of 2014 onward. Ben Johnson gets the award for InfluxDB committer to have the largest delete set in a single commit when he ripped everything out when we started the 0.9 release line. The storage engine didn't even start getting written until December of 2015 (although I wrote the prototype of the concept over Labor Day Weekend in the beginning of September 2015). So in some sense you could say that we've been at this storage game for less than three years.

However, I wouldn't discount a technology simply because it's new. We take data loss very seriously and strive to create a storage engine that is safe for data. The issues linked to in that post are either closed, apply to a previous storage engine, or were recovered through tooling (in the case where a corrupt TSM file was written). Yes, these things take time to get right and there is always room for improvement. Proper infrastructure and planning mitigates these risks. For example, in our cloud environment, we take hourly snapshots of the EBS volumes that store data. We make sure that we're able to recover from a catastrophic failure, even if it is one that was induced by some software bug. Although we haven't seen corrupt TSM files or corrupt WAL files in our cloud environment.

The argument on community size is in a similar vein. Yes, the Postgres community is larger than InfluxDB's. But InfluxDB has a large, vibrant and growing community. PHP has a larger community than Go, but I'm not going to write code in PHP because of that (no offense to the PHP devs). When I was a Ruby programmer I didn't pick it because of maturity or community size. In 2005 barely anyone even knew about it. I picked Ruby (and Rails) at the time because of what I could build with them. More importantly, I picked those tools because of how quickly I could build with them. It also didn't hurt that I connected with the Ruby community and felt like I had found my tribe. So it's possible to have a community that you like and connect with regardless of size.

Ultimately, we've chosen to create from scratch. We've also chosen to create a new language rather than piggybacking on SQL. We've made these choices because we want to optimize for this use case and optimize for developer productivity. It's true that there are benefits to incremental improvement, but there are also benefits to rethinking the status quo. I've heard many times from our users that they liked the project because of how easy it was to get started and have something up and running. We'll continue to optimize for that while also optimizing performance, stability, the overall ecosystem and our community. It means that we invest into tools outside the database to make solving problems with time series data easier. It also means that we've created a storage engine from scratch and we're creating a new language, Flux. We've MIT licensed the language and the engine. This is because we're building Flux to make it work with other databases and systems. Our goal is to build an all new community and ecosystem around Flux, for programmers that are working with data (time series or otherwise).

Finally, some thoughts on benchmarks. I hate benchmarks. There are lies, damn lies, and benchmarks. Particularly in comparisons. You always have to look for what was in, what was out, and if everything was done to favor one solution over another. And yes, we're guilty of putting out the original performance benchmark comparisons. So here's what I want to do for InfluxDB as an ongoing effort. We should be benchmarking, but doing it with workloads that are as close to what we see in real production systems as possible. No bulk loading a bunch of data and then doing a bunch of basic queries while the DB isn't under any other load. Further, I don't want to do comparisons to other solutions. I don't want to do another vendor's work for them. I'd rather focus the benchmarks on continuous improvements against our own builds. Benchmarks are great when they lead to ongoing product improvement. They're also useful if we make them public for the community and our customers so they can see over time how things are shaping up.

We see time series as a massive market with many different offerings, which often have different philosophies. And much of this is about APIs and aesthetic, so for many questions, there isn't really a "correct" answer. Our goal is to focus our product efforts on delivering the best experience for the community and customers who are working with time series data and building applications and solutions on top of our platform. At the same time, we want to contribute as much of our from scratch code back to the open source Go community so that implementors ahead of us can build on our shoulders.




> Our goal is to focus our product efforts on delivering the best experience for the community and customers

After makimg clustering closed source after saying it would be part of the open source version, I think it would be more accurate and far more honest to just say "our customers" not "community and customers".


The vast majority of InfluxDB users are using open source exclusively. There are millions of servers all over the world running open source software built by InfluxData. When I say we're building for our community, I mean exactly that. We continue to put software into the open source ecosystem because it's a core value for our company and as developers we like to share what we're building with the world.

Yes, we build some features (like clustering) exclusively for paying customers, but that is what subsidizes the open source that we continue to build and make freely available. Last year I gave a talk and a related blog post about the dynamics of building a business on open source software: https://www.influxdata.com/blog/the-open-source-database-bus...


> The old "nobody ever got fired for buying Big Blue" argument. It's true that Postgres has been around for much longer than InfluxDB.

Yes. Postgres appeals to managers the same way IBM / Oracle did / do. Right-y-o.




Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: