
InfluxDB v0.10 GA with hundreds of thousands writes/sec, 98% better compression - pauldix
https://influxdata.com/blog/announcing-influxdb-v0-10-100000s-writes-per-second-better-compression/
======
SkyRocknRoll
We are using 0.10 GA in production. We have seen lot of disk space reduction
from 22GB to 700MB and performance improvement order of magnitude. Influxdb is
the modern timeseries database.

Keep Rocking !!

~~~
Shish2k
I wonder what sort of data you're storing with that reduction? I'm using it as
a drop-in replacement for Graphite (with Grafana as a front-end, loving both
BTW) and just had disk space reduced from 140GB to 250MB :D

------
kev009
Be _very_ skeptic of this, I wasted weeks trying to get git master versions of
this not fall over including tsm1 storage engine. They are very good at
marketing and made nice query and API entry, but data storage has been a total
shit show.

Looking at less than a month of issues:

* [https://github.com/influxdata/influxdb/issues/5440](https://github.com/influxdata/influxdb/issues/5440)

* [https://github.com/influxdata/influxdb/issues/5482](https://github.com/influxdata/influxdb/issues/5482)

* [https://github.com/influxdata/influxdb/issues/5534](https://github.com/influxdata/influxdb/issues/5534)

* [https://github.com/influxdata/influxdb/issues/5540](https://github.com/influxdata/influxdb/issues/5540)

If you need something that won't fall over, it's not glamorous and a bear to
setup, but OpenTSDB will sail with massive load once you've gotten it running.

~~~
pauldix
Not sure what issues you're seeing, but we have people running this in
production at significant scale.

One of the issues you linked to was for 0.9.6. Others were there because they
had super high tag cardinality and not enough memory to actually run.

When people post comments like "I have 50 million series and it crashes on my
box with 2GB of memory!!!", they're not relevant. You shouldn't expect
miracles...

~~~
kev009
No, and you're countering rhetoric with rhetoric so here are the specs: bare
metal 96G RAM, 16 cores, 36 disks (also tried 36 SSDs). Nothing exotic, and
something similar you should be using to test before releasing another ill-
fated storage engine based on "It works on my Macbook Pro with an artificial
load test, ship it!". On the same box carbon is usable, and TSDB can scale
linearly across a cluster of these machines. Influx burns down as soon as more
than a few GB of metrics are persisted even after backing off collection.

~~~
pauldix
I'm not quite sure what you're doing. If you have that level of hardware I
don't know why you're running out of memory. Our tests with 100B points split
across 1M series ran on much more modest hardware than that and didn't have a
problem.

There's something different with what you're doing than with what we're
testing. If you can give more detail about what your actual data looks like, I
may be able to help.

Are you doing this on v0.10 (beta1 or greater) and you're seeing this problem?

~~~
jsmthrowaway
> If you have that level of hardware I don't know why you're running out of
> memory.

Did you folks fix the "oops, I accidentally selected too much and OOMed the
machine" crash that was supposed to be resolved by the new query planner in
.10? That was my immediate showstopper for InfluxDB for a very high-volume
infrastructure, because I didn't feel like proxying InfluxDB just to enforce
chunking, and the mere existence of the bug gave me pause. I hit that in 24
hours and shelved the system in 48.

Even pre-1.0, I'm not encouraged when I have to "work around" pretty obvious
oversights in reliability, so I put you back on the back burner to give you
time to mature and transitioned back to a Kappa-style architecture for my
needs. I might still use InfluxDB for low-volume aggregations out of my stream
processing, but it's for sure out of the hot path for the foreseeable future.

~~~
pauldix
The current version still has the problem that if you do a massive SELECT then
it will fill up memory until the process gets killed. In the future we'll give
controls to limit how much memory a query can take.

I think it's common in databases that if you throw a massive query at it that
the server doesn't have the resources to handle, things will go wrong. It'll
thrash, or crash, or generally have poor performance. We'll be working on
improving the failure conditions, but if you put a query to a database that is
too big for the server to handle, some sort of failure scenario will be
encountered. Just a question of how it's handled.

Also, there's no new query planner in 0.10. We have a bunch of work getting
merged in for the query engine for 0.11. But if you throw a huge query at the
DB that the server can't handle, you'll still have great sadness.

Which DB ended up solving your big query problems? Maybe we're just a poor
fit? Or what kinds of queries should we be working on optimizing?

~~~
thatsad
_The current version still has the problem that if you do a massive SELECT
then it will fill up memory until the process gets killed._

That means any ad-hoc query can trivially crash your server. That's pretty
serious. It's even worse if the ingest is using statsd-influxdb-backend with
udp since during the crash you'll lose data.

Do you really think it's ok for a novice user to be able to crash the server
from your web gui if they make a minor mistake (e.g something like

    
    
      SELECT value FROM /series.*/ 
    

not realizing how many series their regex actually applies to until it's too
late and the has gui stopped responding)?

 _I think it 's common in databases that if you throw a massive query at it
that the server doesn't have the resources to handle, things will go wrong._

This thinking is worse then wrong. Not only is it wrong - a simple unqualified
select won't crash any "common" database - it makes anyone with any "common"
database experience doubt your other claims.

------
zenlikethat
I have been forwarding Riemann metrics to it and visualizing with Grafana,
pretty fun so far. Love that it's written in Go and fairly lightweight on
resources compared to JVM-based technologies.

------
CSDude
I just love the ability to GROUP BY queries, they work so fast and saves me
from writing a stupid map-reduce style job for that basic thing. I am building
a time series nice analysis product with it and this change would work very
fine.

------
chaotic-good
_It depends greatly on the shape of your data, but with regularly spaced
timestamps at second level precision and float64s, we’ve seen compression
reduce each point down to around 2.2 bytes per point._

AFAIK you're using float64 compression scheme from "Gorilla" paper. 2.2 bytes
per data point is possible with it but only on data that doesn't utilize full
double precision (example: small integers converted to float64). You should
compare compression algorithm used by TSM storage engine with zlib or any
other general purpose compression algorithm, otherwise this number will be
meaningless.

~~~
Dylan16807
Honestly if you have a series of increasing unix timestamps you should be able
to do much better than 2.2 bytes each.

~~~
chaotic-good
Even simple delta+RLE can squeeze increasing unix timestamps up to few bytes.
I'm talking about floating point data compression. Even the specialized state
of the art algorithms can't guarantee you 2.2 bytes per float in general.

------
kyloon
I had another go at this new release after failing to get any of the 0.9.x
releases to suit my use case of massively high volume writes. Now that I
managed to get some of my data in it, aggregation queries seem to be lagging
in speed, I would like to know more about what is actually being done
currently to improve the query speed rather than waiting out for the v0.11
release as I am deciding whether InfluxDB is the way forward for my use case.

~~~
pauldix
The new query engine work should get merged into master early next week.
You'll be able to test against a nightly build then to see what kind of
performance improvement will be part of 0.11.

What kinds of queries are you seeing poor performance with? That would help us
troubleshoot and improve, thanks.

~~~
kyloon
Thanks for the update. Most of the queries I am testing now are 'mean' and
'group by' aggregation for downsampling and visualization in Grafana. Looks
like the problem is CPU bound as I see all 8 cores got maxed out when
processing the queries. Currently there are about 50 series and 5B data points
in total.

------
bfrog
Been using 0.9 for a few months now and been pretty happy with it. Though I'm
not really looking forward to figuring out how to move from telegraf 0.2 to
0.10, since that had some breaking changes.

I do love the simplicity of influx+telegraf, kapacitor also looks cool.
Chronograf seems like a bad version of grafana still... maybe in the future if
its FOSS and somehow manages to be better than grafana I'd use it.

------
AYBABTME
I don't get all the hate. Reading the comments here yesterday and then today
again, made me really question what's going on in this community.

They (InfluxDB) made huge progress, they work hard on an open source DB (and
ecosystem) and people insult them and question Paul Dix's >30 minute response
time to your support questions in a HN thread.

------
stock_toaster
I would love to see some good (and up to date) documentation on replacing
graphite with influxdb (retention, rollups, etc).

Last time I tried it out I vaguely recall that configuring rollups was kind of
painful -- lots of nearly duplicate CS queries, even for a relatively small
number of series.

------
XorNot
Is the text ingest engine usable with TCP yet? UDP is a bit weird for this
when I may care greatly about whether my server is successfully feeding a
stream of lines too influxdb.

~~~
pauldix
We don't have the TCP listener wired up yet, but it's on the roadmap. Thanks
for bringing it up, I'll try to get it prioritized for sometime in the next
few months.

------
ddorian43
They did the mongodb approach. Build a shitty db and then build a normal one
and claim 100X faster.

And then add synchronized disk commits (like postgresql has always done) and
performance goes puff (meaning lower than pg) (example: elasticsearch,
mongodb).

~~~
fapjacks
This is not the first time I have seen an anti-InfluxDB post that also subtly
pushes PostgreSQL in its place. Now I totally agree and I am not happy with
InfluxDB's performance, and maybe I'm just being paranoid, but it's
interesting I've seen this a handful of times.

~~~
pauldix
Have a look at this release. The write performance is massively better than it
was. We did tests writing up to 100B points against a single server.

The next release will focus on improving query performance, but this one still
works for many cases.

~~~
officialchicken
"Massively better" doesn't mean deployable to production either. Especially
not if you've only tested with a single point of failure with a single sized
payload. But I'm willing to give it a third (and final) trial:

1) Does "DELETE * FROM foo" still cause the system to lockup, freeze, and
require a restart to free memory? Or are there other conditions/queries that
cause the system to become unstable?

2) There's no README/CHANGELOG/dependencies info on the download page. Which
is the preferred version of Go to install on my servers for Influxdb - 1.4 or
1.5?

~~~
jtblin
I'd guess go version doesn't matter, you should install the binary which is
statically linked, no?

------
thecourier
can you efficiently store logs in InfluxDB? would love to read if somebody
have tried that before... (just wondering)

~~~
piran
We store some logs at a small scale (5k/s) and queries are fine 0.9.5. You
just have to be smart how you do tags still. Hopefully 0.10 is the same.

------
eklavya
A lot of comments have mentioned Time Series use. How does it compare to
Cassandra?

------
illumen
But does it work? It wins the most buggy database servers I've ever used in 20
years award.

Is writing less than 20MB/sec of data something to brag about?

~~~
sytse
We're using it with GitLab.com and we like it. 0.9 didn't work at all due to
the volume but with 0.10 everything is functioning OK

~~~
fweespeech
A full cluster or a single node?

~~~
YorickPeterse
We currently run InfluxDB 0.10.0-nightly-614a37c (I have yet to upgrade it to
the stable release) on a single DigitalOcean instance with 8GB of RAM with
30-something GB of storage. The previous stable release (0.9 something) didn't
fare very well, even after we significantly reduced the amount of data we were
sending (we were sending a lot of data we didn't really need).

Switching to 0.10.0-nightly-614a37c in combination with switching to the TSM
engine resulted in a very stable InfluxDB instance. So far my only gripe has
been that some queries can get pretty slow (e.g. counting a value in a large
measurement can take ages) but work is being done on improving the query
engine
([https://github.com/influxdb/influxdb/pull/5196](https://github.com/influxdb/influxdb/pull/5196)).

To give you an idea of the data:

* Our default retention policy is currently 30 days

* 24 measurements, 11975 series. Our largest measurement (which tracks the number of Rails/Rack requests) has a total of 28 539 279 points

* Roughly 2.3 out of the 8 GB of RAM is being used

* Roughly 4 GB of data is stored on disk

This whole setup is used to monitor GitLab.com as well as aid in making things
faster (see [https://gitlab.com/gitlab-
com/operations/issues/42](https://gitlab.com/gitlab-com/operations/issues/42)
for more info on the ongoing work).

~~~
fweespeech
Thanks for the information. :)

Unfortunately, I need 2+ instances with Active/Active or failover to seriously
consider anything for production which is why I've not touched InfluxDB beyond
some light testing.

------
la6470
I don't like the fact that old data will continue to use the old engine after
the upgrade.

~~~
jmcgough
There's a migration script that can be used for moving old data to the tsm
engine.

------
effenponderousa
I have nothing to do with influx and I probably won't in the future.

There are way too many haters in HN. You venomous minority who shit on every
bit of good news that isn't yours -- keep your negativity to yourself.

You fucking monkeys infected with rage.

~~~
donatj
On the one hand I agree with your sentiments, on the other hand I disagree
with your approach.

~~~
la6470
Looks like all the hordes of oldies from slashdot is now coming to HN

