
InfluxDB 2.0 Alpha and the Road Ahead - pauldix
https://www.influxdata.com/blog/influxdb-2-0-alpha-release-and-the-road-ahead/
======
petetnt
Been using InfluxDB for a rather long time now and while I've had my share of
the problems (including
[https://news.ycombinator.com/item?id=17768860](https://news.ycombinator.com/item?id=17768860))
generally I do like the ecosystem a lot. When it works it just works and it's
awesome.

That said while TickScripts are not exactly a pleasure write, I wouldn't say a
new language is pretty high on my wantlist. For example, as the blog post
states:

> We also need to add features for backup & restore, bulk data import and
> export, and data deletes.

I'd take any of those over a new query language any day now. Especially the
restore part is at the moment nearly completely non-existing after backup
format change in InfluxDB, so good luck if you happen to run into data loss or
just want to move around data in general. Same goes for data deletion outside
of whole measurements.

I am hopeful that those features arrive before a 2.0 launch and hopefully in
some sort of backwards compatible way (or at least ported to Influx 1.x).

~~~
e-dard
Hi, I'm one of the engineers working on the storage side of InfluxDB.
Improving the performance of adhoc deletes, as well as import/export
(backup/restore) are features that my team will be actively working on in the
coming weeks and months.

~~~
petetnt
Hi @e-dard! That's great to hear, looking forward to next releases! Keep up
the good work!

------
pauldix
InfluxDB creator here, happy to answer questions and add more commentary here!

~~~
ezrast
I tried out InfluxDB a while ago in my spare time and was intrigued by the
feature set, but ultimately couldn't get past the abstruse query language,
especially coming from the simpler and more flexible PromQL (not being able to
do ad-hoc math across time series was a big deal for my use case). I'm eagerly
looking forward to giving it another shot with Flux and have super-high hopes.

What does the data model for time series look like in 2.0? Mostly the same as
1.x, or has that gotten more flexible as well?

~~~
pauldix
For now we take writes in the 1.x line protocol so it’s still measurement,
tags, and field. However, Flux doesn’t really make that a requirement. So in
the future we plan on having a way to write series in without requiring a
field or even a measurement.

Once the planner gets the data to the Flux processing engine it views
everything thing as a table of data with columns and records. So it’s much
more flexible in how we can represent data.

~~~
ezrast
Sounds good; thanks for the reply!

------
dekhn
Backup and easy restore (not having to run SQL statements to switch up tables)
is the thing I care about the most.

I agree with the folks below suggesting you use javascript as the language
instead of reinventing your own.

------
nerdbaggy
I really like what Influx is doing. I find getting information in and out much
easier in Influx than most others.

~~~
exabrial
There's a canned dashboard for that actually. Setup the telegraf input for
your influx instance and it can tell you all kinds of neat stuff about what
it's doing internally.

------
geekybiz
To other users of InfluxDB : How do you read data if you want to group,
filter, sort by something other than time?

We used to use InfluxDB to store our perf data where every data point
contained a timestamp (thus we thought influxdb was ideal for our use case).
But, soon we wanted to group, filter, sort by various dimensions and it lead
to performance issues.

Bringing it all in-mem and using pandas to do that was very slow for us. Also,
creating indexes for so many columns didn't seem like a good idea.

We switched to postgres and the decision has served us well so far. I just
want to understand if influx isn't suitable to our kind of use-case or we used
it incorrectly.

~~~
pauldix
InfluxDB 1.x was definitely not designed for that. With Flux in InfluxDB 2.0
you will be able to do things like store reference data in other places and
join that with time series data in InfluxDB at query time. You can also sort,
group and filter by any measurement, tag, field, or value. However, there are
no user defined secondary indexes so the scope of this will be a bit more
limited based on how things are stored. I'd have to know more about the
specific kinds of queries to figure if it's something that would make sense
within InfluxDB 2.

------
exabrial
> Kapacitor is the killer app

Um. Yes. Wait, there are people that use influxdb _without_ kapacitor
alerting??? Someone name any commercial or open source equivalent, b/c I'd
like to know in case they kill it.

~~~
endymi0n
I haven‘t had any other needs since switching to Prometheus / Prometheus
AlertManager / Grafana and a Slack binding.

AM looks ugly as hell („designed by engineers“), but it‘s damn rock solid and
versatile.

Tried Influx in the first days, but it was slow and buggy and they changed
their paradigms every other version. Well, looks like they‘re doing it
again...

~~~
h1d
How do you store long term data?

~~~
ptman
Prometheus can write data to another system for long term storage:
[https://prometheus.io/docs/operating/integrations/#remote-
en...](https://prometheus.io/docs/operating/integrations/#remote-endpoints-
and-storage)

~~~
h1d
I know. As the OP didn't like Influx, I was wondering which other solution is
used.

------
viraptor
I haven't seen flux and a few other things before, but it sounds really
exciting. It looks like influx people release a lot of this info (and related
tech like grafana) on their YouTube channel:
[https://www.youtube.com/channel/UCnrgOD6G0y0_rcubQuICpTQ](https://www.youtube.com/channel/UCnrgOD6G0y0_rcubQuICpTQ)

Good content, it's a shame it's not more popular.

~~~
s17tnet
They quote something like 8 grand-a-node. It is a bit off the the 'popular'
zone.

~~~
viraptor
You can still do a lot with the open version. Especially if you need it for a
small/side project, it's great.

If you really need the HA, you can start spending on the enterprise edition.
Or move to another storage.

------
jrockway
What's the story regarding high availability for InfluxDB 2.0? I know it is a
commercial offering in the current release, and while I'm not super thrilled
about that, I do think it's reasonable. It would be nice if you could buy the
self-hosted version with a "contact sales" step which I honestly will never
do. (I would just put a proxy in front of the multiple instances that tries to
write to as many of them as possible and does reads from one at random. What
could go wrong!)

The other thing I never figured out in the current version is how to write the
following query. I store samples like (device, network, direction) -> packet
count. I then want to know how many packets were sent across the network in
total.

With monitoring systems I've used in the past (internal to my former
employer), this was easy. You would do the delta calculation at the lowest
level to convert device packet counters to number of packets sent in the last
time interval, which varies randomly because samples are not necessarily
arriving at discrete intervals. Then you would do an align, to bring the
randomly-added sampling times into alignment across all the "streams" (which
is a unique (device, network, direction)). Once the data is aligned, you can
then do a group by to get rid of a certain tag, like device (and just have
(network, direction) -> total packets sent in the time interval).

With InfluxDB, I can't figure out how to do this. It has the group by time
concept, but not alignment, so I can't write a query that will work. I ended
up just computing the deltas before inserting the item, at which point
everything works fine.

I have not tried the same query with Prometheus, but I suspect it would work
like I expect, as it seems very heavily inspired by a certain internal
monitoring system I am most familiar with.

~~~
pauldix
This release of 2.0 is not a commercial offering. It's completely free and
open source under an MIT license without restrictions of any kind. For
example, you could use it as the basis of your own commercial offering or use
code from it and that is all fair game.

For the query you mention, you can do that in 1.x using a combination of group
by time, fill and subqueries. In 2.0 you can do this in Flux using similar
operators. More complex possibilities for interpolating missing data is also
in the works.

Solving more complex query processing like what you mention is a specific
reason why we started developing Flux. We couldn't figure how to address some
of the more advanced query feature requests in the old language so we started
with something new.

In addition to giving Flux more functions, flow control and more power, we
also will be adding more out of the box functions and syntactic sugar to make
some of these more advanced queries possible without being overly verbose and
complex. Our order of operations for developing the language:

1\. Make it powerful 2\. Make it easy 3\. Make it fast

So it'll take us time to get to all of our goals, but I think we've laid down
a pretty good foundation.

