Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I love influx but damn do they like moving (too?) fast and quickly changing stuff. In a way, it's pretty cool since it means that they don't get stuck with bad decisions for backwards compatibility reasons, but it's a bit of a roller coaster for users.

Not sure what's the best solution though. Having a "stable" but fundamentally limited product (I guess influxdb v1) or breaking stuff in hopes of ending up with a way better technical foundation.



We're migrating off of InfluxDB due to that rollercoaster, honestly. It's hard enough to find time to maintain the monitoring stack at work. Casually dropping "Oh, and now you get to rebuild the entire grafana to change the query language" on that doesn't help. And apparently, version 3 does the same thing, except backwards.

Sorry, but at that point, we've decided to rebuild the entire metric visualization once on TimescaleDB, since we're running postgres a lot anyhow.


Fair warning, I had serious scaling issues with Timescale.

Solutions like Grafana Mimir, Victoria Metrics, Clickhouse, or yes, the new Influx implementation, are much more scalable and will give you much fewer headaches.

ClickhouseDB is realy brilliant, btw, it's a powerhouse. Especially with the fairly recent additions that enable hybrid local + S3 option, pushing older metrics to S3 for cheap long-term storage.


Agreed. We initially used Timescale for our GraphQL Metrics product[0] but very quickly ran into scaling & performance issues. We switched to Clickhouse and have scaled 10.000x+ since with almost no issues.

[0]: https://stellate.co/graphql-metrics


Also, Timescale similarly introduced S3 for bottomless data tiering:

https://www.timescale.com/blog/expanding-the-boundaries-of-p...


Only for their managed solution at the moment.


If you are open, would love to hear more about some of the challenges you had with Timescale, esp. with your workload.

mike (at) timescale or DM on twitter?


The cloud storage option for CH looks like a game changer for time-based data. Any concerns there about accidentally causing "trashing" when cold data is needed? I believe the MergeTree system works by splitting the table into parts during insertion that then later get merged together, so you have to careful that during merging, only the "hot" data is touched, otherwise you'll start pulling in cloud storage data that's supposed to be effectively read-only.


The merging happens primarily on hot data, I haven't run into any issues there.

But there are lots of approaches, depending on your needs.

You can (should) define a "cache disk" for S3, which will cache up to X Gb locally to avoid trashing.

Another option is is to move data into separate (purely S3 backed) tables after a certain time to avoid accidentally fetching large amounts of data from S3. You can still easily join the data together if needed.


Can you elaborate on what scaling issues you had with Timescale?


I ran into issues with TS too. Main issue I recall now was maintaining a grand total count of events that were already rolled up into daily counts was not fast since it always looked back at all the data. There was no way in the ts patterns to efficiently express it without handrolling something. The issue was that a grand total count can’t be expressed in terms of a hypertable because there’s no time column.

It’s fantastic for workloads that neatly fit in the hypertable pattern though.


You tried to use a timeseries database without a time column?


> Don't be snarky.

https://news.ycombinator.com/newsguidelines.html

Are you expecting a real answer of how I was hoping timescale's internal watermark system would help me roll up a total count or are you just implying I'm an idiot?


What's the difference between the watermark and a time column? Not the person you were replying to but I'm curious since I also thought that TimeseriesDB had a similar "timestamp" to influx.


TS requires you to specify a time column in each hypertable or view (continuous aggregate) where you want it to work its magic. It then stores an internal watermark that it compares to the time column in the table to figure out from where to read when refreshing.

My issue was that for a grand total I didn’t have a time column, so I couldn’t define my query as a continuous aggregate and the query had to start counting from the start of my underlying series each time.


Perhaps: add a time column with an artificially huge time range? This defines the grand total as an interval sum that happens to include all possible intervals.


Same here. I joined current company 3 years ago when Influx v2 was coming out. I was supposed to build some analytics on top of it. It was very painful. Flux compiler was often giving internal errors, docs were unclear and it was hard to write any a bit more complicated code. The dash is subpar to graphana but graphana had just raw support. There was no query builder for flux so I tried building dashboards in influxv2 but the whole experience was excrutiating. I still have an issue open where they have an internal function incorrectly written in their own flux code and I provided the fix and what was the issue but it was never addressed. Often times I had a feeling that I found bugs in situations that were so basic that it felt like I was the only person on the planet writing Flux code


I'm running InfluxDB 1 and 2 in parallel for a personal project, waiting for v2 to get mature and stable enough to replace v1. It's never happening I guess. v1 still works great for me.


We are influxdb enterprise customers and looking to do the same thing. They've kept their enterprise offering on 1.x, which has kept us mostly happy, but seeing what's going on in their OSS stuff is horrifying and we're looking to avoid the crash and burn at the end of the tunnel.


We announced the availability of the v3 successor to Enterprise v1. It supports the v1 API. We're still building data migration tooling, but if you're interested in testing it out just email support or your sales rep.


How does one upgrade from v2 beta to the latest v2? The docs for doing that seem to no longer exist https://github.com/influxdata/influxdb/issues/24393


To be honest I'm not sure. Upgrading individual releases on the way should take you there, but the v2 beta was quite a while ago.


Are you running the "OLAP" TimescaleDB on the same instance as your regular OLTP Postgres? This is the only reason I would entertain TimescaleDB, if I had a strict "1 server" requirement. I briefly deployed and looked into it and there were a lot of footguns like with compression.

If not, I would suggest looking at a proper OLAP DB. VictoriaMetrics has been great and was easy to set up.


We're much rather looking at reducing the number of technologies we have and exchanging one specialized one-use database for another one doesn't sound great. And sure, TimescaleDB is a hefty extension and will require some work to understand it, but things like HA, backups and overall management of Postgres are pretty much solved for us.

And beyond that, TimescaleDB works with a few things we have already. We could migrate Zabbix to use TimescaleDB for a large performance boost. Also 1-2 teams are building reporting solutions for the product platform, and they are generating some significant timeseries data in a Postgres database as well.


That's a fair point, but it's worth appreciating the fundamental differences between OLTP RDBMS and OLAP timeseries. I'm not saying deploy N different DBs, I'm saying pick a good OLTP solution (Postgres) and a good OLAP solution.

My bad experience with TimescaleDB 3 years ago was that enabling compression required disabling the "dynamic labels" feature, which was a total nonstarter for us. A proper timeseries DB is designed to achieve great compression while also allowing flexibility of series. Hopefully Timescale will/has fixed that without adding another drastic perf tradeoff, but given how Postgres is architected for OLTP I would be surprised.


Can you say more about "dynamic labels"? Do you just mean that as you evolve, you want to add a new type of "key-value" pair?

The most common approach here is just to store the step of "dynamic" labels in JSON, which can be evolved arbitrarily.

And we've found that this type of data actually compresses quite well in practice.

Also regarding compression, Timescale supports transparent mutability on compressed data, so you can directly INSERT/UPDATE/UPSERT/DELETE into compressed data. Under the covers, it's doing smart optimizations to manage how it asynchronously maps individual mutations into segment level operations to decompress/recompress.

(Timescale cofounder)


>My bad experience with TimescaleDB 3 years ago was that enabling compression required disabling the "dynamic labels" feature, which was a total nonstarter for us.

What is the "dynamic labels" feature? Is it a part of Postgres or Timescale?


It's a Timescale feature that allows you to "just insert" metrics without first doing a schema migration to make sure a possible new label schema is supported. I.e. a feature that is taken for granted in native timeseries databases, which don't have to work around RDBMS schemas.

I assume it's doing an automatic ALTER TABLE when necessary, which modifies each row and somehow breaks compression across the sharded tables. Or at least an automatic re-compression would cause massive latency on insert that they wanted to avoid.


You threw out a database because it didn’t offer compression in your specific use case? That’s it?

Just solve compression on the block level, why are you so specific about it happening in the database? It’s probably one of the least interesting feature comparisons when betting on which database to trust.


Not all compression is created equally. Timeseries data has well-defined characteristics that generic block level compression doesn't understand. It's a great example where application-level compression is worthwhile. The proof is in the pudding.


A reason I would still bias towards postgres is the maturity of managed solutions (including decoupled compute/storage solutions like Aurora and AlloyDB).

Are managed "proper OLAP DB" solutions competitive with managed RDBMS from a price and ease of use standpoint?


Using TimescaleDB from a managed provider is limited, unless of course that provider is Timescale. Other managed providers are only permitted to use the TimescaleDB Apache 2 Edition.

This link has a comparison of features[1].

[1] https://docs.timescale.com/about/latest/timescaledb-editions...


Another alternative to timescale could be hydra. Haven’t tried it myself but the promise of columnar tables seems wildly useful.


What are you migrating to?


It’s funny how for the longest time, I was upset with how slowly the web moved. At times I wished they wouldn’t care as much about backwards compatibility.

But now with these VC-funded tech products that have spawned over the last 5-7 years, who have a move-fast-and-break-things attitude, I’m seeing the benefits of the old approach.

I suppose it’s all a matter of trade offs, as with all things, and there’s no silver bullet.


We just left it. Too many changes, new query language is incomprehensible to drive-by-graphing, and rest of the industry seems to be building around PromQL/Prometheus.

Victoriametrics so far works very well.


I'm honestly surprised the CTO is still employed.


serious question on behalf of the uniformed, why? I feel that society encourages people to double down and be consistent even if they are armed with better information. we could be better if we didn't have to stick to the one true path, no?


He is a founder.


Been using both 1.x and 2.x for telemetry (oss & paid both). I am pretty excited with 3.x's interoperability. Archiving to standard data formats makes the data science team's job's easier, and with a more standard ANSI SQL query engine with jdbc support, and high cardinality tags, it will greatly speed up front end development and analysis use cases.

As well, I am one of those folks that happens to find the Flux query language powerful, but it's not easy enough for folks to just make that jump from SQL. Flux is much closer to Splunk's search language. It is good at what it does. FluxQL doesn't even have date parsing (which is really odd for a time series query language), but FlightSQL in 3.x seems to be more complete.


Yes I think v3 is pretty solid, and it's nice that they are still supporting v1 and v2. But I think the "migration" from v1 to v2 was the "painful" part. Not because it was too hard to migrate (I guess you don't even have to, since it's still supported), but because it introduced a very different approach, that was supposed to be the future of influx, that was just basically dropped in the next release. I think some commitment towards v3 might help in that regard. As you said, flux is powerful and took some time to get used to but it's now basically useless if you took the time to get into it.

I like that they are converging towards SQL, but at the same time it's a bit like going back to square one. They seem more convinced about going full SQL this time though, but yeah

Just searching for this, I stumbled on this documentation page that illustrates the point very well:

https://docs.influxdata.com/influxdb/v1/query_language/

In the same page (about the original influxql in v1), there is a depecration notice for v1 stating that v2 is the stable version, implying that InfluxQL is not recommended. And a pop up notice stating that v2 (flux) is basically deprecated and just in maintenance mode, and that you should use InfluxQL. But as I said in my earlier comment, I guess in some ways that's better than being too rigid and sticking with bad or less ideal technical decisions.


For v3 we're supporting InfluxQL natively in addition to SQL. The InfluxQL implementation is actually just a front end on top of DataFusion, the SQL engine we use.

We really wanted to bring Flux along too, but found that it was too difficult in the near term to have it work well with v3. We spent a bunch of time building a gRPC API that Flux uses to talk to v3 (the same thing we have in our Cloud v2 product), but that API was designed with the previous storage engine in mind. It ended up being brittle and performed very poorly.

So at this point the long term supported languages for InfluxQL and SQL, but we're continuing to support Flux for our customers.


I've always had a soft spot for influxdb after using it for a self hosted datadog/newrelic etc solution many (6+) years ago with great success. Still use it in conjunction with telegraf and grafana for personal project monitoring, but I've not brought myself to upgrade from the 1.x series.

Hopefully it's improved, but last time I tried upgrading I found the UX in grafana to be subpar on the newer versions, as I recall you lost the autocomplete/UI to build your queries. Obviously grafana is it's own project but feels like they (influx) should invest more resource in areas like this to encourage people to upgrade - if you're going to do major upgrades make sure they have feature parity


Like you, I've stuck with Influx v1, Telegraf and Grafana. My policy is to upgrade only when there are significant reasons to. When I evaluated InfluxDB 2, there were no major reasons for me to switch. Of course, the data ingested in my case is relatively small. YMMV.

I looked at TimescaleDB but at the time there was no easy way to get data from Telegraf to TimescaleDB. Telegraf finally merged code that allows writes to Postgres databases, but it took like 3 years to do that.

Ultimately, I still stuck with InfluxDB v1 because sending data to it via the InfluxDB line protocol is so simple. I have a couple of bash scripts that use awk to transform command output to Influx line protocol and send it to InfluxDB. It's just so simple. I love it.

I love learning about new things, but the InfluxDB v1 keeps working fine so I may not switch from it until something forces me to do it.


Was very similar for me. No good reason to go V2, the new query language, on top of sucking for anyone that only uses it once every few weeks, also wasn't really supported well in Grafana.

I ended up trying VictoriaMetrics by near accident as infuxdb didn't like something on my raspberry pi, and honestly it has been pretty painless. It is Prometheus-like stack which means you can use any PromQL-compatible things with it. There is "all in one binary", and version split by functions.

VM have tools to migrate from InfluxDB v1. I ended up just sticking old influxdb data in one database, as I wanted to change the format of what I write to it along with the migration.

> Ultimately, I still stuck with InfluxDB v1 because sending data to it via the InfluxDB line protocol is so simple. I have a couple of bash scripts that use awk to transform command output to Influx line protocol and send it to InfluxDB. It's just so simple. I love it.

It also have agent that's job is to convert from various protocols, and do the scraping, that includes influxdb, and few other popular protocols.


I've heard good things about VictoriaMetrics. I'll have to carve out some time to check it out.

Thanks for sharing your experience.


have you given QuestDB a try? it includes its implementation of the InfluxDB Line Protocol, adds SQL for queries and can sustain a higher ingestion rate, without high cardinality limitations


I delayed upgrading to flux and finally bit the bullet this summer, and a month later read the announcement deprecating it.

Next time around I'm going to give TimescaleDB a look.


>they like moving fast

They are always… in flux * sun glasses on*


We moved from Influxdb to Prometheus for this reason. Influxdb is far more powerful, but ain't nobody got time to fix all the graphs in Grafana or learn the very mathematical like QL.

If we had dedicated personell to manage our monitoring we might have stuck with it.


Sounds like tech I wouldn't wanna depend on.


We're trying to make the transition from v1 to v3 easier by brining the write and query APIs from that version forward. We wanted to do the same for Flux, but found it was too difficult in the near term. We might be able to do something in the future, but for now we're focused on making core improvements to the v3 engine.

We'll have data migration tools for v1 and v2 into v3 later this year/early next.


Thanks for your reply! Dumb question (I couldn't find a definitive answer) but will v3 InfluxQL be compatible with v1? Is there an article about the changes between v1 and v3?


Yes, the goal was that for anyone with Grafana dashboards or queries elsewhere, they wouldn't have to rewrite them. Just point at v3 and pretend that it's a v1 database (use the v1 API).

But there are a few things that aren't there. Continuous queries, SELECT INTO, and anything that modifies data isn't there.


Why would you find that cool? It's anything but unless it is a personal project. If people depend on your work, that is irresponsible.


I guess I'd rather have that than ossifying on a completely flawed architecture. Apparently flux was kind of a dead end, and while it's super risky and illustrates issues in decision making, it's still better than just doubling down on something that their own team consider to be futureless or too flawed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: