
ListenBrainz moves to TimescaleDB - kingkool68
https://blog.metabrainz.org/2020/07/22/listenbrainz-moves-to-timescaledb/
======
mfreed
Fun fact: TimescaleDB exists because we were using InfluxDB + Postgres for a
previous IoT project and also found it unworkable (developer experience, query
language, reliability, scalability and performance, operations, etc).

We first built TimescaleDB as "Postgres for time-series" for our own needs and
then decided to open-source it for others. :-)

~~~
zitterbewegung
Have you thought of making a TimescaleDB app like the Postgres.app for macOS?
Or could I use Postgres.app to make a TimescaleDB app?

~~~
avthar
It seems you can use the Postgres.app and install TimescaleDB on it

Here's some instructions on how to do so:
[https://github.com/slashdotdash/til/blob/master/postgres/ins...](https://github.com/slashdotdash/til/blob/master/postgres/install-
timescaledb.md)

------
120bits
Interesting read and thanks for sharing.

Not too long ago, I was asked to work on some analytics project and it
required time-series data. I'm not a rockstar programmer and don't really know
much about trends. So, I ended up googling and stumble upon InfluxDB. It felt
like that right choice and I started playing with it. As the time passed, I
realized that it might be a good software and I'm sure people love InfluxDB,
but it wasn't the right choice for me. I didn't really like the docs, maybe
its good now. And I had the same feeling about query syntax, it felt weird.

I moved to TimescaleDB and never looked back. I have it production for almost
2 months now. 20 tables and over 100Million writes/week. One of things I
really liked was staging, I don't use docker and or anything fancy. I have
bash script that and it runs on centos box and all timescale extension and
postgres database are packaged together.

I was impressed by the timescale compression feature. I wasn't using it
earlier because I had to be careful about what columns I need to segmentby. I
would love to see some more features but I'm sure timescaledb team is already
on it.

~~~
mfreed
Hey 120bits - thanks for the nice words!

What new/other features would you like to see? (Also feel free to join
slack.timescale.com or reach out at mike (at) timescale.com)

------
pgt
+1 on escaping measurement names. Quoting from their source code:

    
    
        def get_escaped_measurement_name(user_name): # ... comment omitted
            return '"\\"{}\\""'.format(user_name.replace('\\', '\\\\\\\\').replace('"', '\\"').replace('\n', '\\\\\\\\n'))

~~~
hoseja
That looks somehow worse than regex in C literal.

~~~
contravariant
Although it seems about par for the course for regexp in elisp.

------
iliekcomputers
Hey! I've been working on ListenBrainz [0] for the past 3-ish years. Happy to
answer questions if anyone has any.

[0]: [https://listenbrainz.org](https://listenbrainz.org)

~~~
jeffbee
I read that whole page and I have no idea what the project does.

~~~
iliekcomputers
Yeah, we need to fix that landing page. It's basically an open repository of
your music listening history.

------
RedShift1
I'm in the process of moving from InfluxDB to TimescaleDB myself and can't
wait to get rid of the hoops I have to jump through to get InfluxDB to answer
some basic questions, mostly stemming from the fact that InfluxQL doesn't
support boolean expressions. Something like 'SELECT MAX(temperature) > 10
FROM...' doesn't work.

~~~
samhld
(InfluxData solution architect here) Boolean is supported. You query it in the
WHERE clause. Try `SELECT MAX(temperature) FROM ... WHERE temperature > 10`.
That said, I'm not sure why you'd run a query like that in InfluxQL as it's
the same as `SELECT max(temperature)`. :).

~~~
RedShift1
It's not the same thing. 'SELECT MAX(temperature) > 10 FROM ...' gets you all
the datapoints, with a value of true or false. Moving it to the WHERE clause
only gets you the datapoints where temperature is > 10\. Yes, you can fill
with 0 after the GROUP BY, but if _all_ datapoints are less than 10, you get
nothing back from the database. That's confusing: were all datapoints less
than 10, or was there nothing registered in the time window I'm querying?
Impossible to tell the difference. Not to mention some user interfaces just
bailing: no data. I need to show a chart with a timeline of this condition
being true or false.

Plus my actual use case is even more complex, not only do I need something
like MAX(temperature) > 10, I need (MAX(temperature)) > 10 &&
(MAX(temperature) - MAX(dewpoint)) > 4.5).

~~~
samhld
Yep you're right -- oversight on my part. To be clear, the "not sure why you'd
run a query like that" was referring to doing it the InfluxQL way (which is
not the same thing)...where your results would end up being the same
controlling for time range.

------
akulkarni
(TimescaleDB co-founder). Thanks for the kind words! I feel especially proud
about the first point "openness" \- this is something we strive for both
technically and culturally.

For example, we have a pretty active Slack channel[0] where you can ask us
anything. We've probably given away $$$$ of free support over the years ;-)

[0] [https://slack.timescale.com/](https://slack.timescale.com/)

------
gregors
We too started off with influx but it wasn't a good fit mainly due to use
having issues with high cardinality. I don't know if this is still the case
with current implementations, but what it boils down to is if your data is
searchable by a "user_id" really look elsewhere. That might be an
oversimplification but that's the gist of it.

I was fully ready to just roll my own partitioned table and gave TimescaleDB a
shot. It worked well. There was a bug we ran into, but it was an existing one
documented on github and was addressed pretty quickly.

I still like influx, and would use it again but beware of the cardinality
issues.

~~~
valyala
If you have cardinality issues in InfluxDB, then just substitute InfluxDB with
VictoriaMetrics :) [1]

[1] [https://medium.com/@valyala/insert-benchmarks-with-inch-
infl...](https://medium.com/@valyala/insert-benchmarks-with-inch-influxdb-vs-
victoriametrics-e31a41ae2893)

------
awinter-py
timescale is a postgres extension. 'postgres as a platform' is an interesting
world to live in.

postgres built-in RBAC is clunky or people would be relying on it, but an
ecosystem of postgres plugins could include cleaner or smaller versions of
this feature.

Even things like complex migrations (github's gh-ost, for example) could exist
as DB plugins.

------
decafninja
As someone who wants to pick up a time series DB to learn, what would be the
best in terms of being the "industry standard"? InfluxDB? TimescaleDB?

I'm familiar with some basics of kdb and use it often in my day job, but from
what I understand that isn't widely used outside of finance?

~~~
valyala
The following time series databases are popular right now:

* ClickHouse (this is a general-purpose OLAP database, but it is easy to adapt it to time series workloads)

* InfluxDB

* TimescaleDB

* M3DB

* Cortex

* VictoriaMetrics

The last three of these TSDBs support PromQL query language - the most
practical query language for typical time series queries [1]. So I'd recommend
starting from learning PromQL and then evaluating time series databases from
the list above.

[1] [https://medium.com/@valyala/promql-tutorial-for-
beginners-9a...](https://medium.com/@valyala/promql-tutorial-for-
beginners-9ab455142085)

------
osigurdson
In some cases it is difficult to define the table columns up front. Instead, a
few tables: Object, Property, Time and Value (example below) are defined which
make it possible to create new items on the fly. This works reasonably well up
to a few billion records in the value table. However it does end up taking a
lot of space (covering indexes/requisite memory are required for performance).
It would be great to see a Postgres compatible solution that solves this
problem in a more optimal way than a stock RDMS.

Object objectId objectName other...

Property propertyId objectId FK propertyName other...

Time timeId time other..

Value timeId FK propertyId FK value

------
silvester23
> If you ever write bad data to a measurement in InfluxDB, there is no way to
> change it

Correct me if I'm wrong, but I'm fairly certain you can just write data with
the same timestamp again and it gets updated. Deleting is also easily
possible.

~~~
iliekcomputers
ListenBrainz dev here. We wanted stuff like the ability to do stuff like
"DELETE FROM measurement where field = blah" and that support didn't exist
last time we looked. [0]

Another thing that's not mentioned in the post but was a pain point for us was
that it's not easy to query for fields with "null" values [1].

I figure a lot of our pain might be because we're not as good at Influx as we
are at PostgreSQL. We've been running MusicBrainz[2] for ~18 years on
PostgreSQL, that knowledge will hopefully transfer over a little with
Timescale.

[0]:
[https://github.com/influxdata/influxdb/issues/3210](https://github.com/influxdata/influxdb/issues/3210)

[1]:
[https://github.com/influxdata/docs.influxdata.com/issues/717](https://github.com/influxdata/docs.influxdata.com/issues/717)

[2]: [https://musicbrainz.org](https://musicbrainz.org)

~~~
silvester23
True, deleting by value does not seem to be possible. I can see how that would
be painful if it's necessary for your use case.

------
rweichler
Figured I'd use this as an opportunity to plug my own service:
[https://eqe.fm](https://eqe.fm)

Only works on jailbroken devices but it works well, has a local backup, and
has been maintained (by me) for 2 years now.

Server costs are $2.50/mo, so this will stay up as long as I am alive.

------
jbmsf
I'd love to hear more about how your data ingestion works. I'm thinking of
implementing TimescaleDB myself, but in my initial read of the docs, the focus
seemed to be managing the database, not getting data into the database...

~~~
dominotw
same way you'd insert data into postgres.

~~~
jbmsf
That's not really helpful. Let's assume you have a distributed system; you
probably don't want all of your system components connecting directly to
TimescaleDB. You also probably want to have some layer that implements queuing
and handles back pressure if it can't insert into the database at the rate
that events are coming in. You may want to batch insert data.

I'd assume that most anyone building a system like this at scale has to solve
these problems; does everyone roll their own?

~~~
k-rus
There are number of tools, which can be used in front of TimescaleDB or
PostgreSQL. There are plugins for Telegraf, Kafka, Prometheus as described in
[https://docs.timescale.com/latest/using-
timescaledb/ingestin...](https://docs.timescale.com/latest/using-
timescaledb/ingesting-data)

Timescale builds connector and entire workflow to run Prometheus on top of
TimescaleDB and support Grafana in flexible way:
[https://github.com/timescale/timescale-
prometheus](https://github.com/timescale/timescale-prometheus)

~~~
jbmsf
Perfect. Thank you for the reference.

------
iEchoic
Has anyone used Prometheus as well as TimescaleDB in production and have
thoughts to share on those, comparatively?

~~~
akulkarni
We have quite a few in our Slack channel: slack.timescale.com

Feel free to ask over there :)

(Btw - TimescaleDB is designed to work with Prometheus. You can see more here:
[https://github.com/timescale/timescale-
prometheus](https://github.com/timescale/timescale-prometheus))

------
thejosh
I really want to love timescaledb, it's great.. except for the minor issue of
not being able to back up.

[https://github.com/timescale/timescaledb/issues/1835](https://github.com/timescale/timescaledb/issues/1835)

~~~
akulkarni
TimescaleDB definitely supports backups :-)

Here is a page from our docs on how to perform Backup & Restore:
[https://docs.timescale.com/latest/using-
timescaledb/backup](https://docs.timescale.com/latest/using-
timescaledb/backup)

Not sure what's going on in that one Github issue, but we are looking into it.

~~~
justinclift
It seems to be affecting multiple people too. :(

~~~
akulkarni
That issue is now closed by the original author:

"Data is successfully dumped. also i can see the constraints, indexes are also
copied successfully."

[https://github.com/timescale/timescaledb/issues/1835](https://github.com/timescale/timescaledb/issues/1835)

~~~
justinclift
Cool. :)

