
Timescale raises $16M from Benchmark, NEA, Two Sigma Ventures - mfreed
https://blog.timescale.com/raised-16m-from-benchmark-nea-to-understand-machines-time-series-data-iot-9eff8213c15c
======
jandrewrogers
The value proposition appears to be a much better time-series solution for the
PostgreSQL environment, and in that regard it looks like it is well-executed.
But I don't follow the assertion that it is targeted at machine-generated
sensor data per se for a couple reasons. I'm not sure it is possible to hit
the operation metrics/characteristics typically required for these workloads
from within the PostgreSQL ecosystem. Interesting database implementation but
I am not clear on the core technical differentiation and value proposition
compared to other databases. Maybe someone from TimescaleDB here can fill in?

For example, what is the online insertion rate story? As a rough rule of
thumb, I expect an inexpensive Linux server to drive a million records/second
through parsing, processing, indexing, and disk storage because you need that
kind of throughput for efficiency when dealing with sensor data. But I would
not expect that from PostgreSQL -- it is optimized for a different workload.
Does this bypass the normal PostgreSQL I/O path some how?

What is the scale-out story? One of the most difficult parts of dealing with
sensor data models is that you need to continuously shift data and load
between machines to get consistent performance. TimescaleDB's sharding model
looks like it might actually be able to support this kind of load shifting but
PostgreSQL has no such architectural concept and I don't see anything written
about it in a cursory read.

What is the geospatial analytics story? This is a huge part of analyzing
sensor data for many applications. PostGIS uses GEOS, which is the only high-
quality geospatial geometry engine in open source, but is that integrated into
TimescaleDB? Is there proper support for high-performance geospatial sharding?

Looks like a solid architecture and implementation though.

~~~
mfreed
Thanks for the detailed comments. Let me try to answer a couple of them.

\- We don't bypass the Postgres I/O in general, but have rewritten a lot of
the insert path as it relates to managing inserts across many chunks (child
tables in Postgres). You can see a detailed explanation and performance
comparison of PG10 declarative partitioning vs. TimescaleDB here:
[https://blog.timescale.com/time-series-data-
postgresql-10-vs...](https://blog.timescale.com/time-series-data-
postgresql-10-vs-timescaledb-816ee808bac5)

\- What we find is that in many sensor settings, at least when deployed at
centralized collection points, you'll actually place your DB/storage
infrastructure behind a front-end anyway (web interface, MQTT, SSL termination
point, etc.), and then batch data through a data pipeline like Kafka. This
means that when the data hits TimescaleDB, you can perform larger batch
inserts (even 10K rows / batch), which is how a single-node Timescale can
support 100,000s of rows/sec (millions of metrics/sec). More benchmarks:
[https://blog.timescale.com/timescaledb-
vs-6a696248104e](https://blog.timescale.com/timescaledb-vs-6a696248104e)

\- So while this isn't the native I/O speed of Linux, it's not just about raw
write rate; you also get those other things that are critical for a robust
data product: reliability, flexible and optimized query performance, etc.

\- Scaling _out_ OLTP workloads has in general been challenging, but there are
some nice characteristics of time-series workloads that make this better.
We're working on scale-out clustering right now, and hope to be able to
announce some things later this year.

\- We fully integrate with PostGIS and other Postgres extensions. Some of our
users customers are very much employing TimescaleDB + PostGIS for
temporal+spatial analysis: doing things like asset tracking or logistics over
time.

------
jurgenwerk
We recently integrated TimescaleDB extension into our PostgreSQL database at
[https://nightwatch.io](https://nightwatch.io). For performance reasons, we
decided to implement time based table partitioning. We were evaluating native
PG 10 partitioning vs. TimescaleDB, and concluded the latter will spare us a
lot of work, maintenance and headaches. We were right - we got table
partitioning basically "for free", query performance improved, and there is no
partitioning maintenance involved, everything is automatic behind the curtains
with their concept of hypertables. Support is really good too, they are very
helpful on Timescale Slack channel.

~~~
tda
I had the same experience. I manually partitioned a pg10 db and it was quite a
lot of work to make sure I had all the partitions I need. Then I switched to
timescale and got it was just a breeze. I could partition a dataset both in
time and in foreign_key with a single command, and performance (especially
huge inserts) jumped up. Or to be more accurate, the performance degradation
of my table which made each subsequent insert a bit slower than the previous
was completely gone. All in all, very happy with timescaledb.

Congrats with the funding!

~~~
akulkarni
Thank you!

------
doh
I'm wondering if there is a way to use Timescale in combination with Citus.

We are well adjusted to Citus as we're using now for almost 2 years. For most
data it's great but we do have some time series data where we have to do a lot
manually to achieve desirable results. I'm wondering if there is a way to take
advantage of both, but not sure how would the query distribution work in
combination of the two.

~~~
craigkerstiens
Craig from Citus here. Its a question we get quite often. Unfortunately, I'm
actually not sure. There is a chance they would play just fine together, and
from a quick glance they likely would but we've not explored this ourselves
yet. Since Citus would be running on the coordinator you'd place timescale on
each distributed node and at that point what Citus passes down is standard SQL
which timescale can act on. All that said we've not heavily tested it
ourselves and there would be some open questions on designing the right
experience.

We do have a number of users that are using Citus alongside the native time
partitioning that was released in Postgres 10 with good results.

Edit: We also had this blog post in our pipeline that speaks to time
partitioning with Citus which just went live today -
[https://www.citusdata.com/blog/2018/01/24/citus-and-pg-
partm...](https://www.citusdata.com/blog/2018/01/24/citus-and-pg-partman-
creating-a-scalable-time-series-database-on-PostgreSQL/)

------
pjungwir
I set up a time-series Postgres 9.6 database last year with traditional table
partitioning (based on entity id, not time). I looked at TimescaleDB but it
was about a month old at the time so I wasn't comfortable recommending it.
Also I got the impression that it was little more than a wrapper around
traditional partitioning, just to make it easier to use. Looking at their
Github now it seems to be more than that though. I will have to give it a
closer look! But also you might want to clarify your literature about what's
going on. Is it something like cstore_fdw with its own files (but not built on
FDW)?

I have a Postgres extension of my own for fast timeseries inserts that gives
you back the data as an array
([https://github.com/pjungwir/floatfile](https://github.com/pjungwir/floatfile)),
but it is much less developed and comes with a lot of caveats. My own approach
is very simple: just make a file of 8-byte floats and keep appending to it. Is
timescaledb something like that (but better)? One of the drawbacks with my
approach is that you have to re-implement a lot of things (like histograms).
Even though you say you support "all of SQL" it looks like you are similarly
re-implementing things that would be slow otherwise (histograms).

Also, I see a lot about insert performance, but I keep getting projects where
inserts aren't challenging but getting fast analytic queries is. I'd love to
see some examples/numbers there.

I really like the idea of keeping timeseries data quasi-connected to your main
RDBMS, so I'm thrilled to see successful projects in that space. It can
definitely be challenging. I wonder if something like influx_fdw is possible
and would make sense? (I've never used InfluxDB.)

~~~
mfreed
Thanks for the comments, and it sounds like you want a deeper dive as to some
of the design decisions around TimescaleDB.

What might be useful to start is two of our technical blog posts around how we
design internal partitioning [1], as well as how this compares against
declarative partitioning in PG10 [2].

We aren't rewriting the lowest storage layers like cstore_fdw, so we can
continue to use Postgres' heavily-tested storage layers....as well as work
with its existing streaming replication and recovery mechanisms. We also
aren't packing individual writes into files (like it sounds you are doing with
floatfile), arrays (like in tgres), or other TOASTED data structures, although
we might be compatible. We certainly do support the range of types (arrays,
JSON/B, etc.) in our data rows.

Fully agree that projects differ on their needs of inserts vs. queries time,
but in addition to much higher insert rates, our general data model and
aggressive constraint exclusion across chunks provides meaningful time-based
query improvements. We also have all sorts of planner optimizations, e.g., we
taught the planner that querying for the data by time rounded-to-some-time-
bucket preserves the sort on time --- which is useful for all kinds of ORDER
BY/GROUP BY queries. We've published various benchmarks around this [3].

Finally, there are a lot of changes needed to "just make things work", which
also includes support for constraints, foreign keys, upserts, triggers, etc.,
which enable capabilities like data retention and continuous aggregation
policies.

[1] [https://blog.timescale.com/time-series-data-why-and-how-
to-u...](https://blog.timescale.com/time-series-data-why-and-how-to-use-a-
relational-database-instead-of-nosql-d0cd6975e87c)

[2] [https://blog.timescale.com/time-series-data-
postgresql-10-vs...](https://blog.timescale.com/time-series-data-
postgresql-10-vs-timescaledb-816ee808bac5#88f3)

[3] [https://blog.timescale.com/timescaledb-
vs-6a696248104e](https://blog.timescale.com/timescaledb-vs-6a696248104e)

~~~
bsg75
To confirm, Timescale does not currently support upserts? INSERT ON CONFLICT
... ?

~~~
RobAtticus
We do support UPSERTs: [http://docs.timescale.com/v0.8/using-
timescaledb/writing-dat...](http://docs.timescale.com/v0.8/using-
timescaledb/writing-data#upsert)

There is one limitation mentioned there ("TimescaleDB does not yet support
using ON CONFLICT ON CONSTRAINT with a named key") that doesn't work, and a
tip that mentions requirements for the UNIQUE constraint, but these are not
usually dealbreakers.

------
kalugen
How does it compare to other time series db? I've used influxdb, prometheus,
graphite, Elasticsearch and even Splunk.

I mean in terms of complexity, manageability, performance, scalability...

~~~
jmngomes
Also curious about this, and would add OpenTSDB to that list.

~~~
akulkarni
At a high-level, we've taken a unique approach to the time-series data
problem. While every other time-series db has taken a non-relational approach
(which also means no or poor SQL support), we've built ours on Postgres. As a
result, not only do we support full SQL (JOINs, complex WHEREs/GROUPBYs,
window functions, etc), but we also inherit the reliability and vast ecosystem
of Postgres.

At a more detailed level, comparisons between TimescaleDB and others varies
depending on the database. Typically though, if you are most concerned with
data compression or simple column scans, some of the other options might be
more suitable than ours. But otherwise you should see similar (if not better)
insert and query performance, while maintaining all the other benefits of a
relational DB / Postgres.

And with your relational data living alongside time-series data, many people
have been able to simplify their stack (from multiple dbs into one db), which
makes things easier to manage operationally and at the application level.

Re: OpenTSDB, we hear that managing an HDFS cluster / Zookeeper / etc can be a
headache. You obviously don't have that problem with TimescaleDB.

Re: Prometheus, I should add that we've open-sourced `pg_prometheus` [1],
which allows you to write Prometheus data into Postgres / TimescaleDB. So with
regards to Prometheus, we are more of a complement than a replacement.

[1]
[https://github.com/timescale/pg_prometheus](https://github.com/timescale/pg_prometheus)

------
cstanley
"We realized that this problem needed a new kind of database, and so we built
TimescaleDB: the first open-source time-series database designed for
scalability and complex analysis."

I'm not a DB expert, but what have financial firms been using for decades?
Likely dealing with much larger securities datasets...

Or, is it just that TimescaleDB is 'opensource'...

~~~
stalller
Yes, they have. kdb+ has been the standard bearer within finance for more
real-time trading applications. Still, q as a query language tends to
marginalize more of the organization from having access to the data.

TimescaleDB offers full-SQL at scale, opening the database up to more people.
It's used to power longer-term trading platforms (for e.g., commodities
trading) and interactive analytical dashboards (for e.g., research analysts).
We also see applications in compliance and risk management.

And yes, TimescaleDB is open-source. So we see TimescaleDB as complementary to
kdb+, which has been around for much longer as a proprietary system, is
certainly full-featured and powerful, but is used more in real-time trading
applications.

------
petetnt
Oh how I wish AWS RDS would add TimescaleDB as an extension. Not sure if we
could migrate at this point (we are using InfluxDB for time series and metrics
and Postgres for related relational data), but if TimescaleDB was available on
RDS I'd seriously consider it as it would simplify our approach by a tenfold.
Congrats on the funding!

~~~
RobAtticus
Thanks! We're continuing to try and get that accomplished ourselves. It always
helps for them to hear from potential users to let them know there is demand
for it.

We have an issue with instructions tracking this:
[https://github.com/timescale/timescaledb/issues/65](https://github.com/timescale/timescaledb/issues/65)

~~~
petetnt
I already sent them a message back in May, I guess it wouldn't hurt to send
another one too :)

------
akulkarni
Hi all - Very excited to announce the news, and are here in case there are any
questions. Thanks!

~~~
chuhnk
Curious to know who is currently using timescaleDB? I can't find anything on
the website.

Congrats on the funding.

~~~
stalller
Thanks for asking. For some of the larger companies, takes a little bit of
time to work with their marketing / legal teams :)

We do list some of them within the blog post:

\- STE Energy, to back operational dashboards used to monitor 47 power plants
in Italy, Albania, Colombia, Perú, Honduras, Costa Rica, and Chile, replacing
Redis

\- InTraffic, which monitors all of the roughly 5,000 remote controllable
railroad switches of the Dutch railways, for use with Tableau as a Business
Intelligence solution

\- One of the largest American telecom companies, for storing server health
data

\- A top financial services firm, to replace Oracle for powering interactive
analytics of market data

\- A public lighting and semiconductor company, for deployment in a small-
footprint IoT edge device collecting and analyzing sensor readings

------
emourujarvi
Thanks! I didn't know time-series databases existed. I had to rummage through
examples to understand what it even means because it wasn't clearly explained
in the FAQ. This makes things so much easier.

~~~
stalller
Hi, thanks for the feedback on our FAQ. We did write a blog post that goes
into what time-series data is and why purpose-built time-series databases
should exist / are necessary.

Please read it here: [https://blog.timescale.com/what-the-heck-is-time-series-
data...](https://blog.timescale.com/what-the-heck-is-time-series-data-and-why-
do-i-need-a-time-series-database-dcf3b1b18563)

------
Serow225
Have you guys had any interest from the existing SCADA process historian
vendors (Wonderware Historian, OSISoft PI, etc) or historian users about using
Timescale?

~~~
snowwindwaves
I think the scada vendors have already solved the same set of problems that
timescale solves.

Edit: pretty sure the scada vendors partition the data in to tables so each
table is a day or week of data, they do rollups on the data so the detail is
available if you do a query with a 1 minute timespan but the data is already
aggregated in to representative values for a 1 minute interval if the query
timespan is 1 day.

Sounds like timescales clustering story might be better, but with OSI or
wonderware your clustering story is ms SQL or oracle.

~~~
wenc
Except that most process historians store data in a non-relational way which
makes it difficult to integrate with other types of data.

Process historians, like most NoSQL time-series offerings, are traditionally
optimized for fast data retrieval (for HMIs, real-time monitoring dashboards,
etc.) but are really weak when it comes to analytics. There are new products
like Seeq (a kind of a Tableau for process historians) that attempt to address
this, but somehow miss the fact that for analytics, you really need your data
to exist in a dataframe-like shape.

Today, if you want to do any kind of non-trivial analytics, you typically have
to first ETL data from the historian to a relational database via REST or OLE
DB API. TimescaleDB on the other hand already stores data in a native
relational format and lets you run full SQL on it.

------
cevian
One more thing: If this is the kind of thing that excites you, please come
join us: [http://www.timescale.com/careers](http://www.timescale.com/careers).
We have open positions in engineering (core database, R&D, customer success
and Frontend) as well as content marketing, evangelism, recruitment, and
partnership.

------
thebiglebrewski
Cool. So where's the value here? How are you making money? Didn't understand
that from the first 2 or 3 paragraphs. Seems like you have a lot of GitHub
stars, but how about revenue?

~~~
akulkarni
We're following the classic "open core" model (similar to MongoDB, Elastic,
Confluent), and we already have customers paying for features related to
enterprise-grade reliability, performance, and support. We'll have more to
share in the upcoming months.

~~~
thebiglebrewski
Nice thanks for the response!

------
georgewfraser
What is the difference between TimescaleDB and a conventional column-store
data warehouse with a time partition key?

~~~
RobAtticus
We touched on this a bit elsewhere when someone asked about InfluxDB. One of
the biggest advantages is that by being built on Postgres, we are able to
support the full range of SQL/RDBMS features including secondary indices,
complex WHERE predicates, CTEs, and more. Time-series data can also be easily
stored alongside relational data, which makes it easier to simplify your
stack.

~~~
georgewfraser
Column store data warehouses have all those features, except indices which
don’t make sense in the context of a columnar architecture. It seems to me
that if you create a table in a conventional data warehouse with a partition
key on a time column, you have all the advantages of TimescaleDB, plus all the
advantages of an MPP columnar data warehouse. Am I missing something? Why
should anyone use TimescaleDB instead of just a specific configuration of a
data warehouse?

~~~
cevian
A lot of time-series data has complex relations and cross-correlations that
make it unsuitable for column stores. Take for example a device that reports
temperature and humidity. You may want to enforce foreign key relations
between devices and readings, or you may want to do analysis for device in a
particular location (which may be part of readings or devices table) secondary
indexes help here. Or you may want to easy access to all data where
temperature is above 90 degrees (partial indexes help here). You often want
this in a real-time dashboard where the kind of latencies you see with MPP are
problematic. Pretty much the columnar/MPP stuff is good for simple aggregates
for ad-hoc non-real-time queries but TimescaleDB is geared to more complex
analysis to power dashboards.

Plus, when you know which particular aggregates you want to regularly query,
TimescaleDB makes it easily to perform continuous roll-ups so you can store in
a separate (hyper)table.

~~~
ddorian43
See Clickhouse for `oltp` columnar store (built for powering dashboards).

~~~
cevian
Clickhouse is a very cool project but is more analytical than OLTP. Percona
had a nice writeup[1] (the comparison to mysql is especially apt -- note the
lack of real-time updates and deletes in Clickhouse).

[1] [https://www.percona.com/blog/2017/02/13/clickhouse-new-
opens...](https://www.percona.com/blog/2017/02/13/clickhouse-new-opensource-
columnar-database/)

------
kalugen
How does it compares to other time series database? I mean, things like
InfluxDB, Prometheus, Graphite, etc.

