
Multi-node TimescaleDB is now free - manigandham
https://blog.timescale.com/blog/multi-node-petabyte-scale-time-series-database-postgresql-free-tsdb/
======
EvanAnderson
I really appreciate that the linked article uses the phrasing "source-
available", the lower case "free", and doesn't use the phrase "open source".
Terminology matters a lot.

For me, a lot of the value in Free software comes from being able to make
modifications to the software (either yourself, or by hiring others), and
generally being in control of your own "software destiny".

With that in mind, I think it's important to call attention to this license's
prohibition of running modified versions in production. This prohibition
applies regardless of your modifications being distributed (and in fact, later
in the license, distribution of modifications is expressly prohibited as
well):

 _Clause 2.1 (d): "A license to prepare, compile, and test Derivative Works of
the TSL Licensed Software Source Code solely in a Non-Production Environment
..._

I've often pined for visibility into the source code of proprietary software
that I use. I suppose this is a "win" for TimescaleDB in my mind over source-
unavailable proprietary software. In the end, however, this license means it's
still just proprietary software.

~~~
akulkarni
Thanks for drawing attention to Clause 2.1 (d).

The original intent of that clause was to avoid us needing to support modified
versions that were deployed to production. (Note: We provide _a lot_ of free
support in our 4000+ member Slack channel [0].)

But that clause was written 1.5 years ago, and a lot has changed since then.
There’s actually an internal debate right now on whether we need to keep it.
So thank you and HN for spurring this discussion!

[0] [https://slack.timescale.com/](https://slack.timescale.com/)

~~~
detaro
_If_ you intend to change it to allow running open-sourced changes, you might
consider allowing changes submitted to you privately too, for vulnerability
reports.

~~~
akulkarni
Thanks for the input, will bring it back to the team for discussion.

(Note: I really appreciate getting this kind of feedback openly from the
community, so thank you :-)

~~~
jkaplowitz
One possible approach compatible with true software freedom and the usual
definition of open source is not to restrict use of modified versions of the
code, but instead to use naming to distinguish between the two, and only
support the unmodified version.

For example, the code build system could have variables for the name and maybe
the logo and other trademark/brand-ish things, and the public codebase could
be configured by default to call itself Timescale Community DB or Timescale
Custom DB or some other name instead of TimescaleDB. Your private build would
simply substitute the json file with those data values and maybe point to
logos that aren't in the repo instead of generic ones that are, or something
similar to that.

You'd also have the option to use any mixture of trademark law or copyright
conditions to restrict the commercial version's name and branding assets.

All of the options I described above are used in reality by various projects
out there. For example, the git repository for VS Code OSS has a product.json
file with most of the customization points (not all) that MS changes in
building their supported VS Code release, TeX and Red Hat apply naming
restrictions, and Red Hat also has rules in their support contract.

~~~
akulkarni
It's an interesting idea. We'll consider it. Thanks!

------
eloff
This is an interesting window into their business model. This could be a
purely an altruistic decision, which businesses sometimes do, contrary to
popular belief. More likely it's a bet that wider adoption from making the
clustered version free will drive more revenue through their managed database
as a service offering. Which shows that their non-OSI open-source license is
actually leading to more code and features being available free and (mostly)
open-source. As opposed to gating features for paying customers.

I think we're too hung up on OSI open source licenses. The additional
restriction in the timescaledb license that you can't run a paid database as a
service offering affects hardly anyone negatively (AWS). It affects us all
positively by providing a sustainable business model to support additional
development and support of an open-source product we use. Win-win if ever
there was one. I'd like to see more open-source and closed-source companies
consider this model.

~~~
akulkarni
(Timescale CEO and post author)

You are spot on. Before the Timescale License, we were left with a tough
decision: do we open-source a feature so that everyone can have it for free OR
do we close a feature so that the mega-clouds don't have access to it?

We didn't like either of those options, which is why we created the Timescale
License, which allows us to offer capabilities for free (and make the source
code available) to everyone except the cloud providers (ie free for 99.9999%
of all users).

We find that this has resulted in a mutually beneficial outcome for ourselves
and our users.

    
    
      "I think we're too hung up on OSI open source licenses. The additional restriction in the timescaledb license
      that you can't run a paid database as a service offering affects hardly anyone negatively (AWS).
      It affects us all positively by providing a sustainable business model to support additional development
      and support of an open-source product we use. Win-win if ever there was one. I'd like to see more open-source
      and closed-source companies consider this model."
    

^ Really well put.

~~~
jcims
I'm a huge fan of Splunk but always want to keep my eye open for alternatives.
My use case is mostly security analytics against event content and patterns,
and for that the Splunk Processing Language is very well suited.

That said I find it's fairly tedious to do a lot of time-series analysis and
pattern discovery/anomaly detction across rich event models (think aws
cloudtrail events).

Anything TimescaleDB can help with here? Are there case studies you can point
us to? It feels like there is probably home for both just in my domain and
quite obviously in the broader context of large enterprise ops/security.

~~~
GordonS
I use TimescaleDB for mass storage and query of security events (up to 100s of
millions) - the speed of queries and aggregate queries even on a single node
is very impressive.

I haven't done anything with regards to anomaly/trend detection yet, but it's
planned. Not really sure where you see a database (TimescaleDB) fitting into
that though?

~~~
jcims
We're in that scale domain where everything is a pain in the ass but not
obviously outside the scope of commercial solutions. I just checked and we're
averaging ~500k events per second in the five areas I'm interested in.

I feel that we could probably use a time-series database to reflect our
streams as 'last observed state' type collections as well as do the
aggregations that we need to feed back into anomaly detection.

I'd like to also use something like that to create a 'heat map service' where
you can feed a property/window/range and get back scalar for color coding and
possibly a slice of values for sparkline type UI.

Without getting hands on, though, it's hard to say for sure.

~~~
akulkarni
@jcims I'm really interested to see if we can help. If you're open to
discussing, please feel free to email me: ajay (at) timescale.com

~~~
jcims
It wouldn't be me reaching out but I'll put a bug in the right person's ear.
This has been something I've been thinking about for a bit, the HN post is
just a bit serendipitous.

~~~
akulkarni
Sounds good, thanks!

------
akulkarni
Hi, I authored this post, but the credit really goes to the Timescale database
team.

Multi-node TimescaleDB is the result of a massive amount of engineering effort
over two years, as can be seen in this +67,000 line PR:
[https://github.com/timescale/timescaledb/pull/1923](https://github.com/timescale/timescaledb/pull/1923)

We're thrilled to make this free so that more developers can use it.

~~~
tpetry
Am i the only one who thinks it is really cool that this will be free but
hesitant to use it?

I have seen so many distributed data storages fail in a multitude of ways that
i just dont trust anyone anymore. After 2-3 years they may have ironed out
most bugs and i can evaluate again whether i do trust their implementation to
store my data safely.

~~~
akulkarni
This is why we built this on top of Postgres. It allows us to inherit Postgres
reliability.

While I can't guarantee there won't be bugs ;-), we have found that building
on Postgres has enabled a much higher level of reliability than other time-
series databases.

------
gen220
I work with a lot of time series tables in Postgres, albeit not at the scale
that this targets. (some millions of rows, distributed sparsely over time, on
which the median insert/update size is <10, but with some tail-end
inserts/updates touching >200k rows).

I like concepts behind TimescaleDB, and understand the value it's adding to
vanilla Postgres. We have our own implementation at my company and it's quite
good for our purposes, but it would certainly struggle at TDB's targeted
scale.

As I understand it (correct me if I'm wrong, this is my impression from the
marketing page), TimescaleDB is "more than an Extension" to Postgres, because
it rewrites some of the Postgres internals (query parser, etc)?

If this is true, I'm curious, was it not possible to package the same results
into an extension? What was the decision process like? Could the concept not
be upstreamed into Postgres? I'm relatively ignorant of this side of the
community, so please forgive me if this question is naive.

Finally, if it is "more than an extension", does this imply that TimescaleDB
is a fork of Postgres, with all the risks to adoption that entails?

~~~
akulkarni
TimescaleDB is packaged as a Postgres extension. The "more than an extension"
is meant to highlight that TimescaleDB makes changes and adds capabilities far
beyond what the typical extension does.

------
speedgoose
>All of these capabilities are being released under the Timescale License, our
source-available license that permits broad usage, except for where
organizations are providing TimescaleDB-as-a-service.

So it's not open-source because AWS hasn't been nice with ElasticSearch and
they don't want to be in the same situation?

~~~
mrkurt
That sounds like open source to me, I bet they're just being really
conservative about saying "open source" because there's been so much backlash
at MongoDB/Cockroach/etc for similar restrictions.

It's restricted OSS because AWS takes things, runs them, and eats up all the
potential revenue.

~~~
eitland
> That sounds like open source to me, I bet they're just being really
> conservative about saying "open source" because people there's been so much
> backlash at MongoDB/Cockroach/etc for similar restrictions.

Open Source has a very defined meaning. Please read up on the history of open
source and source available licenses before saying it is all the same.

We've been defending it against a number of attacks and we will probably do it
again, so please don't get on the wrong side of history ;-)

Note: this is not a criticism of Timescale. I can see what they did and
respectfully did not pretend it was Open Source. Compared to a proprietary
license their license opens a llt of possibilities.

~~~
xyzzy_plugh
Frankly the pedantry around the definition of Open Source, which I understand,
is incredibly nauseating. Sure, this isn't Open Source by "the definition",
but it's close enough if you squint. The difference doesn't impact almost
anyone. Are you or someone you love impacted by this licensing decision?

Throw in an expiry date, dual licensing (pay to play seems more than fair) and
I'm content. History be damned.

I'm so sick of it. Just because one group defines it a certain way doesn't
make it gospel. The OSI has no power over me.

~~~
__s
Yep. I've been putting code out for free (MIT) for over a decade. I'm an open
source developer. But I'll decide what gatekeeping I participate in for myself
thank you

Maybe we just need to have "little O" open source. Unless someone's saying
"Open Source" don't get in on splitting these hairs

~~~
eeZah7Ux
I'd rather have "big F" Free Software: something that ensures that the many
efforts I put into development end up benefiting the end users.

Right now middlemen are making billions and end users get vendor lock-in.

------
ttsda
A few days ago I went to your site to check if the distributed stuff had
arrived, good to see it's here! You're making incredible software.

------
petr25102018
Why should someone use TimescaleDB over ClickHouse for time-series/analytics
workloads?

~~~
valyala
If you use PostgreSQL, then it feels natural to add TimescaleDB extension and
start storing time series or analytical data there alongside other relational
data.

If you need effectively storing trillions of rows and performing real-time
OLAP queries over billions of rows, then it is better to use ClickHouse [1],
since it requires 10x-100x less compute resources (mostly CPU, disk IO and
storage space) than PostgreSQL for such workloads.

If you need effectively storing and querying big amounts of time series data,
then take a look at VictoriaMetrics [2]. It is built on ideas from ClickHouse,
but it is optimized solely for time series workloads. It has comparable
performance to ClickHouse, while it is easier to setup and manage comparing to
ClickHouse. And it supports MetricsQL [3] - a query language, which is much
easier to use comparing to SQL when dealing with time series data. MetricsQL
is based on PromQL [4] from Prometheus.

[1] [https://clickhouse.tech/](https://clickhouse.tech/)

[2]
[https://github.com/VictoriaMetrics/VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics)

[3]
[https://github.com/VictoriaMetrics/VictoriaMetrics/wiki/Metr...](https://github.com/VictoriaMetrics/VictoriaMetrics/wiki/MetricsQL)

[4] [https://medium.com/@valyala/promql-tutorial-for-
beginners-9a...](https://medium.com/@valyala/promql-tutorial-for-
beginners-9ab455142085)

~~~
quade1664
We spent about 6 months looking at pretty much every database tech on the
market, cockroach, clickhouse, influx, voltdb, memsql etc were top contenders,
there was an outdated article on medium.com (by victoria metrics) which
slammed TimescaleDB for its disk usage, we did not realised it was biased, so
we dropped TSDB dropped off the list, but we saw a email about their
compression segment by device_id, and gave it a shot, ....we implemented it, 5
months after our production release we now have outstanding performance and
compression (95x) We are planning to move the rest of our databases to TSDB
now as it ticks our boxes our use case is HTAP, not solely OLAP and OLTP

I'm super excited about this news, but TSDB please work on allowing us to put
data over 1 year old on slow disk seperate servers, so we can keep the hot
stuff on the NVME servers, once you get this sorted it will be the perfect fit
for us.

~~~
akulkarni
Glad to hear it is working out for you! I'll relay the request re: old data.
But please also feel free to email me directly at ajay (at) timescale.com (or
email support (at) timescale.com) if you have any follow up questions /
requests.

~~~
akulkarni
Good news: TimescaleDB already offers this feature. Feel free to ping us
support (at) timescale.com and we can walk you through it. Thanks!

------
PaulWaldman
I understand that the Timescale license can't be utilized by cloud providers,
but what about others who need a timeseries database for their SaaS offering?
Is this permitted as long as you aren't marketing a hosted TimescaleDB
solution?

Edit: wording

~~~
akulkarni
That would be permitted, as long as the service isn't just a "TimescaleDB-as-
a-service." [0]

For example, if the service allowed users to only make DML changes (access /
modify data) then it is ok, but DDL changes (creating / modifying database
schemas) is not permitted.

In fact, we already have 100s of SaaS companies using TimescaleDB as part of
their offering.

[0]
[https://www.timescale.com/legal/licenses](https://www.timescale.com/legal/licenses)

~~~
teraflop
More specifically, the text of the license says you can't offer any service
that is "primarily a database storage or operations product", even one that
doesn't allow schema modifications.

If that wasn't what you intended to prohibit, you should probably fix the
wording of section 3.21(i).

------
ksec
Correct me if I am wrong.

Timescale DB Core ( if there is such a thing ) is still available under Apache
2.0. So nothing has changed. You can use it just like any other open source
project with no restriction.

Timescale DB multi-node, originally not free and only available in Timescale
Cloud. Is now Freely available under the Timescale License, a source-available
license.

Timescale DB multi-node and its license only forbid you to provide TimescaleDB
" _multi-node_ " itself-as-a-service. And does not allow running it with _any_
changes that is not upstreamed. You can still resell any software or services
built _on top_ of Timescale DB multi-node.

Again, correct me if I am wrong.

~~~
akulkarni
Almost!

Yes, TimescaleDB "core" \- still Apache 2.0

TimescaleDB multi-node - was never before released, is now released for free
under the Timescale License, a source-available license

There are other capabilities (e.g., gap-filling) that are also under the
Timescale License, in addition to multi-node.

The Timescale License prevents "TimescaleDB-as-a-service" usage.

You can still run software / services on top of Timescale Licensed software,
as long as you are not offering "TimescaleDB-as-a-service".

The Timescale License currently prevents running any modifications in
production, but we are actively debating removing that restriction (as I
mention elsewhere).

Hope this helps.

------
heipei
Does TimescaleDB support automated downsampling using various functions
(min/max/mean/avg) and then during querying automatically picking the correct
downsampled data? This is the biggest issue that I and others have with
InfluxDB, that it doesn't do that, so the only convenient way to use it is
just to expire all data outside the retention policy. Ticket here:
[https://github.com/influxdata/influxdb/issues/7198](https://github.com/influxdata/influxdb/issues/7198)

~~~
atanasovskib
I think what you are referring to is the TimescaleDB real-time aggregates
[https://docs.timescale.com/latest/using-
timescaledb/continuo...](https://docs.timescale.com/latest/using-
timescaledb/continuous-aggregates#real-time-aggregates)

It allows you to define aggregations that are automatically used when quering
the raw table if the query matches, and it also allows you to drow the raw
data with a retention policy but keep the aggregated form
([https://docs.timescale.com/latest/using-
timescaledb/continuo...](https://docs.timescale.com/latest/using-
timescaledb/continuous-aggregates#dropping-data))

~~~
heipei
OK, but it looks like I still have to define these aggregates manually. I was
really more talking about the standard use-case that folks used to use
Graphite / rrdtool for: Keep track of real-time high-fidelity metrics while
still being able to query aggressively-downsampled historical data for
comparison, and doing so without having to configure anything.

~~~
mfreed
Hi @heipei -- one thing to observe is that Graphite & rrdtool are designed for
a specific monitoring use case, while TimescaleDB is a more general-purpose
time-series database.

So what that means is that TimescaleDB has mechanisms to make it really easy
to define downsampling (continuous aggregates, data retention policies), and
even have queries that transparency query _across_ the historical aggregates
and new raw data (real-time aggregates, which parent pointed to, which isn't
supported by InfluxDB).

What the database _by itself_ doesn't do is automatically create certain
continuous aggregates on metrics immediately, because frankly, users' needs
vary so much.

That said, we have built stacks/solutions that leverage TimescaleDB and do
precisely that. For example, we just released a design doc and beta around our
refreshed native integration with Prometheus, that addresses an extremely
similar use case to Graphite / rrdtool. Because now this is automated, it
defines many of these things out-of-the-box, so you don't need to configure
anything. Check it out and input welcome!

[https://tsdb.co/prom-design-doc](https://tsdb.co/prom-design-doc)

~~~
heipei
Thanks for the pointer. I truly understand that TimescaleDB is a general-
purpose time-series DB and I understand that most use-cases are unique in that
it makes sense to make these decisions about what and how to downsample
consciously. However, I feel that there is a large audience of people who
"just" want a database that they can point their system-metrics collector at
(Telegraf), point their dashboard at (Grafana) and just hit "go", much like
would with something like Datadog, and have the confidence that they can still
scale the database if its ever necessary. Much like ElasticSearch provides
default mappings (text/keyword/date/number), this would a great 80-20 solution
for the default use-case of "I want to collect system metrics from my hundreds
of servers and have a few sensible defaults about granularity, downsampling
and data-retention, and only then will I start to worry about whether that
data will eventually exceed my one-server deployment."

~~~
mfreed
Yep, that's exactly what the "Timescale Observability" stack is about. Type
"helm install", and a full stack is spun-up and auto-configures to scrape
information. You have graphs up in Grafana within 2 minutes, zero
configuration.

\- See [https://github.com/timescale/timescale-
observability](https://github.com/timescale/timescale-observability)

\- Or join the #prometheus channel at
[https://slack.timescale.com](https://slack.timescale.com)

------
X6S1x6Okd1st
>67k line PR

Man I'm glad I don't have to review that

~~~
RobAtticus
To be clear, it was developed in a branch so all the individual commits have
been reviewed beforehand when landing on that branch. And the branch was
rebased throughout the development cycle. This is just the final PR to merge
that branch back into master :)

------
jakaroo
How does the multi-node version work with data compression compared to the
single-node version?

I like how on a single-node I can utilize data compression and get a 95%
storage saving.

~~~
mfreed
In the current version, you can execute `compress_chunks` on each of the data
nodes and enjoy those same savings (and will work transparently with queries,
as before).

In subsequent releases, we'll add full support of compression, e.g., just
create a compression policy on the access node and you are off and running.

~~~
jakaroo
Sounds great. So I just manually execute this `compress_chunks` command once
on each data node and then I have compression enabled forever on those nodes?

~~~
mfreed
Not yet, I should have been clearer:

compress_chunk operates on a single chunk, the way to define "compress all
chunks older than 1 week is":

    
    
       SELECT compress_chunk(i) from show_chunks('conditions', older_than => INTERVAL '1 week'); 
    

[https://docs.timescale.com/latest/using-
timescaledb/compress...](https://docs.timescale.com/latest/using-
timescaledb/compression#manual-compression)

So you'd need to setup a cron job that runs that script every night or
something...at least until we release compression policy support.

------
malisper
> All of these capabilities are being released under the Timescale License,
> our source-available license that permits broad usage, except for where
> organizations are providing TimescaleDB-as-a-service.

Maybe someone can give clarification on this, but the line between using
TimescaleDB to build a product and providing TimescaleDB-as-a-service seems
incredibly blurry. If I have a product that in some way let's you query time
series data, and that product is powered by a TimescaleDB, would that count as
providing TimescaleDB-as-a-service?

I used to work for Heap which is an analytics tool. In a way you can view Heap
as just a wrapper around Postgres. We stored event data in Postgres and
provided a UI that allowed you to express queries (e.g. count the number of
logins over the past month). We would take the query in the UI, compile it
into a SQL query, and run the SQL against Postgres. If Heap was powered by
TimescaleDB, would that violate the Timescale License? In fact, you could
technically view any dashboarding product that queries TimescaleDB as
providing "TimescaleDB-as-a-service".

I looked at the actual license[0] to see what it says, and it seems really
unclear. The license gives you permission to use TimescaleDB to develop "Value
Added Products or Services" which it defines as a product that uses
TimescaleDB as part of a larger offering. One of the requirements for a
product or service to be considered "Value Added" is:

> (ii) such value-added products or services add substantial value of a
> different nature to the time-series database storage and operations afforded
> by the Timescale Software and are the key functions upon which such products
> or services are offered and marketed

This seems incredibly vague. What exactly does "substantial value of a
different nature" mean? In the end, tons of products are just wrappers around
DBs. If products like Heap or Datadog were to be backed by TimescaleDB, would
they add "substantial value of a different nature" on top of it? In the end,
Heap and Datadog are products designed for querying time series data. I could
definitely make a case that they don't provide value of a different nature
from TimescaleDB. This vagueness seems like a huge risk and without further
clarification, makes me want to stay far away from TimescaleDB.

[0]
[https://github.com/timescale/timescaledb/blob/master/tsl/LIC...](https://github.com/timescale/timescaledb/blob/master/tsl/LICENSE-
TIMESCALE)

~~~
mfreed
Hi @malisper, we totally appreciate concerns around potential uncertainty what
a "Value Added Service" means.

In fact, when we were looking at Timescale licensing, we took careful look at
what a lot of other like company licenses did here (Confluent, Redis, etc),
and what later became the Polyform License. Most of them left this definition
pretty vague -- because frankly, legal language is never as precise (and
perhaps shouldn't be) as what an engineer may like.

We went a step further, and tried to define this more precisely about what it
means to "offer" TimescaleDB:

    
    
          (iii) users of such Value Added Products or Services are prohibited,
          either contractually or technically, from defining, redefining, or
          modifying the database schema or other structural aspects of database
          objects, such as through use of the Timescale Data Definition Interfaces,
          in a Timescale Database utilized by such Value Added Products or
          Services.
    

[def]
[https://github.com/timescale/timescaledb/blob/master/tsl/LIC...](https://github.com/timescale/timescaledb/blob/master/tsl/LICENSE-
TIMESCALE#L287)

What that means is that if you've defined the Heap schema, you have built the
indexes and tables, and then are offering a SaaS product on this, you're fine:

\- You are offering a product/marketing SaaS service around usage/product
analytics, not a time-series-database-as-a-service

\- You are not approaching the market and saying, "Here's how to get
TimescaleDB-as-a-service" (unlike, say, Managed TimescaleDB running on
Rackspace or Digital Ocean), you are saying "Here's a full Product/Marketing
Analytics Solution".

\- You are not giving your users direct/psql access to the raw database to
define their tables/schemas/indexes and otherwise just treat that service as a
hosted TimescaleDB instance.

I hope that helps!

~~~
malisper
> We went a step further, and tried to define this more precisely about what
> it means to "offer" TimescaleDB

I don't understand how the bit you posted helped make things more concrete?
Section 3.21, the section you referenced lists three conditions, all of which
have to be true for your product to be considered "Value Added". I agree the
third condition, the one you quoted, is pretty clear. But the second
condition, the one I quoted seems really vague so the definition of "Value
Added" as a whole becomes really vague.

> What that means is that if you've defined the Heap schema, you have built
> the indexes and tables, and then are offering a SaaS product on this, you're
> fine.

FWIW, Heap would automatically create new tables for customers as they sign up
and would also automatically create new indexes for customers as needed. For
that reason alone, I'm pretty sure Heap would violate the Timescale license.

I agree that it's pretty difficult to be specific about what "value added"
means. I'm not sure what the right solution is. I would still want to go over
with the Timescale License with an IP lawyer pretty thoroughly before I were
to use TimescaleDB.

~~~
mfreed
> FWIW, Heap would automatically create new tables for customers as they sign
> up and would also automatically create new indexes for customers as needed.
> For that reason alone, I'm pretty sure Heap would violate the Timescale
> license.

Nope! The user doesn't define or control that those tables and indexes are
created. I.e., the user, through the Heap UI, doesn't say: I want a table with
this schema and I want to create an index on (event_id, timestamp).

------
hagen1778
Free multi-node TSDB solution sound cool! I wonder if someone tried to use
TimescaleDB as remote-storage for some heavy-loaded Prometheus [1] setups.

[1] [https://prometheus.io/](https://prometheus.io/)

~~~
sevagh
I've had luck with [https://thanos.io/](https://thanos.io/) for a big (~1
billion timeseries across all our DCs) Prometheus scale out project.
Horizontally sharded Prometheus that can be queried and alerted on in a
unified view with object store backend.

~~~
deepsun
I remember being very impressed with numbers from the following tweet
[https://twitter.com/this_is_tckb/status/1256649880434606080](https://twitter.com/this_is_tckb/status/1256649880434606080).

I'm wondering what is the cost of your setup to handle billions of timeseries?

------
thesausageking
What's the difference between Timescale's license ("free to everyone except
cloud providers") and GPLv3?

~~~
teraflop
The differences are pretty substantial.

The GPL puts no restrictions whatsoever on how you can _use_ software that
falls under it. Timescale's license, on the other hand, gives you very limited
usage rights. You can use _unmodified_ versions of the software, but you can't
allow clients to make schema changes, nor can you use it to provide any
service that is "primarily [a] database storage or operations product or
service".

In addition, Timescale's license is much more restrictive about allowing
derivative works. The GPL lets you create modified versions and/or reuse code
in other products, no matter how extensive your changes, as long as the
results are also GPL-licensed. Timescale's license lets you create modified
versions, but you're not allowed to:

* make any changes that bypass "usage restrictions"

* use your changes in production

* distribute your changes in any way, except for assigning all the rights back to Timescale

~~~
ensignavenger
Changes you make to GPL software only have to be provided under the GPL if you
redistribute the work to others- if you keep it to yourself, run it yourself,
etc, you are not required to release it as GPL.

~~~
StavrosK
Aren't you required to release it even if you run it yourself with GPLv3? Or
am I thinking of another version?

~~~
mfreed
You are thinking of the AGPLv3. Not to be confused with the GPLv3 =)

[https://www.gnu.org/licenses/agpl-3.0.en.html](https://www.gnu.org/licenses/agpl-3.0.en.html)

"It has one added requirement: if you run a modified program on a server and
let other users communicate with it there, your server must also allow them to
download the source code corresponding to the modified version running there."

~~~
ensignavenger
As the term you cited says, the AGPL only requires distribution if you are
providing it to some one else, including as service over a network- if you use
it yourself, and don't distribute it (including over a network service) to
anyone else, you still don't have to share your source.

------
anonymousDan
I think this is a good move. But from an enforcement perspective, how
realistic is it to prevent someone like Amazon from offering a clone service
(at least for backend components) and claiming they wrote it from scratch? Is
there any way to force them to reveal the source for a particular service?

~~~
mrlala
I mean.. you think any major provider like Amazon is going to just blatantly
rip it off and sell it saying "We created this!"

That's just not going to happen.

------
psankar
Are there any tools to migrate from elasticsearch to timescale ? We are
considering a switch from our es and are evaluating options. Timeseries is
also one of the contenders. We are not looking for text search just some
nested queries on a timeseries data.

~~~
tylerfontaine
Disclosure: I work for Timescale, previously worked for Elastic

Pretty much any ETL tool you like could do this, as long as it speaks to
elasticsearch and postgres.

Logstash (if you're using the ELK stack) can write to CSV or other formats as
well as do any processing, but it doesn't have a JDBC output plugin, so you'd
have to ingest with something else. Conversely, fluentd for example can output
to Postgres, but doesn't have an elasticsearch input (at least that I could
find), so you'd have to export from es with something else.

So it might be a couple of steps, though there are rich clients for most major
programming languages for both elasticsearch and postgres. If your schema is
fairly simple, this might not be too bad to roll your own.

That said, the hardest part is likely massaging your data, if your
elasticserch schema is complex. Because you have to totally denormalize things
for es (generally), you might have to unravel some of that going back into a
relational database.

------
rb808
Can someone please summarize what it does because I couldn't figure out from
website? It says its "on Postgres", is it a flavor of PG? or it sits on top of
multiple PG instances.

~~~
akulkarni
How's this:

TimescaleDB is a distributed time-series database that is packaged as a
Postgres extension (a "mega-extension" to quote someone else on this thread).

TimescaleDB:

* Scales to over 10 million of metrics per second [0]

* Supports native compression, using delta-delta, Gorilla, Simple-8B RLE, and other best-in-class compression algorithms (achieving a median 94% compression based on user data) [1]

* Offers native time-series capabilities, such as data retention policies, continuous aggregate views, real-time aggregates, downsampling, data gap-filling, and interpolation

* Handles high cardinality [2]

* Outperforms other non-relational databases including InfluxDB [3], Mongo [4], Cassandra [5] for time-series data

With TimescaleDB you also get all of the goodness that is built into Postgres:
full SQL, a variety of data types (numerics, text, arrays, JSON, booleans),
ACID semantics, and operationally mature capabilities including high-
availability, streaming backups, upgrades over time, roles and permissions,
and security.

[0] [https://blog.timescale.com/blog/building-a-distributed-
time-...](https://blog.timescale.com/blog/building-a-distributed-time-series-
database-on-postgresql/)

[1] [https://blog.timescale.com/blog/building-columnar-
compressio...](https://blog.timescale.com/blog/building-columnar-compression-
in-a-row-oriented-database/)

[2] [https://blog.timescale.com/blog/what-is-high-cardinality-
how...](https://blog.timescale.com/blog/what-is-high-cardinality-how-do-time-
series-databases-influxdb-timescaledb-compare/)

[3] [https://blog.timescale.com/blog/timescaledb-vs-influxdb-
for-...](https://blog.timescale.com/blog/timescaledb-vs-influxdb-for-time-
series-data-timescale-influx-sql-nosql-36489299877/)

[4] [https://blog.timescale.com/blog/how-to-store-time-series-
dat...](https://blog.timescale.com/blog/how-to-store-time-series-data-mongodb-
vs-timescaledb-postgresql-a73939734016/)

[5] [https://blog.timescale.com/blog/time-series-data-
cassandra-v...](https://blog.timescale.com/blog/time-series-data-cassandra-vs-
timescaledb-postgresql-7c2cc50a89ce/)

------
gigatexal
How does TimescaleDB work as a traditional OLTP db? Can I run general
analytical queries on it and leverage its distributed nature? Or is it better
for single table append only workloads?

~~~
atanasovskib
The Hypertables and Distributed Hypertables can be used to store any kind of
data, but works best as long as it has a monotonously increasing partitioning
key (e.g. time), with high ingest load, few data modifications (preferable
bulked)

The beauty of TimescaleDB being built on Postgres is you can have your regular
Postgres tables (OLTP schema) and time-series data (Hypertables) live side by
side. Use 1 language (1 mindset) to query them, join them, work with them as
you see fit. With Distributed Hypertables (what the post is about) you can now
partition your data to live across multiple servers, and still use your 1
mindset to query all that data.

edit: With the preferred workload you get the most out of TimescaleDBs
advanced features like compression, continuous aggregates and data retention
policies. You can use the aggregates to build complex auto-updating
materialized views that are automatically used even when you query the raw
tables also ([https://docs.timescale.com/latest/using-
timescaledb/continuo...](https://docs.timescale.com/latest/using-
timescaledb/continuous-aggregates#real-time-aggregates))

~~~
gigatexal
This sounds like the perfect fit to a write only event log table we stored in
postgres at a previous employer. I pushed to move it to BigQuery but this
sounds like it would have been fine.

~~~
valyala
There is more cost effective alternative to BigQuery for storing and analyzing
big amounts of logs - LogHouse [1], which is built on ClickHouse.

[1] [https://github.com/flant/loghouse](https://github.com/flant/loghouse)

------
shay_ker
I saw that TimescaleDB is mostly C, like other PG extensions. Have you all put
any thought into using Rust? Just curious about why or why not.

~~~
k-rus
TimescaleDB uses heavily PostgreSQL API and hooks, which expose many data
structures, macros and functions. My understanding is that using Rust or even
C++ will require to write large FFI and also maintain it between PG major
versions, which are released every year. Also, just having FFI is unlikely
enough, and will require to write wrappers on top of it to use the best of
Rust and not just another syntax on top of C.

------
jakaroo
How does the multi-node version handle high availability and automatic
failover?

Are those included or are they paid add-ons?

~~~
mfreed
Short answer is: The 2.0 release won't natively support automated failover,
although you can build around using PG tools like physical replication +
Patroni. But these capabilities are certainly things we are working on.

Per the PR notes:

    
    
      The current implementation has many more limitations 
      that will be addressed over time:
    
      - HA and replication has to be managed node-by-node. 
        This will be improved with native replication.

------
tornato7
Awesome. I just hope that one day Amazon supports it on RDS. I do know that
Digital Ocean does!

~~~
mfreed
An important clarification is that Azure, Digital Ocean, Rackspace (Object
Rocket), Alibaba Cloud -- which all support managed TimescaleDB today -- only
offer the Apache-2 version of TimescaleDB.

Many of the more advanced features of TimescaleDB, including this distributed
options, is released under the _Timescale License_.

All code under the Timescale License is also source available and people are
free to use, incorporate into their commercial SaaS services, distribute, etc.
with the _primary_ limitation being if you are offering TimescaleDB as a
hosted DBaaS (like RDS, Azure Postgres, etc.)

Instead, Timescale Cloud is the place to get TimescaleDB advanced features as
a fully managed DBaaS.

[https://www.timescale.com/products/features](https://www.timescale.com/products/features)

------
dhkl
Very happy to see Timescale making more features available in the community
edition.

We first started evaluating time series databases a month or two ago, some
features like continuous aggregation (rollups) were enterprise only. Perhaps
their strategy is to drive adoption and letting people try their feature out,
hoping that some of these adoptors will end up using their managed solution. I
checked their pricing, and the delta between their pricing and the underlying
AWS instance seems quite reasonable.

We ended up testing Influx first, because it seems to be a safe choice with
wider adoption and extensive documentation.

With Influx, it was very easy to put together a prototype quickly. But once we
started throwing some real workload at it, it would lose writes under load.
But it makes sense that it failed, because according to Influx's documentation
([https://docs.influxdata.com/influxdb/v1.8/guides/hardware_si...](https://docs.influxdata.com/influxdb/v1.8/guides/hardware_sizing/)),
we would need cluster to make it work. Influx is very transparent in their
documentation that writes and queries will fail immediately when a server is
unavailable without cluster.

This isn't to say that Influx wouldn't work for other use cases. But at least
in our use case, their open source offering isn't suitable for us, and it's
unclear how much better the cluster version is.

Timescale, on the other hand, was able to handle the same workload under
stress. As we are unable to backfill some of the ingressing data, it's quite
vital that the system can degrade more gracefully.

For my use case, one feature that still need some work in Timescale is their
real time aggregation. It is currently impossible to define a rollup on top of
another rollup, which means that if you are ingesting a lot of data into the
raw table, and you down sample into a wide time bucket (e.g. a day, or week),
queries against these wider buckets will potentially ended up having to query
a lot of data points, slowing the system down considerably. Granted, it is a
new feature that just got released about a month ago. Hopefully, with multi-
node nearing completion, continuous aggregation will get a bit more love.

I spoke with their engineers about this over Slack, and their suggestion was
to manually modify the rollup materialized view to aggregate over a
combination of the materialized buckets (currently handled by the continuous
aggregation) + real time aggregation from a higher resolution bucket.

We are still testing out Timescale, of course. But so far, it's been holding
up its end of the bargain. The fact that Timescale is "just an extension"
built for Postgres also makes it a less risky choice and offers a lot of
flexibility; if Timescale doesn't work out, we could still work with Postgres,
and that IMHO is a very nice thing.

~~~
akulkarni
Thanks for the feedback. Really glad to hear that your experience is going
well with TimescaleDB! Feel free to ping me directly ajay (at) timescale.com
if there's anything I can do to help.

------
EGreg
So does this mean anyone can join this giant snowball?

Like IPFS or MaidSAFE or Dat or Bittorrent?

------
xyst
If this is just postgresql, why would I use this implementation over
postgresql?

It also seems like it scales linearly with some decrease in ROI after 12
nodes.

~~~
edoceo
It's not "just" PG, it's like a mega-extension with data-types and tuple-
layout and a tonne of magic for the data-domain.

You can, of course, make similar models with plain PG - in the same way one
does GIS without PostGIS.

~~~
akulkarni
"mega-extension" <\-- I like that!

------
valyala
Multi-node TimescaleDB is a great contribution to open source world!

BTW, it would be great comparing multi-node TimescaleDB to VictoriaMetrics
cluster [1], which is licensed under vanilla Apache2 open source license [2].

[1]
[https://github.com/VictoriaMetrics/VictoriaMetrics/blob/clus...](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/cluster/README.md)

[2]
[https://github.com/VictoriaMetrics/VictoriaMetrics/blob/clus...](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/cluster/LICENSE)

