

Ask HN: Is there any demand on time-series data mining? - chaotic-good

Hi HN!<p>I&#x27;m building time-series database. It seemed that there is some demand for solution that can store metrics and draw graphs (something like graphite or influxdb).<p>My db is different. I&#x27;m trying to create database that can be used to mine time-series data efficiently (at low cost and low effort). It will be able to perform similarity search (given one time-series find the most similar), motif search, clustering and so on. Functionality is very similar to jMotif only implemented as distributed database. No graphing and no interaction with graphite so far! :)<p>Is there any demand for such thing in software industry?
======
liori
Lots. If you could make one, it would sell quite well. This leads to question…
why nobody did so earlier?

The problem is, that everyone's problem is different. "Time Series" is a very
broad topic; you apply widely different algorithms to time series related to
social activities (e.g. energy usage over the year; lots of seasonality,
anomalies), financial activities (e.g. stock prices; modeling various market
limitations), machine-generated events (e.g. server logs; text parsing,
usually simple statistics), scientific experiments (e.g. intensity of an
observation in a short time span; very specialized algorithms for basically
any experiment). Moreover, even in a single class of data, most actually good
ML algorithms require heavy tuning in terms of setting parameters, evaluating
performance, etc. There are algorithms that don't have many knobs to tune, but
their performance is often subpar too.

In the end, you'd end up with a product that either is so generic that it
doesn't do anything well, a product that basically only stores data and does
simple statistics (influxdb, druid…), a specialized product for a specific
market (there are already lots of them, e.g. for server logs there's Splunk,
which is basically `grep` on steroids).

I'm a pessimist here, but only because I'm actually working with time series
data in two of those settings (server logs, scientific experiments), and had
to evaluate what's on the market (I use a commercial package for the first
one, and I ended up writing my own scripting in `R` for the second because,
seriously, there's no software package that would be powerful enough to have
all the knobs I need, yet be simple enough to use). I'd love to see an
actually featureful time series software, but I fear starting from a generic
"Time Series" database will bring nothing new to the market.

