
Show HN: Unplugg: An automated Forecasting API  for timeseries data - mgontav
http://unplu.gg/test_api.html
======
syntaxing
I see from the earlier comments that this runs some sort of "optimized" ARIMA
model. Is there anyway to output the statistical information of the fitted
model though the API?

~~~
mgontav
Not really. We're directing this more towards a completely automated use-case
of forecasting with no human interaction, so it's not on our plans to release
internal parameterization of the forecast.

~~~
syntaxing
Hmm, I think at the minimum, the variance of the forecasted results should be
obtainable. I'm not sure how many people would use a black box model without
knowing some sort of performance/statistical confidence metric.

~~~
mgontav
We could return a confidence interval for the forecasted values, you're right,
it can help in cases where it would be used for simple anomaly detection (and
it would give a greater sense of security/control over using the forecast).

I do believe we have some similar feature in the pipeline for development,
I'll make sure to push it forward. Thanks for the feedback.

~~~
blparker
I think a confidence interval would be crucial for the forecasted values.

------
y7
Why is there no mention of the model you use?

~~~
mgontav
Our goal is to keep it as simple as we can, keeping the worries about model
selection and tuning on our side as much as possible, therefore we don't go
much into those details.

I can share that our platform is built on top of ARIMA models, but with a lot
of pre-processing work done previously to try and figure out automatically the
best parameters to use, as well as a lot of previous hand-tweaking done by
ourselves in-house using different datasets (we started out tuning it for
forecasting energy consumption, but figured that the resulting models were
performing well enough to warrant testing in other domains).

Right now we're opening it up for testing to get more feedback on its
performance, so feel free to shoot any more questions or feedback.

------
ashnyc
@mgonatav . we are building an internal ERP for our manufacturing business. We
use our sales data to try and predict what our future sales will look like. We
try and produce what we think we are going to be selling in the next few
months. Right, if we sale 3 items a day we just do a straight math and assume
we will sale 3x10 in 10 days . I would like to talk to you and see how your
service can help us

~~~
mgontav
Sure thing, shoot me an email at mgontav@unplu.gg and we'll see how we can
help you out.

~~~
ashnyc
Sent you an email. Thanks

------
keredson
{ "timestamp": 1458000000, "value": 63.422235 },

dear lord, why? this reminds me of the old "xml binary format" joke:

<byte> <bit>0</bit> <bit>0</bit> <bit>1</bit> <bit>0</bit> <bit>0</bit>
<bit>1</bit> <bit>0</bit> <bit>0</bit> </byte>

~~~
watty
I don't get the snark - what's the glaring problem with the format? I work
with sensor data at my job and very rarely is it uniformly distributed so we
use a similar format.

~~~
keredson
Because it's horribly inefficient. It's using 49 bytes to encode 8 bytes worth
of data. If your data set is a few hundred observations this likely doesn't
matter. But most users of timeseries data have millions or billions. (I come
from a computational finance background.)

Even if they were wedded to JSON for some reason, they could have just used a
list of observations, like:

[1458000000,63.422235],

That would have cut their data costs in half.

Or just use one of the many existing formats for transmitting time series
data. It's not a new topic.
[https://github.com/mobileink/data.frame/wiki/What-is-a-
Data-...](https://github.com/mobileink/data.frame/wiki/What-is-a-Data-
Frame%3F)

~~~
watty
This is an API for very small datasets (daily time series data). The goal
should be accessibility and readability over saving a few bytes.

I'm not saying it's ideal, I just think the snark is unwarranted considering
how common it is. I just checked InfluxDB and they follow a similar model
(even more verbose).
[https://docs.influxdata.com/influxdb/v1.2/guides/querying_da...](https://docs.influxdata.com/influxdb/v1.2/guides/querying_data/)

Checked a few more and I believe they're the same - Microsoft IoT, Predix
(GE), etc.

~~~
keredson
that example you give does not follow a similar model. it defines the columns
once (not repeated w/ every observation):

    
    
                        "columns": [
                            "time",
                            "value"
                        ],
    

and then the observations as a list of lists:

    
    
                        "values": [
                            [
                                "2015-01-29T21:55:43.702900257Z",
                                2
                            ],
                            [
                                "2015-01-29T21:55:43.702900257Z",
                                0.55
                            ],
    

exactly as i suggested in the "even if they were wedded to JSON for some
reason" section of my original explanation.

------
dardien
This is interesting but, if I give you for example 12 months of data (Evenly
distributed), how much time in the future will it be able to forecast?

~~~
mgontav
We allow for you to specify the limit of the forecasting period, so you can
experiment with that.

However, due to how we model the forecast, it isn't realistic to expect ultra-
long term predictions, as eventually the forecast will revert to the mean of
the series.

In a more practical note, we have seen good results with forecast windows in
between 1/4 and 1/8 the size of the historic data given. So, in your case you
could expect between 1-3 months of forecast.

------
tommynicholas
Very cool, something I would definitely use in a project I'm working on if it
were a package I could install, and something I would probably use "as is" in
many other instances.

~~~
mgontav
Hey, you can still try it out as is, just request an API key and start using
it. Unless you're worried about sending data out into unknown servers... in
that case I can assure you we keep absolutely no information on the data we
receive to be processed. Our initial use-case of energy consumption
forecasting demanded this kind of data policy, and we're sticking to it.

~~~
tommynicholas
I will definitely try it out, thanks!

~~~
mgontav
Cool! Get back to us with feedback on how it went, we're looking for as much
input as possible at this point.

------
GistNoesis
Is this somehow connected to facebook Prophet? How do you compare?

~~~
mgontav
We've been expecting this question all day long.

As fate has it, we have no connection to FB's Prophet - we at Whitesmith have
been working on unplugg for some time now and decided a few weeks ago that
this week we'd share it on some communities to have more people testing it and
more feedback. It seems that the folks over at Facebook decided something
similar. You know what they say, great minds :p

Joking aside, as intimidating as it might have been to see FB releasing a
related tool, we feel that we still fill a different segment. From what I've
been reading today, Prophet is a tool tailored for timeseries forecasting with
human interaction and input in mind - it can work like a black forecasting box
but it seems that it is the most useful when paired with an analyst that can
keep looking at the output and tweak the model accordingly. It is _really
friendly_ as far as forecasting packages go and trust me, we looked at a fair
amount of them. That and the use of ProbProgramming to infer their params is
just awesome (I'm a fervent Bayesian at heart).

Unplugg on the other hand, fills the need for a "generic" forecasting tool for
uses where you don't want/need much specific tailoring and want a really
Plug&Play solution - it's an API that you can call from pretty much
everywhere, with no dependencies or specific environments needed (so no need
to deploy your own R/Python/Matlab - yikes - environment where your models
live and run). One possible use case would be an energy monitoring portal that
lives completely client-side and requests forecasts to our API on-the-fly
directly from the client.

We are still actively developing and testing different forecasting models -
the one running is just the one we feel most confident about - and will be
looking at Prophet as a possible alternative (although I haven't seen their
license carefully, so can't be sure).

~~~
NumberCruncher
The last time a start up tried to sell us a plug and play generic forecasting
SAAS they made the mistake wanting to impress us and showing us their backend
code. It was the first time in my life seeing spark code but it took only 10
minutes to find the spot being responsible for uncontrolled overfitting making
their product useless. Every time I open a black box analytic tool happens the
same.

------
nodesocket
I'm getting an unexpected error occurred. Here is the sample financial data I
am using: [http://pastebin.com/W6PJfG3f](http://pastebin.com/W6PJfG3f)

~~~
mgontav
Thanks for reporting it, we believe to have fixed it in the meanwhile, feel
free to keep testing.

