
Google Prediction API - Google Code - ColinWright
http://code.google.com/apis/predict
======
drjoem
The service is pretty cool. The training works by uploading your CSV formatted
training data (with labels) to Google Storage. Then you make a call to train
the google service. Google has not said much about what kind of algorithms
they are using behind the scenes, besides the fact that they are using a
combination of a proprietary and open-source ML algorithms. The service trains
up a variety of different models and then uses a voting scheme to decide which
ones are optimal.

A few problems I see (or saw, I havnt used it in a few months) with the
service are the following.

1\. currently, there is no way to pick your cross validation folds. this can
lead to severe overfitting if your data is not i.i.d

2\. they provide a numerical (double) accuracy number which corresponds to the
accuracy estimated from training. how is this number calculated (AROCS,etc.).
They do not say

3\. Security issues - read the fine print of what happens when your data gets
uploaded to Google storage. It could be a cause for concern

4\. Your are competing for resources. When I was testing the API, I would
train two successive models with the same amount of data, and I would notice
one call would complete (asynchronously) after 10 seconds, while the next
would take 10 minutes. This is because your are competing for resources

5\. Currently no way to inject prior knowledge into your models. What if you
know your data is Guassian, you could use an RBF kernel, but with this API,
you cannot, because it might pick the Naive Bayes Classifier and not the SVM,
etc.

In general, this service probably will work for the average SPAM detection
problem, but if you really want a great system, you probably need to keep
everything in house.

------
equark
Are there any public benchmarks of the prediction API against standard
datasets?

------
mrspandex
It's not quite clear to me what this actually does. How does training work? Do
I give it input and actual decisions so that it can decide based on trends or
is it more of a fuzzy match to known data?

~~~
jvandenbroeck
It has been a while since I have checked it out, but you give it training data
(examples + values) and then it will be able to predict values for unseen
examples. Eg. predict if a comment on a blog post is spam.

Actually the google prediction api is already really old; I don't know why it
shows up now on HN

~~~
zoudini
I think it has something to do with this submission's mentioning it
(<http://news.ycombinator.com/item?id=2776254>)

------
rorrr
If it actually works, why wouldn't Google just feed the stock data to it, and
make billions by predicting future prices?

~~~
gjm11
Because "prediction" here doesn't mean "magical prediction of the future", it
means "spotting and extrapolating patterns", and (in so far as the Efficient
Market Hypothesis is true) there are no exploitable patterns to spot in stock-
market data.

(Presumably the EMH is only approximately true, but it's probably close enough
that Google can't make billions that way without a considerable risk of losing
billions instead, and without a lot of effort that they might do better to put
into making billions by more conventional means.)

~~~
rorrr
> there are no exploitable patterns to spot in stock-market data.

[citation needed]

------
bauchidgw
not yet deprecated?

------
drjoem
test

------
DotNetPete1
I thought this was an april fools joke, then I looked at the date.

