

Google Launches BigQuery and (AI/ML) Prediction API - timr
http://googlecode.blogspot.com/2010/05/bigquery-and-prediction-api-get-more.html

======
hooande
If I understand the Prediction API correctly it's a text classifier. I would
imagine it's a bayesian classifier, like the kind found in most spam filters.
I don't know how accurate it is...it would be nice to see how it performs on a
standard corpus (like the 20newsgroups).

If you're handy with linux and you don't want to deal with their API, you
could probably just download and install rainbow
<http://www.cs.umass.edu/~mccallum/bow/rainbow/> (though it is a very
difficult installation).

~~~
sketerpot
It can be used as a text classifier. It takes as input a collection of
(output, input1, input2, ..., input_n) tuples, stored in the newly-announced
Google Storage, and then uses a variety of machine learning algorithms (which
I would bet includes some Bayesian stuff) to build a model which it can use to
take (input1, input2, ..., input_n) tuples and predict the corresponding
output.

So, that's the API. You can do a lot of the same things offline, with almost
the same file format, using Weka:

<http://en.wikipedia.org/wiki/Weka_(machine_learning)>

So if you're interested in playing around with Google's Prediction API, you
should probably download Weka and fiddle with it some. It's pretty easy to get
started with, and it will definitely give you an idea for the sort of thing
you can do here.

~~~
smokinn
Another interesting project is crm114: <http://crm114.sourceforge.net/>

It's ridiculously easy to use. I used it for identification of spam/scam
messages and setting it up was just 5 lines of code.

I wrote a blog post about using it here: <http://smokinn.com/blog/post/253>

------
fjabre
The prediction API looks _really_ interesting.

Can anyone think of some interesting use cases and the implications of making
something like this available to the masses?

------
dschobel
The Prediction API has the potential to be really cool.

Wonder how concerned the Directed Edge (YC) guys are about Google entering
their space.

~~~
sketerpot
Directed Edge is a lot higher-level than this, and more focused on making it
easy for their users to make good recommendation engines. Prediction API is
focused on more general tasks, and for that reason I predict that it'll have
trouble competing with Directed Edge. Unless some startup comes along and
competes with Directed Edge using Prediction API as a back-end -- but we can
talk about that later, if it ever happens.

Prediction API is a nice broad tool, but Directed Edge still has a big
advantage in their niche.

------
Jun8
Too bad the data has to be on the Google Storage Service. It will be some time
before people like me will get their hands on an API key for that.

------
holdenk
I would love to play with this [for a side project], any ideas on what the
timelines are for access for small fish like me?

~~~
sketerpot
Download Weka. The file format is almost identical, and it has a nice GUI to
get you started fast. Google's algorithms and infrastructure may be more
sophisticated and scalable, but if you want something a lot like this for a
side-project, Weka is easy to start with and it's available _right now._

<http://www.cs.waikato.ac.nz/ml/weka/>

