From the article, he admits that this is a hypothesis about how the service might work. It's actually just an introductory overview of naive bayes, and doesn't address an actual use case of the G prediction engine (at least, not that they have confirmed). The actual examples from google all seem to have discrete outcomes, so far.
Naive bayes is almost definitely going to be something that they offer — it seems like it's just a question of 'when'.
Right. At this point you kind-of have to just imagine the workflow (which is actually what I do quite a bit before tackling an analytics problem). You have to envision an ultimate goal of what you want your output to be, an understanding of what's being done to your data, and then make sure you correctly accumulate and format your data to insert into the system. When they flush out their API docs (and let me use their API) I will write another post!
Right. I don't mean this negatively, but your post is not really about Google's prediction tool at all. It's a general setup to Naive Bayes. I understand that what got you enthusiastic to write this up was Google's announcement, but at the end of the day I could remove any reference to google and the post would be just fine. I suppose I got excited, despite knowing about the general procedure already, because I thought this post was directed at how to actually use the google offering. It's just a title issue, I guess.
Sorry about that. A lot of the meta-discussion I had with others (mostly non-techs or semi-techs) this past week about the Prediction API was simply "But what would I use this for?" kinds of questions. That's what this post was meant to address.
Incidentally this is only my second post, and I'll continue to write both on general insights to existing public data, as well as more technical (with-code) posts geared towards those who want to get their hands dirty.
Same for me. Anybody know why this breaks in Chrome?
[edit: looks like the source is truncated. Several closing tags -- including a few divs, body, and html -- are missing. Not sure why Chrome can't handle that though.]
I wanted some tech like Midomi's ou Soundhound's music fingerprinting mixed in with this. Show me new artists that sound similar to artists that I like. Better yet, similar to a mix of artists that I like. Now that would be nice.
Good idea! But I also want people to understand what I'm talking about to some degree :)
This is actually quite difficult to do. First you need to identify which features of a song are representative of its genre (a song might have 3 million of them). Then you need to build a model that can classify songs accurately based on those features. This has to be done in a speedy way, because you know, you don't have time to wait for a few million songs to process...
A good book for this sort of thing is Toby Segaran's excellent Programming Collective Intelligence. It walks you through this sort of fascinating thing with easy examples and clear explanations. It's sprinkled with simple Python code.
If you want a good introduction to Naive Bayesian classifiers, there was a pretty readable explanation in Artificial Intelligence: a Modern Approach. It's an expensive book, but I'm sure you can find a copy in any well-stocked university library.
Naive bayes is almost definitely going to be something that they offer — it seems like it's just a question of 'when'.