
New Book from O'Reilly: Programming Collective Intelligence - joshwa
http://radar.oreilly.com/archives/2007/08/programming_col.html
======
neilc
I could do without the Web 2.0 hype, but the table of contents does look very
interesting: it's basically an introduction to a particular subset of machine
learning/data mining. I'd love to see more focus on solving these kinds of
_hard_ problems, and less on rounded corners and AJAX, among the "Web 2.0"
crowd.

~~~
trekker7
Exactly! "Web 2.0" has somewhat of a bad image because people think of the
flash instead of the substance. "Web 2.0" actually has a lot of deep
technology to it; I think "harnessing human intelligence" is a technical trend
that will continue for decades. The hard part is figuring out unique, useful
ways of algorithmically working with user generated data, which in turn would
lead to new categories of applications.

------
henning
"the defining moment of the Web 2.0 revolution was Google's invention of
PageRank"

Huh? PageRank dates from the late 90s. And it's implemented in a compiled
language (C++, I think -- not even GC'd).

And it involves elementary linear algebra. Heresy!

Now, as for the book itself. It looks like it tries to cover waaay too much --
support vector machines and other forms of supervised learning, unsupervised
learning/clustering, optimization and evolutionary computation, applications
to collaborative filtering, along with using particular libraries. There's no
way you could cover all that in 358 pages!

I gave a talk on evolutionary computation to a Ruby users group and their jaws
literally dropped. Web people don't give a shit about something if it doesn't
involve HTTP, a programming language that's currently en vogue, JavaScript, or
relational databases (really only MySQL or maybe Postgres).

If you're interested in an introductory data mining book with a practical
focus, may I instead suggest Witten and Frank's "Data Mining: Practical
Machine Learning Tools and Techniques"? It uses Weka throughout, which is
mature and nice.

~~~
snifty
Just because people are interested in "Web2.0" doesn't mean they're not going
to be interested in the kind of stuff in this book.

In my opinion, interest in this sort of material is going to be propelled
forward by Web2.0. It's not at all true that web programmers "don't give a
shit" about mathematical modelling, etc. It's just that there hasn't been much
accessible code for people to look at (much of it tangled in obscure, obtuse,
proprietary academic and military labs).

It's true that this book seems to be attempting to cover a lot of topics, but
that's probably the point: a decade ago, there was no popular concept, let
alone interest, in relational databases. That changed because the street found
a use for it, as the saying goes. Anyway, I'll second your Witten and Frank
recommendation, cool book.

------
seiji
It looks like a stripped down version of AIMA (<http://aima.cs.berkeley.edu/>)
with two differences: It doesn't cram everything into an "agent based
approach" and it gives usable examples right away skipping theory (no 100
pages devoted to first order logic).

In the sample text they mention Hot or Not, Google, Amazon, Netflix, and a few
other companies to quickly give a "real world" view of what is useful. The
book certainly will get more people interested in doing more sophisticated
computations on their data. (Not that I think AIMA using Romanian cities or
endless "sue is pat's mother. sarah is pat's daughter => sue is sarah's
grandmother" examples make the material seem less useful.)

~~~
neilc
It doesn't seem like a "stripped down version of AIMA" at all, IMHO. AIMA is
an AI book, this is an introductory data mining book. As such, this book talks
about algorithms for clustering, classification, feature identification,
collaborative filtering, etc., none of which are really addressed in depth in
AIMA. There is a relatively brief section on learning techniques in AIMA, but
it doesn't go into much depth, and is more focused on reinforcement learning
than on typical data mining techniques (automatic classification, clustering,
etc.)

~~~
plinkplonk
Does the book go into any depth on these issues or is the coverage
superficial? Is there code? Thanks in advance

------
codeslinger
Its not out yet. I bought it on Amazon and then they told me a couple days
later that it would take until November to actually receive it. I cancelled
the order.

~~~
joshwa
It's out-- see Tim O'Reilly's comment (currently the last one):

For those of you wondering whether to buy from Amazon or directly from
O'Reilly, I heard from our Amazon sales rep that Amazon is temporarily out of
stock, and is in fact showing as "not yet published." She wrote in email:

"Since I've heard from several of you regarding Programming Collective
Intelligence and the status on Amazon, I thought I better send out a quick
little note to explain the "glitch." As most of you have seen, Amazon's detail
page for Programming Collective Intelligence is now showing as a pre-order but
just last week it was "available." ... Here's what happened as it's been
explained to me. Apparently, Programming Collective Intelligence ran out of
stock as quickly as it was received in. Because it ran out of stock so close
to the expected pub date, the system threw it back into a pre-order status."

~~~
codeslinger
I tried to get it from Amazon again. Lets see if they can handle it this time
;-) I'm not going to buy it from Oreilly b/c I get free (amortized) shipping
from Amazon _and_ its $13 cheaper.

~~~
neilc
FYI, my order from Amazon shipped today (I ordered it 5 days ago).

------
herdrick
Holy cow - this could be really good. The content preview is giving me a
feeling of wordiness though. Next time I'm in a bookstore I'll thumb through
it to see if it's succinct enough.

~~~
Goladus
I thought the same, however it still may be useful as a survey book. It's
something I might read as a starting point. Sometimes it's discouraging to
slog through a technical paper only to discover that it's not going to help
you much.

------
vikram
try it on safari.informit.com free for 14 days.

