

Machine Learning Fairy Dust - stdbrouw
http://stdout.be/2011/07/18/machine-learning-fairy-dust/

======
law
What makes me really nervous is that we're nearing the point when Google's
Prediction API and its knock-offs will increasingly pervade web sites much in
the way that AJAX and other technologies have. While overuse of AJAX and the
Facebook "Like" button is extremely annoying, it's still pretty harmless.

Machine learning, on the other hand, isn't innocuous. In order to use the
Prediction API, you need a large corpus of data, which will just further
incentivize web sites to ignore the privacy implications of their actions.
Machine learning is far too abstract and too much of an "umbrella" term for it
to be anything but careless to refer to it as some sort of panacea.

If you thought that Facebook's "Beacon" was a slap in the face to online
privacy, just wait until you see what the feature holds. Once machine learning
libraries with extremely robust, completely unsupervised classifiers become
more abundant, we're going to see an exponential increase in the market for
data. Banner advertisements will be replaced with much more terrifying
'targeted' ads, and we will enter into an age where we are judged not by the
empirical evidence of our actions, but the inferences made from people who
behave like us.

~~~
bluekeybox
> Banner advertisements will be replaced with much more terrifying 'targeted'
> ads

Can someone please explain to me why ads for stuff I might actually be willing
to buy (as opposed to hyper-annoying junk thrown at me every day) terrify so
many people?

Not that I am ambivalent to privacy issues; just playing devil's advocate
here.

~~~
makmanalp
First, everyone assumed that no one knew knew about anything they did on the
net. They thought that when they looked at an oven mitt in an online store,
only they knew about it. The truth was that information was just unused by the
store, and so the user never saw it.

Gradually, this notion became slowly dispelled when stores actually started
leveraging this information to provide, for example, suggestions. It's all
fine and good here.

Finally came targeted ads. The part that people find terrifying is when the
suggestions are "following them around!". This is creepy in multiple aspects:

The first is that people don't quite understand _how_ this happens (of course,
_we_ do). How is this information showing up in different websites? Did the
store just let these other sites handle my information? In their eyes, the
boundaries for who is allowed to my personal information seem to blur. This
also breaks the paradigm of location that the user has in their mind. "If I
don't go to site X that is, I won't see anything about site X." It looks like
everyone knows everything.

The second is that it's creepy-through-analogy. The fact that it's going
wherever you're going and nagging you constantly is weird. When I walk into a
retail store, I'm usually asked if I want any help, and I decline politely.
If, however, the salesperson keeps approaching me and trying to sell me things
that I don't want, I get the fuck out. With targeted ads, the average user
can't do that! The creepy sales guy is following you out to the street and
into your home. This usually ends in extreme frustration.

Finally, there are also some nuances that targeted ads miss. Targeted adds are
actually not that targeted, they're just there to grab at the "low hanging
fruit" customers that are on the verge of making a decision. Just because I
looked at a dildo once because I thought it was funny doesn't mean I want to
be bombarded with the world's finest penis emulators for the next 3 days.

~~~
lotu
So use, ad block and/or incognito mode. Problem solved. And don't say that
most people don't know how to use these features. While true anyone, who is
concerned about this is totally free to ask other people or even pay them to
explain and solve the problem for them. The fact that they don't suggests to
me that they _don't care_. Most of the whining about privacy is more the
poster being upset that not enough other people are concerned in the same way
the poster is.

~~~
makmanalp
I'd like to point out that I came by the above insights after a conversation
with my aunt (who is in her 40s), accompanied by a few other older relatives.

I do actually care but those statements were not a reflection of _my_ cares,
it was of theirs. While this is not equivalent to a comprehensive study of
average computer using people across the world, they sure as heck cared but
didn't even begin to know where to start, in contrast to what you are hoping
will happen.

------
hammock
There are NO shortcuts. This is a fantastic article, and he lists a lot of
good examples- machine learning, "social", crowdsourcing, AJAX, real-time.

I would add to that list "create a forum." Maybe that's part of "social." In
marketing I hear it all the freaking time- you get a half-ass mediocre idea
and it always includes some type of "forum" your customers will recruit
themselves into somehow, and start to form a community. Most of these people
have never been on a forum so I can't blame them for not knowing how it works,
but it is a challenge.

------
dholowiski
I was nodding my head u til I got to the end - it seems like the google
prediction API _is_ the magic fairy dust we've been waiting for?

~~~
athst
I think his point is that it's just a tool that makes it a little easier for
startups to incorporate machine learning into their products - like he said,
it may be appropriate for some types of problems, but not all. But I'm sure
we'll start to see more tools like that become more widely used.

When AJAX first came out, not everyone knew how to do it - but now, everyone
can drop in jQuery and do all sorts of complex things relatively easily.

~~~
dholowiski
I guess that a good point. I wonder what kinds of bad implementations of the
api we'll see. What a great revenue stream for google - what startup won't use
the api in some way? I know i'm setting it up tonight and using it on at least
one project.

~~~
taliesinb
Judging from their forum activity, they hardly have any usage.

------
wccrawford
I think 'machine learning' is so complex that people just don't feel like
trying to explain it. That, or their business secrets are tied up in it, and
they don't want to give away the golden goose.

~~~
_delirium
That's an explanation for some of the examples, but I think a lot of the times
it's actually really simple, along the lines of, "we sift through some data
and correlate it". The odd thing is, that often works, especially for user-
facing perceptual stuff where there's a strong placebo effect, even more
especially if you salt liberally with some hand-tuned biasing. Sort of how The
Sims is able to use some super-simple algorithms to give the impression of
interesting characters.

However, if you _do_ need some real magic to be done, and your product really
won't work without it, then things get trickier; bad statistics, or at least
statistics not really used correctly, is really common in the innards of these
kinds of products.

------
j_baker
I think this is usually a case of marketing having a bit too much say in
product discussions. In the publishing industry, it seems like "My widget does
X" doesn't get as strong a reaction from publishers as "My widget does X _and_
it adapts to your readers".

The problem being (of course) that people forget how hard a problem machine
learning can be.

------
the_cat_kittles
Its fine if people want to say that ML will take care of the "details" ...let
them try to use ML right and they will see you need to spend a long time
understanding how to do things right. Most of the time, you can't use linear
regressions right out of the box, let alone SVM's.

~~~
arasraj
Agreed. The use of ML is highly dependent on the data. Having a something like
the Prediction api is fine, but seems like the use-cases would be rigid.

~~~
taliesinb
Yup, and if commit to it and suddenly realize you need a little more
flexibility than the API provides, you're probably in a worse position than if
you rolled it yourself.

Real-world ML is so full of black magic and hackery that it's the LAST thing
I'd try to sell as a web service.

------
danw
The cloud will solve it

