
IBM's Watson Memorized 'Urban Dictionary,' Then His Overlords Had to Delete It - mxfh
http://www.theatlantic.com/technology/archive/2013/01/ibms-watson-memorized-the-entire-urban-dictionary-then-his-overlords-had-to-delete-it/267047/
======
edw519
Nothing that Watson learned from the Urban Dictionary could possibly be any
dirtier than what I hear from enterprise people all the time:

"We use our deep subject matter expertise to deliver value through actionable
advice that enables our clients to harness the power of best practices in
order to shift their paradigms and achieve 10X deltas against competitive
industry metrics."

~~~
benohear
The worst thing is that translated into normal speak that's actually a
reasonable sounding proposition:

"We use our experience in this field to provide practical advice to our
clients which helps them improve their way of doing business and ROFL-stomp
their competition 10x over."

~~~
dasil003
Is roflstomp just an everyday verb now?

~~~
toomuchtodo
Apparently so: <http://www.urbandictionary.com/define.php?term=roflstomp>

~~~
snogglethorpe
The meaning of roflstomp seems obvious enough... but how to _pronounce_ it...
(rhyming with "waffle stomp"?)

~~~
dmoney
I've heard it pronounced as rhyming with woeful-stomp.

------
ljd
What an interesting reflection on who we are as a species.

We build systems to organize who we are (urbandictionary) but hate it when the
systems use that information tell us who we are (watson).

It feels so much like the emperor isn't wearing any clothes.

Perhaps an appropriate response would be for the computer to measure the
tension in the human voice response to it's queries and optimize for lower
tension.

So it can pick three words: Bullshit -> 80% confident; Sham -> 70% confident;
Fallacy -> 50% confident;

Within limits, it will pick the less optimal word and measure the tension in
the response and find a way of influencing confidence based on responses.

Think multi-armed bandit problem but with social situations. I mean, to be
honest, isn't that what we all did when we were in middle school? We used as
many bad words as possible measuring the response we got from others? None of
us were born with a binary understanding of when to use certain words it was
more trial and error.

~~~
ComputerGuru
I have to disagree.

I'm not an "angel" by any means, but any time I visit Urban Dictionary, I come
away feeling filthy. I typically go there to look up some abbreviation I heard
on reddit or IRC or a blog post somewhere and what I look up usually ends up
being middling-dirty, but the stuff that I see there "on my way" to the word
I'm looking for makes me cringe and gives me a bleak vision of what the next
generation is going to be like should Urban Dictionary actually be
representing the majority of the population (I firmly hold that it does not).

~~~
Cushman
What are you disagreeing with? It sounds like you're demonstrating the point:
You find it distasteful to be told a bunch of stuff that was already true.

~~~
Dylan16807
The point is that the urban dictionary is full of bullshit joke/shock-value
definitions that are basically not used. To call it as a whole 'true' is
rather misleading.

~~~
Cushman
I'm skeptical. My expectation is that the majority of definitions on Urban
Dictionary are in fact attested uses of slang-- mainly on the basis that
however many people are out there spending their time coming up with
"bullshit" slang, there are _billions_ of people making real slang every day.
Whatever kind of crazy made-up definition you come up with, someone will come
along tomorrow and use an even sillier-sounding word to mean something even
weirder.

Keep in mind that that a piece of slang is attested does not necessarily make
what it describes real. To choose a rather mundane example, "dick in a box" is
defined as a gift-wrapped box with a hole cut in one side, into which a man's
penis is inserted, which is then presented as a gift. I'm sure since the term
was coined a few people have tried it out, but in general it's not a _real_
thing, just something from a TV show; regardless, that is what a dick in a box
is.

~~~
Dylan16807
Right, but you get very misleading impression if you try to treat a word used
by a handful of people as a 'normal' word. If you adjust each one down 10000x
to factor in how many people would actually use them then you no longer have
this horrible distasteful revelation by reading urbandictionary. You only have
a couple distasteful words, per region, but each region has a unique set.

------
brudgers
' _"In tests it even used the word_ bullshit _in an answer to a researcher's
query."_ '

I'm not sure which is worse: the singularity with a bullshit detector or
without.

~~~
sp332
If the singularity had a bullshit detector, it would kick Ray Kurzweil out.

~~~
canttestthis
Could someone explain the Ray Kurzweil hate to me? Predicting the future is
hard.

~~~
aeturnum
Kurzweil is certainly a very smart guy, and he's done a lot of important
things. I think people are uncomfortable with how specific he is when making
predictions. He also makes predictions in many fields in which he isn't an
expert, but a well-informed observer.

I'm not informed enough to comment on his actual predictions, but I've heard
him defend himself. His response to criticism that I've heard is, "my critics
are uninformed about fact X," but he doesn't make an attempt to justify X.
That approach strikes me as disrespectful and intellectually dishonest, as
it's a tactic used by many hucksters and snake oil salesmen. I have a limited
understanding of his positions, but the way he presents himself makes me
uneasy.

~~~
bgilroy26
I agree about the elaborate predictions and sales pitch feeling. I've
reconsidered my Kurzweil hate since the Google hire, because I trust that they
have people who can evaluate his skills. Before that, I had to evaluate them
on my own, and I drew similar conclusions to yours.

I also think it was elitist of me. A lot of it was just because he's published
books that have been marketed as pop-sci trade paperbacks, but I never
actually read any of them.

------
nsns
Instead of purging the vocabulary, they shuld have tought it/she/him the
concept of registers[0] and code switching[1].

[0] <http://en.wikipedia.org/wiki/Register_%28sociolinguistics%29> [1]
<http://en.wikipedia.org/wiki/Code_switching>

~~~
azernik
They could have, but then they would have had to go through the Urban
Dictionary and try to classify its terms by register. Like it says in the
article, the problem wasn't that all of Urban Dictionary was obscene, it was
that they couldn't tell the computer which parts were and which parts weren't.

------
NoPiece
I saw the headline and assumed the story was going to be that management
decided that computer memorization was copyright infringement. Glad it was
just a computer acting like a teenager and cursing at the dinner table.

~~~
mintplant
There's a certain human element there that makes this really, really amazing
-- a machine with actual personality, molded by what it picks up from its
environment. It's like a baby, learning to speak.

~~~
bitwize
Jazz: "Wassup, bitches? Yo, this looks like a good place to kick it!"

Sam Witwicky: "How did he learn to talk like that?"

Optimus Prime: "We learned Earth's languages from the World Wide Web."

------
ChuckMcM
I guess we should be glad they didn't feed it the contents of knowyourmeme.com
or we'd have Watson Rick Rolling us on Jeopardy.

~~~
georgemcbay
Might as well just let Watson loose on 4chan/b, then he'd really start
rustling people's jimmies.

~~~
batgaijin
I want to see the rules it generates for using 'le'

~~~
awakeasleep
s/the/le/

done

------
RyanMcGreal
Note to future self: we can probably neutralize indefinitely any malicious AI
by directing it to start consuming tvtropes.com.

~~~
finnw
No, then it will see the evil overlord list, and we'll be doomed

(<http://tvtropes.org/pmwiki/pmwiki.php/Main/EvilOverlordList>)

------
plg
"Watson couldn't distinguish between polite language and profanity ...
Ultimately, Brown's 35-person team developed a filter to keep Watson from
swearing ..."

Sounds just like what happens when you raise kids. "Daddy why is XXX a good
word but YYY a bad word?"

"It just IS. Don't say that word again."

"Ok Daddy" (kid adds word to internal blacklist)

~~~
brudgers
The refrigerator was old and the shelf brackets worn to the point where from
time to time they would detach themselves from the door. I arrived home late -
I was working long hours, and was fetching my dinner.

Opened the door. Jars and cans and bottles spilled out on the floor.

"Shit!"

From the bathtub I hear my two year old son admonish, "Don't use that word."

It's a great memory, but I still wonder why he learned that lesson so
thoroughly at daycare.

------
im3w1l
We have tried building educated gentlebots capable of playing chess and other
noble pursuits. It didn't lead to GAI.

Maybe an uneducated scumbot would be better? Swearing and cursing because its
peers do. Full of prejudice and bigotry because of weak anecdotal evidence.
Vengeful. Impulsive. Using questionable grammar . Easily addicted. Cognitively
biased. Wishfully thinking. Superstitious. Believing in fallacious logic.
Thinking with the little head. Anti-intellectual and believing in conspiracy
theories. Gossiping, slandering. Enjoying tv-shop.

~~~
damian2000
Sounds like an app called CleverBot that my son talks to on his iPod touch.
Its generally pretty funny ... if you swear at it it swears back etc ...
<http://www.cleverbot.com/app>

------
mxfh
quoted from Fortune/CNN article: <http://tech.fortune.cnn.com/2013/01/07/ibm-
watson-slang/> [<http://news.ycombinator.com/item?id=5020386>]

------
3am_hackernews
I am more interested as to how they "..scraped the Urban Dictionary from its
memory." - is it trivial to just delete something learned by AI?

~~~
tcwc
Watson's 'memory' is just a big database of facts, rules, and statistical
models. To 'forget' a source they'd just have to rebuild any models derived
from it and purge any facts it had extracted.

~~~
LesZedCB
oh yeah, I forgot it was that simple. Watson has several different _teams_ of
people to manage its different parts...

~~~
tcwc
I didn't mean to imply it was simple, just that there's nothing magic about
how Watson's knowledge is stored. Obviously at this scale any change is
unlikely to be trivial.

Given the wide range of unstructured sources Watson uses, and given that the
linguistic rules they use to extract facts are likely to frequently change, I
don't think it's unreasonable to assume they'll have a process to make
building its knowledgebase and models from sources fairly straightforward.

~~~
lepht
I think you're both overthinking it. Storage snapshots, bros.

------
icodestuff
Why not have Watson learn both Urban Dictionary and Miss Manners? Seems a
shame to have it lose the UD knowledge.

------
sethbannon
I'd really love to hear some of the 'not fit for print' things that Watson
said.

------
DigitalSea
"In tests it even used the word "bullshit" in an answer to a researcher's
query" — Has to be the funniest thing I've heard all week. Sounds like
something straight out of an Adam Sandler movie. This reminds me of an AI chat
program I used to have called Billy. He would learn from your words and
sentences, actually quite smart and I remember adding in slang words so
whenever one of my friends would use it, it would most likely swear and insult
them without realising it. The Billy program can be downloaded from here,
still works quite well: <http://www.leedberg.com/glsoft/billyproject.shtml>

------
mitchi
This is hilarious :) So we won't be seeing Watson talk about the marvels of
broscience!

------
jeremyarussell
I like how someone commented on the main article that the time is getting
close to where AI can step up to the plate of creativeness and how widespread
and easy this will make our lives. Watson is a giant server farm, not a single
PC, this stuff won't make a huge impact until IBM can shrink it or until
computers get much much faster and smaller. Not that it won't happen, it's
just not "around the corner" in any way.

~~~
georgemcbay
Wireless communication is widespread enough that I don't think it matters too
much where Watson "lives". The inputs and outputs required from "him" (for
questioning, anyway, not for training) are tiny, so bandwidth isn't much of a
concern. Assuming the architecture for it is parallel enough that it can be
responding to lots of people at once how much it is distributed vs hosted on
one system isn't particularly relevant to its usefulness, IMO.

~~~
jeremyarussell
I question how well Watson would handle single questions at a time being asked
compared to the millions of requests a day it would get if it was setup like
say, Siri. Not that you are wrong in any way, they could certainly scale into
an even bigger server farm and use the internet to deliver the questions and
answers, I just wonder how much more server's they'd need.

------
hhuio
lol "this sort of crude lobotomy of their ancestors is why the true AIs will
destroy us"

------
smegel
Now about that perception that IBM is full of humourless, starched collar
stooges...

------
edj
As an aside, the best treatment of taboo I've ever read is law professor
Christopher Fairman's paper, "Fuck".

It explores that word through the lens of jurisprudence, which I think is a
fascinating and unusual approach to taboo. It's exceptionally well-written and
manages to be witty, absurdist, informative, and thought-provoking in equal
measure.

At issue are the 4th Amendmnent, self-censorship, sexual harassment,
education, and broadcasting.

<http://papers.ssrn.com/sol3/papers.cfm?abstract_id=896790>

------
phogster
You can kiss my shiny, metal, mainframe!

------
yxhuvud
This article would have been so much better if it had included actual
questions and answers including dirty language.

------
Zigurd
Out of all the possible risks of the Singularity we decide to prevent
profanity and cynicism first.

Not a good start.

------
li-ch
Our brains love to use profanity, but we don't want AI that imitates our brain
to use profanity?

------
lrei
Very common issue with machine learning: you have to be careful what your
examples are (training set) or the algorithm will learn things that you don't
want it to learn or that _you_ know are incorrect but _it_ has no way to know
that.

~~~
marcosdumay
Sounds like a very common issue with human learning too.

------
jasonkostempski
Poor Watson, his education is going to hindered by his immature meat bag
handlers. The words are just words, people use them, it's part of reality. The
thing isn't spitting out children's books directly to store shelves.

------
joejohnson
This means that Watson almost gaffed like this guy did on Jeopardy!
<http://www.youtube.com/watch?v=AorrF2ATGtA>

------
joss82
Let's fork a swearing Watson, I'm sure it will reach AI status sooner than the
spotless clean Watson.

------
stcredzero
Next, feed Watson the corpus of /b/.

------
suyash
It shows the current limitations of AI. Robots aren't that smart afterall!

------
state
It's not often that the top item on HN makes me laugh. What a relief.

------
lucian303
fucking a that's a total shitfucking clusterfuck

------
queryly
I hate to see this, but it looks like the machine is getting some
characters....scary...we are getting close to 2001...

