Hacker News new | comments | show | ask | jobs | submit login
Google killed me (amywilentz.tumblr.com)
257 points by danso 1287 days ago | hide | past | web | 95 comments | favorite

I believe this has been fixed for a while now. It was fixed by the time I reported this over a week ago, for example.

If you see an error in a Knowledge Panel (the box on the right-hand side), look for the "Feedback / More info" link at the bottom of the box and then you can click next to a fact to report a problem. People do review those reports, and that's the fastest way to report an issue.

Apologies. I posted this because it came up recently in a media blog [insert joke about media being slow].

So it was a manual fix that corrected the issue? I guess that makes since given that her Wikipedia entry didn't have the standardized birthdate listing (at least in the first line) until today (http://en.wikipedia.org/w/index.php?title=Amy_Wilentz&ol...). Would that fix alone been enough to fix her snippet?

I'm not close enough to know what fixed it, but I agree that the non-standardized birthdate probably didn't help.

As Amy noted in her piece: "My Wikipedia entry, oddly, was put up by Cousin Joel, who has a genealogy obsession .... Joel began the entry with my connection to my father, and immediately mentioned my father’s birthdate and the date of his death."

I wonder how many of these items are manual fixes? It would be interesting to see the number of them adjusting the index.

This seems to be Google SOP for big data sets like this, and for the most part it works okay. The other obvious similarity is Maps. Pull what data you can from easily available sources, and then crowdsource the details or inaccuracies.


it's mentioned in the post that the author reported the facts a number of times.

It has been corrected since then - the post is from February 28th, not sure why it's just now hitting HN.

This isn't a real-time complaint.

You can always check the data in Freebase: http://www.freebase.com/m/09v5l5c?links&lang=en&hist...

She probably already knows so is somewhat disrespectful that she has not update the entry saying that Google already fixed her public profile.

Given that she comes across as non-technical and not knowledgeable about how these things work, I think that's quite harsh. Maybe she simply hasn't checked her listing since the article was posted.

Great article, really well written!

This is what happens when the world is ruled by a robot

I take it OP hasn't had the delightful pleasure of dealing with old-fashioned human bureaucracy. My uncle tells some great stories about his time in East Berlin, before the Wall fell.

A great example is the post office that did some renovations, and so delivered the old copper and other materials for reutilization. From then on, the System cleverly decided the yearly productivity goals should include delivering the same amount of copper, each year.

The post office master had to figure out how to deliver the copper for a couple of years, in order to avoid failure in meeting the obviously correct goals, until he finally managed to change them.

So, yeah, at least Google's algorithms have the excuse of not being sentient.

I can't seem to find a good canonical reference at the moment, but there's some interesting writing in organizational theory about the extent to which some of those kinds of bureaucracies are themselves basically mechanical systems, controlled by code that just happens to be manually executed by human operators. In general, code executed by computers tends to be executed more "rigidly", but in some systems humans can also be quite automaton-like in just executing instructions. In those cases it's not clear the human operators, rather than the algorithms they're executing, are in control of the decisions (it may not even be true that any one human knows what algorithm is being executed). Of course, you could argue someone can always change the bureaucratic protocol, but that's even true with computer code: someone at Google can always edit the code, at some times more easily than others.

I had this exact thought while walking to work this morning.

It came to me that the vast majority of modern society is completely independent of the actual people executing it. There are very few jobs that could not be done just as well by another person with the same training.

Most jobs are like coded modules. They have well-defined inputs and outputs - even creative jobs can be defined in terms of "create something people will pay for".

The system is what runs the world, not the individual people who carry it out.

I had an interesting hallway discussion with a SAP integrator who mentioned that SAP sort of starts from this view, but then weirdly feeds back in on itself. One of the ideas of SAP is that business processes do have the kinds of pluggable, formalizable patterns you mention, which can be modeled within SAP. And the official idea is that you go from documenting these processes to formalizing them in code. But then over the years SAP's code starts to become a sort of de-facto framework for "this is how you run a company", and people actually start patterning business processes on how SAP models business processes.

The source code for those kinds of systems is commonly referred to as, "policy."

Policy is the documentation, not the source code.

You can read policy. It may not be accurate.

Some analogies can only be taken so far

Do you have a reading list you could share?

To look at popular culture, the entire article reminded me of the plot of brazil, and the protagonist struggling with the rigid human beaurocracy that refused to accept that it had made an error.

Natural language parsing is not 100% accurate and can make mistakes - breaking news!

All kinds of automated data collection such as this will have errors that need to be fixed manually by a human. This will be taken care of if/when we eventually have real artificial intelligence. The data parsing scripts Google and others use are fairly advanced, but they still break down to a couple rules and finding words in close proximity. There is no real intelligence involved here.

EDIT: The important takeaway here is to make sure we continue to educate people (especially government) as to the issues and unreliability of these systems. Death date showing up on Google isn't a train smash, but if some government agency decides that Google is smarter than them? Then we are in trouble.

> This will be taken care of if/when we eventually have real artificial intelligence

Considering that "non-artificial intelligence" makes the same kind of mistakes, I find that an odd claim.

Mistakes happen. Quite frankly far fewer of them happen in the modern world, due precisely to technology like this. What has changed is actually the converse: automation has become so good that we expect it to be flawless. No one would have found it "surprising" (annoying, sure, unless it was in the context of a zazy madcap comedy film) in 1962 to be mixed up with someone else's biography based on a name confusion.

You're completely correct, great points. It's very easy to fall into the trap of 'human-like intelligence will fix it' when thinking of better machines, but it's fairly clear that human-like intelligence has a lot of failings too.

even if the hole knowledge graph would be filled out by humans, there would still be errors in it, thats just how it works with this sort of big data. No way you could design a computer algorithm which does this job without a single error

I find perverse pleasure in your ironic error, i.e. a hole in your knowledge (or potentially typing skills) related to the distinction between "hole" and "whole".

I actually tried to imagine a hole-based knowledge graph... it sounds really interesting.

there would always be some holes in the graph

AI make more mistakes than humans at some things, but only because, like humans, they are just beginning to understand. I might make similar mistakes reading articles written in German, because I am not yet fully fluent in the language. Similarly, NLP AI is currently not "fluent" in English.

It is interesting that google almost seems to take Wikipedia bio boxes as gospel, which is a questionable decision given that Google "rich snippets" in the search results now have such high placement. I wonder how much Wikipedia specific logic is used in the googlebot...I'm guessing that longevity of article and number of edits is a factor in taking an entry seriously.

> It is interesting that google almost seems to take Wikipedia bio boxes as gospel

FTA, I don't believe the data was taken from a bio box:

> So it’s not too surprising that my original Wikipedia entry, as conceived by Joel, was — let’s be honest — more about my father (a famous New Jersey judge) than about me. Joel began the entry with my connection to my father, and immediately mentioned my father’s birthdate and the date of his death.

That's the scary part, as she alludes to in the next paragraph:

> If your name on Wikipedia is followed by a birth and death date, apparently those belong to you from that day forward, no matter whose dates they may be.

Don't know what is so scary about that. I am willing to bet less than 0.01% of "name[s] on Wikipedia..followed by a birth and death date" aren't the actual birth and death date.

Seems like just an oddmatch caused by an overzealous relative that happened due to the incredibly large amount of data Google processes.

You need to be much more careful in wording your bets.

I think you meant to bet that the simple solution is almost always the correct way to parse the Wikipedia article, but that's not actually what you bet. What you really bet was P(correct_parse) * P(correct_data) + (1 -P(correct_parse) ) * P(wrong_parse_accidentally_giving_correct_data) > 0.9999

A 0.01% error rate is incredibly incredibly accurate. I imagine the error rate of birthdays in Wikipedia bio boxes is higher than 0.0001. Note that, for instance, Wikipedia had a wrong birthday for Jimmy Wales. http://blog.oregonlive.com/siliconforest/2007/07/on_wikipedi...

I _might_ believe that name followed by birth date and death date are the birth and death dates is the correct parsing of the Wikipedia data over 99.99% of the time. However, I doubt your proposed bet would pay off.

I would still stick by my bet as worded. For every Jimmy Wales, there are ten of thousands of people who have no odd circumstances. Combined with the much higher volume of data about recent and more verifiable people than those from a time where records were fuzzier and the fact that Wikipedia is a moving target regarding accuracy, I don't think 1 in 10,000 is unreasonable. Just the pages of every athlete competing in a major league currently would offset most anomalies.

Scary because it takes a cry to the internet to get it corrected. While the issue of being declared dead is somewhat innocuous, it could've been much worse -- what if her dad was a convicted sexual offender, and Google chose to associate that with her bio? What do you think that'd do for any job prospects, especially if she wasn't aware that it was present? It could be downright devastating.

Since it is not opt-in (nor is it opt-out), YOU have no control of it. As the author stated, she requested many times, over several months, to have it reviewed and it wasn't until she clamored about it publicly that it was addressed. What if Google never did correct it?

I'm reading Jaron Lanier's _You Are Not a Gadget_ now (came out two years ago), and the phrase "but if some government agency decides that Google is smarter than them?" resonates. A major insight of this book is that, in fact, the dominant technological ideology does advance the idea that Google and Wikipedia are smarter than actual people. And we are, in fact, in deep trouble because of this.

Not sure if it was still the case but didn't google uses to list the extraction sources? I guess the work that was done with google squared also fed into these boxes? I get the impression then that they do use more than one source if possible.

It is a very hard one as the vast majority of the time the text features observed would indicate a date of birth/ death. I guess if the date is not in the metadata/ sidebar on Wikipedia they could flag it and hopefully find another source to correlate with.

It's likely that Google is querying DBPedia.org instead of trying to manually parse plain text on WikiPedia

Curiously, a couple of weeks since that post was made, Google have obviously updated their results, I no longer get a "Died:" record when I search for "amy wilentz".

Perhaps another case of Google ignoring you until you get a bit of publicity, or just them finally catching up.

It reminds me of an anecdote from Steven Levy's "In the Plex" on a sneaker wearing gnome that was inexplicably making a mockery of Froogle results:

> But one problem was so glaring that the team wasn’t comfortable releasing Froogle: when the query “running shoes” was typed in, the top result was a garden gnome sculpture that happened to be wearing sneakers. Every day engineers would try to tweak the algorithm so that it would be able to distinguish between lawn art and footwear, but the gnome kept its top position.

> One day, seemingly miraculously, the gnome disappeared from the results. At a meeting, no one on the team claimed credit. Then an engineer arrived late, holding an elf with running shoes. He had bought the one-of-a kind product from the vendor, and since it was no longer for sale, it was no longer in the index. “The algorithm was now returning the right results,” says a Google engineer. “We didn’t cheat, we didn’t change anything, and we launched.”


Taking your analogy to its logical conclusion, Google should have hired a looper to kill Wilentz in the past.

Hah, what a brilliant anecdote! Thanks

Just wait till she finds her insurance company or local government has wrong information and see how long it'll take to get a fix. Years ago, the bank which had my car loan got sold to another bank which got sold to another bank. When I finished paying off the loan, they sent documents I needed to the wrong address. I had moved to a different state and the bank had as well, and it took weeks to correct this. I actually had to contact the DMV at my former state and get them to send a letter to the new bank, which then released records, which then allowed me to update my DMV in in CA. I won't even get into the mistakes that Sallie Mae made on my student loan records, or how many times my health insurance provider screwed up.

Machines screw up, but humans screw up too, and sometimes getting the problem fixed in a bureaucracy takes even more effort. I don't think bureaucracies "refresh" their policies/procedures (algorithms) or data as fast as software companies do.

It is only a matter of time before they start to use Google for this information, they already use it for mapping, and have made some mistakes. The fictitious island is one example.

I've been curious to learn how these knowledge boxes work, ever since I saw my dog Lucy show up as the star photo in a search for Rat Terrier:


In fact, on a mobile device she's the only photo. Yes!

Are these generated algorithmically? Some amount of manual curation? Or what?

Of course it is humbling to discover that my dog is much more famous than I am...

Looks like Lucy is the primary exemplar picture for a rat terrier on Wikipedia? http://commons.wikimedia.org/wiki/File:Lucy,_a_Rat_Terrier.j...

P.S. She's a cute dog!

Thanks, Matt, I certainly agree that Lucy deserves every bit of her fame. :-)

Yes, it must be the Wikipedia article. (Of course I'm the one who put her there.)

It was interesting to note that the aspect ratio of her photo fits the tall photo space in the knowledge box better that most dog photos which tend to be more horizontal.

I'd add this to Wikipedia's List of premature obituaries: https://en.wikipedia.org/wiki/List_of_premature_obituaries#W

... except that you can't cite the subject's own blog as a reliable source-particularly when they apparently wrote it post-mortem.

Amy Wilentz also published her essay on the Huffington Post: http://www.huffingtonpost.com/amy-wilentz/how-google-killed-... Not sure whether that helps or not though.

Here's the edit history for the page, where you can see all the edits user Amywilentz made over the last few years. https://en.wikipedia.org/w/index.php?title=Amy_Wilentz&a...

It's not too surprising that google dropped the ball. This entry that she describes doesn't even have her birth year shown, but instead lists a dozen other years. That's rather odd for a short bio. Maybe Joel was trying to be polite... never ask a lady her age? https://en.wikipedia.org/w/index.php?title=Amy_Wilentz&o...

>An error in a Google search “factbox” can only be corrected when Google re-indexes (whatever that means) the information that will update the search. Depending on the size of your website, re-indexing takes either a couple of days, or several months. Like good guys, small websites finish last. Note: my website is small.

I think she misunderstands here. If the data is scraped from wikipedia, it really has nothing to do with her site, right?

Wikipedia has to have sources.

Those are the edits for Amy Wilentz's page actually.

Her contributions are here:


> It all comes down to Google’s algorithm (a word I use carelessly, and frequently, but whose meaning is obscure to me, though I feel it is something mathematical)

Loved this. Amusing reminder of the fact that most people do not speak our language. :)

the one that startled me was "can only be corrected when Google re-indexes (whatever that means) the information that will update the search". it would never have struck me that "reindex" was an overly jargonny term, but in retrospect i guess it is.

I was struck by the same thing. The surprising thing to me was that a journalist/writer didn't figure out the nature of the process given that presumably they would understand the idea of recreating the index of a physical book.

AfD? Is she really notable? (daughter of a state judge seems to be her primary accomplishment; also is an English professor).

“Professor of English at the University of California, Irvine, where she teaches in the Literary Journalism program. Her works appeared in The New York Times, The Los Angeles Times, Time magazine, The New Republic, Mother Jones, Harper’s, Vogue, Condé Nast Traveler, Travel & Leisure, The San Francisco Chronicle, More, The Village Voice, The London Review of Books, Huffington Post. She was Jerusalem correspondent of The New Yorker, and is currently a contributing editor at The Nation.”


I don't think "author" counts. I've written a book but I didn't get a Wikipedia page.

This post was really well written. Just a pleasure on the eyes.

I loved it too, up until this part:

"I’m reading Kafka’s Castle right now — which itself may kill me"

Kafka is one of my favorite authors and The Castle is one of my favorite novels :( But a well-written essay nonetheless ;)

I was just watching this earlier. Had me in tears. http://www.theonion.com/video/pragues-franz-kafka-internatio...

omg! That was hilarious! Thanks for sharing that! :)

>It all comes down to Google’s algorithm (a word I use carelessly, and frequently, but whose meaning is obscure to me, though I feel it is something mathematical)...

It seems to me that people are comfortable not computer technology and indignant when they are forced to be aware of the details. I doubt, for instance, you would ever see this sentence:

"It all comes down to the car's engine (a word I use carelessly, and frequently, but whose meaning is obscure to me, though I feel it is probably part of the car)..."

It would be silly, on its face, to display ignorance of such a common part of the vehicle(s) that you use to get around. However, it's seen as normal to be totally disinterested in the inner workings of computers (despite their ubiquity). I'm sure this was true for cars and other pieces of technology at some point, but I don't know how long ago that was. I also don't know what caused it to change.

I feel like, if people were a little more aware of what happens "under the hood" of computing products, they would be more interested in the areas of concern for the tech community (what does company X do with my information? How well is my data protected? What limitations are built into this device?).

She is complaining about how her "true" life doesn't reflect the facts that Google says about her. This was a problem before the internet, before computers, before TV even.

We have all been subjected to our public persona in high school or in our home town. It is not accurate. Sometimes, it is the complete opposite of the truth.

So, I don't know why this author thinks it should be different with Google. Google gets a lot of things right, it gets a lot of things wrong as well. I don't worry about what a search engine says about me.

I know what I am. I am not what Google says and I am not what my high school yearbook says either.

It's interesting to think about the consequences of dissemination of wrong information about yourself online (as a "non-important" person) by a largely well-respected outlet such as Google. As a far-fetched example, it's not uncommon for identity thieves to use a dead person's identity - and if the author applied for a job and the employer Googled her name... you can see where this is heading.

I suspect there might be some liability issues about incorrect information online, perhaps as a result of other cases with more dire consequences than simply not getting a job.

I'll point out that her Freebase entry (sourced by Wikipedia) contained exactly the incorrect data until it was fixed.

See http://www.freebase.com/m/09v5l5c?links&lang=en&hist...

Freebase, owned by Google, is part of its structured data efforts but few people remember this :)

Contributing structured data is awesome!

(former Metaweb/Freebase employee, current Googler)

Is she not aware that anyone can edit Wikipedia?

I believe she's referring to the fact that if you are the owner of the domain with the erroneous information, you can make a request to Google to re-crawl the information using Google's Webmaster Tools. Regular users can't make that request, so under normal circumstances, she would've had to wait until Google gets around to it on their own schedule.

The problem here is that Google apparently isn't aware, not her.

It's a social convention that you don't edit your own Wikipedia page, too.

Check the history: She edited her own page many times.

I wonder if this would suit the criterion for defamation. That would be a really interesting case to see.

To get damages she would need to demonstrate harm. But if she could, this becomes interesting, as Google can not hide behind the usual search-engine defense: they are not merely linking to sites here, but making up their own facts. (IANAL.)

I think Google would be fine in this case, if we're assuming American law. Wilentz arguably qualifies as a "public figure" and so Google would have to be proven to show gross negligence and/or malicious intent in getting the dates wrong. Assuming the googlebot did as it normally does, Google has an even stronger case that they aren't intentionally being bad actors, as the googlebot gets so many other things correct with its algorithm.

No, I didn't read blogs from dead people before, but I have read articles with a larger font size.

Well identity is still a problem. Hilary Mason had a similar problem on Bing a couples of months ago. http://www.hilarymason.com/blog/im-a-dead-celebrity/

Old, and I don't particularly agree with it, but: http://www.theregister.co.uk/2007/12/14/googlepedia_announce...

Handling death in databases of living people is a big problem. You know it's going to happen to everyone you're tracking, sooner or later. But they never seem to update their info when it does.

I wonder if natural language parsing will pick up this article and soon her profile will read that she was killed by Google. RIP Amy Wilentz, you will be missed.

With all the real world relation databases, they still parse wikipedia, badly.

That's our real world object relation databases status, unfortunately.

It is refreshing to read the perspective of how a non-techie views Google and tech. It's a viewpoint that's important to remember.

Funny, slightly off topic, but still relevant fact: When I google my name, the first page is about a sex offender living in Texas.

please increase the font size on your site. a lovely read for it's content, but painful process of consuming it.

:) hahaha every one makes mistakes .. even google

What a bad article...

The author talks about something (google indexing) she admits she doesn't understand.

She doesn't understand that by editing her Wikipedia article, the google crawler would update her infobox data pretty quickly.

Her birth data was added to the wikipedia entry today and the google infobox is fixed. It took less than 2 hours. Not something worth whining about.

Yes, the fact that a reasonably well-respected writer and English professor doesn't understand the inner workings of Google's infobox means that her blog post is terrible.

This arrogant and dismissive response highlights the problem even better than her mild and humorous complaint (which, for the record, I didn't see as whining).

Perhaps she did understand that editing her Wikipedia page would correct the problem but also understood that Wikipedia's policies frown on editing one's own page, even to correct factual errors such as birth and death dates. Or perhaps she was entirely ignorant that she could even edit Wikipedia. Or perhaps she knew but didn't care and only wanted to write a humorous and potentially thought-provoking blog post.

Why are you so quick to defend an algorithm which produced a wrong answer and detract a reasonable and intelligent human being?

Well, the Wikipedia article's history shows that she edited her own article quite a few times, so that doesn't seem to be the problem.

I don't see how the error detracted her in any way, and, as I highlighted it, it was a quick 2-minute fix.

I also don't think the Google algorithm is to blame. If her Wikipedia entry had followed the style guidelines ( http://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Biogr... ), her birth date would have been parsed correctly.

For the record, I didn't find her post funny or even particularly well-written.

I guess if there's something that can be taken from this article it's "Program or be Programmed". The author didn't understand the inner workings of the Google Factbox data, so she assumed computers control her identity and her online information. However, with a little more computer knowledge, you can figure out how to control this data yourself.

Humans control the computers; it's not the other way around.

The Google algorithm is to blame. This was probably the version, where the false data came from: https://en.wikipedia.org/w/index.php?title=Amy_Wilentz&d.... There is no big mistake in the style, it's just that the birthday is missing.

The Google algorithm took the results from the middle of the text, even though the birthdates are always right behind the name. I guess they did something like take the first dates instead of just focusing on the part behind the name. This way the algorithm was more flexible but on the other hand as we see more likely to make mistakes.

"Wikipedia's policies frown on editing one's own page, even to correct factual errors"

Indeed, Wikipedia editors have informed notable people that they are not authoritative sources for information about themselves and should not correct mistakes on the site about their own lives.

I don't understand how the engine in my car works.

I might complain to my mechanic that it won't start. To him, diagnosing a bad spark plug and swapping in a new one is a simple matter of five minutes, but to me it might as well be heart surgery. That doesn't mean I was wrong to complain.

She knew something was wrong, but admittedly didn't understand all the details. She complained to bring it to the internet's attention, after which it was swiftly fixed. This doesn't mean she's done anything foolish or wrong - she was simply pointing out a problem.

I don't think she is really whining; she's just using dry humor.

And she is raising an important point, and one that keeps recurring with Google: what do you do when their automated processes are not enough?

She should not have to edit Wikipedia to resolve this situation. That is not a reasonable expectation of anyone.

In fact, she chose the only solution that usually gets through to Google: blog/tweet about something and get enough exposure.

Actually it's a great article for exactly the reason you give for it being a bad article.

She has no idea how to tell Google to stop saying to the world she is dead. There are "this is wrong" buttons that she clicks that don't do anything, apparently.

This is actually the point of her article. She doesn’t know how to fix the problem and Google (the robot) isn't providing tools to fix it.

Yes, you and I know that if she goes to a high traffic site such as wikipedia, google will scan that very quickly. But that's a hack, albeit a very simple one, but still a hack requiring a lot of knowledge of how Google, and the web, works.

No, this is what a normal person would do because they don't understand google indexing.

Normal people look at the piece of the world that is wrong. If given the opportunity, they click on that same piece of world (the "infobox" in this case) to fix it. Google doesn't provide an indication in the infobox where they are getting their "facts". Other than talking to people (something not available to everyone), how would a person know what to do to change this?

Second, it was my understanding that editing your own wikipedia entry is not acceptable.

> What a bad article...

But a good reminder of just how little understanding most people have of The Internet.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact