I believe this has been fixed for a while now. It was fixed by the time I reported this over a week ago, for example.
If you see an error in a Knowledge Panel (the box on the right-hand side), look for the "Feedback / More info" link at the bottom of the box and then you can click next to a fact to report a problem. People do review those reports, and that's the fastest way to report an issue.
Apologies. I posted this because it came up recently in a media blog [insert joke about media being slow].
So it was a manual fix that corrected the issue? I guess that makes since given that her Wikipedia entry didn't have the standardized birthdate listing (at least in the first line) until today (http://en.wikipedia.org/w/index.php?title=Amy_Wilentz&ol...). Would that fix alone been enough to fix her snippet?
I'm not close enough to know what fixed it, but I agree that the non-standardized birthdate probably didn't help.
As Amy noted in her piece: "My Wikipedia entry, oddly, was put up by Cousin Joel, who has a genealogy obsession .... Joel began the entry with my connection to my father, and immediately mentioned my father’s birthdate and the date of his death."
This seems to be Google SOP for big data sets like this, and for the most part it works okay. The other obvious similarity is Maps. Pull what data you can from easily available sources, and then crowdsource the details or inaccuracies.
This is what happens when the world is ruled by a robot
I take it OP hasn't had the delightful pleasure of dealing with old-fashioned human bureaucracy. My uncle tells some great stories about his time in East Berlin, before the Wall fell.
A great example is the post office that did some renovations, and so delivered the old copper and other materials for reutilization. From then on, the System cleverly decided the yearly productivity goals should include delivering the same amount of copper, each year.
The post office master had to figure out how to deliver the copper for a couple of years, in order to avoid failure in meeting the obviously correct goals, until he finally
managed to change them.
So, yeah, at least Google's algorithms have the excuse of not being sentient.
I can't seem to find a good canonical reference at the moment, but there's some interesting writing in organizational theory about the extent to which some of those kinds of bureaucracies are themselves basically mechanical systems, controlled by code that just happens to be manually executed by human operators. In general, code executed by computers tends to be executed more "rigidly", but in some systems humans can also be quite automaton-like in just executing instructions. In those cases it's not clear the human operators, rather than the algorithms they're executing, are in control of the decisions (it may not even be true that any one human knows what algorithm is being executed). Of course, you could argue someone can always change the bureaucratic protocol, but that's even true with computer code: someone at Google can always edit the code, at some times more easily than others.
I had this exact thought while walking to work this morning.
It came to me that the vast majority of modern society is completely independent of the actual people executing it. There are very few jobs that could not be done just as well by another person with the same training.
Most jobs are like coded modules. They have well-defined inputs and outputs - even creative jobs can be defined in terms of "create something people will pay for".
The system is what runs the world, not the individual people who carry it out.
I had an interesting hallway discussion with a SAP integrator who mentioned that SAP sort of starts from this view, but then weirdly feeds back in on itself. One of the ideas of SAP is that business processes do have the kinds of pluggable, formalizable patterns you mention, which can be modeled within SAP. And the official idea is that you go from documenting these processes to formalizing them in code. But then over the years SAP's code starts to become a sort of de-facto framework for "this is how you run a company", and people actually start patterning business processes on how SAP models business processes.
Natural language parsing is not 100% accurate and can make mistakes - breaking news!
All kinds of automated data collection such as this will have errors that need to be fixed manually by a human. This will be taken care of if/when we eventually have real artificial intelligence. The data parsing scripts Google and others use are fairly advanced, but they still break down to a couple rules and finding words in close proximity. There is no real intelligence involved here.
EDIT: The important takeaway here is to make sure we continue to educate people (especially government) as to the issues and unreliability of these systems. Death date showing up on Google isn't a train smash, but if some government agency decides that Google is smarter than them? Then we are in trouble.
> This will be taken care of if/when we eventually have real artificial intelligence
Considering that "non-artificial intelligence" makes the same kind of mistakes, I find that an odd claim.
Mistakes happen. Quite frankly far fewer of them happen in the modern world, due precisely to technology like this. What has changed is actually the converse: automation has become so good that we expect it to be flawless. No one would have found it "surprising" (annoying, sure, unless it was in the context of a zazy madcap comedy film) in 1962 to be mixed up with someone else's biography based on a name confusion.
You're completely correct, great points. It's very easy to fall into the trap of 'human-like intelligence will fix it' when thinking of better machines, but it's fairly clear that human-like intelligence has a lot of failings too.
even if the hole knowledge graph would be filled out by humans, there would still be errors in it, thats just how it works with this sort of big data. No way you could design a computer algorithm which does this job without a single error
AI make more mistakes than humans at some things, but only because, like humans, they are just beginning to understand. I might make similar mistakes reading articles written in German, because I am not yet fully fluent in the language. Similarly, NLP AI is currently not "fluent" in English.
It is interesting that google almost seems to take Wikipedia bio boxes as gospel, which is a questionable decision given that Google "rich snippets" in the search results now have such high placement. I wonder how much Wikipedia specific logic is used in the googlebot...I'm guessing that longevity of article and number of edits is a factor in taking an entry seriously.
> It is interesting that google almost seems to take Wikipedia bio boxes as gospel
FTA, I don't believe the data was taken from a bio box:
> So it’s not too surprising that my original Wikipedia entry, as conceived by Joel, was — let’s be honest — more about my father (a famous New Jersey judge) than about me. Joel began the entry with my connection to my father, and immediately mentioned my father’s birthdate and the date of his death.
That's the scary part, as she alludes to in the next paragraph:
> If your name on Wikipedia is followed by a birth and death date, apparently those belong to you from that day forward, no matter whose dates they may be.
You need to be much more careful in wording your bets.
I think you meant to bet that the simple solution is almost always the correct way to parse the Wikipedia article, but that's not actually what you bet. What you really bet was P(correct_parse) * P(correct_data) + (1 -P(correct_parse) ) * P(wrong_parse_accidentally_giving_correct_data) > 0.9999
I _might_ believe that name followed by birth date and death date are the birth and death dates is the correct parsing of the Wikipedia data over 99.99% of the time. However, I doubt your proposed bet would pay off.
I would still stick by my bet as worded. For every Jimmy Wales, there are ten of thousands of people who have no odd circumstances. Combined with the much higher volume of data about recent and more verifiable people than those from a time where records were fuzzier and the fact that Wikipedia is a moving target regarding accuracy, I don't think 1 in 10,000 is unreasonable. Just the pages of every athlete competing in a major league currently would offset most anomalies.
Scary because it takes a cry to the internet to get it corrected. While the issue of being declared dead is somewhat innocuous, it could've been much worse -- what if her dad was a convicted sexual offender, and Google chose to associate that with her bio? What do you think that'd do for any job prospects, especially if she wasn't aware that it was present? It could be downright devastating.
Since it is not opt-in (nor is it opt-out), YOU have no control of it. As the author stated, she requested many times, over several months, to have it reviewed and it wasn't until she clamored about it publicly that it was addressed. What if Google never did correct it?
I'm reading Jaron Lanier's _You Are Not a Gadget_ now (came out two years ago), and the phrase "but if some government agency decides that Google is smarter than them?" resonates. A major insight of this book is that, in fact, the dominant technological ideology does advance the idea that Google and Wikipedia are smarter than actual people. And we are, in fact, in deep trouble because of this.
Not sure if it was still the case but didn't google uses to list the extraction sources? I guess the work that was done with google squared also fed into these boxes? I get the impression then that they do use more than one source if possible.
It is a very hard one as the vast majority of the time the text features observed would indicate a date of birth/ death. I guess if the date is not in the metadata/ sidebar on Wikipedia they could flag it and hopefully find another source to correlate with.
It reminds me of an anecdote from Steven Levy's "In the Plex" on a sneaker wearing gnome that was inexplicably making a mockery of Froogle results:
> But one problem was so glaring that the team wasn’t comfortable releasing Froogle: when the query “running shoes” was typed in, the top result was a garden gnome sculpture that happened to be wearing sneakers. Every day engineers would try to tweak the algorithm so that it would be able to distinguish between lawn art and footwear, but the gnome kept its top position.
> One day, seemingly miraculously, the gnome disappeared from the results. At a meeting, no one on the team claimed credit. Then an engineer arrived late, holding an elf with running shoes. He had bought the one-of-a kind product from the vendor, and since it was no longer for sale, it was no longer in the index. “The algorithm was now returning the right results,” says a Google engineer. “We didn’t cheat, we didn’t change anything, and we launched.”
Just wait till she finds her insurance company or local government has wrong information and see how long it'll take to get a fix. Years ago, the bank which had my car loan got sold to another bank which got sold to another bank. When I finished paying off the loan, they sent documents I needed to the wrong address. I had moved to a different state and the bank had as well, and it took weeks to correct this. I actually had to contact the DMV at my former state and get them to send a letter to the new bank, which then released records, which then allowed me to update my DMV in in CA. I won't even get into the mistakes that Sallie Mae made on my student loan records, or how many times my health insurance provider screwed up.
Machines screw up, but humans screw up too, and sometimes getting the problem fixed in a bureaucracy takes even more effort. I don't think bureaucracies "refresh" their policies/procedures (algorithms) or data as fast as software companies do.
It's not too surprising that google dropped the ball. This entry that she describes doesn't even have her birth year shown, but instead lists a dozen other years. That's rather odd for a short bio. Maybe Joel was trying to be polite... never ask a lady her age? https://en.wikipedia.org/w/index.php?title=Amy_Wilentz&o...
>An error in a Google search “factbox” can only be corrected when Google re-indexes (whatever that means) the information that will update the search. Depending on the size of your website, re-indexing takes either a couple of days, or several months. Like good guys, small websites finish last. Note: my website is small.
I think she misunderstands here. If the data is scraped from wikipedia, it really has nothing to do with her site, right?
the one that startled me was "can only be corrected when Google re-indexes (whatever that means) the information that will update the search". it would never have struck me that "reindex" was an overly jargonny term, but in retrospect i guess it is.
I was struck by the same thing. The surprising thing to me was that a journalist/writer didn't figure out the nature of the process given that presumably they would understand the idea of recreating the index of a physical book.
“Professor of English at the University of California, Irvine, where she teaches in the Literary Journalism program. Her works appeared in The New York Times, The Los Angeles Times, Time magazine, The New Republic, Mother Jones, Harper’s, Vogue, Condé Nast Traveler, Travel & Leisure, The San Francisco Chronicle, More, The Village Voice, The London Review of Books, Huffington Post. She was Jerusalem correspondent of The New Yorker, and is currently a contributing editor at The Nation.”
>It all comes down to Google’s algorithm (a word I use carelessly, and frequently, but whose meaning is obscure to me, though I feel it is something mathematical)...
It seems to me that people are comfortable not computer technology and indignant when they are forced to be aware of the details. I doubt, for instance, you would ever see this sentence:
"It all comes down to the car's engine (a word I use carelessly, and frequently, but whose meaning is obscure to me, though I feel it is probably part of the car)..."
It would be silly, on its face, to display ignorance of such a common part of the vehicle(s) that you use to get around. However, it's seen as normal to be totally disinterested in the inner workings of computers (despite their ubiquity). I'm sure this was true for cars and other pieces of technology at some point, but I don't know how long ago that was. I also don't know what caused it to change.
I feel like, if people were a little more aware of what happens "under the hood" of computing products, they would be more interested in the areas of concern for the tech community (what does company X do with my information? How well is my data protected? What limitations are built into this device?).
She is complaining about how her "true" life doesn't reflect the facts that Google says about her. This was a problem before the internet, before computers, before TV even.
We have all been subjected to our public persona in high school or in our home town. It is not accurate. Sometimes, it is the complete opposite of the truth.
So, I don't know why this author thinks it should be different with Google. Google gets a lot of things right, it gets a lot of things wrong as well. I don't worry about what a search engine says about me.
I know what I am. I am not what Google says and I am not what my high school yearbook says either.
It's interesting to think about the consequences of dissemination of wrong information about yourself online (as a "non-important" person) by a largely well-respected outlet such as Google. As a far-fetched example, it's not uncommon for identity thieves to use a dead person's identity - and if the author applied for a job and the employer Googled her name... you can see where this is heading.
I suspect there might be some liability issues about incorrect information online, perhaps as a result of other cases with more dire consequences than simply not getting a job.
I believe she's referring to the fact that if you are the owner of the domain with the erroneous information, you can make a request to Google to re-crawl the information using Google's Webmaster Tools. Regular users can't make that request, so under normal circumstances, she would've had to wait until Google gets around to it on their own schedule.
To get damages she would need to demonstrate harm. But if she could, this becomes interesting, as Google can not hide behind the usual search-engine defense: they are not merely linking to sites here, but making up their own facts. (IANAL.)
I think Google would be fine in this case, if we're assuming American law. Wilentz arguably qualifies as a "public figure" and so Google would have to be proven to show gross negligence and/or malicious intent in getting the dates wrong. Assuming the googlebot did as it normally does, Google has an even stronger case that they aren't intentionally being bad actors, as the googlebot gets so many other things correct with its algorithm.
Yes, the fact that a reasonably well-respected writer and English professor doesn't understand the inner workings of Google's infobox means that her blog post is terrible.
This arrogant and dismissive response highlights the problem even better than her mild and humorous complaint (which, for the record, I didn't see as whining).
Perhaps she did understand that editing her Wikipedia page would correct the problem but also understood that Wikipedia's policies frown on editing one's own page, even to correct factual errors such as birth and death dates. Or perhaps she was entirely ignorant that she could even edit Wikipedia. Or perhaps she knew but didn't care and only wanted to write a humorous and potentially thought-provoking blog post.
Why are you so quick to defend an algorithm which produced a wrong answer and detract a reasonable and intelligent human being?
For the record, I didn't find her post funny or even particularly well-written.
I guess if there's something that can be taken from this article it's "Program or be Programmed". The author didn't understand the inner workings of the Google Factbox data, so she assumed computers control her identity and her online information. However, with a little more computer knowledge, you can figure out how to control this data yourself.
Humans control the computers; it's not the other way around.
The Google algorithm took the results from the middle of the text, even though the birthdates are always right behind the name. I guess they did something like take the first dates instead of just focusing on the part behind the name. This way the algorithm was more flexible but on the other hand as we see more likely to make mistakes.
I don't understand how the engine in my car works.
I might complain to my mechanic that it won't start. To him, diagnosing a bad spark plug and swapping in a new one is a simple matter of five minutes, but to me it might as well be heart surgery. That doesn't mean I was wrong to complain.
She knew something was wrong, but admittedly didn't understand all the details. She complained to bring it to the internet's attention, after which it was swiftly fixed. This doesn't mean she's done anything foolish or wrong - she was simply pointing out a problem.
Actually it's a great article for exactly the reason you give for it being a bad article.
She has no idea how to tell Google to stop saying to the world she is dead. There are "this is wrong" buttons that she clicks that don't do anything, apparently.
This is actually the point of her article. She doesn’t know how to fix the problem and Google (the robot) isn't providing tools to fix it.
Yes, you and I know that if she goes to a high traffic site such as wikipedia, google will scan that very quickly. But that's a hack, albeit a very simple one, but still a hack requiring a lot of knowledge of how Google, and the web, works.
No, this is what a normal person would do because they don't understand google indexing.
Normal people look at the piece of the world that is wrong. If given the opportunity, they click on that same piece of world (the "infobox" in this case) to fix it. Google doesn't provide an indication in the infobox where they are getting their "facts". Other than talking to people (something not available to everyone), how would a person know what to do to change this?
Second, it was my understanding that editing your own wikipedia entry is not acceptable.