This knowledge graph is probably the largest Bayesian network out there
spammers will be populating the web with "facts" that suit themselves.
One of the ideas that came up in that book was the Reticulum (Internet) was populated by "botnet ecologies" that subtly manipulated facts, streams and the like such that filtering this out became another industry (of course).
I've seen the idea that this lies in our future raised here and it seems to get mocked. I think the idea has a lot of merit.
I really hope Google does not use Gmail data for projects other than ads. They really needs to ask users to opt-in to this kind of data sharing. I'm ok with gmail being read for ads, but almost anything else is unethical, especially some experimental knowledge base.
It's already used by the Google Now cards on Android, and it's a fantastic feature. If I book a flight, I automatically get a card that reminds me to leave for the airport at the correct time (taking traffic into account), without any interaction on my part.
Luckily the guy who said that is from Télécom ParisTech, i.e. he was completely speculating.
Public posts from google+ and youtube are fine, though.
One of the best discussions bar none of this issue I've seen.
There are very good reasons why we, as a society, have agreed to disallow many activities that are physically possible. There's a good case to be made that such a rule should be explicitly added where organizations are entrusted with private data.
But that's not the world we currently live in.
The only thing that sets a limit on what google can do with your data is the amount of data you give them. They also have terms of service and privacy policies but these can change over time and/or be re-interpreted in creative new ways to enable whatever it is they want to do next.
However, there is a normative side to the debate as well. This is what I (and you in your first line) explicitly referred to. This side is about asking what state of the world is desirable. It is perfectly legitimate and good to ask this question, so that we might hopefully act upon the answer once it has been found. That is how progress is made in the world.
The perfect example to illustrate this is actually what waterlesscloud wrote downthread:
> If I leave some loose hairs on an airline seat, does the airline now own my dna?
Do you own your DNA? What the hell would that even mean?
Yes. Intellectual property, clean and simple. If someone can make a buck off my DNA, then I get my cut. Prevents exploitation such as this:
"Neither Lacks nor her family gave her physician permission to harvest the cells. At that time, permission was neither required nor customarily sought. The cells were later commercialized. In the 1980s, family medical records were published without family consent. This issue and Mrs. Lacks' situation was brought up in the Supreme Court of California case of Moore v. Regents of the University of California. On July 9, 1990, the court ruled that a person's discarded tissue and cells are not their property and can be commercialized."
Also you call developing a vaccine to cure Polio an exploitation? As far as I can tell from cursory reading of that article, this "exploitation" was hugely beneficial to society.
Which explains why so many people were reluctant to acknowledge the source.
If you don't like that reality then don't use their service. It really is that simple.
I don't like the reality of the US War Machine killing innocents simply to enrich crony war profiteers. By your reasoning, I should stop paying taxes too.
You can. Depending on how principled you are about thing like this. You'd still need to give up your American citizenship otherwise it doesn't matter where you live on the planet.
You're basically saying I shouldn't expect any sort of fair treatment or rights from any service provider on the Internet. I don't want to play on your Internet.
It's modestly better about this than many other SaaS / PaaS providers, but not by much.
I'm having a conversation at this moment with the chief architect of G+ over the G+/YouTube Anschluss in which the two services were integrated. I had separate accounts on each prior to this, repeatedly refused to combine accounts, and yet found them combined as of last November.
Worse: individual users have little or no recourse against such actions.
As for Gmail, as has been pointed out, parties not using Google directly have their private correspondence entered into Google's systems. And not just when emailing Gmail addresses, but many domains for which email is handled via Gmail.
Similar arguments could be made for many other online service providers as well. I don't consider Google to be significantly different from many of these, either for better or worse. But they're certainly a massive and major part of the problem, particularly for their size and scope.
Bruce Schneier and Eben Moglen have made this point quite well, particularly in their December, 2013 Columbia Law School talk, and Schneier's April, 2014, Stanford Law School lecture.
Maciej Cegłowski, "The Internet with a Human Face", makes the case far better yet.
The funny thing is Doctorow makes references to "just metadata" years before it became a public issue, however this goes beyond metadata, and will eventually contain facts about people, not just tangential stuff.
"This isn't P.I.I."—Personally Identifying Information, the toxic smog of the information age—"It's just metadata. So it's only slightly evil."
"Joe goes to the gym three times a week" is a fact.
"Joe's network activity originates from a gym on the following schedule" is not only at least an equivalent fact, in practice it's far superior to the simple case. It can give you subtleties , it's less susceptible to subterfuge , it gives you actionable evidence of specific occurrences, etc.
Consider the CIA doesn't use meta-data to target hellfire missiles because it's less identifying than actual data. They use it because it's far better.
 Joe never goes to the gym on Saturday. Joe goes to the gym more during the spring than the winter. Joe almost never misses a day when Sally is at the gym. Joe and Sally nearly always leave at the same time.
 It's trivial for someone to say they go to the gym on a schedule they don't. It's not even too difficult to get a second or third party to fudge, embellish or outright lie on their behalf. It's much more difficult to get a second or third party to help you make your device convincingly take the claimed routine, without you creating any conflicting meta-data that gives up the ruse.
Thus the only way we can keep privacy would be to roll back the last 50 years of technological progress, and that's why I'm starting to entertain a thought that we (as a society) should drop the concept entirely and tackle the change head-on, instead of being dragged there by force by the ongoing progress of technology.
Unfortunately, I cannot do a more detailed Q&A here, but
if you want more details, please read the original paper here:
http://www.cs.cmu.edu/~nlao/publication/2014.kdd.pdf. (Note that an earlier version of the work was presented at a CIKM workshop in Oct 2013 (see http://www.akbc.ws/2013/ and http://cikm2013.org/industry.php#kevin). We have also published tons of great related research at http://research.google.com/pubs/papers.html
Hello Hari Seldon, psychohistory and mathematical sociology!
There are bots  making Wikipedia contributions, Google could also make automated contributions to Wikipedia/Wikidata.
I don't believe this is what the article is talking about (knowledge vault) though. This is just the human and lightly machine curated graph (knowledge graph).
That Google is engaging in this behavior is indeed speculation, as far as I know. However, Google employees/allies have to realize that attempts to suppress debate on this issue can only backfire on them. Indeed, the fact that they don't have explicit policy on this (correct me if I'm wrong) is one of the reasons researchers are speculating.
It may well be that most people would agree with and/or permit Google to use their data in this way, but people should be given the opportunity to debate it in a reasonable fashion, else it looks like it was forced down their throats. And that's no good for anyone.
Ugh... that's a bit much... because now any employee at google could potentially get access to random facts about me gleaned from my personal and business emails? Good luck keeping different levels of confidential information segregated correctly. That's awesome.
Collecting anonymous statistics about its users does not include automatically generating a database indexed by individual based on their private data. One is par for course when selling bundles of users according to demographic to advertisers while the other is fucking crazy.
Mining public web data for building a database like that is one thing, but mining individual private data like this is crossing a line.
Most of the ideas produced by Socrates / Plato / Aristotle were in fact wrong. They are not a good primer on epistemology, concepts, percepts, metaphysics or anything else. They're a good primer on the history of philosophy.
They inspired incredible progress on thinking and understanding, but they were wrong more often than they were right, and are a poor reference to understanding what knowledge is.
"Knowledge is the grasp of the facts of reality." is was Socrates in an essence said about knowledge.
Then you say Socrates was wrong.
Now try scraping Google and see what they do to you.
If you provide value to Google they will make an API to allow accessing that data easier.
By scraping do you mean scraping their search results? They offer this, which is nice: https://developers.google.com/custom-search/
Many large sites don't allow scraping because of unnecessary server load (denial of service sometimes) so they'll offer an API where you can download content in a controlled (and monitorable) manner.
If one sends an email to an GMail/Outlook.com/Yahoo email address, one should be able to opt-out of their email crawler, advertisement analysis, artificial intelligence analysis, etc.
I don't think they do too much storing of email details, they know that it's a sensitive area and that an employee will eventually blow the whistle and it will hurt user trust, which is a big part of their business.
1) Alice sends an email to Bob (GMail user).
2) Bob receives the email, meanwhile Google scrapes the content and extracts its meaning to show Bob some ads and use the facts to improve Google's A.I.
Alice wants a way to mark her email content as "no index".
So that email service provider don't crawl through the content. Exactly like the robots.txt for domains or the "noindex" metatag in HTML head element!
It's about the email service provider, that should stop analyze the email text to extract its meaning. Gmail uses it to display ads to Bob, builds a shadow profile for Alice (like Facebook) and trains an artificial intelligence (see headline link).
If an unknown person tries to scrape, he/she will promptly get banned by those very same people (Google wouldn't like someone scraping their stuff either).
Different players different rules, I guess.
For example, if you went one by one through Stack Overflow and sucked out every question and answer, your scraper bot would get banned (unless you're doing one request per minute, in which case you'll never finish).
Or if you tried to scrape Twitter.