This just weird Twitter rage. The user seems to think the private information in their tax return will leak, just because Gemini tried to summarize it.
To back this up, they reference the story about Gemini chats "leaking" on the public internet. What actually happened there was that people were creating publicly viewable chats via gemini.google.com/share/ links, and some of those got posted on social media and were indexed.
Gemini is "reading" your documents in the same sense that google servers are "reading" your documents in order to show them in your browser. Gemini is not a person and won't remember what it has "read".
If you don't want google to handle your documents, then don't put your documents in Google docs.
I get what you’re saying. But at a fundamental level there’s a difference between “using” content to do the thing I’m asking you to do (display it in a browser) and running it through an AI to do processing on that content.
I personally think the two situations are quite different.
I agree that if you don’t want Google to sniff on your content you shouldn’t put it on their servers to begin with.
That said, stating that Gemini won’t remember, is dubious. Because given the track record of these companies I have my doubts that they don’t log everything they can put their hands on.
Google Docs does a lot of algorithms over the data you put in. For instance, it paginate them and show a page count. This is an algorithm processing your data exactly like Gemini does. There is no option in Google Docs to avoid the pagination algorithm from reading my data and processing it.
Another example: Google Docs indexes the contents of your document. That is, it stores all the words in a big database that you don't see and don't have access to, so that you can search for "tax" in the Google Docs search bar and bring up all documents that contain the word "tax". There is no option in Google Docs to avoid indexing the contents of a document for the purpose of searching for it.
When you decide to put your data into Google Docs, you are OK with Google processing your data in several ways (that should hopefully be documented). The fact that you seem so upset that a specific algorithm is processing your data just because it has the "AI" buzzword attached to it, seems like an overreaction prompted by the general panic we're living in.
I agree Google should be clear (and it is clear) whether Gemini is being trained on your data or not, because that is something that can have side effects that you have the right to be informed about. But Gemini just processing your data to provide feature N+1 among the other 2 billions available, it's really not something noteworthy.
> For instance, it paginate them and show a page count.
Do you think this information google is gathering can then be used in the future to paginate some other document? Do you think paginating my doc will help their algorithm to better paginate documents in the future? I see what you're trying to say but putting everything in the "algorithm" bucket doesn't help moving the whole conversation around AI forward.
> The fact that you seem so upset
Your upset detector is clearly wrong. I don't use google docs. I don't care about google docs. I'm just adding my 2c to a conversation around this type of practices google and co are using.
Isn't this why we're here on HN? To exchange ideas?
Google is pretty good at separating inference from training. If they wish to train on your data they do that by just training on your data, them running the model on that data to give you info is totally separate.
“Google collects your Gemini Apps conversations, related product usage information, info about your location, and your feedback. Google uses this data, consistent with our Privacy Policy, to provide, improve, and develop Google products and services and machine-learning technologies, including Google’s enterprise products such as Google Cloud.”
“To help with quality and improve our products (such as generative machine-learning models that power Gemini Apps), human reviewers read, annotate, and process your Gemini Apps conversations. We take steps to protect your privacy as part of this process. This includes disconnecting your conversations with Gemini Apps from your Google Account before reviewers see or annotate them. Please don’t enter confidential information in your conversations or any data you wouldn’t want a reviewer to see or Google to use to improve our products, services, and machine-learning technologies.” [italics was bold in the original]
You can opt out of that. Its explained right after what you have quoted.
> To stop future conversations from being reviewed or used to improve Google machine-learning technologies, turn off Gemini Apps activity. You can review your prompts or delete your conversations from your Gemini Apps activity at myactivity.google.com/product/gemini.
Right now it’s an AI but Google has been “reading” your docs since they invented Google Search, Gmail and Google Docs are just extensions. It’s literally in their business model, collect all your info to show you relevant ads.
Oh, I know. I'm aware of that. Which is why I don't use a free gmail account, I don't use google docs, and I run as much ad blocking as possible both at the browser and at the network level. And that's also why i'm sure google is collecting as much data as possible when people use gemini. Because why wouldn't they? It's their entire business model: collect data and then sell ads based on that.
> But at a fundamental level there’s a difference between “using” content to do the thing I’m asking you to do (display it in a browser) and running it through an AI to do processing on that content.
The thing you're asking it to do is to maintain the state of the document in a service. This way it's able to display the state of the document to one or more clients.
Arguably running through AI is even a more privacy-preserving feature, because Google Docs most certainly stores the data to a persistent storage (it is its main feature and cannot work without it), whereas AIs just needs to run the data in RAM.
Just so we're clear, I'm personally not using G Docs nor Gemini nor any other AI tool at the moment. My issue—if you can call it that way—is more fundamental and that is related to intentions.
I think using some data for something that's fundamentally different than its original use case is problematic in my view.
You gave them your consent when you opened the document in Google Docs.
And it is not like Gemini is the first NLP model to be reading your google docs. Google have had one of the most advanced spell/grammar checkers reading through your docs for years now.
Title is sensationalized. According to the thread, Gemini is enabled only after opening Gemini. Ingestion is enabled until Gemini is closed.
> OK, more testing and I think I've figured it out (and it's still bad!). It seems that if you've ever clicked the Gemini button for a type of document then it remains open whenever you open another of that type--and therefore automatically ingests and summarizes it. So, e.g...
What is the problem here? Google already has your document, all they are doing with Gemini is running it through a bunch of transformers and spitting back some information. The act of sending input into an LLM does not itself train the LLM on that data (these systems don't "learn" via execution of normal input/output).
Is the concern that Google would consider your use of Gemini on this document as consent to use it for future training?
1. Some people (e.g. some artists) don't like generative AI as a matter of principle, seeing it as soulless, corporate, and entirely trained on stolen data.
2. Many people resent Clippy-style popup features. They appear at the most inconvenient times, and everyone knows they're mostly for the benefit of the product manager with a user count KPI. And the harder they get pushed, the more people resent them.
3. The distinction between things-known-from-training-data, things-known-from-context and things-known-from-RAG and so on is pretty opaque to most users - and not clearly guaranteed by the documentation. If it's an assistant that can schedule reminders, can find things in your e-mails and google drive, and it promises "Personal Results" where "your communication requests will be used to improve your experience with Gemini" the distinctions are pretty ambiguous.
4. The LLM industry norm is to play fast and loose with training data.
Unless explicitly promised otherwise, the assumption is always that anything being passed to a LLM API may be retained for various reasons. That’s the expectation that’s been set by the industry.
Agreed, but as I understand it they do make that promise:
"We do not use your Workspace data to train or improve the underlying generative AI and large language models that power Gemini, Search, and other systems outside of Workspace without permission."
Edit: I believe that this "without permission" caveat refers to experimental things like Workspace Labs where you have to give them that permission if you decide to join.
Where can I see the value of the "user has granted permission" boolean relevant to my account, clearly and unambiguously? I want to be sure that I haven't accidentally given permission via an action that I may have not understood.
I'm not aware of any such on/off switch or indicator. You could check whether you can turn off Workspace Labs. If you can't turn it off then you haven't joined:
If this is the case, then the phrasing "without permission" is meaningless and so is the clause attached to it. We have to assume that Google thinks we've "given permission".
Sure. That's also the assumption when you upload your private information to someone else's computer. Except for the "unless explicitly promised otherwise" part.
Why would that be limited to passing it to an LLM and not just passing it to any server? If Google wanted to use your data for training, they would just... do that? No Gemini needed
You should have the control of your information per use case / purpose. The ad networks are also Google's property - would you be fine with the Adsense looking at all your private documents? GDPR got that right - you consent to a specific usage, not to a free-for-all.
I don’t use Google to create personal documents precisely because I can’t trust or verify that their apps aren’t doing more than provide storage, versioning, conflict resolution, and an editing UI without my permission.
That said, I think it’s way past time to give up on hoping to trust a cloud / “big tech” provider.
There are now only three models for applications I’m happy to use, and I try to do everything in one of those three where possible:
1) Verifiably end-to-end encrypted. This is the only option way I can get comfortable with cloud software.
2) Self hosted applications.
3) Local first applications with “dumb filestore” data syncing via a provider of my choice (likely self hosted).
In all cases I have a strong preference for truly open source (avoiding VC funded open core weirdness etc.) and try to donate more than the equivalent subscription or purchase price for proprietary alternatives to the projects I find that meet my needs.
Possession of a file doesn’t imply that Google was reading that file, and absent a very limited set of criminal actions for which a search warrant might be issued there’s always been a reasonable expectation that Google was not reading your files… now Google is reading that file, apparently by default.
Not sure what you mean by "reading". Google has always been indexing all personal content to power the search feature (not public Google search). It's a major reason why I use their services.
Let’s take “reading” to mean “learning the semantic meaning of” then… an index is really just a data structure populated with the words and link back to the document, but no developed notion of the semantic meaning of those words or a semantic meaning like “odds this is the user’s SSN” associated with the content of the document, and it could easily be held entirely within your user account, entirely accessible only to you. This is much closer to a Google employee opening your document and reading it and emailing you back a summary as a default action, even if I’ll admit that employee is not generally intelligent. If the technical details make it so the “semantic index” is held only within your user account and only ever accessible by you and cannot under any circumstances leak outside to Google at large, then okay I might be comfortable with that, but I’d want to know those details first rather than just having this “feature” enabled on a file type.
"Gemini is your always-on AI assistant across Google Workspace"
So user buys a product without even reading the product page, then complains about product? m'kay
NO! Don't put your private documents on someone else's computer, that's easy.
But if you do, don't put it on the hard drive of a company that makes money from your personal data for advertising, it sounds almost childish to explain.
It's not clear who you're replying to, but this isn't a conversation with a person. What's being discussed - away from the person who published this - are the truth claims in the post.
To back this up, they reference the story about Gemini chats "leaking" on the public internet. What actually happened there was that people were creating publicly viewable chats via gemini.google.com/share/ links, and some of those got posted on social media and were indexed.