"Unless you turn it off, Google uses your Gmail to train an AI to finish other people's sentences. It does that by analyzing how you respond to its suggestions. And when you opt in to using a new Gmail function called Help Me Write, Google uses what you type into it to improve its AI writing, too. You can't say no."
The mass media and blogs love to present Big Tech's tactics as a fait accompli. Instead they should be making the point that "defaults" are used to deceptively "gain consent". We need legislation to stop this practice.
The paragraph begins "Unless you turn it off,...", then it states "And when you opt in ..." and then it ends with "You can't say no."
Well, which is it. Can they say no by turning it off or not opting in. Or is it impossible to say no.
Of course they can say no. And when they do, it complicates matters for Google. If saying no were useless, then privacy-eroding "defaults" and so-called "dark patterns" would not exist. Why bother tricking people into saying yes or not saying no, if saying no was meaningless and consent is an afterthought. Before you cynically conclude, "there's nothing anyone can do" (watch the replies), ask yourself why dark patterns exist and why Google pays billions to multiple companies to be the "default search engine". Big Tech and smaller so-called "tech" companies put a lot of effort into these tactics. Why. Because they are bored and enjoy manipulating people? No. (Well, maybe. But that's another topic.) It's because people can say no, and when they do, it can have potential repercussions.
Anyway, no problem with the rest of the article. Although it's more about what the companies are doing versus what computer users could be doing, namely, objecting.
Ethical Defaults is something I've been casually supporting for a long time.
I think the statistics around organ donation are what got me started on it.
We really should have legislation mandating minimum consent by default options, and then additional legislation that creates allowances for non-minimum consent on a case-by-case basis. So we can allow, if we want, for organ donation to be consented to by default if the general public feels it is more ethical to default to consent on that option than not. And if harvesting everyone's data for AI is going to be default, then we would need a similar public consensus
The more work a company needs to do to get consent from its users, the less bullshit they are going to create and try to get away with because they'll need to actually convince people to care enough to change their defaults.
It will better align incentives between consumers and companies imo
You've touched on some good points. It makes me wonder: what kind of objection can one raise that would actually make a difference? Send a mean tweet? Call my senator? And tell them to do what?
IMO simple objections will not sway tech platforms from the status quo of free-for-all data hoarding. We must vote with our feet and move away from the data-hoarding incumbents towards user-centric alternatives. Where suitable alternatives don't exist, we'll need to create them.
It’s interesting the contrast between the Zoom story (which got covered on pretty much all editorials) and that of companies that “only” use text as opposed to conversations from video and audio.
As the article points out, because there is no regulation and no clear definition of where the “privacy” line is being crossed - companies will do everything they can to get a competitive edge.
I am also a little baffled with how many editorials have blocked GPTBot but probably couldn’t explain why they did it, because once you hit that publish button - the very next day it’s going to be in a dozen different datasets, not to mention being passed around by data broker’s that rather stay secretive.
All this is setting such an insane precedent for the future of the web and how content will be created, I guess AGI is just that close and it’s going to be that great that it will solve all of our problems.
I've been doing this with bane of the internet "captcha" for ever, where I pass it with incorrect but plausible answers. I'm pretty good at doing this now.. and although it's probably a drop in the ocean, it gives me a warm fuzzy feeling knowing that I will have made their weights ever so slightly more shitty if they try to use my input as training data.
I do the same with the visual captchas. I always click less tiles than there should be or, if something looks like it could be confused with the requested object I will click to have it added to the algorithm. With only two of us doing this, I'm sure the same image is served to others and our attempts to muddy the data is stripped out.
Doomsday scenario: Imagine your bad training results in a self-driving car crashing and someone dying because you deliberately misidentified a fire hydrant with a crosswalk.
Definitely the fault of the person clicking tiles on a webpage and not the company that built, trained, "tested" and deployed the system with that data...
It probably takes way more than a single individual badly training the AI to cause such a problem. That's like saying you yourself are responsible for the same crash because you pay taxes that helped subside the company.
People should stop using the standard versions of their languages and speak using the low class variants in order to protect their privacy until the next LLM version
GPT-4chan (or whatever it was called) was removed from HuggingFace for being too toxic.
This has some interesting implications. The more conspicuously-racist and DIE-noncompliant you appear to be, the more resistant these companies are to including you in their training data.
Hate to break it to you but you are but a miniscule statistical blip in the vast ocean of captcha users. It will not matter one iota what you do.
A more meaningful stance would be to refuse using these tools, and advocating for using other more ethical and privacy friendly alternatives such as hCaptcha or Cloudflare's Turnstile.
The article is rather confusing. A better way to understand what Gmail does is to look at Gmail settings (under General). Here they are:
> [x] Turn on smart features and personalization - Gmail, Chat, and Meet may use my email, chat, and video content to personalize my experience and provide smart features. If I opt out, such features will be turned off.
> [x] Turn on smart features and personalization in other Google products - Google may use my email, chat, and video content to personalize my experience and provide smart features. If I opt out, such features will be turned off.
There are two 'learn more' links that go to the same place:
> The control covers smart features in Gmail, Chat, and Meet that may use your data to improve the models that power smart features, including [list omitted, emphasis added].
> Smart features in other Google products that may use your Gmail, Chat, and Meet data include: [another long list omitted]
This could be a bit clearer about what happens if you have 'said no.' If the reporter had actually gotten someone to clarify that, it would be helpful. As it is, they've added no value over just quoting what it says.
I don't see it as deceptive. More like dumbed down by removing all technical jargon.
How would you describe the issue briefly to someone who doesn't know what machine learning is? Sure, a lot of people know about it now, but I think much of the general public still has only the vaguest idea, and that was much more true a couple years ago.
> It even happened to a tech company. Samsung employees were reportedly using ChatGPT and discovered on three different occasions that the chatbot spit back out company secrets.
This doesn't appear accurate. The article linked in the above paragraph states that there were 3 occurances of Samsung employees giving ChatGPT sensitive data, but does not mention it returning sensitive data.
The paragraph quoted seems to imply some level of fine tuning or persistent memory keeping this information, which I don't believe OpenAI products do?
One can easily get around this by encrypting all their emails on their local devices before sending. I encourage everyone to use up as much of Google's infrastructure as possible with data that is useless to them. Granted, I run my own email server that doesn't train LLMs on the text contents of the email.
It’s a cute idea but practically they can absorb way more people doing this than will actually do it. It’s protects you but it does not hurt google. Storage for emails is cheap.
I mean you're totally right. However, if one is extra bothered by the AI training, it might be easier to adopt the use of GPG than change email providers.
Skimming the article, it sounds like the real issue isn't that Google and Meta are training an AI, but that it's possible for them to accidentally leak sensitive data.
After you do your taxes on paper, do you put them in the postal box for the USPS to pick up? Do you also send other paper mail?
USPS has this great service called Informed Delivery that gives customers scanned images of all our mail. In fact, USPS has been scanning, OCRing and electronically processing mail for a long, long time. I would say that their elaborate surveillance capabilities are on a par with Google's in some respects. They absolutely have social graphs available for anyone who corresponds with anyone else through the mails.
Not to mention the rampant mail theft that's being reported these days; I'd say that email has gained an edge and is safer in most aspects than putting it all on paper, unless you're going to walk it into the IRS on foot.
Yes I send my taxes US mail, certified, at the post office.
The USPS scans envelopes. They don't open them and scan the contents.
Intercepting, opening and reading postal mail that isn't addressed to you is a rather serious federal crime. Email? Google does it every day, all day long.
Even Gmail would be a safe service to use if people just copied/pasted locally encrypted blocks of data into/out of email messages.
I suspect that even if everyone could be convinced that encrypting everything was a good idea, the moment gmail couldn't collect and profit from the contents of people's private messages they'd shut the service down. It exists only to exploit us.
Gmail would have to decrypt the message to show it to the recipient. Once it's in plain text they can do whatever with it.
If the user has to copy/paste a blob from gmail into a file and then run gpg on it, well Microsoft Windows is scanning everything on your hard drive and maybe even in RAM so they'll get the decrypted file.
A handwritten letter is probably the most private way to communicate with another person who you can't talk to directly.
Its easy to do, in theory. Gmail worked on a basis of exclusive invites.
Hype a short name like "hnmail" with a fancy UI. Setup an invite system and you could be the next email provider.
Just as Canonical hyped with free Ubuntu cd's.
I've stopped using gmail ever since they disabled my account wolfcub@gmail for no reason. never will and won't tell me why. Apparently it's "inclusive", whatever that means.
Way to disgruntle a 17 year old me, so I've been hosting my own ever since.
Gmail is not just email. It is all the space they give you so you don't have to worry about emptying inboxes. It is the internal search so you can find an email from 5 years ago. It is labels. It is a mobile app that works smoothly, without any surprises. It is being able to give my parents that are not tech savy, something that works for them, without needing me to help them. It is even being able to set up the google inactive account manager to DO give access to my brother in case something happens to me, so my family can respecfully access my memories when I'm gone.
Cool story, bro. Do you also avoid emailing people with Gmail addresses, or posting on mailing lists with Gmail subscribers, or emailing business with a Google Workspace service?
That's not true. You can't publish a photo with me in it and make money without a release form. Training on a photo I took of someone else is definitely a gray area.
That’s not totally true either. First, it varies quite a bit on jurisdiction (but let’s say U.S.). Second, the lack of model release when it would be required does not mean the model suddenly owns the photograph, it rather just restricts how the owner may use it. Thirdly, in many U.S. states, a model release may only be needed when the photograph is used in promotion of a product, or if it was not taken in public.
LexisNexis makes money taking pictures of me in my car, tagging location data to it and selling that data back to other businesses like Lawyers and Insurance companies. I never signed a release.
If they are training on inbound e-mails from senders without Google accounts residing in or present at the time of sending in two-party consent states, they are likely in violation of the telephone call recording laws of those states.
In fact I doubt there's fully relevant case law, as I think the case would be that the trained model is the recording device, and it could be demonstrated that verbatim strings from presumed confidential communications are regurgitated by the model when appropriately prompted.
People have a hard enough time not-posting this repetitive thread invariant so you can imagine how realistic 'stop using the email service you are using' is.
The most frustrating for me is social media apps constantly asking you to share your contacts with them. They get a curated database of every name, number, email address, street address, and photo from (I would guess) most users, and that data isn't even yours to deny access to.
Sure, it's all publicly available info, but I don't want services I haven't signed up for having my info without my consent. I don't like that my friends and family can just give them access to all of that data without me being involved in any way.
I share your frustration. However, in my case, that kind of information is not on any publicly accessible database. Most friends, family and colleagues wouldn’t even know my address – some might know how to get to me as (house with a yellow door in the middle of the second street to the left, after the church …).
Still, it annoys me that most people probably have my personal email address, phone number and real name tied together as a contact and provide this information to at least one online platform. Back when I used to use Google and Android, I would try to preserve my contacts’ privacy by storing their names using some mixture of first name, nicknames, initial for surname and context, e.g., “Alan F”, “Fid”¹, “Alan (football)“, “John (work)“. I’d also keep their number and email address as separate contacts — though that might only have worked in the early to mid 2000s. At some point, Google started getting too clever at determining which contacts could be “merged”.
On the other hand, apps like WhatsApp won’t work at all without access to the phone user’s contacts list, so asking for permission is a mere formality and Meta gets your information regardless.
> On the other hand, apps like WhatsApp won’t work at all without access to the phone user’s contacts list,
I don't think that's true. Users could be allowed to enter addresses individuality or ideally, when apps ask for permissions to a person's contacts phones could allow users to select what the app can and cannot see (only certain contacts, or phone numbers but not email addresses, etc)
There are ways phones and apps could handle contact data while preserving privacy, but nobody is interested in helping people keep their data private. Phones are designed to leak your data like a sieve and apps are designed to collect every scrap of data they can get their hands on.
I was speaking about how WhatsApp currently works. Not how WhatsApp could potentially work (and the functionality you suggest would almost certainly never be implemented unless Meta were compelled by law to do so).
completely unrelated to what’s being discussed. If I don’t want my fairly anonymous HN posts to be scraped I can avoid posting on HN.
I cannot avoid every contact I have not using these services, unless I have no contacts. If whatever point you’re making is “who cares you can’t avoid it anyway,” that’s not only intellectually very lazy, it’s untrue - Lots of countries have regulated their way around these issues. The fact that one of the biggest producers of tech in the world (US) has this space fairly unregulated is not some excuse to capitulate to things that are fairly easy to regulate sensibly, if there is political will and knowledge. With uninformed takes like the parent I’m replying to still floating around out there, I guess it really is inevitable and unavoidable.
Even if you stop using Gmail, chances are that the other party is using Gmail. (Today even e-mail addresses with non-Gmail domains are often using Gmail behind their custom domain.) So, your emails go to train AI for Google even if you deliberately stopped using their service.
You're not wrong, but it certainly doesn't hurt to to use a different email service. I use Protonmail as my main email account, and I'm sure the HN crowd knows that there many other good email services available these days. If you think the general population should change their behavior, then it has to start somewhere, y'know?
This actually differs from one jurisdiction to another, that is, some jurisdictions do not permit publication of correspondence without the consent of the sender. Use of your email for AI training therefore may therefore be open to legal challenge. You know how legal challenges start? By someone feeling that there is a problem.
In any event, as the other poster mentioned, your original post claimed “You can…” and now you are moving to “You can’t” out of an apparent relish for being contrarian. This is not good-faith discussion on your part.
The article says "Your Gmail and Instagram are training AI.", emphasis on the "your". Of course I can't do anything about someone else distributing data I gave them.
>It’s your data.
>As soon as you decide to upload it somewhere else, it's not.
That may be how everyone is treating it but that isn’t the only or even the obvious way for it to be. Mailing something doesn’t give the mailing service a right to open and scan the contents of your letter, even if it could do so without damaging anything. Parking your car with a valet service does not grant the service the right to drive your car to make deliveries while you’re not using it. Sending photo film to be developed doesn’t give the developing service a right to make their own copies of it. And so on.
It’s not unreasonable for a user to think of their emailing something as just granting the mail service the minimal privileges necessary to transmit and deliver the message to the explicitly intended recipient.
Why would I want to do something about it? It’s a good use of data that they have been very upfront about collecting. Aren’t machines doing things for me a good thing?
Sort of related is the antitrust trial against Google just started. And, yes this is the way to stop this bs because of Google is (correctly) called a monopoly and (hopefully) broken up, then AdSense, Gmail, YouTube, Search and all the rest become separate entities and cannot easily share data under one umbrella. Probably also breaks the creepy stalker advertising model too