Hacker News new | past | comments | ask | show | jobs | submit login
Gmail and Instagram are training AI, and there’s little you can do about it (washingtonpost.com)
81 points by bookofjoe on Sept 12, 2023 | hide | past | favorite | 81 comments




"Unless you turn it off, Google uses your Gmail to train an AI to finish other people's sentences. It does that by analyzing how you respond to its suggestions. And when you opt in to using a new Gmail function called Help Me Write, Google uses what you type into it to improve its AI writing, too. You can't say no."

The mass media and blogs love to present Big Tech's tactics as a fait accompli. Instead they should be making the point that "defaults" are used to deceptively "gain consent". We need legislation to stop this practice.

The paragraph begins "Unless you turn it off,...", then it states "And when you opt in ..." and then it ends with "You can't say no."

Well, which is it. Can they say no by turning it off or not opting in. Or is it impossible to say no.

Of course they can say no. And when they do, it complicates matters for Google. If saying no were useless, then privacy-eroding "defaults" and so-called "dark patterns" would not exist. Why bother tricking people into saying yes or not saying no, if saying no was meaningless and consent is an afterthought. Before you cynically conclude, "there's nothing anyone can do" (watch the replies), ask yourself why dark patterns exist and why Google pays billions to multiple companies to be the "default search engine". Big Tech and smaller so-called "tech" companies put a lot of effort into these tactics. Why. Because they are bored and enjoy manipulating people? No. (Well, maybe. But that's another topic.) It's because people can say no, and when they do, it can have potential repercussions.

Anyway, no problem with the rest of the article. Although it's more about what the companies are doing versus what computer users could be doing, namely, objecting.


Ethical Defaults is something I've been casually supporting for a long time.

I think the statistics around organ donation are what got me started on it.

We really should have legislation mandating minimum consent by default options, and then additional legislation that creates allowances for non-minimum consent on a case-by-case basis. So we can allow, if we want, for organ donation to be consented to by default if the general public feels it is more ethical to default to consent on that option than not. And if harvesting everyone's data for AI is going to be default, then we would need a similar public consensus

The more work a company needs to do to get consent from its users, the less bullshit they are going to create and try to get away with because they'll need to actually convince people to care enough to change their defaults.

It will better align incentives between consumers and companies imo


You've touched on some good points. It makes me wonder: what kind of objection can one raise that would actually make a difference? Send a mean tweet? Call my senator? And tell them to do what?

IMO simple objections will not sway tech platforms from the status quo of free-for-all data hoarding. We must vote with our feet and move away from the data-hoarding incumbents towards user-centric alternatives. Where suitable alternatives don't exist, we'll need to create them.


It’s interesting the contrast between the Zoom story (which got covered on pretty much all editorials) and that of companies that “only” use text as opposed to conversations from video and audio.

As the article points out, because there is no regulation and no clear definition of where the “privacy” line is being crossed - companies will do everything they can to get a competitive edge.

I am also a little baffled with how many editorials have blocked GPTBot but probably couldn’t explain why they did it, because once you hit that publish button - the very next day it’s going to be in a dozen different datasets, not to mention being passed around by data broker’s that rather stay secretive.

All this is setting such an insane precedent for the future of the web and how content will be created, I guess AGI is just that close and it’s going to be that great that it will solve all of our problems.


You can do one thing about it: badly train it >:)

I've been doing this with bane of the internet "captcha" for ever, where I pass it with incorrect but plausible answers. I'm pretty good at doing this now.. and although it's probably a drop in the ocean, it gives me a warm fuzzy feeling knowing that I will have made their weights ever so slightly more shitty if they try to use my input as training data.


I do the same with the visual captchas. I always click less tiles than there should be or, if something looks like it could be confused with the requested object I will click to have it added to the algorithm. With only two of us doing this, I'm sure the same image is served to others and our attempts to muddy the data is stripped out.

Maybe tomorrow we'll be counted as three.


That tomorrow is today, and I’m joining your party. Let’s screw their models!


Doomsday scenario: Imagine your bad training results in a self-driving car crashing and someone dying because you deliberately misidentified a fire hydrant with a crosswalk.


Definitely the fault of the person clicking tiles on a webpage and not the company that built, trained, "tested" and deployed the system with that data...


It probably takes way more than a single individual badly training the AI to cause such a problem. That's like saying you yourself are responsible for the same crash because you pay taxes that helped subside the company.


People should stop using the standard versions of their languages and speak using the low class variants in order to protect their privacy until the next LLM version


GPT-4chan (or whatever it was called) was removed from HuggingFace for being too toxic.

This has some interesting implications. The more conspicuously-racist and DIE-noncompliant you appear to be, the more resistant these companies are to including you in their training data.


Hate to break it to you but you are but a miniscule statistical blip in the vast ocean of captcha users. It will not matter one iota what you do.

A more meaningful stance would be to refuse using these tools, and advocating for using other more ethical and privacy friendly alternatives such as hCaptcha or Cloudflare's Turnstile.


Or you have helped them refine their detection of malicious training input. ;) Who's to say?


The article is rather confusing. A better way to understand what Gmail does is to look at Gmail settings (under General). Here they are:

> [x] Turn on smart features and personalization - Gmail, Chat, and Meet may use my email, chat, and video content to personalize my experience and provide smart features. If I opt out, such features will be turned off.

> [x] Turn on smart features and personalization in other Google products - Google may use my email, chat, and video content to personalize my experience and provide smart features. If I opt out, such features will be turned off.

There are two 'learn more' links that go to the same place:

> The control covers smart features in Gmail, Chat, and Meet that may use your data to improve the models that power smart features, including [list omitted, emphasis added].

> Smart features in other Google products that may use your Gmail, Chat, and Meet data include: [another long list omitted]

This could be a bit clearer about what happens if you have 'said no.' If the reporter had actually gotten someone to clarify that, it would be helpful. As it is, they've added no value over just quoting what it says.


On Android, the Learn More links aren't present, so the privacy invasion is not disclosed at all, beyond the deceptive first-page "summary".


I don't see it as deceptive. More like dumbed down by removing all technical jargon.

How would you describe the issue briefly to someone who doesn't know what machine learning is? Sure, a lot of people know about it now, but I think much of the general public still has only the vaguest idea, and that was much more true a couple years ago.


> It even happened to a tech company. Samsung employees were reportedly using ChatGPT and discovered on three different occasions that the chatbot spit back out company secrets.

This doesn't appear accurate. The article linked in the above paragraph states that there were 3 occurances of Samsung employees giving ChatGPT sensitive data, but does not mention it returning sensitive data.

The paragraph quoted seems to imply some level of fine tuning or persistent memory keeping this information, which I don't believe OpenAI products do?


One can easily get around this by encrypting all their emails on their local devices before sending. I encourage everyone to use up as much of Google's infrastructure as possible with data that is useless to them. Granted, I run my own email server that doesn't train LLMs on the text contents of the email.


It’s a cute idea but practically they can absorb way more people doing this than will actually do it. It’s protects you but it does not hurt google. Storage for emails is cheap.


People like you are probably in the <0.01% of all users: Google will probably manage.


I mean you're totally right. However, if one is extra bothered by the AI training, it might be easier to adopt the use of GPG than change email providers.


How do your recipients read those emails?


I assume they're using a standard protocol like S/MIME

https://en.wikipedia.org/wiki/Email_encryption


Skimming the article, it sounds like the real issue isn't that Google and Meta are training an AI, but that it's possible for them to accidentally leak sensitive data.


And it always has been. Unless your data is stored as encrypted on a Internet-connected server, consider it as good as leaked.


Anything you put online is no longer yours. That has been my view since forever. That is why I do my taxes on paper, and don't use social media.

Email is basically public as far as I treat it. I'm fairly careful about what I say in an email.


After you do your taxes on paper, do you put them in the postal box for the USPS to pick up? Do you also send other paper mail?

USPS has this great service called Informed Delivery that gives customers scanned images of all our mail. In fact, USPS has been scanning, OCRing and electronically processing mail for a long, long time. I would say that their elaborate surveillance capabilities are on a par with Google's in some respects. They absolutely have social graphs available for anyone who corresponds with anyone else through the mails.

Not to mention the rampant mail theft that's being reported these days; I'd say that email has gained an edge and is safer in most aspects than putting it all on paper, unless you're going to walk it into the IRS on foot.


Yes I send my taxes US mail, certified, at the post office.

The USPS scans envelopes. They don't open them and scan the contents.

Intercepting, opening and reading postal mail that isn't addressed to you is a rather serious federal crime. Email? Google does it every day, all day long.


Just because you do your taxes on paper and don't use social media, doesn't mean that information isn't online.

You have to control not just your behavior, but everyone else's. It's exhausting and frankly, impossible.


What, more private, communication method, do you use?


End to end encrypted communications are the only private communications. So, GPG, E2E encrypted messengers. Don't store anything in plain text, ever.


Even Gmail would be a safe service to use if people just copied/pasted locally encrypted blocks of data into/out of email messages.

I suspect that even if everyone could be convinced that encrypting everything was a good idea, the moment gmail couldn't collect and profit from the contents of people's private messages they'd shut the service down. It exists only to exploit us.


Gmail would have to decrypt the message to show it to the recipient. Once it's in plain text they can do whatever with it.

If the user has to copy/paste a blob from gmail into a file and then run gpg on it, well Microsoft Windows is scanning everything on your hard drive and maybe even in RAM so they'll get the decrypted file.

A handwritten letter is probably the most private way to communicate with another person who you can't talk to directly.


Is this really news? I'm pretty sure Google and Meta have been doing this for years. You agreed to it when you signed the ToS.


That contract was signed under duress in the form of survival. There was never a choice.


> and there's little you can do about it

I mean, if people as a whole stopped using Gmail, then this would stop being an issue.


Its easy to do, in theory. Gmail worked on a basis of exclusive invites.

Hype a short name like "hnmail" with a fancy UI. Setup an invite system and you could be the next email provider.

Just as Canonical hyped with free Ubuntu cd's.

I've stopped using gmail ever since they disabled my account wolfcub@gmail for no reason. never will and won't tell me why. Apparently it's "inclusive", whatever that means.

Way to disgruntle a 17 year old me, so I've been hosting my own ever since.


Gmail is not just email. It is all the space they give you so you don't have to worry about emptying inboxes. It is the internal search so you can find an email from 5 years ago. It is labels. It is a mobile app that works smoothly, without any surprises. It is being able to give my parents that are not tech savy, something that works for them, without needing me to help them. It is even being able to set up the google inactive account manager to DO give access to my brother in case something happens to me, so my family can respecfully access my memories when I'm gone.

This is not easy.


I disagree. It is easy.

The only thing that stops is that Google has resources to splash on engineers to build such a product. Thats all it is, money.

Everything that google can do, exists in the opensource space.


It is easy. Just use another similar service without a history of AI research or interest in such.


Cool story, bro. Do you also avoid emailing people with Gmail addresses, or posting on mailing lists with Gmail subscribers, or emailing business with a Google Workspace service?


If I can, yeah.


You can stop using them


No you can't. If anyone emails you from Gmail, or you mail anyone at Gmail, then they are still using your data.

If someone takes a picture of you and posts it on Instagram, they are still using your data.

The only solution here are new laws about retaining ownership of your data even if it's been uploaded to a third party.


You don't own pictures of yourself taken by other people.


That's not true. You can't publish a photo with me in it and make money without a release form. Training on a photo I took of someone else is definitely a gray area.

But also, laws can be changed, which is my point.


That’s not totally true either. First, it varies quite a bit on jurisdiction (but let’s say U.S.). Second, the lack of model release when it would be required does not mean the model suddenly owns the photograph, it rather just restricts how the owner may use it. Thirdly, in many U.S. states, a model release may only be needed when the photograph is used in promotion of a product, or if it was not taken in public.

IANAL


LexisNexis makes money taking pictures of me in my car, tagging location data to it and selling that data back to other businesses like Lawyers and Insurance companies. I never signed a release.


I absolutely can do that. Laws vary greatly around the world, and your (I'd wager North American) laws don't apply where I am.


You do in some countries


If they are training on inbound e-mails from senders without Google accounts residing in or present at the time of sending in two-party consent states, they are likely in violation of the telephone call recording laws of those states.


Pretty sure not. I think there is an implied ownership of the email once I send it. Just like if I send you a letter you now own the letter.


Got any case law you can cite or are you just making this up?


I'm not familiar with the case law, but e.g.

https://codes.findlaw.com/ca/penal-code/pen-sect-632.html

In fact I doubt there's fully relevant case law, as I think the case would be that the trained model is the recording device, and it could be demonstrated that verbatim strings from presumed confidential communications are regurgitated by the model when appropriately prompted.


Google also reads your email for spam filtering purposes, including training their filters which has a financial benefit to themselves as well.

Wouldn't this also be a violation of the same Two Party Consent law that you're trying to apply here?


People have a hard enough time not-posting this repetitive thread invariant so you can imagine how realistic 'stop using the email service you are using' is.


>Your Gmail and Instagram are training AI. There’s little you can do about it.

You can stop using those services.

>It’s your data.

As soon as you decide to upload it somewhere else, it's not.


People trot this tired "point" out way too much here with too little of the obvious rebuttal -

you do not under any circumstances actually need to use these services for them to use and collect data points on you.


The most frustrating for me is social media apps constantly asking you to share your contacts with them. They get a curated database of every name, number, email address, street address, and photo from (I would guess) most users, and that data isn't even yours to deny access to.

Sure, it's all publicly available info, but I don't want services I haven't signed up for having my info without my consent. I don't like that my friends and family can just give them access to all of that data without me being involved in any way.


I share your frustration. However, in my case, that kind of information is not on any publicly accessible database. Most friends, family and colleagues wouldn’t even know my address – some might know how to get to me as (house with a yellow door in the middle of the second street to the left, after the church …).

Still, it annoys me that most people probably have my personal email address, phone number and real name tied together as a contact and provide this information to at least one online platform. Back when I used to use Google and Android, I would try to preserve my contacts’ privacy by storing their names using some mixture of first name, nicknames, initial for surname and context, e.g., “Alan F”, “Fid”¹, “Alan (football)“, “John (work)“. I’d also keep their number and email address as separate contacts — though that might only have worked in the early to mid 2000s. At some point, Google started getting too clever at determining which contacts could be “merged”.

¹ short for Fidelma


At least they have to ask these days, before phones added more security they didn't bother with getting permission.


On the other hand, apps like WhatsApp won’t work at all without access to the phone user’s contacts list, so asking for permission is a mere formality and Meta gets your information regardless.


> On the other hand, apps like WhatsApp won’t work at all without access to the phone user’s contacts list,

I don't think that's true. Users could be allowed to enter addresses individuality or ideally, when apps ask for permissions to a person's contacts phones could allow users to select what the app can and cannot see (only certain contacts, or phone numbers but not email addresses, etc)

There are ways phones and apps could handle contact data while preserving privacy, but nobody is interested in helping people keep their data private. Phones are designed to leak your data like a sieve and apps are designed to collect every scrap of data they can get their hands on.


I was speaking about how WhatsApp currently works. Not how WhatsApp could potentially work (and the functionality you suggest would almost certainly never be implemented unless Meta were compelled by law to do so).


Do you think people don't copy every post off HN and feed it to AI?

At some point things turn from you can avoid to they are ever present.

Kind of like avoiding cameras and license plate readers, you going to lock yourself in a hole and avoid people?


completely unrelated to what’s being discussed. If I don’t want my fairly anonymous HN posts to be scraped I can avoid posting on HN.

I cannot avoid every contact I have not using these services, unless I have no contacts. If whatever point you’re making is “who cares you can’t avoid it anyway,” that’s not only intellectually very lazy, it’s untrue - Lots of countries have regulated their way around these issues. The fact that one of the biggest producers of tech in the world (US) has this space fairly unregulated is not some excuse to capitulate to things that are fairly easy to regulate sensibly, if there is political will and knowledge. With uninformed takes like the parent I’m replying to still floating around out there, I guess it really is inevitable and unavoidable.


> You can stop using those services.

Even if you stop using Gmail, chances are that the other party is using Gmail. (Today even e-mail addresses with non-Gmail domains are often using Gmail behind their custom domain.) So, your emails go to train AI for Google even if you deliberately stopped using their service.


You're not wrong, but it certainly doesn't hurt to to use a different email service. I use Protonmail as my main email account, and I'm sure the HN crowd knows that there many other good email services available these days. If you think the general population should change their behavior, then it has to start somewhere, y'know?


When you send mail to someone else, they can do whatever they want with it, including giving it to google.


This actually differs from one jurisdiction to another, that is, some jurisdictions do not permit publication of correspondence without the consent of the sender. Use of your email for AI training therefore may therefore be open to legal challenge. You know how legal challenges start? By someone feeling that there is a problem.

In any event, as the other poster mentioned, your original post claimed “You can…” and now you are moving to “You can’t” out of an apparent relish for being contrarian. This is not good-faith discussion on your part.


Then you really haven't done anything about them using your data for training, have you?


The article says "Your Gmail and Instagram are training AI.", emphasis on the "your". Of course I can't do anything about someone else distributing data I gave them.


Not even refusing to use these services is a silver bullet against your privacy being violated[0].

[0]: https://en.wikipedia.org/wiki/Shadow_profile


>It’s your data. >As soon as you decide to upload it somewhere else, it's not.

That may be how everyone is treating it but that isn’t the only or even the obvious way for it to be. Mailing something doesn’t give the mailing service a right to open and scan the contents of your letter, even if it could do so without damaging anything. Parking your car with a valet service does not grant the service the right to drive your car to make deliveries while you’re not using it. Sending photo film to be developed doesn’t give the developing service a right to make their own copies of it. And so on.

It’s not unreasonable for a user to think of their emailing something as just granting the mail service the minimal privileges necessary to transmit and deliver the message to the explicitly intended recipient.


> You can stop using those services.

But they can't stop using you.


headline sounds like I’m about to be shaken down for protection money lol


Why would I want to do something about it? It’s a good use of data that they have been very upfront about collecting. Aren’t machines doing things for me a good thing?


Sort of related is the antitrust trial against Google just started. And, yes this is the way to stop this bs because of Google is (correctly) called a monopoly and (hopefully) broken up, then AdSense, Gmail, YouTube, Search and all the rest become separate entities and cannot easily share data under one umbrella. Probably also breaks the creepy stalker advertising model too


I see that this is downvoted, so take my upvote. Every business that is "too big to fail" needs to be broken up ASAP, and that includes Google.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: