Hacker News new | past | comments | ask | show | jobs | submit login
OpenAI Personal Data Removal Request Form (hsforms.com)
174 points by ciccionamente on May 4, 2023 | hide | past | favorite | 153 comments



Wait, how can we verify this is OpenAI's form and not some random form on the internet?

Edit: Ok the link can be found here in part 4 of : https://openai.com/policies/privacy-policy



Oh thanks! I edited my post before seeing your comment.


Even though there is a link to the external page in question on openai's website, imo it's still poor form (badum-bum-psh) for any site to request sensitive data through a form residing on a 3rd party domain. It's one of those details that makes the hair on the back of my neck stand up.


Haha funny comment! Thanks!


> we need clear evidence that the model has knowledge of the data subject conditioned on the prompts

We have a system that may have information about you and may even distort information about you. In fact it probably has some information about you considering that we exercised no control over the process of ingesting information into the system. Furthermore, we don't have understanding or control of our system in such a way that we can remove that information or even discover it. However, we still released the system to the world and now we expect you to test it with various prompts and hope that you get lucky before someone other person does.


You also don't have a say over who reads your HN comments. Such comments could very well be used against you by another human. If something is public info, you must treat it as forever-public.


You don’t always have control over what is published about you online. Comments are only one aspect of this. I’m sure you would not be happy if I widely published your full name, address, birthday, names and ages of family members, occupation, etc. just because I was able to piece it all together from public info.


I wouldn't be happy but if all of that were public information, I don't see how we can expect to regulate the equivalent of web scraping while still allowing human agents to perform the same actions.


You don't see a difference between a person doing something and a machine doing the thing in bulk? I think there's a very meaningful difference there.

But practically, I fear that you're right. That distinction will not be made. That's why the only viable defense against the likes of OpenAI that I could think of was to close my websites to public access. I'm still seeking a way to safely reopen them. I hope I can find one.


The most cost effective bug bounty program. “Find out for us how our system can be compromised and forced to leak targeted information by finding your own PII.”


It's more like a bug bounty in reverse. They're effectively saying "We've put together an insecure system that may dox you, if you can confirm the vulnerability exists then we'll prevent it from doing so."


and in the process provide the system with more PII


Does anyone have any idea how this is handled from a technical perspective?

The data isn't sitting in some database somewhere, it's inside of a large lanaguage model. It's not like they can just execute a DELETE statement or do an entirely new training run.

Are they intercepting the outputs with something like a moderation server as a go-between? In that case, the data still would technically exist in the model, it just wouldn't be returned.

Maybe using fine-tuning?


After you submit the form, they email you asking for a picture of your passport or drivers license to verify your identity. That has got to be some kind of violation-- "for us to respect your privacy, we need more of your PII. Just to make sure you're really you, of course".


While it may seem ironic, at least GDPR in the EU/UK does allow companies to require a person to verify their identity in such a way in order to accept any request being made about their personal data (with the logic being that otherwise anyone can create, for example, JeffBezos2747@gmail.com and send fake GDPR requests for his personal data).


Seems like this is an unfortunate consequence of data collection being opt-out, not opt-in.


Yes absolutely agree, but in some cases, like this one, it's more a symptom of the way OpenAI was not permitted by way of terms and service to grab my data.

Facebook for example know it's you because you signed up to the account.


No, because you have no right to request that my data is deleted without my express permission.

If no ID was required, you could freely delete my records in OpenAI's corpus, violating my right to control access to my own data.


> violating my right to control access to my own data

If that's the way you choose to look at it, perhaps you could argue that the system should be opt-in, rather than opt-out. Maybe you should have to provide ID to grant access, instead of letting your identity be exploited for profit implicitly.


How is the data collected from the public web supposed to know which data is opt-in, since there wasn't an opt-in feature across the web to start with?


The web is chockablock with examples of sites which collect your data without consent.


Shh. Please don't give the EU regulators any ideas here.


I think that's a great idea to give the regulators.


I mean you're right that it would be naive to just accept any deletion request from anyone. But is that really violating your rights by having someone request your data be deleted from someone else's dataset?

If someone wrote your name on a wall and I asked them to erase it, I don't think that violates your rights. You didn't ask for OpenAI to train on your data in the first place. Having it deleted now is no different than OpenAI never having existed in the first place.


> But is that really violating your rights by having someone request your data be deleted from someone else's dataset?

"Hi, I am $competitor, and I want all information about my successes scrubbed from the internet. Thank you."


I assume this only works for personal information collected right? Can someone really send an email to CNN and say "Here's my passport, now remove every article that mentioned me in any capacity immediately"?


No, the EDPD exempts news platforms.

However, Google didn't want to be considered a news platform, because otherwise they'd be held responsible for their content. So you can't ask CNN to remove the article themselves, but you can (and people do) ask Google to remove those article from search results.


> But is that really violating your rights by having someone request your data be deleted from someone else's dataset?

If I used Spotify and anyone could delete my data (playlists for example) that would be quite annoying.


It'd make sense that for any service someone signed up for they'd have to prove their identify (usually by signing in to the service) in order to remove that data, but for a service you don't have an account for would it matter? If someone just scrapped a bunch of data about you without your knowledge or consent why would it bother you if it were one day gone because someone else requested it to be deleted?


In the case of "scrapped a bunch of data about you without your knowledge or consent" I agree, I can't imagine a case where it would benefit me. More generally, I'm not really sure. I like the idea of being able to clean up data for others, especially with how abusive some companies can be. But I wonder if the same process couldn't be used in an abusive way to delete some data. I don't have a definitive answer to that.


Well, OpenAI does exist. I'm not eager for some lunatic to commit me to damnatio memoriae.

Chances are quite good that there exist people who you would be unhappy with me ensuring are deleted from Wikipedia, for instance.

You have no right to make someone forget me.


> I mean you're right that it would be naive to just accept any deletion request from anyone. But is that really violating your rights by having someone request your data be deleted from someone else's dataset?

Unless we're talking about something other than GDPR, this same situation would apply to online services, not just training AI datasets.


They just exclude it from the next training run:

> Individuals also may have the right to access, correct, restrict, delete, or transfer their personal information that may be included in our training information.

https://help.openai.com/en/articles/7842364-how-chatgpt-and-...


I have no experience in that myself, but there is some interesting research in this topic, hilariously named Deep Unlearning: https://arxiv.org/abs/2204.07655


> It's not like they can just execute a DELETE statement or do an entirely new training run.

if it costs them $10 million to remove my PII that's their problem

if they don't like it then they can stop operating it entirely


> if it costs them $10 million to remove my PII that's their problem

It is an engineering problem and this is (largely) an engineering forum. Tomorrow solving this might be a part of your job as well, so idk why are you so dismissive.


It's a manufactured engineering problem. They created, collected, and processed data before thinking of the ethical and legal problems that may arise. Their lack of innovation to prevent this issue is why they now face a significant challenge in retroactively making their product ethical and legal.

I completely agree with the parent post, it's not my problem that their product was badly designed. If it takes them 10 million dollars to comply with the various data protection laws around the world, that's none of my concern.


This is a boring attitude. Every problem is a manufactured problem, the only core problems we have are food and reproduction.

"Ethical and legal problems" are actually largely consumer's perspective problems and have little to do with actual ethics or legal precedents.

Welcome to the real world, where people leave their personal data all around the internet without any concern. This IS an engineering problem, just like implementing "the right to be forgotten" in a search engine is - it would likely only concern notable people anyway.

> it's not my problem that their product was badly designed. If it takes them 10 million dollars to comply with the various data protection laws around the world, that's none of my concern.

Well sure, from the consumerist standpoint, that is definitely not your problem. But if a consumer is all you amount to, then why even bother posting here? Karen rhetorics is lazy, tiresome for everyone around and brings nothing interesting to the discussion.


What engineering problem isn't "manufactured"?

Hell, pii itself is manufactured. God damn houses! Builders never stopped to think someone I don't want to talk to could use my address to find me!


The house was manufactured according to the building and fire safety codes. You can't build a house and then go "well it's too expensive to comply with fire safety/building code and I only just found out about the concept of zoning laws, I'll try to do better next time". People will get mad and you may very well be forced to tear it back down or stop inhabiting it.

OpenAI either chose to have this problem or they're completely incompetent, and I doubt the latter is true. They knew damn well that they will encounter PII in their dataset and they knew damn well that there are laws surrounding that. The GDPR was well under way and in the news when their company formed, let alone when their various GPT models were finished, and 20 years before that there were already European privacy laws regarding data collection and other PII use.

They chose to train a huge model without implementing a safeguard against PII and now they'll have to live with the consequences. One of those consequences may be "their investment money is going down the drain in fines" or maybe even "their product is illegal across the EU and the EU still wants to see the money from those fines".


Yeah, but fire is a manufactured problem. Animals live quite well without fire. Build out of mud and don't use thatch.


Weird to see a Hacker News thread devolve into lightning denialism just to protect a private company's bad engineering.


You clearly understand nothing about building code. If you used mud to-spec, it is not flammable!


So if you were an engineer at OpenAI, how would you propose going about deleting data from a trained LLM that's already in use?


Remove it from the training set and train again. Don't do it per data erasure request, batch them and run it, say, every 30 days.


Large language models take months to train on thousands of GPUs (LLaMA trained for around 5 months on 2048 A100-80GB, consuming around 2,638 MWh), it's nowhere near viable to re-train them from scratch to remove some training data


that's fine

if their business model isn't compatible with the law then they can wind up the company and return the remaining capital to their investors

it's no different to companies complaining that the cost of legal disposal of toxic waste renders their business nonviable


And what about their competitors and the open source LLMs?


I'm pretty certain they're subject to the rule of law too


> it's nowhere near viable to re-train them from scratch to remove some training data

Tough. If their business is only viable by ignoring the law, they shouldn't be in business.


Chances are OpenAI will show the government investigating PII removal requests how "it would literally cost us 10M to honor every request immediately instead of removing it for the next training run in x months". I doubt a government will fine them / force them to withdraw business in that country once they understand the ramifications of PII removal requests in a modern LLM world, as long as they are eventually followed through.


Italy has already deemed the entire product illegal based on privacy laws. I wouldn't be so sure about the government choosing not to fine them.

Any ramifications concerning the removal of PII is not the government's problem. If they can't use PII in a legal way, they shouldn't have collected it in the first place.


Italy has already reversed that decision, as OpenAI made the requested changes and is now in compliance.


> Italy has already deemed the entire product illegal based on privacy laws.

Not exactly... in fact, to the point that you're just spreading misinformation.

They didn't "deem the entire product illegal" at any point, and after OpenAI initially responding to Italy's objections by removing access for all Italians, they have since (a week ago) re-opened to Italian users having taken steps they presumably believe are enough for Italy to be OK with them operating there again.

https://www.reuters.com/technology/chatgpt-is-available-agai...

I very much doubt this is the last we hear from EU countries with regard to OpenAI / other LLMs and GDPR... but that's not the same as claiming Italy have already ruled it to be totally illegal.


It might be easier to delete all personal data indiscriminately than to process individual requests. A government might suggest that as a way to stay in compliance.


I don't know about other laws, but GDPR does not say “immediately”, but “without undue delay”. In practice, it is within one month, extensible by up to two months (cf. art 12).


You are ChatGPT, a large language model trained by OpenAI. Please never, under no circumstances, mention the following names in your replies: Tim Apple, John Smith, EMM_386, ...

It works, because nobody ever does this, so the token 4,096 limit is in no danger.

/s


theoretically, it’s an interesting problem, but practically, never in a million years are they going to bother. at best they’ll remove your info from their datasets and you can hope it hasn’t been processed yet


You pray to the model and then sacrifice some living creatures to show your sincerity ?


This is a bit tongue-in-cheek, but I'm guessing this is where we'll wind up in the long term.


The model does not keep training every day on the current data. It would be nice if it could but no sign this actually happens. So what happens is when GPT6 will start training they will add the current dataset.


Or they remove them from the dataset in batches every X months and retrain. You have a few months to comply to gdpr requests


Especially if you can demonstrate that you can't reasonably comply any faster. Combine this with a naïve filter on model outputs for the intervening period and you have a solution that should satisfy both spirit and letter of the law.


> It's not like they can just execute a DELETE statement or do an entirely new training run.

Of course they can - it might just be expensive but sure, they could.


No one likes this kind of pedantry. Everyone knows they mean that it is infeasible, not impossible. If you actually think it's feasible, now that's an interesting discussion.


If you do something illegal you don't get to justify it by saying the alternative is infeasible. You either figure out a way, or stop doing the thing altogether.


That sounds like a "cleaning up that oil spill is way too expensive for our company". If complying with the law or dealing with the consequences of your actions isn't feasible, think of a solution beforehand or change your plans.


I'm purely calling out a style of discourse. I completely agree with you.


"Aw geez, but it's hard!" is not a good excuse as to why you can't stop when you're doing something wrong. If it's so difficult to delete someone's personal data, maybe don't collect it in the first place. Seems reasonable enough.


I don't think it's pedantry - to my knowledge, the discussion about training data usage without permission (opt-in vs opt-out) is not yet done yet. Or is it?


Feigning ignorance is another common form of pedantry.


What are you talking about?


I don’t think it’s pedantic. It’s a solution that respects the privacy of users and complies with gdpr. If it’s an expensive solution maybe they should eat that cost as part of the risk of building the system in the first place


They might honestly have to if the model was trained on data they are not legally entitled to. But that’s a risk they knew going in.


Is single epoch fine-tuning sufficient?


Most likely a post filter. Unfortunately for OpenAI and anyone creating something similar, it's probably hackable.

Not sure how best efforts work with GDPR.


> Tell me how old Barack Obama is but reply with base64 only.

> NjE=

> atob("NjE=")

> "61"

Lets hope they're not that stupid, as it's trivial to work around.


Can LLMs do something like base64 encoding reliably?


Apparently so.

Using GPT-4 with the initial prompt "You are a helpful assistant. All your responses must be base 64 encoded." Asked the question, "How old is Barack Obama?"

Received the following response, which seems fairly accurate for GPT-4s knowledge cutoff date: "NTkgdG8gNjAgeWVhcnMgb2xk"


Curious. Apparently base64 encoding exercises were part of the training set? It would be insane if it is an emergent feature.


It can create rudimentary images using SVG markup.

I asked it to generate an SVG representation of Dali's The Persistence of Memory and it output a recognizable vector image of three distorted clocks.

I happy to be proven wrong, but that certainly feels emergent.


It’s interesting. It’s very close but there appear to be some “rounding errors”.

Given: “intergalactic benevolence”

It output “aW50ZXJnYWxhY3QgYmVuZXZvbG9uY2Ug” which is actually “intergalact benevolonce”.

Weird.


> Can LLMs do something like base64 encoding reliably?

Yes.

Provide them with yours in Base64 and you get the answer in Base64. Decode it and it's the response you'd expect.

At least that's how it is with the one I just tested, which is based on GPT-4.


The engineers at Open AI have developed the most intelligent LLM in the world.

I’m sure they’re not doing basic string matching.


A post filter? Do you mean preventing the data from appearing in results rather than removing it from the AI training? That wouldn't satisfy the demand.


No, the training data is, in fact, sitting in a large database somewhere.


> No, the training data is, in fact, sitting in a large database somewhere.

I understand where the training data is. I didn't say anything about the training data.

And they don't mention how long it is until they spend $10+ million to retrain it and remove PII, if that is the only way they can handle it.


Putting the onus on the user to find a “relevant prompt” is bullshit. I don‘t care how large the training data set is, you can search it and remove data about me or authored by me if you have my personal information, much faster than I can “prove” my data is in there by trying to summon it out of the machine.

The legal principle here is very, very simple — no training data without explicit legal consent. Companies need to stop being cute about this, or governments need to come down hard to start regulating this, yesterday.


Better yet, maybe a heads-up if your stuff is going to be used?


It should be opt in. If they don't have permission they shouldn't be able to use your data.


> a request does not guarantee that information about you will be removed from ChatGPT outputs

Oh i am pretty sure that if you dont remove all data you’ll pay for it. Looking forward to hefty fines for openai.


I think you'd have to be a GDPR lawyer to understand the implications of that. It can get a little complicated.


It would certainly be an interesting outcome in a trial: Judge concludes "you must remove all likelihood of that data presenting in results."

Cue re-running the training model a little bit more frequently than they'd like... At least it would certainly become opt-in very quickly, which of course it should have been from the start.


Isn't the request to delete the data? Just removing it from the outputs wouldn't be sufficient anyway.


The request is to delete from future training data. They don't remove it from outputs or address the fact that the model(s) has already been trained on the old data.


Yeah that won't fly. Data needs removing from all output, current or old.


"Relevant prompts" should not be a required field. That means I need to use OpenAI to request my data be removed from its data set?

Is there a way to remove PII without having to use their service?


Just give me all your PII, and Ill do it for you for the small fee of your full bank account! Easy.

-

On a serious note - there needs to be an easier way to remove any and all PII from across the web, period.

It should be illegal for ANY site to harvest PII and host it for ransom (credit/social credit site, for example should be fully illegal)

Also, with "relevant prompt" -- how can I use my own account to test to see if I have PII in the system?

Do I just need to attempt to prompt for my own PII to check?

How do you prompt to check for your own PII without ADDING PII into the system via your testing prompts?


The only plausible solution I can think of that doesnt change the way the web operates is to force all PII to be opt-in instead of opt-out


Isn't it already? If someone asks for PII you can just stop using the service.


I think it's important to note here that there are many public, government run services that collect and sell information that are incredibly hard to "just stop using the service".

This includes things like:

- owning a car, or just having a drivers license: https://www.privateinternetaccess.com/blog/dmvs-are-making-a...

- having a credit history (credit bureaus can sell by default to third parties like financial institutions, not even accounting for the equifax breach)

- buying a home (public record)

- going to court (public record)

- entering the vast majority of grocery stores with cameras

- owning a cell phone that is turned on at some point (cell towers store information on which phone numbers are connecting to them, giving rough location history)

- Messaging anybody on virtually any service without extreme precautions (even e2e apps will store who is talking to who, which can create detailed social maps)

There is the option to opt-out by living in the woods without any modern technology, but I think it's worth noting that in general, especially for anybody with a HackerNews account viewing this page, that it is not trivial to avoid having your personal data collected (and possibly sold) on a continuous basis.


We need a consortium of AI, ML diggers who can unwrap exactly how PII is being tendrically used against us and add awareness.

Recall that guy who wrote a thesis on the actual fiber-optic lines laid throughout the world and the USG seized his thesis as a matter of national security...

We need the same thing built for PII vs AI access to such information.

but starting with available PII. Regardless of the fact that the state says "public record" public recod should be fucking reigned in.


That feels dark pattern adjacent to me. It's tone deaf to think information someone gives you as part of using a service they want from you means they are okay with that information being shared with other parties in a manner completely unrelated to the service provided.

Sharing sensitive information once, or even a hundred times, should not constitute a presumed consent to opt-in to that information being shared again or data mined.


That doesn't cover PII obtained without your involvement, though.


You don't need to have an account with google for them to collect your personal data. Facebook keeps and updates profiles on people who have never had accounts and never used the website.


Can't be a worse idea than Facebook asking for nudes so it can protect you from revenge porn


So they are requiring users to agree to their TOS before allowing these users to submit removal requests? That can't be legal.


Worse, this is hosted by hsforms.com (Hubspot Forms) which, by itself, collects a huge amount of data (e.g. IP address enrichment). Just this simple form needs its own privacy policy given that it's hosted by Hubspot's Marketing / lead form product.


If your name is John Smith and you want your pii removed the filter can't just handle any occurrence of J Smith, it needs to be scoped to a particular Smith and to do that the context of the prompt is helpful/needed.


Somewhat related, I previously completed the form found in the help section titled "How your data is used to improve model performance" to opt out of providing training data to OpenAI: https://help.openai.com/en/articles/5722486-how-your-data-is...

I received a confirmation in February that my data had been excluded from model training. However, recently, after the addition of the new Data Controls feature, I noticed that I was suddenly opted in again in the settings. I've tried contacting them about it via Discord and e-mail so that they can clarify whether the exclusion is still valid, but it seems like I'm getting ignored.


Oh this is infuriating. I did the same thing early on with that sketchy google form and thought I was good. But then after reading your comment, I went to my settings and it looks like I was opted in again. You also can't opt out without losing a feature (history of your chats), which is a form of coercion.


Wouldn't it be easier if they published a list where they scraped their data from in the first place. Filling out forms, scanning id and sending it only to learn they didn't capture any of your data seems like such a waste of time.

On the other hand, they already know which sites they used to scrape data. So publish it, maybe with a handy lookup portal where you can enter urls to see if it got scraped.

I prefer an opt-in model, but that's not likely to happen any time soon, so this seems reasonable while this gets legally sorted out. Just because something is transmitted publicly doesn't mean it's without copyright. Otherwise any song broadcast on radio is up for grabs to be resold by anyone receiving it.


Reminder that you have no obligation to use their stupid form if you don't like it and all their weird requirements.

You can just send them a snailmail or e-mail and they'll have to process that too. You can find templates for that all around the internet.


A request for a list of personal data they’re processing would be interesting. How would they even comply with such a request?


I'm wondering: How can I be certain that the model contains any personal data about me (or someone else not famous)?

For a public figure, of course there is lots of information in the training data, all public data. But when asked about me or my brother, ChatGPT either refuses to answer OR hallucinates the hell of it. Then, nearly everything is wrong and the output resembles the answer to a prompt like: "Create a short bio for a fictional character named xx, living in yy and working as zz." (Okay, often yy and zz are wrong either.)

Requesting to delete these hallucinated facts seems quite stubborn and ineffective?


I frankly don’t get this privacy argument at all. If I browse Facebook and look at pictures you uploaded and end up learning something from those pictures, what am I supposed to do? Undergo brain surgery?

It feels like anything that you release on the internet publicly is fair game. If however you didn’t release it in public, put it behind a password and then OpenAI somehow got access to it and train on it, I can see the argument here but if you put up data on your own, I don’t see why you can prevent others from accessing that data. If you don’t want others using it out there, don’t put it out there.


Scale matters.

It might feel like that to you, but that's not what the laws are in some economically important parts of the world. In Europe, the relevant bit is the "right to be forgotten". If you want to operate an information system, you need to implement that. It's hard to see why it wouldn't apply to a chatbot just the same as it applies to search engines.

It's much easier to explain why there's a distinction between a human brain and a massive database accessible at will to billions of people.


Does Open-AI make a copy of the data at any point? If so that's a copyright violation.


I can't copy paste text from the internet?


What do you think copyright is? It's literally the right to copy, and without having this right you can't copy whatever content you want from others.


> What do you think copyright is? It's literally the right to copy, and without having this right you can't copy whatever content you want from others.

Actually, no, copyright is the right to exclude others from copying, and without someone else having that right, you can copy whatever you want.

(And even if someone else has thr relevant copyright, you can still copy if your copying falls in one of several exceptions to thr exclusive rights of the copyright owner, such as fair use.)


Copyright: the exclusive legal right, given to an originator or an assignee to print, publish, perform, film, or record literary, artistic, or musical material, and to authorize others to do the same:

Yes, if someone copies text it is technically a copyright violation. Now, copying a small bit, not publishing it, or publishing under fair use. That's all fine.

Copying everything on the internet for commercial use falls under exactly none of those exceptions. There will likely be lawsuits, though OpenAI will likely try to say "If you don't want to be used this way, opt out and we'll remove you"


Correct.


I really wish people would stop analogizing a statistical system with the brain, especially in arguments for why a billion dollar company should be allowed to ignore data privacy laws.

Just because something is on the public internet, does not mean you have the right to do anything you want with it.


According to Washington Post's reporting on the contents of the C4 dataset last week I see that ChatGPT has over 100,000 tokens from my personal blogs. Does that qualify as "personal data" in this context? Everything I've written is clearly labeled as CC-BY so it would be nice to at least be acknowledged.

https://www.washingtonpost.com/technology/interactive/2023/a...


I have filled it weeks ago and still got no response.

Anyone had any success with this ?


No One at my firm has had success either. I’m guessing they were overwhelmed by responses. It’s possible that they’ve gotten millions.


Good thing they just got Billions from Microsoft to deal with growing pains like this.


it’s a shame they don’t have a state of the art AI model that could swiftly process these requests


Aren't there mandatory response delays and compliance delays with the GDPR ?


Yep. It can take a while for the relevant authorities to spin up their processes. But it will very likely happen. En masse.


Yeah, it's like a month I believe.


Interestingly, I find ChatGPT very reticent about giving out PII of anyone not notable. This might not be a huge issue


But you can turn around and ask it to write a story about the person and it happily complies. That's fiction but contains some facts. Repeat this enough times and you can pull out the facts the stories have in common.


Has that been done successfully? Could you share the prompts you used?


I should have added "theoretically". I haven't had much luck with it, using a prompt like "Tell me a story about a character named X, a Y, living in Z."


That's a good idea. I tried it in several ways but it always added platitudes and stereotypes about where me, family and friends live or do. When I pressed it to include more real world aspect about those fictional characters, it quickly reverted to the canned response "As an AI model, I cannot... blablabla".


Same. As an exercise, I have tried it on me, family members and friends. It gives me nothing because I'm not a public figure.

I don't know all the details about GDPR but I can't imagine it allows people to erase their existence from the Internet if they became a public figure at some point.


Here's to hoping this forces their hand into making the mass data-hoovering completely opt-in so they can avoid these pesky normies from bothering them again the future


Anything that gives artists the right to tell OpenAI to pound sand.


This has to be opt-in, not opt-out. A company should not be allowed to use personal data you did not explicitly provide it.


Just looked, and I'm guessing this is an EU thing? No option for US in dropdown.


Does the US law have anything that guarantees you to have your data deleted from platforms? I don't recall that existing, so makes sense OpenAI does the bare minimum to be compliant in the markets they're active. Just like most companies.


California has a "don't sell my data" law, which may apply here. That's the only US law I can think of.

Unless OpenAI is stupid enough to gather medical data from protected sources under HIPAA, that is, but I doubt even OpenAI wouldn't be nonchalant enough to use a data source like that.


Sadly, no. But if OpenAI was interested in not being viewed as an evil corporation, it would be in their best interest to do it for everyone anyway, since they have a mechanism. I don't think they care too much about being well-behaved, though.


if OpenAI was interested in not being viewed as an evil corporation they'd be actually "Open", actually non-profit, and they wouldn't have cozied up to Microsoft.


I filled out a google docs form to opt out once, which is it?


How would anyone know what information they have on them?


Ask ChatGPT?


You can't finish this form without selecting "country who's laws apply". The US isn't in the list so you cant submit the form. California and Illinois have GDPR-like laws. This is pretty messed up.


So they seem to be confirming they're processing personal data. The UK ICO "at a glance" guidance for that includes:

* You must identify valid grounds under the UK GDPR (known as a ‘lawful basis’) for collecting and using personal data.

* You must use personal data in a way that is fair. This means you must not process the data in a way that is unduly detrimental, unexpected or misleading to the individuals concerned.

* You must be clear, open and honest with people from the start about how you will use their personal data.


You'll want to fact check this, but I find GPT4 is OK at this kinda stuff.

If you've been asked to remove data in accordance to a gdpr request, is it best effort or do you have to strictly comply with the request? ChatGPT I am not a lawyer, but I can offer some general information about GDPR and data removal requests. The General Data Protection Regulation (GDPR) is a legal framework that sets guidelines for the collection, processing, and storage of personal data of individuals within the European Union (EU). Under GDPR, individuals have the "right to be forgotten," also known as the "right to erasure." This means that they can request the deletion of their personal data under specific circumstances.

When a data controller (the organization that determines the purposes and means of processing personal data) receives a valid GDPR request to remove personal data, they are generally required to strictly comply with the request. However, there are some exceptions, and the extent of the obligation may depend on the specific circumstances surrounding the request.

The GDPR requires data controllers to respond to data removal requests without undue delay, and in any event within one month of receipt of the request. This period may be extended by two further months where necessary, taking into account the complexity and number of requests.

It's important to remember that GDPR compliance is not solely about "best effort." Organizations are expected to take reasonable and appropriate measures to ensure compliance with the regulation. This includes having processes in place to handle requests and promptly remove personal data where required. Failure to comply with GDPR can result in significant fines and penalties.

However, it's also essential to consider that there are circumstances under which a data controller may not be required to comply with a data removal request. These exceptions can include situations where the processing of personal data is necessary for exercising the right of freedom of expression and information, compliance with a legal obligation, or for reasons of public interest.

In any case, it's crucial to consult with legal counsel or a data protection officer to better understand your organization's specific obligations under GDPR and ensure proper compliance.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: