> Furthering the S3 health data thought exercise:
If OpenAI got their hands on an S3 bucket from Aetna (or any major insurer) with full and complete health records on every American, due to Aetna lacking security or leaking a S3 bucket, should OpenAI or any other LLM provider be allowed to use the data in its training even if they strip out patient names before feeding it into training?
To me this says that openai would have access to ill-gotten raw patient data and would do the PII stripping themselves.
To me this says that openai would have access to ill-gotten raw patient data and would do the PII stripping themselves.