Hacker News new | past | comments | ask | show | jobs | submit login

Handling the data in an ethical way doesn't need to be handling the data in an completely anonymous fashion. That would be one solution, but you can also create a tust-based system for how the data being labeled is handled, similar to HPAA. In addition, there are simple operational methods that could help ensure the data is processed as close to anonymously as possible. For example with voice data, you could filter the voices, work with the data in segments, and ensure that metadata for the samples is only accesible by trusted individuals certified under the above framework.



In trust-based systems like HIPPA or Clearances, there is a fundamental aspect of requiring 2 conditions to access data: privilege, and the necessity to know. Taking data and mining for valuable insights isn't a "need to know" it's a "need to discover something unknown". This is where the security breaks down. In a conventional HIPPA system, only your doctor needs to access your info. You don't have to worry about some other doctors accessing your information in bulk to try and conduct a study on cancer rates. They don't NEED to know your info, they just WANT to know. When you WANT to know how to accurately fingerprint people by their voice, then obfuscating it is counterproductive.


>You don't have to worry about some other doctors accessing your information in bulk to try and conduct a study on cancer rates.

This not only happens, it's my job (though I'm not a doctor). Of course, it's tightly controlled on my end. I work for the government, but health systems have their own analysts. As part of my job, I have access to sensitive and identifying information.

This isn't to be contrairian. There are existing systems using very personal data in bulk for analysis. The wheel doesn't need reinvented.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: