
Publicly available data under the GDPR - r_singh
https://iapp.org/news/a/publicly-available-data-under-gdpr-main-considerations/
======
bilekas
The GDPR however is about the information being stored being specific for the
requirements, but more the consent for information to be stored and the option
to not have information stored.

Also the time for which that information can be stored. If it comes from
public sources, there needs to be a proven reason why its stored there, and if
it is not required, it must be removed.

I wasn't sure of any good example of 'Public Information' from the article..

~~~
r_singh
> I wasn't sure of any good example of 'Public Information' from the article..

A decent example would be online reviews. When someone leaves a review on Yelp
or Tripadvisor, they're making their name, avatar, city location, interest,
opinion on a subject matter—all public.

Now there are websites that scrape these reviews and display them on their own
property with their source attributed to the original review (Google and Agoda
are examples). And there are web applications like Podium, Yext, etc. which
help companies manage their online reviews by scraping them from all the
business' listings.

While a user can delete their information from the original data source
assuming that the original data source obliges with GDPR, they may not be
aware of what other parties have scraped and stored this information in their
databases without explicit consent.

Another relevant example could be publishing info on LinkedIn and having it
scraped by companies like HiQ.

~~~
bilekas
Okay, but that scraped information is not in line with GDPR. The user has not
given consent for that information to be stored.

Yelp may be in line, storing identifiable information for particular reasons
in order to utilize their services.

But scraping is never in line with GDPR, if you are scraping, which is allowed
for example, the information on users and people cannot be stored.

~~~
mstolpm
Could someone downvoting bilekas comment please eli15 why his comment is
wrong?

I'm under the impression that this is exactly the GDPR position: You are not
allowed to store and process PII if you don't have the consent of the person
identified. I'd be happy to be proven wrong.

~~~
IAmEveryone
You'll be happy to hear that _consent_ is but one of six possible legal bases
for storing data.

The others are legal requirement, required to perform a contract (with the
subject), legitimate interest, vital interest, and public interest.

For scraped data, public interest and legitimate interest are likely to be
most salient.

~~~
mstolpm
Thanks for the explanation. I understand that there are other legal reasons
allowing storage and processing of (some) PII in (some) cases without given
user consent. But does that fit for "scraped PII"?

I'd be very sceptic giving "public interest" or "legitimate interest" as a
reason for storing and processing of PII in scraped data: "Public interest"
(if not about some VIP or other public figure) mostly seems to be based on
summarized data, so you would at least need to anonymize the data immediatly.
The (identifiable) name or picture of a random scraped comment is hardly of
"legitimate" or "public" interest. And vital interest or legal requirement for
scraped data is hard to argue as well in most cases (I'd be even skeptical if
used for law enforcement). Moreover, I have a hard time seeing a lot of cases
where scraping PII by a company, organization or single person is required to
perform a contract with the subject of the PII.

Let's take a concrete example: Why would someone have the right to scrape all
my Amazon reviews from the Amazon website under public or legitimate interest
and store/process them with my name and my identifiable PII? I can see a
public interest in scraping reviews and processing them without the PII, but I
don't see a reason to do so without anonymization.

So, I still see the comment holding: Scraped data falls under the GDPR and
doesn't allow to store and process PII in most cases.

~~~
ratherbefuddled
GDPR doesn't work like that, the means of obtaining the data does not relate
to the legal basis you have for processing it.

If you process personal data you need at least one legal basis, you decide
what that is and you take on certain obligations.

If you rely on "legitimate interest" GDPR requires you to consider the balance
of your interest versus the subject's right to privacy. As long as you do so,
record your determination and take reasonable steps to mitigate the privacy
impact - such as allowing opt out, aging the data out over time etc - and you
make good on your obligations it is unlikely an enforcement agency will find
against you. It is a subjective decision however and there haven't been many
GDPR cases handled by member states yet, so interpretations might change over
time and all you can go on is guidance from regulators presently.

I think a good example of where scraping and legitimate interest would be ok
is if you are trying to sell a product appropriate to people with <job title>.
You go to LinkedIn, pay for a facility that allows you to search for that job
title, scrape that data and attempt to contact those people. There is some
privacy impact but it is likely that people who make themselves available on a
business networking site might reasonably expect to be contacted about things
relevant to their job title. As long as they can opt out from your processing
of their data and you don't keep it for longer than necessary and meet the
other obligations you'll likely be fine (notwithstanding the ePrivacy
directive which is a separate thing).

An example where your grounds might be a lot more tenuous would be a
recruitment agent scraping your name from a question on StackOverflow and
guessing your email address to contact you about a job.

In your Amazon review example, the processor would need to justify capturing
your name. For most processing purposes (eg grouping the reviews by author) a
hash would suffice.

~~~
bkor
Could you add references to the sections of the GDPR which explain this? The
various times I read the GDPR (as well as the local law) I didn't see your
explanation in it. References would allow me to determine what I missed.

