Ever since Aaron Swartz, I wouldn't bet on scraping and then publically dumping data to be risk free legally, even if said data was public/badly protected to begin with.
Leonard French had a video in the past year where he stressed that the "ease" of doing something did not make it legal. From what I understand if you have any amount of protection on your information (and one might count visibility: hidden as a protection) then its possible you are violating the computer fraud and abuse act. The law isn't specific and only requires you do something that the system owners did not reasonably expect you would do.
In HiQ vs LinkedIn, on the injunction the Ninth Circuit Court granted HiQ's injunctions on the grounds that they had raised serious concerns over whether the CFAA can apply at all to systems that do not have an access control mechanism.
> Public LinkedIn profiles, available
to anyone with an Internet connection, fall into the first category. With regard to such information, the “breaking and entering” analogue invoked so frequently during congressional consideration has no application, and the concept of “without authorization” is inapt.
"visibility: hidden" is unlikely to meet that burden. It's not any kind of access control mechanism; the people who are able to see it are defined by their knowledge of the tool, not by being granted access.
This is the criteria they used:
> Put differently, the CFAA contemplates the
existence of three kinds of computer information: (1) information for which access is open to the general public and permission is not required, (2) information for which authorization is required and has been given, and (3) information for which authorization is required but has not been given (or, in the case of the prohibition on exceeding authorized access, has not been given for the part
of the system accessed).
Obfuscation does not require any kind of authorization. It makes it more difficult to extract information, but I think you would have a hard time arguing that other users are not authorized simply because they don't know the URL. There isn't even a list of who is allowed, so "granting" access in this case is nonsensical.
The only way I can see this being illegal is if Parler can successfully argue that they intended for this to require authorization, the scraper knew that Parler intended this to require authorization, and that it's a violation of the CFAA to do something that the provider didn't intend for you to be allowed to do. I don't see that as a remotely likely outcome.
Sounds like Parler was fusked not scraped from their description.
At an extreme you could fusk shared Google documents. Assuming there's a bug that brings it down from end of the universe timelines, basically you are brute forcing a shared password.
Or is this 'scraping' - ?login=myusername&password=mypassword
It's hard to see someone guessing 1000's of URLs that they have no public link to, to get material end users have deleted being straight forward legal.
The Wired article this post links to says that "the posts on Parler were simply listed in chronological order: Increase a value in a Parler post url by one, and you'd get the next post that appeared on the site"
This sounds like the resources were available by simply enumerating over autoincrementing primary key, so it didn't require much "guessing".
Not sure if I agree. Intentionally and systematically guessing URLS of postings and comments that were deleted or intended to be private sounds like computer crime to me.
Weak security does not bestow the right to steal data.
Not that such things aren't without their own flaws, but I like to toy with real-world analogies when thinking about computer crimes. One that seems applicable here is that of a library -- there are stacks of books, and if you know where to go you can read any book you'd like. If a librarian wishes for a book to be inaccessible then the easiest thing to do is to remove it from any indexes and stop telling people about it, but they're still fully within their rights to browse every book till they find what they're looking for. Absent some sort of additional access control, disposal, etc, it would be hard to blame a patron for reading that banned content even if they started their search with the express purpose of uncovering such banned material.
> Weak security does not bestow the right to steal data.
It does not, but entirely absent security does give the public the right to view the data. HiQ vs LinkedIn [1] found that scraping of publicly available assets did not constitute a violation of the CFAA.
That court upheld that in order to trigger a CFAA violation, there needs to be some form of access control in place. Otherwise the data is considered public. Not only are you allowed to scrape it; Parler is not allowed to try to stop you from scraping it, unless they are willing to put the data behind a log in wall.