
LinkedIn, HiQ spat presents big questions for freedom, innovation - carlps
http://www.sfchronicle.com/business/article/LinkedIn-HiQ-spat-presents-big-questions-for-11274133.php
======
ChuckMcM
As someone who once oversaw the operation of a web crawler I can tell you its
pretty simple, if it is "Okay" then the robots.txt file will tell you its
allowed. If you look at the LinkedIn robots.txt
([https://linkedin.com/robots.txt](https://linkedin.com/robots.txt)) you will
see it is carefully groomed to allow various search engines look through
specific sections of their web site, the rest are disallowed.

Pretty much all of the case law comes down as there is a perfectly valid
copyright on the 'collection' of a web site regardless of ownership of
particular pieces, and the robots.txt is a well known and well understood
mechanism for informing 'authorization'

There is a "value" to LinkedIn to letting Google and other search engines
crawl them, you get to see pages in your search results pointed at LinkedIn,
so LinkedIn lets them crawl their pages.

At the end of the day this is _exactly_ a question of value. Microsoft knows
that the collection of information in LinkedIn is valuable for a number of
uses, if you want to pay them some of that value to get access to it, fine, if
not then don't use it.

Here is one possible outcome; Microsoft will tell them what it will cost to
use their info, HiQ will probably not be able to meet it because they've built
their existing pricing structure around "free" access, and then as they are
going down the drain Microsoft will buy their assets and technology and
LinkedIn will get this new service you can buy from them to help you find and
retain people.

~~~
renlo
From what I've been told, if the data is factual, such as current employment
information, then it doesn't fall under copyright.

Interpretation of that factual data would fall under copyright though.

~~~
ChuckMcM
Kind of, kind of not. It is true that the language of Copyright law calls out
'facts' as something that is not being protected by copyright, _how_ you get
the facts has a large bearing on whether or not you can reproduce them.

There is a _lot_ of case law around this stuff as you might imagine. I
certainly haven't followed all of it but my interest in information economics
has lead me to read fairly extensively about it. And I'm not a lawyer, and
especially not a Copyright lawyer so it is entirely possible that everything I
have come to know is pure bollocks, consider yourself so warned :-).

Generally in reading about these things there are 'facts' and 'how you got
access to them' that come out. There are lots of cases where the "collection"
of facts has been upheld to be protected. So for example the "Machinists
Handbook" is a collection of facts about machining and the handbook is
protected by copyright, even though the specific dimensions of various thread
pitches are just 'facts'. Perhaps more interesting has been cases involving
national sports leagues against companies and fans who do things like "live
tweet" a sports event. They have argued successfully that by buying a ticket
to the event you have agreed to the terms of that admission which expressly
prohibits you from reproducing those facts in any form. So while it may be a
"fact" the Buster Posey just struck out, if you learned of that fact by
sitting in AT&T park at a game you can't legally "tweet" it without violating
your agreement with Major League Baseball that you agreed to when you bought
the ticket.

It has similarly been held (look at a lot of CraigsList vs a bunch of people)
that automating access to a web site through scraping is an access that you
have to be explicitly allowed. That allowance comes in the terms of service of
the web site and is expressed by the robots.txt file (and the available terms
of service contracts on the site).

What it boils down to is that the collection of facts in a web site ARE
protected by Copyright. Further, in exchange for granting you access to the
information, the Copyright owner CAN put restrictions on how you may further
use the facts you discover there. If you wish to use the information in a way
the Copyright owner objects too, you must get the 'facts' through some other
source and not the Copyright owner's collection.

And yes, getting it out of Google's cache of the pages does not count. See the
Craiglist vs 3Taps
([https://en.wikipedia.org/wiki/Craigslist_Inc._v._3Taps_Inc.](https://en.wikipedia.org/wiki/Craigslist_Inc._v._3Taps_Inc.))
dispute to get a feel for how the court views things. The simplest
interpretation I can make from those events was that Google's caching pages
counts as fair use (it makes results faster) but people taking the page from
Google's cache is either a CFAA or Copyright violation and thus disallowed.

------
putlake
It's interesting to think about how this differs from Craigslist going after
scrapers. Linkedin is objecting on the basis of DMCA (copyrights) and the
Computer Fraud and Abuse Act (alleging unlawful access of their public
website).

>Nate Cardozo, a senior staff attorney for the Electronic Frontier Foundation
in San Francisco, said copyright law doesn’t apply to this case because
information from LinkedIn profiles, like when someone worked at a particular
company, are facts, not creative works like music or films.

I wonder if copyright applied to Craigslist posts or if the fact that a house
is for rent is just a fact.

------
home_boi
There's a fine line between limiting the free speech of these personal
information aggregators and violating the privacy of people. Right now, there
seems to be a lawless and limitless environment for the aggregators. They
freely present people's physical address history, phone numbers, relatives,
date of birth, etc. to the open web.

I hope some reasonable restraints will be put in place. Something like
aggregated address history can only be displayed on official government
websites with rate limits.

------
shawn-butler
Seems identical to podmappr/3taps v Craigslist which ended up in a settlement
of something like a million USD to Craigslist.

The CFAA is such a horribly written statute. It needs to be completely
rethought.

------
sb8244
On phone and not sure of how to format a quote. Can anyone explain how this
from their terms of services affects copyright claims. It appears that they do
not own the content on user profiles and shouldn't be able to sue based on it:

> You own all of the content, feedback, and personal information you provide
> to us, but you also grant us a non-exclusive license to it.

~~~
sillysaurus3
[https://news.ycombinator.com/formatdoc](https://news.ycombinator.com/formatdoc)

------
akanet
I had the very unfortunate pleasure of being neighbors with HiQ when we shared
a coworking space in downtown SF. They were aggressively selling and marketing
HR software that alerted employers when their employees updated their LinkedIn
profiles, indicating an intent to quit.

I've got no problem with the business, and LinkedIn's suit is probably an
overreach, but by God were the HiQ people annoying. I remember the CEO pretend
boxing with their salesmen to get them amped up (in our rather open, shared
office). Quintessential white good ol boys club mentality.

