Company B scraped listings off of Company A's site, and published their own site based off of that data. Then A sued on three grounds:
1. Company B's notices falsely claimed copyright to the listings.
2. Company B ignored company A's copyright notice.
3. Violation of the Lanham Act, which prevents people claiming something is from somewhere other than where it is when they sell it. (This is the interesting one.)
The court ruled as follows.
General copyright notices at the bottom of a web page claim copyright over the site as a whole and not to all of the data that may appear in the site. Therefore 1 fails because company B's notice is not claiming copyright. And 2 fails because A's notice was not specific enough to claim copyright.
As for 3, the fact that there is no physical product means that the precedent the Supreme Court set in the Darstar case applies - the Lanham act only applies to physical products.
I'm sure that I didn't get it quite right, but that version may be more readable than the original article.
What does that mean?
I thought you possess copyright on your creation without needing specific copyright notices. And if some part of your website is not copyrightable, how would notice help?
Additionally, if someone copies your work that you have copyright on, they are infringing even if there isn't any copyright notice. Are they not?
I'm sorry if I'm not understanding something simple, I tried a quick google search and came up with this , but that seems to apply to cases where you remove copyright notice from copyrighted work.
 - http://www.photolaw.net/did-someone-remove-the-copyright-not...
So the case in question was making separate copyright and trademark claims, and the court rejected _both_ of them, is that right?
Which does suggest the copyright angle really wasn't the best one to take.
There may be a good reason why that didn't apply here, but - not being a lawyer - I don't know what it is.
The article you linked includes various restrictions on the doctrine:
"highly time-sensitive", "in direct competition", "rend[ering] [the] publication profitless"
The wikipedia page on the misappropriation doctrine has a various links and commentary about the narrowing of its application: https://en.wikipedia.org/wiki/Misappropriation_doctrine
Particularly, from the Second Circuit NBA vs Motorola (1997) "Such concepts are virtually synonymous for wrongful copying and are in no meaningful fashion distinguishable from infringement of a copyright." https://www.law.cornell.edu/copyright/cases/105_F3d_841.htm Other circuits have followed similar principles.
This includes a lot of things like prices which seem to be creative endeavors making it somewhat confusing.
The litmus test for this in the US literally the phone book, based on Supreme Court rulings about copyright on phone books. The white pages is an exhaustive list of phone numbers sorted alphabetically by last name, and is not copyrightable because that's considered obvious. The yellow pages sort businesses based on category according to someone's judgement, and emphasizes certain ones, which is considered just barely 'original' enough that it can be copyrighted. The same phone numbers in the same categories would violate copyright of the yellow pages; the same phone numbers with a different organizing principle would not violate copyright.
Those terms are not even presented on the screen when visiting Google Maps. And even if they were, I didn't sign anything or agree to anything. It is ridiculous if those terms are legally binding. Because then I will make a site and make you pay $1 for every page viewed.
Contracts don't need to be signed to be valid (when was the last time you signed a contract with an online shop). And if you don't agree to the terms, then you're using the site without permission and can be sued for damages. But the exact amount to pay can't just be any made-up number.
Similarly, if you walk into a shop, take an apple and eat it; then when the shopkeeper demands a million dollars, you can refuse to agree. Then the shopkeeper is free to sue you for destroying his property and you will be ordered to compensate him, although probably not for the full amount.
My company, Datastreamer, has been in business for ten years indexing public web content (news + blogs). We focus primarily on "live" content. Content that publishes often.
The main challenge we've always had is that just because the content CAN be indexed doesn't necessarily mean you MAY index it.
A recent situation was around Craigslist vs 3taps:
Basically the issue doesn't evolve around WHO has copyright but who has copyaccess to the content.
So if you create an account on Acme.com... You still own the copyright to the content you post but Acme controls access. Not only that but the ToS that you sign gives them rights to your content including bulk sales.
This means that Acme can monetize the content that YOU create while actively preventing people from indexing it even that may be your intention.
This means that in 2018 a company like Google COULD NOT get started because websites would just not allow them to access your content.
I believe that when most people post public content on the Internet they intend it to be public including accessed by other search engines crawlers, etc.
Now we're in a horrible situation where just a few companies essentially own the Internet.
This is why Google can't index Facebook content or Twitter content even though it's public - they can't access it.
They index plenty of their content . What they don't index is content not explicitly marked as "Public". e.g. Facebook posts with visibility settings or protected tweets. FB, Twitter, and now LinkedIn have plenty of content that's not publicly accessible: that is, content which doesn't require a logged-in account that explicitly agreed to TOS/EULA, but they have tons of publicly-accessible content, too. The latter is fair-game.
That said, the Craigslist lawsuit is still bewildering to me. That content is explicitly public, does not require a login, and the only agreements are the automatic unforceable browsewrap ones. The LinkedIn v. HiQ case is very similar to the Craigslist v. 3Taps, however the decisions are in opposite directions.
Published does not mean uncopyrightable. A radio station broadcasts a song; that doesn't remove the song's copyright protection. Receiving a copy is not making a copy.
3Taps settled put of court; there was no decision.
Padmapper is just a much better interface for craigslist data. Even though craigslist pulled their data, craigslist didn't create a new interface to match or even bother to improve their experience, even though they have the user mindshare for posting this type of data.
Essentially it's as you said - innovation is being killed to preserve walled gardens like Craigslist, and users are the ones being hurt here.
That's not completely true. After lots of bashing online , craigslist started adding some features such as searches on maps. Even though it's not as nice and easy to use as padmapper was, it would still be considered an improvement.
 - https://www.inc.com/abigail-tracy/craigslist-quietly-makes-c...
 - https://sfbay.craigslist.org/d/apts-housing-for-rent/search/...
Example: Does it matter whether the data is ever stored? Say I make a listing notifier that pings people when a house is listed in their desired area. I’d need to crawl real estate listings and but I’d never store or republish any content from the site.
It's actually that I "read" the data, regardless of my intention and regardless of what I intend to do with it. But taken to the extreme there has to be something that makes it legal for me as a consumer to scrape ONE item from their data, while at the same time making it illegal for a company to scrape ALL their data in this way.
The worst situation to be in is one where it's "legal until it isn't" i.e., when you actually cause a problem (bandwidth, revenue) you will be sued - and until then you no idea.
Technically (and it's complicated these days because copyright law was mostly written well before computers and the internet existed) any time you create a copy of someone else's creative work, you need an explicitly granted right to do so. You can _read_ a book, but you cannot write down all or part of what you just read. You can listen to a song, but you cannot perform all or part of that song (An Australian band called Men At Work lost a copyright case for a flute solo that contained 2 bars/11 notes of a 1930's folk tune "Kookaburra" who's copyright had been sold/bought in 2000.)
There are some explicit legal exceptions to requiring a license, most commonly people talk about "fair use", but that's again a "pre internet" body of law, and is extremely open to interpretation as to how it applies to digital copies in computers.
It's foolish to assume that the lawyers and the justice system will help you out. This is simply not a case of somebody pirating a movie and slapping on their own copyright.
CFAA was how craigslist won but that's been thrown out thanks to the EFF lawyers.
Going forward, there will be little to no recourse for "web scraping public data".
Suppose there was an intermediary platform so that the user uploads his content/video once to the intermediary platform, and can then select which platforms may use the content under what conditions (i.e. perhaps for free, perhaps for a user-set minimal amount of remuneration, ...), or perhaps instead of selecting which platforms, selecting the rules to which a platform must adhere in order to distribute the content (so that new players can instantly start hosting a diverse set of content as long as they adhere to these rules). The user thus only needs to upload once.
Obviously it would require youtube etc, to provide API access for the intermediary so it can post the content without being able to hijack a user's account on youtube etc...
This would not be in their interest of course, so anyone attempting to create this intermediary upload service would face a never-ending cat and mouse game, somewhat reminescent of youtube-dl...
Perhaps instead of silly link tax rules, it would be better for governments to force platforms above a certain size/usage to provide API access such that these intermediary upload services can flourish.
They might effectively become some kind of consumer/producer protection organizations, by proposing multiple usage standards (like my video MUST NOT have ads, or perhaps the opposite for another user, my video MUST be remunerated at x cents per y views, otherwise the platform may no longer distribute my content, ...) from which their users can select...
apart from "pure format" content like music, videos, ... as soon as there is more compound structured content, like a blog post with images, it would be hard to standardize the format.
then there is the problem of authentication, especially if a user uploads a video through the intermediary, and 1 year later a new platform wishes to distribute the content on its network too, and after another year the user wants be active on this platform, there is already an account, and how does he get his login credentials?
In theory the intermediary could have a messagebox for each user, and the platform can send the credentials to the user in this messagebox, so that if interested a producer can decide to make use of his account on a new platform...
Edit: so while currently all the platforms act like publisher and printer separate the market into publishers (intermediary rights managers) and printers/distributors (big platforms like YT etc). Similar to how humans decided it was better to split up doctors and pharmacists to prevent de facto quackery of selling whatever you happen to have on hand, or cheaper deals for...
My guess is by acting as a central authority for all used equipment deals (bootstrapped by scraping listings from many sources), they were able to use the existence of the listings to better convince future dealers to just list directly with them instead of your business? e.g. increased exposure for dealers, Machinio doesn't take a commission, so dealers just pay a flat rate to gain exposure?
At the end of the day, I have to assume the litigation was over them stealing dealers by disintermediating you? Because otherwise I'm guessing they'd just argue they were providing you free exposure.
Seems like you're trying to have your cake and eat it too. You're presumably happy for Google to scrape your website (and collect your customers' data for their purposes like ad tracking), but when a smaller site does it you're not happy? Make up your mind.
What benefit does scraping your site give someone else? If someone posts your listings on their site, how do they connect would-be buyers with the sellers, anyway?
Yes, but notices matter for purposes beyond whether copyright exists, and therefore falsifying, removing, or modifying them also matters.
This wasn't about infringement (oddly enough) but about falsified or removed “copyright management information”, specifically the copyright notices.
Which seems odd, because if the original notices were valid, then I can't see why actual infringement wasn't alleged. OTOH, if there was no infringement, how could the thing presented been the thing covered by the notices, such that either the removal or falsification claims had substance?
They are literally checking with their lawyers about what they can and cannot answer. I think they're OK.
(The case number is 17-16783.)
There was oral argument on March 15, and the 9th Circuit posts video recordings of all hearings on YouTube:
(I haven't watched it yet, so I don't know how it went.)
Since then, the only filings have been a few citations of supplemental authorities, the last one in June. If I'm not mistaken, the case is just waiting for the judge to write an opinion. According to the 9th Circuit's FAQ:
> 18. How long does it take from the time of argument to the time of decision?
> The Court has no time limit, but most cases are decided within 3 months to a year.
...It's currently been 8 months since the date of the argument, so hopefully that won't take too much longer.
IANAL but you are not misunderstanding. From all the ruling I've read, the courts tend to address "this case" particularly when they cite specifics. In this case the defendant used the argument that the copyright notice was not on the page the data was taken from and the court ruled in their favor. Nothing appears to be said about how it would have gone if such a notice were present because no other arguments were made (based on this article). I suspect there are other arguments to be made in such a notice were present (data/facts can not be copyrighted etc) but they didn't rule on any of that.
Here are a couple of cases that I'm especially interested in:
1. Does this mean that it is legal to scrape "database"-type websites for statistics and provide them on your own website? Could one use this ruling and copy all of IMDB's film data? Repackage that data into a Creative Commons website? Or a better set of tools for casting agents?
2. What about social or community-curated websites? Could you mirror all Reddit comments (which used to be Creative Commons anyway) to a more dev-friendly site? Don't force a mobile app down people's throats? Make it ad-free, donation-supported like Wikipedia?
3. What about big media? Could you bootstrap a new video site by scraping all existing (or popular) YouTube videos? Provide a means for owners to "claim ownership" of their account on the new site? Then market it as YouTube "but grown up" (18+)?
4. Kind of getting off-topic, but could you build a new music service by temporarily ignoring copyright, copying pirated music, then pivot to something that does collect money (via ads or subscriptions) for the music rights holders? I'm thinking Spotify, but built for music connoisseurs. Rich APIs with tagging, smart playlists, etc.
How likely would any of these be to avoid lawsuits until they're big enough to hire a legal team? Are certain behaviors less legal than others?
I'd really appreciate feedback on this. (Thanks in advance!)
IMDb forbids scrapers in their conditions but you can freely download their datasets.
Similar to what you described: omdbapi is a third party API for the free IMDb data.
suggests there is little to no recourse for IMDB and the likes. Craigslist was able to win their case against 3taps, arguing the scraping was putting a load on their servers (typical Craig Newman bullshit) and that they continued scraping even after the IP ban and that is a computer frauds act or something like that which is draconian response likes of which that guy who killed himself because he got caught for scraping academic journals.
IIRC Pushshift is what powers sites like https://removeddit.com/
(to anyone reading this, please consider donating--this project is so cool)
1. I did something very similar and got a legal threat from a Fortune 100. My attorneys had successfully litigated similar matters but told me in my case, the law was clear-cut in the F100's favor. My attorneys attempted to negotiate a license agreement that would allow me to continue, but it fell on deaf ears and I had to shut down my business.
2. Like most things on the internet, sure, you could. But someone could stop you with legal force if they were inclined to do so.
Even though cases like Feist supposedly say that facts pulled from fact-collections aren't subject to copyright, that doesn't really work out on the internet for a couple of reasons.
The biggest reason is the CFAA. Anyone has the right to tell you to stop talking to their servers at any time. Such notice doesn't even have to be explicit; in Craigslist v. 3Taps, the judge indicated that IP blocks should've been sufficient notice that 3Taps's presence was unwelcome, but companies frequently go even farther and say that the fine print buried in the Terms of Service banning "automated access", "spidering", "crawling", or similar is sufficient notice. Judges frequently accept that argument.
Failure to heed these notices very likely constitutes "exceeding authorized access" under the CFAA, which is both a tort and a crime (meaning, you can both be sued and be put in jail for it). Companies use this to bludgeon upstarts all the time.
The other thing here is:
a) very few facts on the internet are really alone; interesting data is almost always encased in HTML or some other copyrightable superstructure.
b) a legal philosophy called "the RAM Copy Doctrine", which is very well established at this point, means that storing any copyrightable work in system memory for any amount of time, even if it's just long enough to extract a non-copyrightable fact and then throw the surrounding copyrightable content away, meets the legal definition of a copy and is thus potentially infringing. This effectively allows rightsholders to instantly double any infringement claim, and attorneys actually have tried to posit that each instance of a copyrighted work getting loaded into memory for e.g. network transmission to another source should count as an independent act of infringement.
The 9th Circuit ruled that memory copies meet the Copyright Act's definition for fixed, non-transitory copies in MAI v. PEAK (1993) and that has been (mis)applied to most digital copyright controversies since. This is the effective equivalent of stating that the reflection of an image from the human retina onto the optic nerve is a separate potentially-infringing copy.
Scraping is most certainly still a legally dubious business. You have to take the Google route and get too big to fail before you'll start to see inconsistent outcomes like Perfect 10 v. Amazon, where the judges effectively gave Google a pass because they were scared of the widespread ramifications of shutting down Image Search, despite the clear legal reasoning that would've mandated this.
In this case the copyright was 'This is our website. We don't have anything specific about our data.' So no specified claims to sales or specific claims to sales data puts everything in the public domain as long as the scraper uses the data for 'informational purposes' and does not make a copyright claim to the data itself.
FYI: a more specific definition for crayon is 'a writing device where the writing material wears off to leave a meaningful mark.' Examples are: crayons (duh), pencils, charcoal sticks, etc. Since pens and markers have ink or pigment reserves and do not wear down they are not considered crayons.
However, you could copyright a article describing the merits of a specific type of writing instrument, regardless of who owns the trademark.
Perhaps as you mention it's necessary to publish more explicit notices alongside content. But perhaps not. This ruling seems to relate to data, possibly distinct from creative works.
It’s probably a lot like the piracy problem — Don’t spend a lot of time fighting it because those people don’t pay, but secure the legal rights to your works well enough to fight anyone with more sinister plans than making just a copy or two.
Is the number 1 copyrightable? No, because it exists and was "discovered".
Is the number 2 copyrightable? No, because it exists and was "discovered".
Is it reasonable to assume therefore that this very large number X, that represents this disputed file, cannot in fact be copyrighted because it already existed was in fact just "discovered"?
If it's argued that there was some effort to "discover" this number, then I'll write a generator that produces each number between -1 million and +1 million and claim "copyright" over each of them. After all, regardless of the process for arriving at a particular mp4 of a movie (people just dancing round on a set, running the output through various software, etc.), the final output is simply a number. Just because someone went to a lot of effort shouldn't prevent me from going through some effort to write files containing each number between -1 million and +1 million and charging royalties for anyone to use them.
In fact, this raises the interesting point that anything that can be represented as a number already fundamentally exists in the range 0 to +infinity. It's just down to us to discover them. Think about that for a moment: Somewhere out there in the range 0-infinity is a good Star Wars Episode 7 just waiting to be discovered.
We could therefore write algorithms that search all the numbers in some search space (e.g. whatever 0-3mb is in decimal), scan them to see if they're valid files, e.g. mp3s and then run them through AI to see how they compare to known music, etc.
Thus, a new tech "Big Random" is born... I thank you :-)
If you can randomly generate works that have market value (copyright is intended to encourage creation of works with value, not merely a sequence of random words or bits), then we can talk about whether they're individually copyrightable (maybe the generator itself is the only copyrightable work in the picture, I dunno).
But, no one is going to take an argument seriously that because the number 1 (or the letter "a") cannot be copyrighted then no sequence of numbers or letters can be copyrighted.
write_to_file("%s.bin" % i, i)
Of course the effort is in discarding the dross.
How are we to determine "market value"? And let's not forget, the copyright crowd don't want to claim rights over just 1 specific number, but any number that happens to render (when run through a movie player) anything that resembles e.g. Star Wars. I don't think this argument is as absurd as it first appears...
It makes it more absurd, as it indicates clearly that it's not a random sequence of numbers that is being protected by copyright, but a specific, recognizable, creative work. If I watch an mpeg of Star Wars and then watch a laser disc version of Star Wars, I will recognize them as the same work (well, ignoring Lucas' retconning nonsense and CGI shitshow). Most humans would, including most judges and juries in a copyright case.
"The copyright crowd" includes the leadership (and presumably the populace) of nearly every developed nation on earth, so you're arguing against a pretty solid majority (which is fine, I have some unpopular ideas that I hold pretty strongly...the majority seems to love and support war and killing more than I ever will).
Anyway, I'm all for reasonable copyright terms (and the US has absurd and abusive copyright law that punishes and inhibits new creators at the behest of old corporations and billionaires), but it's just not a sound basis to argue that because one can randomly generate any creative work given infinite monkeys and infinite time that no creative work should be able to be copyrighted. It says that there is no difference between a creative work, like Star Wars, and a random series of bits, because Star Wars can be recorded as a (non-random) series of bits.