This is a bizarre take. The most substantial point is buried near the end of the article: Genius does not own the copyright to the lyrics. Yes, they may have taken billions of painstaking person-years to hand-transcribe them onto artisanal silks, but no matter how much effort you put into copying someone else's work, it doesn't make it your work, and you can't sue people over it. At best, you could claim your copy is a derivative work, but that only grants you protection for your additional creative contributions on top of the original work, which for a straight transcription is... well, nothing.
Genius knows this, which is why they didn't file a copyright suit. Instead, they claimed other things like unfair competition and breach of contract. However, Title 17 Section 301 of the US Code says that "all legal or equitable rights that are equivalent to any of the exclusive rights within the general scope of copyright [...] are governed exclusively by this title". To avoid this, Genius needed to prove that their claims weren't "equivalent" – ie weren't just copyright claims dressed up as something else. They failed to do this, and so their case was thrown out.
You seem focused on whether this case was the legally correct decision, which it sure seems to be. This article, like many readers, is more focused on whether this was a fair result. Nothing bizarre about that.
The judge may have done the correct thing, but readers may feel that Congress didn't. This case will doubtless be used in the future to argue for sui generis database rights like the EU has.
(My view is that in principle, some form of sui generis database rights makes sense, but for the things that US copyright law already covers it is currently far, FAR too restrictive and lasts too long, so I would vehemently oppose expansion of existing US copyright law to cover sui generis database rights.
However, if US copyright law were reformed such that it mandated blanket licensing (see [the EFF proposal]), strengthened fair use protections, and shortened copyright duration, then I would totally support similar rights for sui generis databases.)
Laws are very much subjective. They're results of their time and the people in charge of voting them. That's why the US still has people in jail for life for non violent weed related crimes while you can now legally buy and consume weed in half of the country.
Laws are constantly changing for a reason, "fairness" isn't set in stone nor objective.
Are you American by any chance ? I've seen a lot of American talking like that about "law" as if they were god given, immutable, objective and fair.
But, in that sense, fairness is also subjective. There are viewpoints that call what Google did unfair and viewpoints that call it fair. What overarching philosophy do we use to decide which is the right viewpoint?
Maybe I'm being pedantic, but laws themselves are objective. At least, they are crafted to be an objective description of what is "right" and "wrong". Said another way, laws exist to objectify the morals of the law's author.
I think what you are trying to say is that laws are not self-justifying. Laws draw an objective view of the world, but we can subjectively agree that we don't like that view and then change it.
In a common-law system the written (legislated) law is not actually the law, it's the court's interpretation of how it should be applied, taking into account similarities to other laws and the precedents set in applying them, that determines what it actually means. That makes it quite subjective and vulnerable to initial precedents in applying it.
why is this downvoted? Without getting into a debate about what true objectivity is and whether or not it exists, In the US, the majority of laws are objective; and it's the application of those laws that is subjective.
This could be an issue of articles: Laws are objective; the law is subjective.
Laws consist of literal syntax, but that does not make a Law "objective". Wittgenstein made issue of the fallability of the English language. English is not a special case (just the applicable one).
To whit, a Law does not always describing discrete actions and consequences.
To say a Law is Objective is not saying anything, because the Law is necessarily interpreted in any frame of reference. A single Law is necessarily subjective and following this, The Law in aggregate is subjective.
There is no end to this debate. The buck stops at the law. As of today this is fair and objective. Without this we have no framework to judge right or wrong. You can handwave around this but this is the best jurisprudence has come up with over the years. Do you have a better system than pitchforks on HN?
The law is not "fair and objective". It is a representation of the morals of the particular group holding power at the time of its writing.
It will never be "fair" because it only very rarely take all of the circumstances into account - as Anatole France put it: "In its majestic equality, the law forbids rich and poor alike to sleep under bridges, beg in the streets and steal loaves of bread."
Neither is it particularly objective - its application, especially in a common-law system, is extremely subjective. (You're bound by whatever the judge in previous cases thought the law was - E.g. in the US "stare decisis" is enshrined pretty much as an immutable rule)
It has never been a framework of "right or wrong", either. It is a framework on how disputes get settled. There is no interest in "right" beyond "the authors of the law at the time thought it was a good idea"
To take a trivial example: Slavery was recognized as morally wrong long before the law actually made it something that was not allowed. The law still allows carceral slavery, even though there is growing consensus that that's morally wrong.
As for "better" systems - there's certainly a large faction of countries making the case a civil law system is better than a common law system. But you don't need a "better system" - the law can only be meaningful if we accept that at all times, it will be flawed, it is not "fair and objective", it needs to be tempered with compassion, and it is our job to improve upon what we have.
You can't just say "there is no end" and follow it with "my end is the right end".
At least talk about why you think deontology is better than other forms of norm setting, why the law must be right even though there are many different and contradictory laws in different places, and how you propose to make new rules if not based on a concept of "right" that is independent from the existing rules.
Why do you think that law is necessarily deontological? There are many areas of the law in the US that tilt towards consequentialism, such as antitrust law (does it harm the consumer?)
That certainly IS a consistent position one could take. Most people feel that there are certain moral imperatives that exist independent of government-defined laws and that government laws can be fair or unfair. However, the position you take here -- that "fair" means whatever the government says -- is a possible basis for ethics.
But I, and most people, find it to be deficient. It requires that you admit that (for one example) slavery in the USA before 1863 was fair which I (and most people) disagree with. I'm sure you can find other examples if you wish.
Are you sure this is what you want to base your definition of "fair" on?
In a democracy the government is inherently an extension of the people and so when you frame it as fair meaning whatever the government defines it as you are, at the same time, acknowledging that the majority have already decided the matter to be fair. Whether or not those people are willing to admit to the world or themselves that they believe the law to be moral, it is from their consent as a mass that the law was able to manifest. So at the very least the law is what most people most people think is morally fair.
The US is not a democracy by definition, and even if you want to substitute in “republic” here, the description of how the law-making sausage factory works here is idealistic at best.
Laws decide nothing. They are just a bunch of rules made up by those in power. Way too many of them are the direct result of lobbying by billion dollar industries. Copyright is an example. People vote for and elect representatives and they end up serving these elite groups instead. How is that fair?
If you believe any given law is unfair, it is your duty to disobey. This is called civil disobedience.
It seems I respect working code more than you do. I wouldn't say laws are broken.
Improving legacy code always means to understand business logic and if possible the motivation behind it. Then you have a replacement built and you can justify why it is better.
Plenty of laws are broken. Drug control laws, white collar crime laws, whistleblower protections, copyright, IP, patent, child support & alimony, parent-child access / visitation, execution or the lack of it, gun control or the lack of it, fucking mandatory sentencing.
Shall I go on?
"laws aren't broken" is the kind of thing someone says when they've never seen "justice" in action, or are walking around with their eyes shut.
I think your parent was just trying to say "laws are important".
"Legacy" code that only "occasionally works" doesn't sound like something worth keeping around. It sounds like something to dump in the trash and never look back.
If you have working legacy code though, even if it occasionally fails because it's full of bugs and fundamental flaws, should be replaced very carefully, and only with in-depth understanding of what is working and what is broken about the existing system.
Based on the broken laws you list, I assume you're talking about America. American laws are indeed very broken, but dumping it in the trash and never looking back would be a one-way ticket to a bloodbath. It is both important very important to fix our very broken laws, and important to do so very carefully and only with an in-depth understanding of what is working and what is broken about the existing system.
If I were to give you as much benefit of the doubt as you give your parent, I'd say that "occasionally laws work" is the kind of thing someone says when they've never seen how much worse it can get.
Law also says that erratic criminals can be restrained by kneeling on them, one of the reasons which hastened George Floyd's death. I don't think law is always right.
Emotional appeals shouldn’t* hold any weight when considering the law. You’ll end up creating highly biased or illogical laws that cater to the culture of the times and need to be changed again shortly. Kneeling may cause the death of some but save the lives of countless others. “The law isn’t always right” isn’t an objective statement because “right” is relative to the observer.
The last sentence without the context of the preceding sentences would seem that way, yes. However in the context of “laws created as a response to emotional appeals” it becomes a warning to remember that everyone has feelings and everyone’s feelings are different when creating or considering the law.
> Yes, they may have taken billions of painstaking person-years to hand-transcribe them onto artisanal silks, but no matter how much effort you put into copying someone else's work, it doesn't make it your work, and you can't sue people over it.
Not really all true though. Genius started out by stealing lyrics from other sites. In the early days many of the lyrics had the exact same errors as other more establish sites. That may have changed since.
Aren't collections of facts copyrightable? So google has copyright over google maps and I can not copy that but I can go out and record exactly the same data since I collected it myself.
No, in the US you cannot copyright facts, only expression. So you have control over word-for-word copies of your article about a bird; but you have no control over dissemination of the facts you discovered about the bird. SCOTUS decided this in 1991, Feist v. Rural: https://en.wikipedia.org/wiki/Feist_Publications,_Inc.,_v._R....
It could be a system, oracle, or network, which computes weights or votes, just as soon as it could be an individual or group casting ballots or deliberating. The deciding is key, perhaps not who decides. Some aspects of law are slippery like that though, where one set of rules applies to human agents, and a different, perhaps mutually exclusive set, apply to everything else.
For example, horse drawn carriages have the right of way even over pedestrians, because you’re pulling weight, or more accurately, a beast of burden is pulling the weight. It’s a living thing too, and it can’t stop on a dime when it’s got a load. It makes sense when you know the context and framing for why the law is so written today.
I’m sure there are similar examples in others contexts. Court cases and judges look at the law like we do whitepapers. Some docs are better than others, and there are some devs, and other judges’ toes you’d be hesitant to tread on, especially if you have a habit of doing that kind of thing.
Right. "Expression" isn't restricted to specific wording, it can potentially mean specific choices of what is included and what is included in a compilation; I could hold a copyright to such a compilation even if I don't hold the copyright to any of the items in the copyright.
For example, many artists publish playlists on Spotify to build their brand; it could plausibly be copyright infringement for one of them to verbatim copy a playlist I made and publish it as their own, even if I don't hold the copyright to any of the songs on my playlist, and even if one of the songs on the playlist was actually that artists' own song. The act of assembling a playlist is potentially copyrightable expression.
Compilations of facts are copyrightable in the US, but they can't be just raw collections - there has to be a choice made what to include.
> The Act also provides copyright protection to compilations, but only to the extent that there has been a contribution of originality in assembling that compilation.
Map copyright is based on the idea there are decisions made around what to include and how to display it.
You can't photocopy a map and claim copyright. However, a human can trace the same map and claim copyright.
Regarding compilations of facts, the general doctrine is that copyright would protect the semi-arbitary choices of what to include in that compilation (e.g. judgement of relevance - which words to put in dictionary, what detail to include/exclude in a map) and disallows copying that compilation; but it explicitly does not protect "work and sweat" required to gather that data, and allows people to copy particular facts out of that compilation, for example, if they are making their own selection with different criteria, as the underlying facts are not protected no matter how much effort it took to obtain them.
In this regard, copying lyrics of some particular song does not violate the rights of Genius - they don't have copyright to that particular song and the compilation-of-facts rights don't apply for that particular single item.
I would guess maps is a different case because those are their own works. There are decisions on design being made, how to show overlays. But that’s just my assumption.
Why does it matter whether you're transcribing lyrics or transcribing geography? In both cases you're effectively just writing down something that exists already.
Exactly. This should definitely be grounds for allowing copying google maps data as long as you render it in your own style. The usual trick here is trap data where map makers insert fake data to catch copies but thats exactly what happened here with the ' in the lyrics.
If the actual data isn't copyrightable, yet the 'fake/trap data' is, then presumably when one copies the map, a court will decide that the magnitude of the copying is very small - only a single apostrophe was copied - and therefore the damages negligible.
If fake data is hidden amongst real data, wouldn't there also be the argument that the copier was unaware that they were copying a creative work rather than pure facts?
Fred Saberhagen made this a plot point in one of his Berserker stories. Going by my highly suspect memory...
A damaged Berserker captures an atlas showing an occupied system nearby, and heads there with its last reserves of power to destroy the system.
The human who didn't stop the Berserker is charged with a crime against sentience, but is acquitted when he reveals the secret: The occupied system was a fake, in the tradition of cartographers going back to the Middle Ages on Sol.
Because you are allowed to (I presume) copy the raw data (gps coordinates of roads, buildings, etc.) But you don’t have access to it, instead you only see their map design.
The land is there. Depicting it is the result of Google's work. You went over, or used a satellite photo with the appropriate licensing and YOU painted (created) the map.
Lyrics are not just found in the wild (like a mountain or a street is). Someone thought of them, wrote them down, so it was their creation. It is like listening to me reciting a poem, write it down, and sell the book.
I don't know about copyrights for maps, but your argument is not good: You can't copyright a flower, but take a picture of that flower and the rights to the picture belong to you. Google is not preventing others from collecting mapping data, but they prevent them from copying the data Google has collected.
> Genius does not own the copyright to the lyrics. Yes, they may have taken billions of painstaking person-years to hand-transcribe them onto artisanal silks, but no matter how much effort you put into copying someone else's work, it doesn't make it your work, and you can't sue people over it.
Apparently they license the lyrics now:
> Genius isn’t the copyright holder for these lyrics, it just licenses them itself.
It's not a case of someone copying without permission and then suing another person who copied them. It's a valid licensee suing someone who is copying them.
Imagine if a McDonald's franchisee sued someone running a rogue/unlicensed McDonald's around the corner. Would we have no sympathy for them also?
Legally speaking, it appears the right to sue requires at least some exclusive copyright rights, [1] which Genius surely didn't have (and a McDonald's franchisee also would not have). This is presumably why they didn't bring a copyright suit.
Google licenses the lyrics through a third party, LyricFind, which in turn hired people to transcribe them. In a small percentage of cases, those transcribers did not do their own work and instead copied lyrics from Genius.
I'm not sure what your background is but your analysis sounds quite strong opiniated.
There are examples in law that 'work' can be protected; Just because you don't have the copyright doesn't mean that someone else is just allowed to use your work results.
Apparently in this specific case its not protected.
If I was going to scrape this data and re-purpose it, I would've absolutely cleaned up those apostrophes. The pivoting between straight and curly would certainly be a pet peeve. Unless there's a semantic difference between the two I'm unaware of.
The song was released almost a year before UTF-8 was first presented at USENIX, so I think it is reasonable for the lyrics to be expressed using the commonly available technology available of the time.
I queried for this song's lyrics, and I found only a single quoting symbol (') used entirely throughout.
Taking a layperson's reading of the lyrics, I didn't find anything off due to quoting issues. If there was any disruption, it was minimal and unnoticed.
Yeah, makes sense, but this is still a pretty good approach. Inserting invisible or unusual Unicode symbols would prompt the scraper to carefully cleanup the read files (maybe even fixing these apostrophes as a result). Unusual whitespace is also likely to be removed and cleaned up.
On the other hand, these alternating apostrophes have a chance to stay unnoticed (or neglected), falling through the cracks.
There is a semantic difference between the two. The straight quote is a superset of the curly one.
So "rock 'n' roll" is correct. And "rock ’n’ roll" is correct. But "rock ‘n’ roll" is not correct, since the wrong apostrophe is used. We're not quoting the letter n, we're showing that the letter a was removed.
On this quantity of data, you wouldn't be able to do this manually.
If you hope to avoid being caught this way, I'm going to assume you noticed this without the benefit of hindsight and plan to correct all out-of-place Unicode characters automatically. How will you avoid over-correcting?
There's also no reason to believe this is the only fingerprinting Genius has done (they only need to publish the most obvious fail). For example, I can use the same fingerprinting technique but switch between American and British spellings.
I think if you had multiple corpuses of lyrics, you could cross-check for anomalies of any variety (odd quoting, switching between american/british english), etc.
The fingerprinting isn't likely applied to every song, to prevent obvious detection. If you went through multiple databases, you might see N prevailing copies of a song's lyrics, and 1 that seemed different. The one that's different has the anomaly.
I'm not disputing that they proved their point, but this is triggering one of my pet peeves about common misunderstandings of Morse code.
Timing is critical in Morse code. You can't just write out a bunch of dashes and dots to transcribe it without clearly transcribing the rests between dots and dashes as well. They haven't given us the rests at all, so all the info they end up having is:
And that can be interpreted in any number of different possible ways besides "REDHANDED". E.g. it could also be "AU5EWRFE", or any of thousands of different interpretations (actually probably a lot more than that; this would be a fun programming problem). They should have used a binary encoding; 22 bits (all they have given us) is not enough information to uniquely encode the string "REDHANDED". Once you include the short rests that are needed, we're talking 44 binary bits or 22 ternary bits. And if you want the long rests to distinguish properly the spaces between words, then 22 ternary bits won't do it; you need the full 44 binary bits.
>They should have used a binary encoding; 22 bits (all they have given us) is not enough information to uniquely encode the string "REDHANDED".
The fact that the sequence can be interpreted as REDHANDED with a particular way of grouping the input is just being cute. Regardless of the grouping, it is a binary encoding of a 22-bit number, and so would have a one-in-2^22 chance of being reproduced at random.
Edit: To clarify: You're saying they should've mentioned 22-bits in the context of binary digits without mentioning Morse code, and if they did want to bring up Morse code they should've used trits or more bits to encode the stops. I'm saying that the fact that their 22-bit sequence can be interpreted in Morse code as a relevant word is just dressing, and does not detract from the point that the sequence was likely copied. Put another way, if someone tried to counter by saying their sequence could've been generated independently because "AU5EWRFE" and many other strings also encode to the same sequence, it would not affect the facts at all.
There is a misunderstanding because anyone who properly understood Morse code would never represent it in this manner where it's indistinguishable from a large number of other possible strings.
There is no misunderstanding because it was never the intention to use the bitstring to represent a unique word in Morse code in the first place. Instead, Morse code was merely used as a convenient method to devise a bitstring from a relevant word.
And they definitely do it with maps. There is a tiny little village I visit in rural Roscommon each year. Each year a new major retailer appears to have opened in this 500 population village, well according to Google Maps that is. At the moment there is a branch of New Look situated on a farm down a single track country lane.
I recall coming across this in my travels. There was a named "town" at the intersection of two streets - upon passing through there, nothing. Later wondering where the town went I found that it was not ever there and was just present to identify people copying that map.
(Software patents should be abolished. I just like to point out their absurdity and how it's easy to independently develop a technique (steganography in a search engine result) that someone has already grubbed a "patent" on.)
> while particular embodiments and applications have been illustrated and described, it is to be understood that the disclosed embodiments are not limited to the precise construction and components disclosed herein. _Various modifications, changes and variations, which will be apparent to those skilled in the art, may be made in the arrangement, operation and details of the method and apparatus disclosed herein without departing from the spirit and scope defined in the appended claims._
If I'm reading this correctly, this patent is claiming things that are "apparent" (obvious?) to those "in the know". Computer or not, how did this get granted?
That is just so bizarre to me too. That one would be able to copy the exact core of the idea, but write it in a different language, and suddenly it's not infringing, even though it causes the computer to do the same thing.
How these patents even pass the sniff test, I'll never understand.
I posted about this the other day and someone who used to work at Microsoft made a really interesting/helpful response - worth checking out: https://news.ycombinator.com/item?id=24112418
I am not following your line of reasoning correctly.
The watermarking is done using characters that look different from each other. Printing the book will not change those characters, and therefore it is still possible to extract the original information. Harder, but not impossible.
For this information to be lost, Evince would have to print the book in a crappy font that uses a single symbol for both. Which is possible, but I think it's not what you were going for.
The hash won't be the same, which is what matters the most.
They won't have people manually reviewing the punctuation to catch infringement, and it's highly probable that Evince printing to PDF will mess the whole internal structure, so it's hard to automate it.
If you just print it then that defeats the point. They don't care about individual copies, they want to find who is stripping DRM and posting books on torrent trackers.
How likely are you to notice if your resulting PDF had a specific unique pattern of zero-width spaces embedded with it? Or if a few curly quotes aren't curly? Let alone scenarios like the cover JPEG having stenographic info in it...
Absolutely. It usually tends to be either more visible, or less visible though - some pdf files have a literal watermark on the pages, while other formats like epub contain a guid or other watermark content in the source (epub is zipped xhtml)
Amusingly, I believe if you use calibre's ebook conversion to replace stylesheets and add toc, it may also actually remove those markers that have no actual content and only exist to provide a unique ID.
There's no "benefit", they were just looking for a nice unique way of watermarking textual content, to prove that what shows up on a Google search is indeed sources from them and not some other transcription of the lyrics.
> LyricFind. LyricFind is a Google licensing partner, and may be the source of the Genius content appearing in Google’s search results. LyricFind published an explanation on its web site Monday, saying, “Some time ago, Ben Gross from Genius notified LyricFind that they believed they were seeing Genius lyrics in LyricFind’s database. As a courtesy to Genius, our content team was instructed not to consult Genius as a source. Recently, Genius raised the issue again and provided a few examples. All of those examples were also available on many other lyric sites and services, raising the possibility that our team unknowingly sourced Genius lyrics from another location. As a result, LyricFind offered to remove any lyrics Genius felt had originated from them, even though we did not source them from Genius’ site. Genius declined to respond to that offer. Despite that, our team is currently investigating the content in our database and removing any lyrics that seem to have originated from Genius.”
Sounds like everyone and their mother is scraping stuff off Genius, not just Google; they went after Google specifically because they knew they couldn't just disappear and they had the financial means to pay for compensation, unlike the thousands of crappy lyrics websites.
That said, it would've been just if Google would pay for access to Genius' particular, well-curated, "source" database of lyrics, especially given that they're basically stealing traffic.
But it sounds like the issue is that Google really wasn't using Genius's data directly. The problem is that Google is sourcing from "The Internet," and everybody and their grandmother is 'stealing' from Genius.
Here's an interesting question: if Genius closed up shop tomorrow, how long would it take Google to become the primary source of song lyrics online (by rebuilding Genius's dataset from general Internet harvesting)?
Last I checked, ignorance wasn't usually a defense but I'm not a lawyer. I just know not to pretend the Keurig I bought off the back of a truck was a good deal for everyone involved.
But physical analogies to IP fall apart quickly so I'm not going to encourage people to read into that too deeply
So there's two different concepts. One is original creative output, which is copyrightable. The other is information, which is not copyrightable.
If you find something verbatim identical in a bunch of different places, you've got a strong case that it's just information, because if it were original creative output it wouldn't show up identically in multiple places.
If it turns out everyone was plagiarizing a single source, but you were unaware and took down the offending content when asked, you won't have much in the way of legal liability.
If that flies, it's a great tactic. You can't just use the data from site A, so you build anonymous sites B, C and D who use the data from site A, and use the data from those sites instead. "We didn't source from A".
It's not a great tactic if the owner of site A decides to sue you and subpoenas the hosting providers for B, C and D. You can only get away with it as long as you aren't successful enough to draw the attention of anyone with money to burn on legal fees.
Google has a history of scraping content that they want, their business is built on the back of scraping other peoples content. The story I read just recently of what happened to Celebrity Net Worth was an interesting read where Google asked for an API, they refused and Google just scraped the content anyway. There was no lawsuit, but CNW put up fake content and sure enough, it made its way to Google.
It is all ironic given how aggressive Google are in blocking any attempts to scrape its content.
I’d say most of Genius’ visitors comes from the “song x lyrics” so hiding those with robots would ultimately make them lose almost all of their traffic.
To be fair to TC, if you disable JavaScript you get a pretty good experience - just the full article, legible. Not like those sites that require JS to load the text and/or images.
robots.txt is designed to keep garbage off search results. It has absolutely no power to prevent a bot to do anything. Also if the site added robots.txt they might as well shut down because their entire userbase comes from people searching lyrics on google.
The problem is that Google is stealing content and placing it on search so the user never goes to the source, By blocking it with robots they block themselves from google results AND Google may already keep scraping the content.
> The Unicorn tier is for large companies or companies that would like to have a reciprocal relationship with our foundation. If you need special guarantees, indemnities or require us to sign your contract for a data license, please select this tier. If you have another creative idea you would like to propose, please also select the unicorn tier.
> For any of these cases, please detail your request in the company information field and we will work with you to fit your company's mythical situation. We will also find an appropriate monthly support amount to our non-profit foundation of $1500 or more per month. Please always consider enabling the growth of our non-profit foundation and the continuous growth of our metadata!
We live in a society of laws. Even soldiers. Google have shown they have no respect for the law not equality before it and will cheat while using the law as a cudgel. Recall law exists that the strongest might not always get their way. "Ironic" is the pole way of pointing this out.
Without law, Google cease to exist immediately. They are incapable of enforcing property rights without it.
Pardons aside, soldiers go to jail for taking an attitude like Google's.
Just like Genius, Google licensed the lyrics. If they didn't, the publishers definitely would have sued.
Ironically, it is Genius that seems to have no respect for copyright law. Genius ended up having to settle a case years ago because they were using lyrics without the appropriate licensing [1].
Which law did google break? Scraping in and of itself isn't illegal last time i checked, and usa doesn't have database copyrights unlike some juridsictions.
> somewhat, per things like the Americans with Disabilities Act
This is just not right at all. There is nothing in the Americans with Disabilities Act that make blocking scrapers illegal.
I think you mean you don't like the power imbalance of the large company taking away from smaller companies while using technological means to stop the same thing happening to them.
I don't like it either, but that doesn't magically make it is illegal. I'm not even sure it should be.
> There is nothing in the Americans with Disabilities Act that make blocking scrapers illegal.
Retrieving, processing, and displaying information in a manner contrary to the wishes of the provider of that information is necessary for accessibility to disabled users. As a specific example, any attempt to block use of wget for scraping also blocks use of wget as part of a `wget | filter | text-to-speech` pipeline[0], and is thus a discrimination against blind or otherwise visually impaired users. The ADA is, as mentioned, only somewhat effective in prohibiting such things, though.
> it's not illegal
> that doesn't magically make it is illegal.
I don't think anyone is claiming that scraping itself actually is legally protected - I interpreted DigitalSea and harry8 as implying that it should be.
0: in either the shell sense or the workflow sense
Retrieving, processing, and displaying information in a manner contrary to the wishes of the provider of that information is necessary for accessibility to disabled users. As a specific example, any attempt to block use of wget for scraping also blocks use of wget as part of a `wget | filter | text-to-speech` pipeline[0], and is thus a discrimination against blind or otherwise visually impaired users. The ADA is, as mentioned, only somewhat effective in prohibiting such things, though.
This is not the case. Unfortunately (?) the ADA doesn't allows the disabled person to specify their own technology. If Google can reasonably say that speech to text works via a standard screenreader (which it does) then they are ok.
> The ADA is, as mentioned, only somewhat effective in prohibiting such things, though
Well that's not the intent of the ADA, so not really surprising.
I am not a lawyer, but my understanding is that only the copyright holder can sue for copyright infringement. I am pretty certain Genius does not hold the copyright to those lyrics. It's odd Genius brought this case at all. This is briefly noted at the end of the original article, but it seems like the whole point. Did I miss something?
From genius.com: "Genius Media Group, Inc. (GMG) is fully licensed to display lyrics across all of its properties. In 2013, GMG entered into licenses with every major music publisher: Sony/ATV Music Publishing, EMI Music Publishing, Universal Music Publishing Group, and Warner/Chappell Music. In addition, GMG developed a form license with the National Music Publishers' Association (NMPA) which today covers more than 96% of the independent publisher market."
Original copyright holder could give someone else authorisation to sue on their behalf, e.g., through an assignment. Doubtful Genius got an assignment in the agreements they have with publishers.
Also, Google claimed it is sub-licensed to re-publish through a third party, LyricFind, which has licenses with "over 4000" music publishers.
>Copyright holder could give someone else authorisation to sue on their behalf, e.g., through a license.
They can't assign the bare right to sue. To have standing the plaintiff will need to hold at least one of the exclusive rights in 17 U.S. Code § 106 aiui. Cf Righthaven cases, Silvers v Sony Pictures
I am almost certain you are correct, I would be shocked if genius was granted an exclusive right to lyrics. If nothing else, how could you sell the songs without rights to the lyrics?
So I don't think genius ever had standing to pursue this case.
Again, not a lawyer. But you'd think the actual lawyers would have checked this more carefully.
They are going to have a problem with standing for the exact reason you suggest. This case was one company who was scraping other people’s copyrighted works suing another company for doing the same.
Isn’t their main argument unfair competition? Google, the starting point of the internet, decided to undermine their business by taking their collated content and publishing it at the top of results?
Google appears to do this for other things, asking questions often shows answers without needing to visit the website. Perhaps these are all licensed and there is a kick back for these sites...
Google appear to be serving ads on content other people have collated while eliminating the source of traffic to the original site.. If that isn’t unfair business practice and taking advantage of their monopoly on search I don’t know what is.
But companies are absolutely allowed to do things that cause other companies to go out of business. It happens all the time. Think of the buggy whip manufacturers.
I agree that this Google practice looks dodgy to me. But the question is, what law specifically is being broken? This looks like a copyright case, and if that's the issue, then the copyright holder is the generally the one who has to bring the case in. (I believe there are exceptions such as when exclusive rights are granted, but I haven't seen a justification that they apply here.) That's what the law requires; otherwise the courts would be even more swamped.
You very well can, but that doesn't mean Google can't block you (CFAA protection). Genius here was reliant on Google for a large portion of their regular traffic so they couldn't just block Google without suffering revenue losses.
> Which means they are only not OK with scraping when Google uses the scrapped for purpose they deemed reduced their traffic
Obviously. The way the profit model for the internet works right now, for sites to coexist with Google, they must actually receive some of the traffic that is generated from searches matching their content. Who would be okay with having all of their content scraped with the result being that they get none of the traffic and thus the monetary benefit from the work they do?
Because they'll block you. You can prevent Google from indexing your content using robots.txt (Google has a robots.txt on its site as well).
You don't have a right to access their service as many times as you want to, eg by automated means, although you can attempt it. Flip a coin on whether they sue to stop you if you become too annoying.
The Genius complaint is essentially that they want to be represented in Google search without having Google take lyrics from their service and use them in their own served-up content snippets (making a sizable part of the value of genius.com void). Genius knows Google can get lyrics elsewhere if they have to, the lawsuit is probably out of spite due to past conflict with Google and their annoyance at Google competing with them in a shady way (Google was de facto using Genius's service to reduce the value of Genius).
That's correct however the publishing company that administers an artists royalties is generally the one to bring the suit. This is the same type of royalty as sheet music.
I feel like this is a forgotten bit of history but for years Genius didn't pay royalties for reproducing lyrics instead choosing to claim that their own reprinting of lyrics fell under "fair use" guidelines:
>"David Lowery, frontman and songwriter for Cracker and Camper van Beethoven, is waging war on the sites he believes make money off song lyrics but don't pay the songwriter. Once he took a closer look at where his music was making money on the Internet, he realized: There were more people searching to find lyrics to his songs than searching to illegally download mp3s of his music. And he wasn't making money off those searches. Last November, after months of exhaustive and systematic Googling, he released something called The Undesirable Lyric Website List.
>"The National Music Publishers Association seized upon this list, and announced that it would be sending take-down notices to every single name. At the top of that list was the very popular Rap Genius."
>"Rap Genius has been around for a few years, and it's extremely popular. No ads, lots of traffic and, just recently, a major investment from one of the hottest venture capital firms in Silicon Valley. The founder of Rap Genius, Ilan Zechory, says the site doesn't belong on Lowery's list. Because it's way more than just transcribed lyrics. He says the site is more like a social network: a discussion board for music geeks and even some of the musicians themselves — prominent rappers like Nas and Rick Ross — to comment on their own lyrics. Artists, the founders say, love the site."
>"Just this week, Rap Genius announced that, despite its opinion that the site falls under the criteria for fair use, it's going to pay songwriters for posting their lyrics. It's just easier than fighting with music publishers, who've been very successful at going after other lyric sites in the past few years. ..."[1]
Genius claims that Google’s actions caused a decline in traffic to its site. The lawsuit was probably a way to assuage nervous investors (who have poured >70M into the company)
I think it's important to point out that when you license lyrics, you don't actually get the lyrics. I know, sounds ridiculous. You'll get the license to display them, and when you ask the rightsholders of these lyrics (the publishers) for the actual lyrics they'll tell you "oh, we don't have the actual text, just the rights. You need to find the text somewhere else."
As a result, creating an accurate lyrics database like Genius has done is an enormous amount of work, and my non-lawyer gut-feeling says that in this case, Google is screwing over Genius big time. Too bad the legal system doesn't support that.
If Google can scrape my site, am I allowed to scrape Google results? Could I create a Google clone by scraping?
If I scraped the most common search results from Google, front page only, and removed all the ads what would Google's argument against that be?
On one hand, so many sites make finding information difficult, on the other it feels pretty scuzzy that Google prevents searchers from clicking through to the site that put the work into generating content.
"Scraping" was the wrong word. I was remembering incorrectly. Microsoft did not have to scrape. They were apparently using search data captured through Internet Explorer. There was a time before Google had control of the browser. This illustrates how companies with browsers can gather data about users' web activity and how far they can go. Today, even Firefox is gathering data about users' activity with "telemetry", and users typing things into the browser's search box are by default sending this data to Google.
The amount of Google captchas that you needeed to solve when searching on Google from Microsoft office made me think that it was some kind of psychological warfare.
> Genius isn’t the copyright holder for these lyrics, it just licenses them itself.
Both Google and Genius are licensing the lyrics. Ironically, Genius ended up having to settle a case years ago because they were using lyrics without the appropriate licensing [1].
I think the legal argument services like serpapi make is that as long as you don't create a google account and/or accept google's terms then you are free to scrape and clone what is publicly accessible (at least in the US). I have no idea though.
The problem with the robots.txt "standard", e.g., ones like Google's with no "crawl-delay" directives, is that it does not define what is a "robot". The query above is obviously not a "robot", but Google, with all it resources, still treats as such.
Google probably does more (abusive) scraping than any other entity. Web scraping is in their DNA. It is in their web pages, too.
Yeah, the tricky thing for those scraped by google is that given google's search monopoly, the sites can't block their scraping entirely, since they need to be shown in search results.
I wonder if it's even possible to fix Google search in the framework of a for-profit company. It seems like the trajectory of any ad-supported service eventually lands it in a "don't let the user out no matter the cost" phase. Perhaps such a service really does need to operate as a non-profit foundation of some sorts.
There was a post about regulating Google like a public utility recently, but perhaps we should also consider looking at other less conventional internet "public utilities" - things like the Internet Archive, Wikipedia or essential open source projects like Debian. I think a search engine that's transparent both in terms of its logic and how it's maintained and managed might be the only way.
It's not Genius' content. It's the musicians', who actually wrote the lyrics. If it was their content, they could sue for copyright infringement. Google has been sued on those grounds, and forced to change how they do their business (e.g. provide ContentID to handle copyright infringement on YouTube).
But I suppose we could say "don't scrape and present content outside of a regular search result". But then again, Google claims they haven't done so - that they got those lyrics from LyricFind, a lyric licensing platform - and Genius didn't present any evidence that this wasn't the case. So I'm not clear on how could any laws help here.
Finally, the question was not about Genius, it was about allowing competitors to Google to emerge. I don't see how would this help.
What does a GPT-3 future look like? If I ask it to fill in lyrics for a song or facts about companies and this comes from the knowledge it gained by “reading” a vast corpus, how is this different than a person reading the Genius site, memorizing the song, and the transcribing the lyrics?
Too me this is the endgame for all of these complains about search not being ten blue links anymore. Future knowledge engines will be vast AIs that have assimilated information into internal self organized structures, and will synthesize requests for that knowledge “in its own voice”
Unless AIs are lifting content by overfitting and making exact replicas instead of expressing the same facts in an entirely new way, I don’t think people will be a able to sue especially when the process by which the answer arrived is a massive Rube Goldberg contraption with 100 billion parameters.
GPT-3 for example can already extract information from SEC EDGAR reports, a service other companies often charge money for.
Techcrunch article left too many questions for me so I looked up and found https://www.rollingstone.com/music/music-features/genius-law... which seems to be a much more detailed and nuanced explanation of what is going on here, with backgrounds on who, what, etc.
This was confusing to me as well. I saw EDNY and was confused about why it was booted due to federal preemption. Considering the importance of this procedural aspect, it seems like a pretty big deal to get this wrong.
The real story here is left completely untold: why didn't they bring a copyright claim? Could they bring a copyright claim in the future? Could one of the owners of the copyright bring a claim? These are the questions that matter.
Nothing, because then you are on google sites all the time and google gets all the ad dollars.
For some reasons, google wants to become AOoL, introducing the A with their AMP service (or with Applied Semantics).
This rises the question: Will content owners create their own content network? If Google steals your content on the internet, why put your content on the internet? Why not have an app that delivers content to paying customers? Now each content provider tries this on his own with his own app. why not combine the efforts and just offer a browser for their closed network or embrace the Brave browser? If all content producers pull this off together, the audience will be there.
Facebook could offer a Facebook content network on their own because they already have the audience, and Genius and all those Recipe sites could publish their content in a secure way. Maybe Instagram with its text pictures is already the predecessor.
It seems like Google was taken over by Applied Semantics in the same way that Boeing was taken over by McDonnell Douglas because in the long run, nobody offers up his content for search if it is ripped off.
The crux of the legal decision here is that Genius doesn't own the copyright on the lyrics either. Hard to steal from Genius what they don't own in the first place.
Genius does give the tools to crowdsource lyrics, meanings, and comments. Google brought the rights to display lyrics but not lyrics themselves..they just steal that from other companies and rehost it with their own ads. Copyright law was not designed to handle this, which is something that is obviously morally wrong and will eventually harm consumers. Google has stated: 'if you don't want to be crawled, use Robots.txt' which because they have 90% market share, is clearly impossible for Genius. It's downright evil.
Downright evil seems like a stretch, especially given that Google doesn’t seem to have gotten the lyrics from scraping Genius and rather bought them from a third party who themselves scraped them from Genius. Whether or not Genius blocks googlebot in their robots.txt doesn’t actually seem to be relevant. But this does seem like a good case for the US to introduce “database rights” into copyright law to reward entities that assemble collections of otherwise-disparate information that they don’t own the copyright for, either because someone else does it because it’s something that cannot be copyrighted. Then Genius would have legitimate grounds to sue the third-party that scraped their lyrics. On the other hand, this would also had it existed have allowed Google to sue Bing back when it was first starting for piggybacking on Google’s search results. It’s not obvious to me that database rights are a good idea.
Didn't Google complain that Bing was copying their results a couple of years ago ?
Is it possible to have a middle ground ?
I can see both points of the argument :
- Genius does not own the lyrics, in most cases these are entered by users afaik. A similar example would be somebody adding an address/info on Google Maps.
- On the opposite end, associating a query like "that song written by blue haired 80s singer" to an actual result sounds more like a transformative work (although google owns user entered information as well here with the database of all the queries entered by users).
Would it be possible to have a framework where you can purchase such data at a fair price ?
I can't really understand why Genius cares. If Google didn't scrape from them they could scrape from one of dozens of other lyrics sites with pretty much the same results.
The value add of Genius is not the lyrics anyway. I never go to Genius to just look up lyrics because the site is fairly heavy. I use another site that is lighter and has a nicer lyrics format.
The only reason I go to Genius is for the real value add - the song annotations; and these are added by volunteers.
I rarely use the Google version of the lyrics either to be honest.
> Defendants made unauthorized reproductions of Plaintiff’s lyric transcriptions and profited off of those unauthorized reproductions, which is behavior that falls under federal copyright law.
Does this mean Genius still has grounds to sue, as the copyright protections are on their “work” which is the transcription?
It also seems to mean a ToS is not useful for protecting content, only determining legal users interaction with it (Excluding loading the page). A webpage is a work rendered through a browser, kinda makes sense I guess.
I think that’s their case to make, but from what I could find the answer seems to be probably not.
Derivative works are copyrightable (Translations, adapting a book to a movie), but it might be hard to argue an exact copy of the song lyrics are “derivative” as they aren’t an original creation, but simply a subset of the old work.
This all seems odd to me. Publishing song lyrics on the Internet seems like a small feature of a search engine and not an entire company with millions of dollars invested.
They license lyrics however and, critically, often times lyrics and their copyright are not a package deal. many songs don't have lyrics provided from the publisher. Genius allows users and artists to upload the lyrics. Google has the copyright but blatantly steals the content. This obviously is not protected in US law but it is elsewhere (see news sites and such in some countries requiring google to pay). Google then uses their search results to leverage their position.
Just a thought experiment: If Google releases paid version of search where it won't show ads, personalize based on your click-through history and do not copy content from source for presentation, won't share your data with anyone, would you opt for it?
Root cause of most of the issues seem to be monetization model of Google where they are optimising people to stay within their ecosystem.
It's an interesting question. I don't trust Google, and a fast pivot to another product/offering wouldn't change that overnight, so I would lean towards no.
I paid for Youtube Red, even though I never signed into the service on Youtube and even though I already got all of the Youtube-specific benefit in the form of adblocking and NewPipe. I did that purely to try and signal to Google, "I will pay for content, I want you to have revenue sources outside advertising."
But ultimately Youtube Red seems to have been a failure, my signal hasn't changed the course of the company, and the parts of Youtube Red I did get value from (say, Music) have gotten noticeably worse over the years. I'm planning to drop my subscription in September. I think it would be tough for me to buy into another product like that from Google, my experience trying to buy products from Google to get around advertising/tracking has been both a practical and moral failure. Certainly I won't sign up for a paid GSuite account now.
But I do already pay for Email (Fastmail), bookmarking (Wallabag), and a few other "free" internet services and apps today. So I might pay for search if a company like DuckDuckGo offered it instead of Google. But I would need to see what their offer actually was.
I mean, heck, I'd be giving recurring donations to DuckDuckGo today if they accepted them, they're on my list of companies I want to exist. So paying 5$ a month for a "premium" search experience wouldn't really be that different, even if I never signed into it.
> I mean, heck, I'd be giving recurring donations to DuckDuckGo today if they accepted them, they're on my list of companies I want to exist. So paying 5$ a month for a "premium" search experience wouldn't really be that different, even if I never signed into it.
Different people have different reasons for disabling ads. Some people only care about egregious ads and privacy violations, that's fine.
I am against ads in general, I noticed a sharp quality of life increase when I started blocking ads universally everywhere I could, regardless of whether or not I was on the web. That's a longer conversation, I'm not going to get into it now. I respect that other people have different opinions, but I also feel reasonably strongly about my own.
I do spread DuckDuckGo to other people, but I also spread uBlock Origin, so I'm not sure I'm a net positive there either. :)
It's again, an interesting question. I've been pushing pretty hard in my personal life to financially support Open Source software that I use, projects that I really care about. DuckDuckGo is one of the few companies that I really like that I haven't ever really supported in any tangible way.
I do feel guilty about that, but not guilty enough to allow myself to be turned into a product. There is likely no company where I would ever feel guilty enough to turn on ads. But given that DuckDuckGo isn't going to allow donations any time soon, I could see a lightweight 'premium' DuckDuckGo product effectively being a way for me to just give them money without it feeling to them like it's a donation.
Actually, what I should do is to look into some of their merch and see what they offer and if they actually make a profit on it. I thought at one point DuckDuckGo sold shirts or something, but I don't know if they still do. Again though, this all kind of ends up being a messy proxy for donations. I'm not super-jazzed about walking around as a living billboard either, even for a company I like.
The difference in amounts between what Google makes per 1000 searches and what a profitable/sustainable search engine needs to make are quite vast.
Can't remember the source but G make around ~$70 per 1000 searches. Bing make less than half than this, DDG less again.
G commands higher rates because of higher competition for ads and also because of its retargeting and all the other methods that raise that per 1000 figure higher.
To answer your question (sort of), Google's non-ad version would need to be expensive to break even with its current model.
I'd like to be able to remove the ads, but I don't care about the rest of it. Besides removing ads, the other things you mention would make the service less valuable for me.
Genius knows this, which is why they didn't file a copyright suit. Instead, they claimed other things like unfair competition and breach of contract. However, Title 17 Section 301 of the US Code says that "all legal or equitable rights that are equivalent to any of the exclusive rights within the general scope of copyright [...] are governed exclusively by this title". To avoid this, Genius needed to prove that their claims weren't "equivalent" – ie weren't just copyright claims dressed up as something else. They failed to do this, and so their case was thrown out.