Hacker News new | past | comments | ask | show | jobs | submit login
Court dismisses Genius lawsuit over lyrics-scraping by Google (techcrunch.com)
370 points by fortran77 on Aug 12, 2020 | hide | past | favorite | 286 comments

This is a bizarre take. The most substantial point is buried near the end of the article: Genius does not own the copyright to the lyrics. Yes, they may have taken billions of painstaking person-years to hand-transcribe them onto artisanal silks, but no matter how much effort you put into copying someone else's work, it doesn't make it your work, and you can't sue people over it. At best, you could claim your copy is a derivative work, but that only grants you protection for your additional creative contributions on top of the original work, which for a straight transcription is... well, nothing.

Genius knows this, which is why they didn't file a copyright suit. Instead, they claimed other things like unfair competition and breach of contract. However, Title 17 Section 301 of the US Code says that "all legal or equitable rights that are equivalent to any of the exclusive rights within the general scope of copyright [...] are governed exclusively by this title". To avoid this, Genius needed to prove that their claims weren't "equivalent" – ie weren't just copyright claims dressed up as something else. They failed to do this, and so their case was thrown out.

You seem focused on whether this case was the legally correct decision, which it sure seems to be. This article, like many readers, is more focused on whether this was a fair result. Nothing bizarre about that.

The judge may have done the correct thing, but readers may feel that Congress didn't. This case will doubtless be used in the future to argue for sui generis database rights like the EU has.

(My view is that in principle, some form of sui generis database rights makes sense, but for the things that US copyright law already covers it is currently far, FAR too restrictive and lasts too long, so I would vehemently oppose expansion of existing US copyright law to cover sui generis database rights.

However, if US copyright law were reformed such that it mandated blanket licensing (see [the EFF proposal]), strengthened fair use protections, and shortened copyright duration, then I would totally support similar rights for sui generis databases.)

[EFF proposal for blanket licensing]: https://www.eff.org/deeplinks/2020/05/plan-pay-artists-encou...

The law decides fair. It's not subjective.

Laws are very much subjective. They're results of their time and the people in charge of voting them. That's why the US still has people in jail for life for non violent weed related crimes while you can now legally buy and consume weed in half of the country.

Laws are constantly changing for a reason, "fairness" isn't set in stone nor objective.

Are you American by any chance ? I've seen a lot of American talking like that about "law" as if they were god given, immutable, objective and fair.

But, in that sense, fairness is also subjective. There are viewpoints that call what Google did unfair and viewpoints that call it fair. What overarching philosophy do we use to decide which is the right viewpoint?

Genius is a YC company and HN is a YC-owned forum, so this is probably not the best venue for an unbiased discussion on fairness

> Laws are very much subjective.

Maybe I'm being pedantic, but laws themselves are objective. At least, they are crafted to be an objective description of what is "right" and "wrong". Said another way, laws exist to objectify the morals of the law's author.

I think what you are trying to say is that laws are not self-justifying. Laws draw an objective view of the world, but we can subjectively agree that we don't like that view and then change it.

In a common-law system the written (legislated) law is not actually the law, it's the court's interpretation of how it should be applied, taking into account similarities to other laws and the precedents set in applying them, that determines what it actually means. That makes it quite subjective and vulnerable to initial precedents in applying it.

why is this downvoted? Without getting into a debate about what true objectivity is and whether or not it exists, In the US, the majority of laws are objective; and it's the application of those laws that is subjective.

This could be an issue of articles: Laws are objective; the law is subjective.

Laws consist of literal syntax, but that does not make a Law "objective". Wittgenstein made issue of the fallability of the English language. English is not a special case (just the applicable one).

To whit, a Law does not always describing discrete actions and consequences.

To say a Law is Objective is not saying anything, because the Law is necessarily interpreted in any frame of reference. A single Law is necessarily subjective and following this, The Law in aggregate is subjective.

There is no end to this debate. The buck stops at the law. As of today this is fair and objective. Without this we have no framework to judge right or wrong. You can handwave around this but this is the best jurisprudence has come up with over the years. Do you have a better system than pitchforks on HN?

The law is not "fair and objective". It is a representation of the morals of the particular group holding power at the time of its writing.

It will never be "fair" because it only very rarely take all of the circumstances into account - as Anatole France put it: "In its majestic equality, the law forbids rich and poor alike to sleep under bridges, beg in the streets and steal loaves of bread."

Neither is it particularly objective - its application, especially in a common-law system, is extremely subjective. (You're bound by whatever the judge in previous cases thought the law was - E.g. in the US "stare decisis" is enshrined pretty much as an immutable rule)

It has never been a framework of "right or wrong", either. It is a framework on how disputes get settled. There is no interest in "right" beyond "the authors of the law at the time thought it was a good idea"

To take a trivial example: Slavery was recognized as morally wrong long before the law actually made it something that was not allowed. The law still allows carceral slavery, even though there is growing consensus that that's morally wrong.

As for "better" systems - there's certainly a large faction of countries making the case a civil law system is better than a common law system. But you don't need a "better system" - the law can only be meaningful if we accept that at all times, it will be flawed, it is not "fair and objective", it needs to be tempered with compassion, and it is our job to improve upon what we have.

You can't just say "there is no end" and follow it with "my end is the right end".

At least talk about why you think deontology is better than other forms of norm setting, why the law must be right even though there are many different and contradictory laws in different places, and how you propose to make new rules if not based on a concept of "right" that is independent from the existing rules.


Why do you think that law is necessarily deontological? There are many areas of the law in the US that tilt towards consequentialism, such as antitrust law (does it harm the consumer?)

> The law decides fair. It's not subjective.

That certainly IS a consistent position one could take. Most people feel that there are certain moral imperatives that exist independent of government-defined laws and that government laws can be fair or unfair. However, the position you take here -- that "fair" means whatever the government says -- is a possible basis for ethics.

But I, and most people, find it to be deficient. It requires that you admit that (for one example) slavery in the USA before 1863 was fair which I (and most people) disagree with. I'm sure you can find other examples if you wish.

Are you sure this is what you want to base your definition of "fair" on?

In a democracy the government is inherently an extension of the people and so when you frame it as fair meaning whatever the government defines it as you are, at the same time, acknowledging that the majority have already decided the matter to be fair. Whether or not those people are willing to admit to the world or themselves that they believe the law to be moral, it is from their consent as a mass that the law was able to manifest. So at the very least the law is what most people most people think is morally fair.

In aggregate maybe. But surely there are specific laws which a majority think are unfair.

It’s only the laws which are unfair AND which affect a majority of voters, where you could maybe claim democracy has weighed in on the fairness.

History is littered with laws which are only unfair to minorities, which I think proves my point.

The US is not a democracy by definition, and even if you want to substitute in “republic” here, the description of how the law-making sausage factory works here is idealistic at best.

If only history had any examples of unfair laws...

But the law can be unfair, and can even be made more fair, and for that, you need discussion.

> One has not only a legal, but a moral responsibility to obey just laws. Conversely, one has a moral responsibility to disobey unjust laws.


How do we get new laws, or changes to old laws? It seems to me that process involves people deciding what is fair.

Enforcement of laws should be as objective as possible. The laws themselves are very much subjectively created.

Laws decide nothing. They are just a bunch of rules made up by those in power. Way too many of them are the direct result of lobbying by billion dollar industries. Copyright is an example. People vote for and elect representatives and they end up serving these elite groups instead. How is that fair?

If you believe any given law is unfair, it is your duty to disobey. This is called civil disobedience.

Laws are exactly like released legacy code. They work and occasionally they fail.

They can be better, but you have to understand them to know how to improve them.

I don't understand how your second sentence follows from the first.

Perhaps you mean: they are full of bugs, fragile, broken, but occasionally they work?

It seems I respect working code more than you do. I wouldn't say laws are broken.

Improving legacy code always means to understand business logic and if possible the motivation behind it. Then you have a replacement built and you can justify why it is better.

Plenty of laws are broken. Drug control laws, white collar crime laws, whistleblower protections, copyright, IP, patent, child support & alimony, parent-child access / visitation, execution or the lack of it, gun control or the lack of it, fucking mandatory sentencing.

Shall I go on?

"laws aren't broken" is the kind of thing someone says when they've never seen "justice" in action, or are walking around with their eyes shut.

I think your parent was just trying to say "laws are important".

"Legacy" code that only "occasionally works" doesn't sound like something worth keeping around. It sounds like something to dump in the trash and never look back.

If you have working legacy code though, even if it occasionally fails because it's full of bugs and fundamental flaws, should be replaced very carefully, and only with in-depth understanding of what is working and what is broken about the existing system.

Based on the broken laws you list, I assume you're talking about America. American laws are indeed very broken, but dumping it in the trash and never looking back would be a one-way ticket to a bloodbath. It is both important very important to fix our very broken laws, and important to do so very carefully and only with an in-depth understanding of what is working and what is broken about the existing system.

If I were to give you as much benefit of the doubt as you give your parent, I'd say that "occasionally laws work" is the kind of thing someone says when they've never seen how much worse it can get.

The law decides what is allowed. Society decides what is fair.

Law also says that erratic criminals can be restrained by kneeling on them, one of the reasons which hastened George Floyd's death. I don't think law is always right.

Emotional appeals shouldn’t* hold any weight when considering the law. You’ll end up creating highly biased or illogical laws that cater to the culture of the times and need to be changed again shortly. Kneeling may cause the death of some but save the lives of countless others. “The law isn’t always right” isn’t an objective statement because “right” is relative to the observer.

The entirety of law is an appeal to emotion.

Law endeavours, or let me say: should endeavour...

To be fair.

And fairness is entirely a human emotional pursuit.

This seems like a fully general case against making or changing any law ever.

The last sentence without the context of the preceding sentences would seem that way, yes. However in the context of “laws created as a response to emotional appeals” it becomes a warning to remember that everyone has feelings and everyone’s feelings are different when creating or considering the law.

> You’ll end up creating highly biased or illogical laws that cater to the culture of the times

yes this has often happened

> and need to be changed again

the thing is, they often don't get changed.

That isn’t an apt comparison — Kneeling on someone’s neck is never going to save countless lives.

I guess that's why progressive taxation is fair! /s

> Yes, they may have taken billions of painstaking person-years to hand-transcribe them onto artisanal silks, but no matter how much effort you put into copying someone else's work, it doesn't make it your work, and you can't sue people over it.

Not really all true though. Genius started out by stealing lyrics from other sites. In the early days many of the lyrics had the exact same errors as other more establish sites. That may have changed since.

Yes, they did, because they got a takedown from the actual copyright holders: https://www.billboard.com/articles//5785701/nmpa-targets-unl...

You are correct about Genius also scrapping in the beginning.

It's much better now, but that's only because of unpaid volunteer editors who do most of the corrections and annotations in their site.

Aren't collections of facts copyrightable? So google has copyright over google maps and I can not copy that but I can go out and record exactly the same data since I collected it myself.

No, in the US you cannot copyright facts, only expression. So you have control over word-for-word copies of your article about a bird; but you have no control over dissemination of the facts you discovered about the bird. SCOTUS decided this in 1991, Feist v. Rural: https://en.wikipedia.org/wiki/Feist_Publications,_Inc.,_v._R....

You might be thinking of sui generis database rights, which DOES cover collections of facts. The EU, Russia, and Brazil recognize this right, but the US doesn't: https://en.wikipedia.org/wiki/Database_right#United_States

The US does recognise compilations as copyrightable though (but there must be decisions made about what to include).

See https://wiki.openstreetmap.org/w/images/6/6f/Protection_of_C...

It could be a system, oracle, or network, which computes weights or votes, just as soon as it could be an individual or group casting ballots or deliberating. The deciding is key, perhaps not who decides. Some aspects of law are slippery like that though, where one set of rules applies to human agents, and a different, perhaps mutually exclusive set, apply to everything else.

For example, horse drawn carriages have the right of way even over pedestrians, because you’re pulling weight, or more accurately, a beast of burden is pulling the weight. It’s a living thing too, and it can’t stop on a dime when it’s got a load. It makes sense when you know the context and framing for why the law is so written today.

I’m sure there are similar examples in others contexts. Court cases and judges look at the law like we do whitepapers. Some docs are better than others, and there are some devs, and other judges’ toes you’d be hesitant to tread on, especially if you have a habit of doing that kind of thing.

Right. "Expression" isn't restricted to specific wording, it can potentially mean specific choices of what is included and what is included in a compilation; I could hold a copyright to such a compilation even if I don't hold the copyright to any of the items in the copyright.

For example, many artists publish playlists on Spotify to build their brand; it could plausibly be copyright infringement for one of them to verbatim copy a playlist I made and publish it as their own, even if I don't hold the copyright to any of the songs on my playlist, and even if one of the songs on the playlist was actually that artists' own song. The act of assembling a playlist is potentially copyrightable expression.

Compilations of facts are copyrightable in the US, but they can't be just raw collections - there has to be a choice made what to include.

> The Act also provides copyright protection to compilations, but only to the extent that there has been a contribution of originality in assembling that compilation.

Map copyright is based on the idea there are decisions made around what to include and how to display it.

You can't photocopy a map and claim copyright. However, a human can trace the same map and claim copyright.

See City of New York v. GeoData Plus, and the discussion in https://wiki.openstreetmap.org/w/images/6/6f/Protection_of_C...

Regarding compilations of facts, the general doctrine is that copyright would protect the semi-arbitary choices of what to include in that compilation (e.g. judgement of relevance - which words to put in dictionary, what detail to include/exclude in a map) and disallows copying that compilation; but it explicitly does not protect "work and sweat" required to gather that data, and allows people to copy particular facts out of that compilation, for example, if they are making their own selection with different criteria, as the underlying facts are not protected no matter how much effort it took to obtain them.

In this regard, copying lyrics of some particular song does not violate the rights of Genius - they don't have copyright to that particular song and the compilation-of-facts rights don't apply for that particular single item.

I would guess maps is a different case because those are their own works. There are decisions on design being made, how to show overlays. But that’s just my assumption.

Why does it matter whether you're transcribing lyrics or transcribing geography? In both cases you're effectively just writing down something that exists already.

Exactly. This should definitely be grounds for allowing copying google maps data as long as you render it in your own style. The usual trick here is trap data where map makers insert fake data to catch copies but thats exactly what happened here with the ' in the lyrics.

If the actual data isn't copyrightable, yet the 'fake/trap data' is, then presumably when one copies the map, a court will decide that the magnitude of the copying is very small - only a single apostrophe was copied - and therefore the damages negligible.

If fake data is hidden amongst real data, wouldn't there also be the argument that the copier was unaware that they were copying a creative work rather than pure facts?

Fred Saberhagen made this a plot point in one of his Berserker stories. Going by my highly suspect memory...

A damaged Berserker captures an atlas showing an occupied system nearby, and heads there with its last reserves of power to destroy the system.

The human who didn't stop the Berserker is charged with a crime against sentience, but is acquitted when he reveals the secret: The occupied system was a fake, in the tradition of cartographers going back to the Middle Ages on Sol.

Just add some small, inconsequential error to all positional data.

Does this mean Genius can add a few non-breaking spaces to the lyrics, with a couple of odd unicode characters that look like periods and commas.

I guess at some point it’d end up looking like a ReCaptcha and there’s enough of those already ...

Thats exactly what they did with commas. Didn't help them.

Because you are allowed to (I presume) copy the raw data (gps coordinates of roads, buildings, etc.) But you don’t have access to it, instead you only see their map design.

The equivalent would be to copy an existing map, not to create a map based on your observations of the world.

But what google did here was not listening to every song and writing down the lyrics. They just copied the text off another website.

The land is there. Depicting it is the result of Google's work. You went over, or used a satellite photo with the appropriate licensing and YOU painted (created) the map.

Lyrics are not just found in the wild (like a mountain or a street is). Someone thought of them, wrote them down, so it was their creation. It is like listening to me reciting a poem, write it down, and sell the book.

This is a classic example of the map not being the teritory that I think gets the point across: "Ceci n'est pas une pipe"

I don't know about copyrights for maps, but your argument is not good: You can't copyright a flower, but take a picture of that flower and the rights to the picture belong to you. Google is not preventing others from collecting mapping data, but they prevent them from copying the data Google has collected.

> Genius does not own the copyright to the lyrics. Yes, they may have taken billions of painstaking person-years to hand-transcribe them onto artisanal silks, but no matter how much effort you put into copying someone else's work, it doesn't make it your work, and you can't sue people over it.

Apparently they license the lyrics now:

> Genius isn’t the copyright holder for these lyrics, it just licenses them itself.

It's not a case of someone copying without permission and then suing another person who copied them. It's a valid licensee suing someone who is copying them.

Imagine if a McDonald's franchisee sued someone running a rogue/unlicensed McDonald's around the corner. Would we have no sympathy for them also?

Legally speaking, it appears the right to sue requires at least some exclusive copyright rights, [1] which Genius surely didn't have (and a McDonald's franchisee also would not have). This is presumably why they didn't bring a copyright suit.


True, though that blog post also says "We do not crawl or scrape websites to source these lyrics.", which was definitely not true (though may be now).

Google licenses the lyrics through a third party, LyricFind, which in turn hired people to transcribe them. In a small percentage of cases, those transcribers did not do their own work and instead copied lyrics from Genius.

I think it was true then. I believe their claim was usage via third party?

Where I live, there is a database right.

The fact that Genius collated those works is meaningful work in its own right.

I'm not sure what your background is but your analysis sounds quite strong opiniated.

There are examples in law that 'work' can be protected; Just because you don't have the copyright doesn't mean that someone else is just allowed to use your work results.

Apparently in this specific case its not protected.

Regardless of what you think about the lawsuit, you have to give them credit for their watermarking method:


If I was going to scrape this data and re-purpose it, I would've absolutely cleaned up those apostrophes. The pivoting between straight and curly would certainly be a pet peeve. Unless there's a semantic difference between the two I'm unaware of.

The semantic difference would be important in a song like Baby Got Back by Sir Mix-a-lot, which includes both speech quotes and imperial measurements.

Imperial measurements should use the prime symbol and not straight quotes: https://en.wikipedia.org/wiki/Prime_%28symbol%29#Designation...

The song was released almost a year before UTF-8 was first presented at USENIX, so I think it is reasonable for the lyrics to be expressed using the commonly available technology available of the time.

For lyrics, you should just transcribe the pronunciation, eg "six foot two".

I for one would impose metric... it may be harder to rap, but true talent steps up to a challenge...

"But I would walk 804 kilometres, And I would walk 804 more, Just to be the man who walks a 1,608 kilometres, To fall down at your door..."

It's so catchy! Perhaps: "16093.44 meters high" by "0.2286 meter nails"...

1.608 megametres :^)

The units are not spoken, but for clarity they should be notated in the lyrics.

TIL about prime symbols. Thank you!

My anaconda don't want none of your poor semantics.

I queried for this song's lyrics, and I found only a single quoting symbol (') used entirely throughout.

Taking a layperson's reading of the lyrics, I didn't find anything off due to quoting issues. If there was any disruption, it was minimal and unnoticed.

Sir Mix-a-lot is an American rapper so the measurements in the song are presumably in US customary system units and not imperial units.

Ahh, but is it US customary or US survey?

Could you try not to be so pedantic regarding my pedantry.

Yeah, makes sense, but this is still a pretty good approach. Inserting invisible or unusual Unicode symbols would prompt the scraper to carefully cleanup the read files (maybe even fixing these apostrophes as a result). Unusual whitespace is also likely to be removed and cleaned up. On the other hand, these alternating apostrophes have a chance to stay unnoticed (or neglected), falling through the cracks.

There is a semantic difference between the two. The straight quote is a superset of the curly one.

So "rock 'n' roll" is correct. And "rock ’n’ roll" is correct. But "rock ‘n’ roll" is not correct, since the wrong apostrophe is used. We're not quoting the letter n, we're showing that the letter a was removed.

The vast majority of content consumed by this scraper is never closely inspected by a human, though.

On this quantity of data, you wouldn't be able to do this manually.

If you hope to avoid being caught this way, I'm going to assume you noticed this without the benefit of hindsight and plan to correct all out-of-place Unicode characters automatically. How will you avoid over-correcting?

There's also no reason to believe this is the only fingerprinting Genius has done (they only need to publish the most obvious fail). For example, I can use the same fingerprinting technique but switch between American and British spellings.

This is not a straightforward problem.

I think if you had multiple corpuses of lyrics, you could cross-check for anomalies of any variety (odd quoting, switching between american/british english), etc.

The fingerprinting isn't likely applied to every song, to prevent obvious detection. If you went through multiple databases, you might see N prevailing copies of a song's lyrics, and 1 that seemed different. The one that's different has the anomaly.

I'm not disputing that they proved their point, but this is triggering one of my pet peeves about common misunderstandings of Morse code.

Timing is critical in Morse code. You can't just write out a bunch of dashes and dots to transcribe it without clearly transcribing the rests between dots and dashes as well. They haven't given us the rests at all, so all the info they end up having is:

dot dash dot dot dash dot dot dot dot dot dot dot dash dash dot dash dot dot dot dash dot dot

And that can be interpreted in any number of different possible ways besides "REDHANDED". E.g. it could also be "AU5EWRFE", or any of thousands of different interpretations (actually probably a lot more than that; this would be a fun programming problem). They should have used a binary encoding; 22 bits (all they have given us) is not enough information to uniquely encode the string "REDHANDED". Once you include the short rests that are needed, we're talking 44 binary bits or 22 ternary bits. And if you want the long rests to distinguish properly the spaces between words, then 22 ternary bits won't do it; you need the full 44 binary bits.

>They should have used a binary encoding; 22 bits (all they have given us) is not enough information to uniquely encode the string "REDHANDED".

The fact that the sequence can be interpreted as REDHANDED with a particular way of grouping the input is just being cute. Regardless of the grouping, it is a binary encoding of a 22-bit number, and so would have a one-in-2^22 chance of being reproduced at random.

Edit: To clarify: You're saying they should've mentioned 22-bits in the context of binary digits without mentioning Morse code, and if they did want to bring up Morse code they should've used trits or more bits to encode the stops. I'm saying that the fact that their 22-bit sequence can be interpreted in Morse code as a relevant word is just dressing, and does not detract from the point that the sequence was likely copied. Put another way, if someone tried to counter by saying their sequence could've been generated independently because "AU5EWRFE" and many other strings also encode to the same sequence, it would not affect the facts at all.

I think you missed the first sentence of my post.

Nope, I didn't. Your first sentence says:

>but this is triggering one of my pet peeves about common misunderstandings of Morse code.

and my post is about how there is no misunderstanding.

There is a misunderstanding because anyone who properly understood Morse code would never represent it in this manner where it's indistinguishable from a large number of other possible strings.

There is no misunderstanding because it was never the intention to use the bitstring to represent a unique word in Morse code in the first place. Instead, Morse code was merely used as a convenient method to devise a bitstring from a relevant word.

You'd think Google would be wise to that since they do that themselves.

E.G. When they caught Bing copying them... https://www.wired.com/2011/02/bing-copies-google/

And they definitely do it with maps. There is a tiny little village I visit in rural Roscommon each year. Each year a new major retailer appears to have opened in this 500 population village, well according to Google Maps that is. At the moment there is a branch of New Look situated on a farm down a single track country lane.

This is a variation on the trap street https://en.wikipedia.org/wiki/Trap_street

I recall coming across this in my travels. There was a named "town" at the intersection of two streets - upon passing through there, nothing. Later wondering where the town went I found that it was not ever there and was just present to identify people copying that map.

Called a Paper town.

This is very clever! Also, take a look at Claim 2 of this patent[1]. Do you think these are similar enough to constitute infringement?

[1]: https://patents.google.com/patent/US9881516B1/en

(Software patents should be abolished. I just like to point out their absurdity and how it's easy to independently develop a technique (steganography in a search engine result) that someone has already grubbed a "patent" on.)

This bit at the end is the best part, I think:

> while particular embodiments and applications have been illustrated and described, it is to be understood that the disclosed embodiments are not limited to the precise construction and components disclosed herein. _Various modifications, changes and variations, which will be apparent to those skilled in the art, may be made in the arrangement, operation and details of the method and apparatus disclosed herein without departing from the spirit and scope defined in the appended claims._

If I'm reading this correctly, this patent is claiming things that are "apparent" (obvious?) to those "in the know". Computer or not, how did this get granted?

If it's not implemented in PHP, it's not infringing!

The patent claim requires the program to "load available PHP server header information"

That is just so bizarre to me too. That one would be able to copy the exact core of the idea, but write it in a different language, and suddenly it's not infringing, even though it causes the computer to do the same thing.

How these patents even pass the sniff test, I'll never understand.

Reminds me of a much older practice for the same reason: https://en.wikipedia.org/wiki/Agloe,_New_York

That's pretty clever! How the tables have turned since hiybbprqag

Wasn’t sure if you were having a seizure there, so for anyone else who was wondering:



I posted about this the other day and someone who used to work at Microsoft made a really interesting/helpful response - worth checking out: https://news.ycombinator.com/item?id=24112418

Warning: The Word of the Day for Aug 11 at Urban Dictionary is pretty NSFW.

Thanks, sorry, I should have noted that anyway just for linking UD.


Does anyone know if ebooks are watermarked in a similar way?

They almost certainly are for some sources. And probably with your unique user id to work out who leaked the copy on to torrent sites.

The way to check this is pretty easy though, get 2 users to bit for bit compare their books to work out if they are identical.

Open the book in Evince, Print to PDF. Bam, your book is now anonymous.

I am not following your line of reasoning correctly.

The watermarking is done using characters that look different from each other. Printing the book will not change those characters, and therefore it is still possible to extract the original information. Harder, but not impossible.

For this information to be lost, Evince would have to print the book in a crappy font that uses a single symbol for both. Which is possible, but I think it's not what you were going for.

The hash won't be the same, which is what matters the most.

They won't have people manually reviewing the punctuation to catch infringement, and it's highly probable that Evince printing to PDF will mess the whole internal structure, so it's hard to automate it.

If you just print it then that defeats the point. They don't care about individual copies, they want to find who is stripping DRM and posting books on torrent trackers.

How likely are you to notice if your resulting PDF had a specific unique pattern of zero-width spaces embedded with it? Or if a few curly quotes aren't curly? Let alone scenarios like the cover JPEG having stenographic info in it...

Absolutely. It usually tends to be either more visible, or less visible though - some pdf files have a literal watermark on the pages, while other formats like epub contain a guid or other watermark content in the source (epub is zipped xhtml)



Amusingly, I believe if you use calibre's ebook conversion to replace stylesheets and add toc, it may also actually remove those markers that have no actual content and only exist to provide a unique ID.

Not ebooks, but interesting technique used to find who is leaking your sensitive documents: https://en.wikipedia.org/wiki/Canary_trap

Not sure about ebooks, but I assume every workplace email sent to all does this to catch leaks.

Can someone explain to me the benefit here? Is it making it less likely for the google scrape to get a search hit?

There's no "benefit", they were just looking for a nice unique way of watermarking textual content, to prove that what shows up on a Google search is indeed sources from them and not some other transcription of the lyrics.

Watermarking, while similar is not the correct word, though the purpose of watermarking and this steganographic embedding is the same.

You have to admit, it's genius!

> LyricFind. LyricFind is a Google licensing partner, and may be the source of the Genius content appearing in Google’s search results. LyricFind published an explanation on its web site Monday, saying, “Some time ago, Ben Gross from Genius notified LyricFind that they believed they were seeing Genius lyrics in LyricFind’s database. As a courtesy to Genius, our content team was instructed not to consult Genius as a source. Recently, Genius raised the issue again and provided a few examples. All of those examples were also available on many other lyric sites and services, raising the possibility that our team unknowingly sourced Genius lyrics from another location. As a result, LyricFind offered to remove any lyrics Genius felt had originated from them, even though we did not source them from Genius’ site. Genius declined to respond to that offer. Despite that, our team is currently investigating the content in our database and removing any lyrics that seem to have originated from Genius.”


The dismissal seems logical to me

Sounds like everyone and their mother is scraping stuff off Genius, not just Google; they went after Google specifically because they knew they couldn't just disappear and they had the financial means to pay for compensation, unlike the thousands of crappy lyrics websites.

That said, it would've been just if Google would pay for access to Genius' particular, well-curated, "source" database of lyrics, especially given that they're basically stealing traffic.

But it sounds like the issue is that Google really wasn't using Genius's data directly. The problem is that Google is sourcing from "The Internet," and everybody and their grandmother is 'stealing' from Genius.

Here's an interesting question: if Genius closed up shop tomorrow, how long would it take Google to become the primary source of song lyrics online (by rebuilding Genius's dataset from general Internet harvesting)?

The same thing came up in the Linked In scraping case. The courts have defended website scraping.

Even if Google scraped it on purpose in order to steal traffic, it would likely be legal.

Last I checked, ignorance wasn't usually a defense but I'm not a lawyer. I just know not to pretend the Keurig I bought off the back of a truck was a good deal for everyone involved.

But physical analogies to IP fall apart quickly so I'm not going to encourage people to read into that too deeply

So there's two different concepts. One is original creative output, which is copyrightable. The other is information, which is not copyrightable.

If you find something verbatim identical in a bunch of different places, you've got a strong case that it's just information, because if it were original creative output it wouldn't show up identically in multiple places.

If it turns out everyone was plagiarizing a single source, but you were unaware and took down the offending content when asked, you won't have much in the way of legal liability.

If that flies, it's a great tactic. You can't just use the data from site A, so you build anonymous sites B, C and D who use the data from site A, and use the data from those sites instead. "We didn't source from A".

It's not a great tactic if the owner of site A decides to sue you and subpoenas the hosting providers for B, C and D. You can only get away with it as long as you aren't successful enough to draw the attention of anyone with money to burn on legal fees.

pretty much data laundering

Google has a history of scraping content that they want, their business is built on the back of scraping other peoples content. The story I read just recently of what happened to Celebrity Net Worth was an interesting read where Google asked for an API, they refused and Google just scraped the content anyway. There was no lawsuit, but CNW put up fake content and sure enough, it made its way to Google.

It is all ironic given how aggressive Google are in blocking any attempts to scrape its content.

Probably a silly question, but why not just use robots.txt? That was designed for preventing exactly this.

I’d say most of Genius’ visitors comes from the “song x lyrics” so hiding those with robots would ultimately make them lose almost all of their traffic.

Not due to robots.txt but you can see what happens to genius formerly rapgenius when they get removed from the index:


Wear a condom before clicking techcrunch.com links:


Outline works well for TC and provides a nicer reading experience. https://outline.com/https://techcrunch.com/2013/12/25/google...

To be fair to TC, if you disable JavaScript you get a pretty good experience - just the full article, legible. Not like those sites that require JS to load the text and/or images.

But you need JavaScript to get past the "we value your privacy, so give us permission to sell it" banner.

robots.txt is designed to keep garbage off search results. It has absolutely no power to prevent a bot to do anything. Also if the site added robots.txt they might as well shut down because their entire userbase comes from people searching lyrics on google.

Other way around. It was invented to stop crawling. Indexing is still technically allowed even when blocked by robots.txt from crawling.

The problem is that Google is stealing content and placing it on search so the user never goes to the source, By blocking it with robots they block themselves from google results AND Google may already keep scraping the content.

robots.txt isn’t enforced by anything

They also scrape MusicBrainz, but even if they don't index MusicBrainz at least they donate to it

They have an contract with MusicBrainz. They are listed on https://metabrainz.org/supporters/tiers/4.

> The Unicorn tier is for large companies or companies that would like to have a reciprocal relationship with our foundation. If you need special guarantees, indemnities or require us to sign your contract for a data license, please select this tier. If you have another creative idea you would like to propose, please also select the unicorn tier.

> For any of these cases, please detail your request in the company information field and we will work with you to fit your company's mythical situation. We will also find an appropriate monthly support amount to our non-profit foundation of $1500 or more per month. Please always consider enabling the growth of our non-profit foundation and the continuous growth of our metadata!

That's like saying it's ironic that a soldier fights for his life when he tries to kill other people.

It's just the war that is being fought, not some sort of hypocrisy or irony.


We live in a society of laws. Even soldiers. Google have shown they have no respect for the law not equality before it and will cheat while using the law as a cudgel. Recall law exists that the strongest might not always get their way. "Ironic" is the pole way of pointing this out.

Without law, Google cease to exist immediately. They are incapable of enforcing property rights without it.

Pardons aside, soldiers go to jail for taking an attitude like Google's.

Just like Genius, Google licensed the lyrics. If they didn't, the publishers definitely would have sued.

Ironically, it is Genius that seems to have no respect for copyright law. Genius ended up having to settle a case years ago because they were using lyrics without the appropriate licensing [1].


Which law did google break? Scraping in and of itself isn't illegal last time i checked, and usa doesn't have database copyrights unlike some juridsictions.

It's blocking scrapers that is (somewhat, per things like the Americans with Disabilities Act) and/or should be (in general) illegal.

(And to head off the obvious: rate-limiting is orthogonal to whether the high-request-rate querient is scraping.)

> should be (in general) illegal

But.. it's not illegal?

> somewhat, per things like the Americans with Disabilities Act

This is just not right at all. There is nothing in the Americans with Disabilities Act that make blocking scrapers illegal.

I think you mean you don't like the power imbalance of the large company taking away from smaller companies while using technological means to stop the same thing happening to them.

I don't like it either, but that doesn't magically make it is illegal. I'm not even sure it should be.

> There is nothing in the Americans with Disabilities Act that make blocking scrapers illegal.

Retrieving, processing, and displaying information in a manner contrary to the wishes of the provider of that information is necessary for accessibility to disabled users. As a specific example, any attempt to block use of wget for scraping also blocks use of wget as part of a `wget | filter | text-to-speech` pipeline[0], and is thus a discrimination against blind or otherwise visually impaired users. The ADA is, as mentioned, only somewhat effective in prohibiting such things, though.

> it's not illegal

> that doesn't magically make it is illegal.

I don't think anyone is claiming that scraping itself actually is legally protected - I interpreted DigitalSea and harry8 as implying that it should be.

0: in either the shell sense or the workflow sense

Retrieving, processing, and displaying information in a manner contrary to the wishes of the provider of that information is necessary for accessibility to disabled users. As a specific example, any attempt to block use of wget for scraping also blocks use of wget as part of a `wget | filter | text-to-speech` pipeline[0], and is thus a discrimination against blind or otherwise visually impaired users. The ADA is, as mentioned, only somewhat effective in prohibiting such things, though.

This is not the case. Unfortunately (?) the ADA doesn't allows the disabled person to specify their own technology. If Google can reasonably say that speech to text works via a standard screenreader (which it does) then they are ok.

> The ADA is, as mentioned, only somewhat effective in prohibiting such things, though

Well that's not the intent of the ADA, so not really surprising.

I am not a lawyer, but my understanding is that only the copyright holder can sue for copyright infringement. I am pretty certain Genius does not hold the copyright to those lyrics. It's odd Genius brought this case at all. This is briefly noted at the end of the original article, but it seems like the whole point. Did I miss something?

From genius.com: "Genius Media Group, Inc. (GMG) is fully licensed to display lyrics across all of its properties. In 2013, GMG entered into licenses with every major music publisher: Sony/ATV Music Publishing, EMI Music Publishing, Universal Music Publishing Group, and Warner/Chappell Music. In addition, GMG developed a form license with the National Music Publishers' Association (NMPA) which today covers more than 96% of the independent publisher market."

Original copyright holder could give someone else authorisation to sue on their behalf, e.g., through an assignment. Doubtful Genius got an assignment in the agreements they have with publishers.

Also, Google claimed it is sub-licensed to re-publish through a third party, LyricFind, which has licenses with "over 4000" music publishers.

>Copyright holder could give someone else authorisation to sue on their behalf, e.g., through a license.

They can't assign the bare right to sue. To have standing the plaintiff will need to hold at least one of the exclusive rights in 17 U.S. Code § 106 aiui. Cf Righthaven cases, Silvers v Sony Pictures

Not a "bare right to sue", but an exclusive license, or an assignment, could grant some of those rights that give rise to standing.

This is clearly not an exclusive license though, right?

I am almost certain you are correct, I would be shocked if genius was granted an exclusive right to lyrics. If nothing else, how could you sell the songs without rights to the lyrics?

So I don't think genius ever had standing to pursue this case.

Again, not a lawyer. But you'd think the actual lawyers would have checked this more carefully.

They are going to have a problem with standing for the exact reason you suggest. This case was one company who was scraping other people’s copyrighted works suing another company for doing the same.

Isn’t their main argument unfair competition? Google, the starting point of the internet, decided to undermine their business by taking their collated content and publishing it at the top of results?

Google appears to do this for other things, asking questions often shows answers without needing to visit the website. Perhaps these are all licensed and there is a kick back for these sites...

Google appear to be serving ads on content other people have collated while eliminating the source of traffic to the original site.. If that isn’t unfair business practice and taking advantage of their monopoly on search I don’t know what is.

But companies are absolutely allowed to do things that cause other companies to go out of business. It happens all the time. Think of the buggy whip manufacturers.

"Unfair competetion" has a specific legal meaning. As I understand primarily in the US it's under state law. Here's a summary: https://www.law.cornell.edu/wex/unfair_competition

I agree that this Google practice looks dodgy to me. But the question is, what law specifically is being broken? This looks like a copyright case, and if that's the issue, then the copyright holder is the generally the one who has to bring the case in. (I believe there are exceptions such as when exclusive rights are granted, but I haven't seen a justification that they apply here.) That's what the law requires; otherwise the courts would be even more swamped.

Again, I'm not a lawyer.

The problem is because the form of unfair competition is copying the copyright act takes charge and the claim must be made under copyright law.

It should be legal then for someone to run a meta engine on top of Google?

You very well can, but that doesn't mean Google can't block you (CFAA protection). Genius here was reliant on Google for a large portion of their regular traffic so they couldn't just block Google without suffering revenue losses.

Wasn't it ruled not long ago that CFAA can't be applied like that?


Does Google hold the copyright on its search results? Why can't I scrape Google?

Here is the question: Does Google has a right to block you? I believe they do, it is their API afterall.

In Genius's case, does it disallow Google for scraping?

From their robots.txt, I can't tell:


I wonder who that mecha guy is that he has his own rule in robots.txt


Maybe mecha is intended to be a robot user? Like the deleted user on Stack Overflow? In that case it might have lots of garbage contributions.

You shouldn’t need a law degree to interpret a robots.txt though. If that’s where we’re at then we as a society need to re-evaluate.


Genius is given a way out to prevent itself from being scrapped.

But it doesn't. Mean they probably value traffic from Google.

Which means they are only not OK with scraping when Google uses the scrapped for purpose they deemed reduced their traffic

> Which means they are only not OK with scraping when Google uses the scrapped for purpose they deemed reduced their traffic

Obviously. The way the profit model for the internet works right now, for sites to coexist with Google, they must actually receive some of the traffic that is generated from searches matching their content. Who would be okay with having all of their content scraped with the result being that they get none of the traffic and thus the monetary benefit from the work they do?

It's not Genius' content, though. It's the artists'. If it was their content, they could simply sue for copyright infringement.

Because they'll block you. You can prevent Google from indexing your content using robots.txt (Google has a robots.txt on its site as well).

You don't have a right to access their service as many times as you want to, eg by automated means, although you can attempt it. Flip a coin on whether they sue to stop you if you become too annoying.

The Genius complaint is essentially that they want to be represented in Google search without having Google take lyrics from their service and use them in their own served-up content snippets (making a sizable part of the value of genius.com void). Genius knows Google can get lyrics elsewhere if they have to, the lawsuit is probably out of spite due to past conflict with Google and their annoyance at Google competing with them in a shady way (Google was de facto using Genius's service to reduce the value of Genius).

That's correct however the publishing company that administers an artists royalties is generally the one to bring the suit. This is the same type of royalty as sheet music.

I feel like this is a forgotten bit of history but for years Genius didn't pay royalties for reproducing lyrics instead choosing to claim that their own reprinting of lyrics fell under "fair use" guidelines:

>"David Lowery, frontman and songwriter for Cracker and Camper van Beethoven, is waging war on the sites he believes make money off song lyrics but don't pay the songwriter. Once he took a closer look at where his music was making money on the Internet, he realized: There were more people searching to find lyrics to his songs than searching to illegally download mp3s of his music. And he wasn't making money off those searches. Last November, after months of exhaustive and systematic Googling, he released something called The Undesirable Lyric Website List.

>"The National Music Publishers Association seized upon this list, and announced that it would be sending take-down notices to every single name. At the top of that list was the very popular Rap Genius."

>"Rap Genius has been around for a few years, and it's extremely popular. No ads, lots of traffic and, just recently, a major investment from one of the hottest venture capital firms in Silicon Valley. The founder of Rap Genius, Ilan Zechory, says the site doesn't belong on Lowery's list. Because it's way more than just transcribed lyrics. He says the site is more like a social network: a discussion board for music geeks and even some of the musicians themselves — prominent rappers like Nas and Rick Ross — to comment on their own lyrics. Artists, the founders say, love the site."

>"Just this week, Rap Genius announced that, despite its opinion that the site falls under the criteria for fair use, it's going to pay songwriters for posting their lyrics. It's just easier than fighting with music publishers, who've been very successful at going after other lyric sites in the past few years. ..."[1]

[1] "https://www.npr.org/sections/money/2014/05/09/310462951/when...

Genius claims that Google’s actions caused a decline in traffic to its site. The lawsuit was probably a way to assuage nervous investors (who have poured >70M into the company)

They're gonna annotate the web though! Any day now...

$70M for lyrics not even owned seems absurd.

Still, Google is being very fucking evil here. It's as if they stole that $70M for themselves.

> very fucking evil

Your abuse of words is a heinous crime against humanity.

Surely the investors are nervous now?

yeah but they can't blame Genius founders, whoop whoop!

I think it's important to point out that when you license lyrics, you don't actually get the lyrics. I know, sounds ridiculous. You'll get the license to display them, and when you ask the rightsholders of these lyrics (the publishers) for the actual lyrics they'll tell you "oh, we don't have the actual text, just the rights. You need to find the text somewhere else."

As a result, creating an accurate lyrics database like Genius has done is an enormous amount of work, and my non-lawyer gut-feeling says that in this case, Google is screwing over Genius big time. Too bad the legal system doesn't support that.

It's for this sort of thing that Google had to get rid of their "do no evil" spiel.

If Google can scrape my site, am I allowed to scrape Google results? Could I create a Google clone by scraping?

If I scraped the most common search results from Google, front page only, and removed all the ads what would Google's argument against that be?

On one hand, so many sites make finding information difficult, on the other it feels pretty scuzzy that Google prevents searchers from clicking through to the site that put the work into generating content.

"If Google can scrape my site, am I allowed to scrape Google results."

You are alllowed. Google would not likely try to sue you. They will try to block you however.

Bing was created by copying Google results. Google did not sue Microsoft, but they did try to expose the copying.

> Bing was created by scraping Google results.

Do you have a source? Just curious about the back story.

Apparently, the copying allegation is false. There's more to it:


Not excusing MS, but it seems they were not wholesale copying Google results.

"Scraping" was the wrong word. I was remembering incorrectly. Microsoft did not have to scrape. They were apparently using search data captured through Internet Explorer. There was a time before Google had control of the browser. This illustrates how companies with browsers can gather data about users' web activity and how far they can go. Today, even Firefox is gathering data about users' activity with "telemetry", and users typing things into the browser's search box are by default sending this data to Google.

> users typing things into the browser's search box are by default sending this data to Google.

This is incorrect. Firefox will prompt you whether to enable search suggestions on first use, and will not fetch suggestions until you say yes.

Interesting that you said "until" instead of "unless".

Aha. Malware scraping!

Not GP but from memory this was the same incident: https://www.wired.com/2011/02/bing-copies-google/

Google was pretty unhappy with Bing for doing just that.


The amount of Google captchas that you needeed to solve when searching on Google from Microsoft office made me think that it was some kind of psychological warfare.

tbh sounds petty enough to be true

Being unhappy and being illegal are two very different things

From the article, Genius lost this case because:

> Genius isn’t the copyright holder for these lyrics, it just licenses them itself.

Both Google and Genius are licensing the lyrics. Ironically, Genius ended up having to settle a case years ago because they were using lyrics without the appropriate licensing [1].

[1] https://www.nytimes.com/2014/05/07/business/media/rap-genius...

It is scuzzy that Google steps on other sites' air hoses.

Your idea to scrape Google's search results is pithy, ironic counter-innovation at its dastardly best.

All you need to pull this off is funding for a top legal team, and deep reserves of emotional energy.

Go for it!

I think the legal argument services like serpapi make is that as long as you don't create a google account and/or accept google's terms then you are free to scrape and clone what is publicly accessible (at least in the US). I have no idea though.

I would assume Google respects robots.txt, so you should be fine non-abusively scraping their site insofar as you respect their robots.txt

Google's robots.txt does not tell the full story.

For example, if you include a User-Agent header and put certain strings in it, e.g., "curl/7.47", you will be blocked.

   echo -e 'GET /search?q=robots.txt HTTP/1.1\r\nHost: www.google.com\r\nUser-Agent: curl/7.47\r\nConnection: close\r\n\r\n' |socat -,ignoreeof ssl:www.google.com,verify=0
The problem with the robots.txt "standard", e.g., ones like Google's with no "crawl-delay" directives, is that it does not define what is a "robot". The query above is obviously not a "robot", but Google, with all it resources, still treats as such.

Google probably does more (abusive) scraping than any other entity. Web scraping is in their DNA. It is in their web pages, too.

   curl https://www.google.com/search/static/gs/animal/m05py0.html|grep scrape

Yeah, the tricky thing for those scraped by google is that given google's search monopoly, the sites can't block their scraping entirely, since they need to be shown in search results.

From https://www.google.com/robots.txt:

  User-agent: *
  Disallow: /search
  Allow: /search/about
  Allow: /search/static
  Allow: /search/howsearchworks

If you succeed then make something that is a proxy for gmail, and then for sheets and docs and chat! google-nextgen.com is not taken... yet.

I wonder if it's even possible to fix Google search in the framework of a for-profit company. It seems like the trajectory of any ad-supported service eventually lands it in a "don't let the user out no matter the cost" phase. Perhaps such a service really does need to operate as a non-profit foundation of some sorts.

There was a post about regulating Google like a public utility recently, but perhaps we should also consider looking at other less conventional internet "public utilities" - things like the Internet Archive, Wikipedia or essential open source projects like Debian. I think a search engine that's transparent both in terms of its logic and how it's maintained and managed might be the only way.

The other option would be strong antitrust enforcement; allowing competitors to emerge and compete with incumbents.

How would that work? What should an antitrust order demand Google to do in order to allow competitors to emerge?

Well in this case, not steal their content. It's wrong for Google to siphon other businesses work for their own profit.

Google shouldn't steal Genius or Yelp content. Facebook shouldn't steal YouTuber's content, etc. etc.

It's not hard to see how if you were in the little guys shoes you'd feel screwed over. We need to innovate our laws to reflect that.

It's not Genius' content. It's the musicians', who actually wrote the lyrics. If it was their content, they could sue for copyright infringement. Google has been sued on those grounds, and forced to change how they do their business (e.g. provide ContentID to handle copyright infringement on YouTube).

But I suppose we could say "don't scrape and present content outside of a regular search result". But then again, Google claims they haven't done so - that they got those lyrics from LyricFind, a lyric licensing platform - and Genius didn't present any evidence that this wasn't the case. So I'm not clear on how could any laws help here.

Finally, the question was not about Genius, it was about allowing competitors to Google to emerge. I don't see how would this help.

What does a GPT-3 future look like? If I ask it to fill in lyrics for a song or facts about companies and this comes from the knowledge it gained by “reading” a vast corpus, how is this different than a person reading the Genius site, memorizing the song, and the transcribing the lyrics?

Too me this is the endgame for all of these complains about search not being ten blue links anymore. Future knowledge engines will be vast AIs that have assimilated information into internal self organized structures, and will synthesize requests for that knowledge “in its own voice”

Unless AIs are lifting content by overfitting and making exact replicas instead of expressing the same facts in an entirely new way, I don’t think people will be a able to sue especially when the process by which the answer arrived is a massive Rube Goldberg contraption with 100 billion parameters.

GPT-3 for example can already extract information from SEC EDGAR reports, a service other companies often charge money for.

Techcrunch article left too many questions for me so I looked up and found https://www.rollingstone.com/music/music-features/genius-law... which seems to be a much more detailed and nuanced explanation of what is going on here, with backgrounds on who, what, etc.

>A state court

>Eastern District of New York

Ugh. Basic legal literacy can no longer be expected in the media?

EDNY is a federal court, not a state one.

This was confusing to me as well. I saw EDNY and was confused about why it was booted due to federal preemption. Considering the importance of this procedural aspect, it seems like a pretty big deal to get this wrong.

The real story here is left completely untold: why didn't they bring a copyright claim? Could they bring a copyright claim in the future? Could one of the owners of the copyright bring a claim? These are the questions that matter.

They couldn’t bring a copyright claim because they don’t own an exclusive license.

That's not literacy but domain knowledge

What will google steal when all the other websites are dead?

Nothing, because then you are on google sites all the time and google gets all the ad dollars.

For some reasons, google wants to become AOoL, introducing the A with their AMP service (or with Applied Semantics).

This rises the question: Will content owners create their own content network? If Google steals your content on the internet, why put your content on the internet? Why not have an app that delivers content to paying customers? Now each content provider tries this on his own with his own app. why not combine the efforts and just offer a browser for their closed network or embrace the Brave browser? If all content producers pull this off together, the audience will be there.

Facebook could offer a Facebook content network on their own because they already have the audience, and Genius and all those Recipe sites could publish their content in a secure way. Maybe Instagram with its text pictures is already the predecessor.

It seems like Google was taken over by Applied Semantics in the same way that Boeing was taken over by McDonnell Douglas because in the long run, nobody offers up his content for search if it is ripped off.

Thank goodness. I'm happy that google gives me lyrics and I don't have to go an add covered lyrics site.

It perfectly fits their mission "to organize the world's information and make it universally accessible and useful".

I enjoy using Genius. People, including the artists, leave commentary on lyrics that give me a deeper understanding of the lyrics.

I enjoy Genius too.

Genius with it's annotations is much more interesting than just the lyrics in plain text.

Also Google makes you click the big down arrow before showing the complete lyrics, which makes me irrationally annoyed.

> to organize


Google is blatantly ripping off companies and putting them out of business.

The crux of the legal decision here is that Genius doesn't own the copyright on the lyrics either. Hard to steal from Genius what they don't own in the first place.

Genius does give the tools to crowdsource lyrics, meanings, and comments. Google brought the rights to display lyrics but not lyrics themselves..they just steal that from other companies and rehost it with their own ads. Copyright law was not designed to handle this, which is something that is obviously morally wrong and will eventually harm consumers. Google has stated: 'if you don't want to be crawled, use Robots.txt' which because they have 90% market share, is clearly impossible for Genius. It's downright evil.

Downright evil seems like a stretch, especially given that Google doesn’t seem to have gotten the lyrics from scraping Genius and rather bought them from a third party who themselves scraped them from Genius. Whether or not Genius blocks googlebot in their robots.txt doesn’t actually seem to be relevant. But this does seem like a good case for the US to introduce “database rights” into copyright law to reward entities that assemble collections of otherwise-disparate information that they don’t own the copyright for, either because someone else does it because it’s something that cannot be copyrighted. Then Genius would have legitimate grounds to sue the third-party that scraped their lyrics. On the other hand, this would also had it existed have allowed Google to sue Bing back when it was first starting for piggybacking on Google’s search results. It’s not obvious to me that database rights are a good idea.

Interesting to see the roles reversed.

Didn't Google complain that Bing was copying their results a couple of years ago ?

Is it possible to have a middle ground ?

I can see both points of the argument :

- Genius does not own the lyrics, in most cases these are entered by users afaik. A similar example would be somebody adding an address/info on Google Maps.

- On the opposite end, associating a query like "that song written by blue haired 80s singer" to an actual result sounds more like a transformative work (although google owns user entered information as well here with the database of all the queries entered by users).

Would it be possible to have a framework where you can purchase such data at a fair price ?

I can't really understand why Genius cares. If Google didn't scrape from them they could scrape from one of dozens of other lyrics sites with pretty much the same results.

The value add of Genius is not the lyrics anyway. I never go to Genius to just look up lyrics because the site is fairly heavy. I use another site that is lighter and has a nicer lyrics format.

The only reason I go to Genius is for the real value add - the song annotations; and these are added by volunteers.

I rarely use the Google version of the lyrics either to be honest.

> Defendants made unauthorized reproductions of Plaintiff’s lyric transcriptions and profited off of those unauthorized reproductions, which is behavior that falls under federal copyright law.

Does this mean Genius still has grounds to sue, as the copyright protections are on their “work” which is the transcription?

It also seems to mean a ToS is not useful for protecting content, only determining legal users interaction with it (Excluding loading the page). A webpage is a work rendered through a browser, kinda makes sense I guess.

Is the transcription actually copyrightable? Just because genius wants it to be doesn't make it true.

Compare to the whole word perfect clip art lawsuit

I think that’s their case to make, but from what I could find the answer seems to be probably not.

Derivative works are copyrightable (Translations, adapting a book to a movie), but it might be hard to argue an exact copy of the song lyrics are “derivative” as they aren’t an original creation, but simply a subset of the old work.

Remember when Google punished Genius (then known as Rap Genius) by removing them from search results entirely?



Pretty funny that Google depends on them for content now.

This all seems odd to me. Publishing song lyrics on the Internet seems like a small feature of a search engine and not an entire company with millions of dollars invested.

Genius does not hold lyrics copyright but the site business model depends on traffic generated by lyrics which is in this case hijacked by Google.

They license lyrics however and, critically, often times lyrics and their copyright are not a package deal. many songs don't have lyrics provided from the publisher. Genius allows users and artists to upload the lyrics. Google has the copyright but blatantly steals the content. This obviously is not protected in US law but it is elsewhere (see news sites and such in some countries requiring google to pay). Google then uses their search results to leverage their position.

Just a thought experiment: If Google releases paid version of search where it won't show ads, personalize based on your click-through history and do not copy content from source for presentation, won't share your data with anyone, would you opt for it?

Root cause of most of the issues seem to be monetization model of Google where they are optimising people to stay within their ecosystem.

It's an interesting question. I don't trust Google, and a fast pivot to another product/offering wouldn't change that overnight, so I would lean towards no.

I paid for Youtube Red, even though I never signed into the service on Youtube and even though I already got all of the Youtube-specific benefit in the form of adblocking and NewPipe. I did that purely to try and signal to Google, "I will pay for content, I want you to have revenue sources outside advertising."

But ultimately Youtube Red seems to have been a failure, my signal hasn't changed the course of the company, and the parts of Youtube Red I did get value from (say, Music) have gotten noticeably worse over the years. I'm planning to drop my subscription in September. I think it would be tough for me to buy into another product like that from Google, my experience trying to buy products from Google to get around advertising/tracking has been both a practical and moral failure. Certainly I won't sign up for a paid GSuite account now.

But I do already pay for Email (Fastmail), bookmarking (Wallabag), and a few other "free" internet services and apps today. So I might pay for search if a company like DuckDuckGo offered it instead of Google. But I would need to see what their offer actually was.

I mean, heck, I'd be giving recurring donations to DuckDuckGo today if they accepted them, they're on my list of companies I want to exist. So paying 5$ a month for a "premium" search experience wouldn't really be that different, even if I never signed into it.

> I mean, heck, I'd be giving recurring donations to DuckDuckGo today if they accepted them, they're on my list of companies I want to exist. So paying 5$ a month for a "premium" search experience wouldn't really be that different, even if I never signed into it.


They seem to want you to enable their ads. Is that something you would ever do?

Unfortunately no.

Different people have different reasons for disabling ads. Some people only care about egregious ads and privacy violations, that's fine.

I am against ads in general, I noticed a sharp quality of life increase when I started blocking ads universally everywhere I could, regardless of whether or not I was on the web. That's a longer conversation, I'm not going to get into it now. I respect that other people have different opinions, but I also feel reasonably strongly about my own.

I do spread DuckDuckGo to other people, but I also spread uBlock Origin, so I'm not sure I'm a net positive there either. :)

It's again, an interesting question. I've been pushing pretty hard in my personal life to financially support Open Source software that I use, projects that I really care about. DuckDuckGo is one of the few companies that I really like that I haven't ever really supported in any tangible way.

I do feel guilty about that, but not guilty enough to allow myself to be turned into a product. There is likely no company where I would ever feel guilty enough to turn on ads. But given that DuckDuckGo isn't going to allow donations any time soon, I could see a lightweight 'premium' DuckDuckGo product effectively being a way for me to just give them money without it feeling to them like it's a donation.

Actually, what I should do is to look into some of their merch and see what they offer and if they actually make a profit on it. I thought at one point DuckDuckGo sold shirts or something, but I don't know if they still do. Again though, this all kind of ends up being a messy proxy for donations. I'm not super-jazzed about walking around as a living billboard either, even for a company I like.

The difference in amounts between what Google makes per 1000 searches and what a profitable/sustainable search engine needs to make are quite vast.

Can't remember the source but G make around ~$70 per 1000 searches. Bing make less than half than this, DDG less again.

G commands higher rates because of higher competition for ads and also because of its retargeting and all the other methods that raise that per 1000 figure higher.

To answer your question (sort of), Google's non-ad version would need to be expensive to break even with its current model.

I'd like to be able to remove the ads, but I don't care about the rest of it. Besides removing ads, the other things you mention would make the service less valuable for me.

I don't care about the ads (I block them) or personalization (I don't go through their click-counting-script).

I'd happily pay 10€/m for better results though.

> do not copy content from source for presentation

That alone makes too different an experience.

Applications are open for YC Summer 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact