Hacker News new | past | comments | ask | show | jobs | submit login
How companies use fake sites, backdated articles to censor Google results (2017) (lumendatabase.org)
573 points by rinze on Sept 30, 2018 | hide | past | favorite | 121 comments

Interestingly enough, this Torrence Boone guy working now at Google seems to have had lots of success keeping his #1 SERP clean. His LinkedIN tells us he worked as "Global CEO" at an unnamed agency before, he got zero recommendations, but is now Vice President at Google.

Now comes the interesting part: Google won't even autocomplete "torrence boone enfatico" — Shame upon him who thinks evil upon it...

If you've been wondering about the moral decline at Google, this is the kind of people they hired as top management.

Is hiring privacy-conscious people a sign of moral decline?

I'd argue that it's a quite good idea to keep your LinkedIn profile looking exactly like that, unless you've decided that you need a good-looking profile right now for "advertising" because you're looking for a job, and you're going to use LinkedIn for that, which many job-seekers won't do.

If you'd be working in defence or infosec, I'd be with you about being a little more diligent about too many details in your employment history.

Then again, it's not like he's shy about clearly lining up all of his jobs before the Enfatico disaster or his Harvard/Stanford alma maters.

It doesn't auto complete on DDG either, so I don't know if that would even be like popular enough to auto complete? I just read the AdWeek article that came up first in it and it's from like 2010.

Somebody needs to take a stand against fraudulent DMCA notices. The escape hatch built into the DMCA to absolve the slimeballs of culpability doesn't nullify other applicable laws.

It clearly constitutes a form of libel against the victims falsely accused of a civil violation. The lawyers involved are violating their ethical duty as court officers and should be charged with barratry. The ACLU/EFF ought to be pursuing this tack to curb the abuse.

The EFF have been pursuing this for decades now. An appeals court upheld the law and the supreme court wouldn't hear the case. Not a ton more they can do, suing individual offenders doesn't solve much.

Indeed. Currently it looks that many of the mechanism originally made for defense get turned around.

Another example would be reporting content or people for violations they did not make.

See: Sony music vs some guy playing piano on youtube.com.

I mean, SONY did have a reasonably valid claim to that copyright, even if the work SONY did to modernize the piece was obvious and trivial.

Copyright terms ought to be much shorter. This is the root of the problem.

I've had something like this happen to a customer of mine. Some of his content posts were being copied by random sketchy sites, then backdated by a couple hours. Google would show the copied content as the main link in Google news, and his actual article would be buried somewhere under "more like this" link (or whatever it was called in news).

Really interesting, this behavior implies that Google hasn't figured out a way to verify what a page states as its creation date.

This could be solved with cryptographic timestamps [1]. The current generation of tooling is a bit raw - cli utility and Python library - but fully usable on the publisher's end. If the practice picked up any traction, surely more front-end tooling would be created.

Example workflow:

- write a blogpost

- build a canonical representation based on a simple HTML or XML template; including canonical URL, title, content, authorship (emails, social media URLs, etc.), optionally related multimedia in a Zip archive, perhaps inlined via data URL

- alternatively, re-purpose a common Web scrapping format, like the one used by Firefox, for canonical representation

- submit the canonical representation for timestamping [2]

- provide a <meta name="publication-timestamp" value="canonical representation url"> in the document as published, to facilitate automatic and manual verification

- upon dispute, provide the canonical representation and verify in the relevant timestamping services

[1] for example http://opentimestamps.org/

[2] from experience, the processing takes under 30 minutes with OpenTimeStamps

I agree that we need cryptographic timestamps. Seems like this would be a good use for blockchain, but any implementation would be better than nothing, as long as it's not fully centralized.

The proposed workflow seems overly complex, though. For authorship verification all you need is ability to submit a block of text and receive a signed timestamp.

  Input: text
  Output: timestamp + signature
  Verification input: text + timestamp + signature.
The simpler it is, the more universally it will be used. It should not 20 minutes either. The crypto involved should be pretty trivial.

Any metadata could be built into the text itself as comments or hidden fields. However, this will not verify the URL the data is coming from, which might be desirable.

The second kind of service could be verifiable archival service.

  Input: URL.
  Output: page content after rendering + timestamp + signature.
  Verification input: text + timestamp + signature.
  The service stores: timestamp + signature.
This could be used to verify that at time X URL Y contained text Z. It doesn't matter whether the text is dynamic and go change later, since you could keep original page content.

> It should not [take?] 20 minutes either.

Fair enough; the tech I know and mentioned (OpenTimestamps) is tied indirectly to Bitcoin's blockchain confirmation speed, thus both slow-ish, and also probabilistic rather than guaranteed. On the other hand, it gives good assurance that the tool will be around for a long time, and able to survive any single service provider's failure.

> The crypto involved should be pretty trivial.

Depends on the requirements. The tech I know and use (OTS) is privacy-conscious and handles timestamping of documents without revealing their actual content. Instead you submit hash(hash(document) + random, private IV) [1], which provides for both anonymous use, and ability to keep the signed content private, as the signed hash cannot be directly tied to any plaintext. This isn't strictly necessary for publicly-posted web pages, but certainly is useful for private documents, security-sensitive documents, etc.

All in all, it's a starting point. Another possible approach would be to standardize submission to, and verification through a service like https://archive.is/, but that puts all trust into one 3rd party.

> Input: [plain]text

Directly timestamping the plaintext fails badly if the timestamping service provider is dishonest and creates an earlier timestamp for the content in their name, only to turn around and accuse the author of stealing the content. Or - a slightly paranoid scenario, but probably warranted in the post-Snowden era - if a whistleblower tries to timestamp some materials before distribution, but the timestamping service provider's filters catch the document and instead alert the government. Ergo, I'd prefer a blind timestamp, with the ability to reveal at the discretion of the author or whistleblower, even at the cost of more complex workflow.

[1] sadly this means the author needs to keep a file containing the IV and a bit of other metadata.

All good points, although there are some arguments that can be made in favor of more straightforward protocol even if it's less secure. I agree that submitting plain text could be problematic for some use cases. At the same time, there are some cool things you could do with plain text, like using the same mechanics for content-addressable content.

Another option could use TLSNotary to verify the content at time of publication, built into the CMS/publishing platform


Google crawls the open web non stop. They definitely have this data. It's just not exposed to the (legal) team dealing with DMCA takedown requests..

EDIT: Or as to the example above: whatever application is responsible for picking the main article to show somewhere.

They crawl the web non-stop yes, but how often an individual site is crawled varies.

> Our crawl process is algorithmic: computer programs determine how often we should crawl each site. If you post new articles on your site throughout the day, you should be able to see them appear on Google News fairly quickly.

> [...]

> From the moment we discover a new article, we'll keep re-crawling it looking for changes. Since we noticed that most changes to articles occur just after they're published, we revisit articles most frequently in the first day after we've found them. After that, we visit them less often.


Fair point, if your content gets copied to unknown website X which Google hasn't crawled for days they might not be able to tell where the article appeared first.

Have you figured out a way to verify what a page states as its creation date?

Inside Webmaster Tools is “Fetch” tool. It allows you to submit any page to Google. You can then also have Google index the page. When you fetch a time stamp is recorded and a full HTML copy of the page too.

I am not sure if Google uses this to prove which is the original copy. Most website owners don’t know this exists. So it’s hard to use it as proof of the original, because the original creator may not even use the tool.

However, if everyone started to submit key articles, then it theoretically could compare the two fetches and decide which is earlier. It could be automated too with sitemap uploads as you post content.

Moreover, you could then use this as evidence in a DMCA take down that your copy is the earliest. Anyways, there are ways to prove earliest content.

That's my point. There is no way to verify the date assertion of an arbitrary page.

The problem is akin to standing in a crowd where some people hold a sign with their age on it and others do not hold a sign, then trying to verify the age of an arbitrary people.

You can prove in many ways that your content existed at a given time. But it seems quite impossible to prove that a copy of it didn't exist earlier somewhere else.

edit: unless the content itself contains a proof like the latest block hash or something like that, but that could be edited

Yes, this appears to be the date google shows next to the search results.

Every time I write a blog post, I submit manually via webmaster tools "Fetch as Google" so that it gets indexed immediately. That way, the date google shows matches the date I published the article.

Ignore the claimed creation date, and use the earliest date that you know for certain, i.e. the time you crawled it.

That only works if you happen to crawl the original first. Given that copies could be plentiful that's a losing game.

Content creators signing their content with a private key would be a better idea.

Content creators signing their content with a private key would be a better idea.

How would that help? Wouldn't the copier simply sign with their own key?

Maybe Google should run a timestamping service. It could be stateless: send an hash of the content, get it back timestamped and signed, then include it in the page before publishing it.

They could, but it would not be registered before the other one would be.

- create signature

- post signature, possibly publicly, (wow, maybe an actual use for a blockchain)

- post content

Anybody copying it could be proven to be a copyist.

Right, you'd need something like the blockchain - ie, a timestamping server - to validate the posting date. But then you don't need a signature, just a hash. The advantage of using Google's timestamping service is that it wouldn't need to store the hash/signature, unlike the blockchain.

No, you don't need blockchain for that, just a good old RFC 3161 timestamping server, most big CAs run one and timestamp stuff for free.

Actually secure timestamping is already used by Windows to verify software signatures, even when the signing key expires: https://security.stackexchange.com/questions/47289/how-is-an...

Heck, you can even use Roughtime [0] to achieve secure timestamping as it uses digital signatures and allows you to provide a nonce. Even the design doc [1] mentions this use case:

> This means that Roughtime replies can be used as a time-stamping service, and thus that Roughtime servers can be used to check each other.

[0]: https://blog.cloudflare.com/roughtime/

[1]: https://roughtime.googlesource.com/roughtime

- Change "a" to "а", hash changed - ... - Profit?

The hash is just to identify the individual article, so it can be timestamped. Google would still use its existing algorithms to decide if the two articles are copies or not.

The publication of the hash proves who came first (to the timestamping service). Content comparison can be done by other means than the hash.

How could that possibly solve the problem of determining exact creation dates?

Not sure why you are being downvoted. If we can't do it manually we can't automate it.

No, I haven't and in a sense I'm relieved. I've certainly thought about it a lot and am glad that people are smarter than me haven't figured it out.

What did you and the client do then ?

Google is crawling all over the web but cannot detect these as fraudulent requests? That seems like an easy check. Do we have an index of this domain? Was this backdated?

But they could be blocking it with robots.txt or some other excuse. It seems like an incredibly rare exception, what news site would block being crawled and indexed? They also are archiving domain info I'm sure. If they wanted to stop this, they have the data.

I don't know, is "we've never seen this page on your site before" a valid defense against a DMCA claim? If I understand DMCA correctly, ignoring the claim means risking a lawsuit if you're wrong about the content not being owned by the claimant.

I lost a domain a while ago. It was quickly snapped up by a squatter, who promptly repopulated it with the exact content it previously contained (harvested from the waybackmachine). They made a single change to the content, adding a link on the home page to some supplement blog.

So it's my content, on a domain I no longer own or control.

How could I prove to Google that the content is mine?

DMCA doesn't require proof. The fact you are attesting under penalty of perjury is enough. If they contest it then your only remaining remedy is a court order and that would require proof. The wayback machine may be sufficient.

I recently learned that this is due to a somewhat shady SEO trick referred to as Private Blog Networks.

You take over expiring legitimate domains, and ideally immediately (before Google notices) put up almost the same content, but with a link to whatever site you're actually interested in giving better search ranking to.

Repeat this 1000x and you can apparently make enough headway that people put time and money into the approach.

> That seems like an easy check. Do we have an index of this domain? Was this backdated?

How do you know it's backdated if by the time you crawl the articles have appeared on both sides. How do you know which was published first? This attack can be performed faster than the re-crawling frequency.

It could be, but from the article that came out, it seems like it's a lot slower. This example was over a year later.

They surely have both the capability and data to thwart this type of fraud. There are only two possibilities that I can think of.

One is that it's a business decision, where they know these cases may exist but they took the decision that it's not worth the extra overhead. Second possibility is that it's an oversight, essentially a "bug" in the system.

Either way, it appears that Google had not yet identified this as a serious problem.

I give credit to them for having a Transparency Report so that others can pick up fraud and abuse which they themselves may not have already found to have been a problem; an essential safeguard. I also view projects such as Lumen as being of increasing importance.

Google is crawling all over the web but cannot detect these as fraudulent requests?

Google crawls surprisingly little of the web, even on web sites it knows about. That’s why web developers have to use site indices and Google Webmaster Tools to tell G that stuff exists on their sites, hoping that it will be discovered and indexed.

The Torrence Boone thing is quite remarkable.

Considering a fake DMCA carries the same penalty as perjury (correct me if I'm wrong) then I would have expected Google to have taken some action. A suspension pending an internal investigation or something similar.

Have they made any public statements on the matter?

>Considering a fake DMCA carries the same penalty as perjury (correct me if I'm wrong)

My understanding is that regrettably you're wrong, definitely at least in practice if not in theory. First on the theory side of things, Warner Bros at least argued in the countersuit Hotfile filed against them for abuse of takedowns that when it came to the DMCA:

>"The DMCA’s language applies only to impersonating a copyright owner or sending notices on their behalf without authorization; mere misidentification of the files being taken down or the works represented therein are at most incorrect ‘statements’."

Unfortunately the settlement Hotfile made in the original suit against it by the MPAA also ended its countersuit so Warner Bros arguments were not put to the test. If anyone else knows of a subsequent case that'd be good to read, but I can't remember any. If there's nothing at all then hard to say either way, but there is at least a strong colorable argument that it either doesn't mean what a lot of people think it should period or it's too ambiguous to have any real teeth.

And speaking of real teeth anyway, there do not seem to be any fixed consequences for perjury, and I do not think there has ever been a single prosecution for it in the now 20 year history of the law. In fact the sole and only victory of any kind I can remember of even a flagrantly abusive act of political censorship was years when WordPress won a default judgement against "Straight Pride UK" [1] (which had already gone dark of course). Lenz won vs UMC, but I don't think won any actual damages? And further the standard set there was that actual bad faith by the rights holder would need to be shown, which is yet another hurdle. I think only something like Diebold rose to that level. Edit: ubernostrum beat me to it and mentioned Diebold and that perjury is about who you're representing not what you're asserting too.

So at the end of the day there just has never seemed to be any significant consequences to abuse (or "mistakes"), and the result has been as would be expected given the incentives and interests involved.


1: https://www.theverge.com/2015/3/9/8175491/wordpress-automatt...

Wouldn't this qualify as impersonating a copyright owner, though?

I'm not a lawyer and I honestly don't know exactly how to parse WB's argument there, it really needs to go to trial. If interpreted liberally does it mean that anyone representing any copyright owner at all, so long as they're not lying about who they're representing even if they're completely lying about what they own, passes? I hope not! But I can't tell, it seems like it could if the perjury language is interpreted solely and exclusively to apply to the statement of who is requesting action, not at all about the substance of said action? I would hope it'd go to trial and result in precedent, but from the article it sounds like they settled, which to be clear is totally understandable for a client first perspective. Not everyone wants or can afford to fight the good fight at any expense. But I don't know where that leaves the "perjury" side of things specifically, beyond the same take home we've had for a while: the DMCA's process is too prone too abuse.

On the other hand it definitely would appear to be a slam dunk on the bad faith part, but again there don't seem to be any really clear, really significant penalties on that side. Someone who goes through all the massive expense and trouble of challenging it could win 100% of the time, which would mean the content could be reinstated. But can they even get all their expenses back, let alone enough multiplier penalties to deal with missed enforcement or get lawyers interested in doing it on contingency? At some level of abuse courts might have other tools, maybe it can start to get into other higher level laws covering fraud or something, but that seems like it'd be a harder case by definition vs specific language of the direct law in question. Even if it's clearly defeatable, if the cost is high enough that may not matter in practice.

One thing though that is always possible is to at least generate political outrage, and in that case the more egregious and the more clearly they get away with it the better. There have in fact been multiple attempts to reform the DMCA in Congress including some really good language, but none of them have gotten through. That doesn't mean it's not worth continuing to bring up when the right examples present themselves that are extra photogenic. I've printed out some excerpts of this and I'm going to send it to my rep and Senators tomorrow along with some other collected concerns and a list of previous legislative efforts. I don't expect any action in an election season but sometimes pebbles add up. All the protection laws we have didn't come out of nowhere and didn't come up with zero opposition from entrenched interests after all.

Considering a fake DMCA carries the same penalty as perjury (correct me if I'm wrong)

The only part of a DMCA takedown that's under penalty of perjury is the assertion that you are the copyright holder, or are authorized to act on behalf of the copyright holder.

The infamous Diebold case -- where they did get penalized for a false DMCA notice -- involved Diebold more or less admitting openly in court "yeah, we knew it was wrong but we sent a notice anyway". Which means it's extremely difficult to get someone on a false DMCA notice.

> The only part of a DMCA takedown that's under penalty of perjury is the assertion that you are the copyright holder, or are authorized to act on behalf of the copyright holder.

But that's exactly the fradulent claim, and unlike other cases that are understandable genuine mistakes (even though the mistakes would be avoidable if the claimants put a reasonable amount of care into checking their claims), in this case, it's clear that the claimant knew the claim was false, and they falsified documents to support that claim.

I highly doubt courts would appreciate this sort of behavior, it's just that the victims of the censorship likely don't know, don't care, don't have the resources to fight, or don't catch it before the statute of limitations expires.

Unfortunately, it seems like the maximum penalty for perjury is five years, and I assume it's rare that anything near that is imposed, so I guess someone decided it's worth it.

Isn’t this a case where they’re falsely claiming to be a copyright holder?

I think it's only perjury if you're making the claim in court. If you're making the claim to Google, you're just lying.

I saw Scottsdale, Arizona mentioned and straightaway remembered that GoDaddy is based there. Coincidence?

Edit: actually, answering my own question -- when a domain is reg'd at GoDaddy and privacy is enabled, it's then "owned" (in the whois sense) by DomainsByProxy at GoDaddy.

It seems almost impossible to reliably backdate articles, especially if supposedly original domain name wasn't even registered at the time the supposed copy was posted.

The only scenario I could think of is claiming Google didn't index the original because it was blocked by robots.txt. Apart from that, how could Google not know the original date of publication?

I guess Google doesn't have systems in place to do this verification and they just remove from the SERPS anything and everything that has a DMCA violation, just or otherwise.

Another scenario I've witnessed myself is someone using a copyright free image (Unsplash) and then someone else claims a violation, saying they are the actual copyright holder and it was fraudulently posted on Unsplash by a third party. Google removes your homepage from serps, traffic tanks, business ends and there's no way you can even fight the decision. In the end who knows who's the actual copyright holder?

> The only scenario I could think of is claiming Google didn't index the original because it was blocked by robots.txt. Apart from that, how could Google not know the original date of publication?

What if the article wasn't even online? I.e., some journalist "wrote" the article for some local newspaper without internet presence.

Would it be possible to content providers to send a URL to Google basically saying I made this, here's a timestamp to prove that you actually released it first, and whoever copies it and backdate it won't have any reliable proof they actually did publish it first?

Whenever I publish something I made, I submit the URL to the Wayback Machine so if it gets hugged to death at least there's a public copy somewhere else.

There are many secure RFC 3161 timestamping services. You send them a hash, they timestamp and sign it and send it back. Some services are free, some cost money. Unfortunately, there's no way to express timestamps of web pages in a form Google understands. And it would only be meaningful for static pages, anyway.

> I made this, here's a timestamp to prove...

a timestamp can't prove anything. What stops somebody else from copying your content, but timestamping it earlier and submitting it as proof? How can google verify it?

In fact, google may have an incentive to allow this - since those scammy sites may actually be using google ads, while legit sites may have their own ad-nework which google may not be part of...

By having the timestamp be signed by a trusted party. Either a single entity or e.g. the blockchain

This is basically the purpose of copyright registration and filing your work with the Library of Congress

I agree that would be a potential solution but it's too far removed from reality to even be a consideration imo.

You'd need some sort of negative affirmation as well. "when I published this article, searching for it yielded nothing" signed by a trusted party would be a stronger statement. But no clue how you actually do that.

Even if the third party reliably proves your article is published october 1, what prevents the scammer from publishing an article with a fake date of june 21 on plain www-internet? It's not like the scammer is going to publish the article hash on any block chain.

Something like this can only work if all news publishers always publish the article hash and those that don't will simply not get crawled and not be valid as proof in DMCA takedowns.

Timestamp before you publish.

Yes, but not with Google. This can be done with the Bitcoin blockchain. Before you groan, hear me out. You can submit a transaction which contains a hash of your content. Once it's on the blockchain you can prove that the content was created no later than the date of the transaction.

.. Which does nothing against a backdated article. What stops me from copying a NYT article and posting it to the blockchain a few hours after, claiming it's mine?

What makes the bitcoin blockchain good for this? The bitcoin blockchain is good for posting transactions within the bitcoin network and kind of a shitty solution for "a generic blockchain"

Why a blockchain at all? Why not just a merkle tree? Surely there is somewhere better than the bitcoin blockchain for this. CT logging isn't done on the bitcoin blockchain for a reason.

So sure, you can prove that something wasn't created after the date of the transaction (assuming you're using a CS hashing function), but for this to work in this specific context, you'd need every publisher to be posting transactions to this log.

If I post an article and post a hash to $genericBlockchainWhichIsReallyJustAMerkleTree, what prevents someone from copying, backdating, and saying that I copied them? That they didn't post to the log as well?

Agree that bitcoin blockchain is unnecessary, but isn't there some merit in the idea that all major publishers in a juristiction would opt-in to posting to a single chain? Given the potential legal cost savings it would seem to be at least somewhat feasible.

Oh absolutely. But I don't know how much of a "blockchain" that really is depending on how you define it. When I think blockchain, I think miners/proof of work/etc.

Is a CT log a blockchain? Because really what we're talking about here is a signed ledger which may be decentralized but very well may be centralized as well.

> What makes the bitcoin blockchain good for this?

I can use it for decentralized time stamping right now.

You should really educate yourself on the bitcoin blockchain and how it records time before making a fool of yourself by trying so desperately to dismiss it.

Problem is, the evil SEOs backdate, therefore he needs to prove that the content was created no earlier - thus IIUC, blockchain/Merkle tree can't help him...

Depends on who has the burden of proof? The evil SEO cannot prove his version was created earlier than his.

Any third-party tool like PasteBin would work in a similar way and incur no transaction fee.

There is no way to know that someone at PasteBin wasn't paid off.

Are you saying that you'd be willing to host such a blockchain on your machine with timestamps and hashes for every article in the Internet?

The space before punctuation thing might simply indicate a European writer. Several countries in Europe do that deliberately and I notice it with a lot of my colleagues.

It's interesting to note also how they appear to use the Fox News trademark while having no apparent affiliation. Also, although I don't know suite numbers, it's interesting that the address on the domain registration looks to be very close to a UPS store.

Are you sure about that space thing?

I live in Europe and I have been browsing the Internet for 20 years now, and this is the first time I hear about it.

Also, I visit plenty of Eropean news sites, and none does that.

It seems to be pretty common among the French at least. I haven't seen it in print media, only from individuals.

In French, no space before a comma or a period (unless want to explicitly communicate your disdain for computery things or you’re above 60) but we do put a space before : ? !

Example: « Pas de palais : pas de palais ! »

(^Quote from the Asterix movie that will be way more useful to you for making French friends in a bar than saying « I can’t speak French » in French)

This is actually a really interesting example of the "space before punctuation" thing rather than " space before punctuation ".

Also: it looks like we're using colons differently, is that right?

We use colons for the same reason as in English, but there is a space before colons in French (Microsoft Word inserts it automatically for you in French), and no uppercase after the colon.

It's a somewhat common error, but I don't think it's ever used deliberately, especially in any professional setting.

French typographic rules prescribe a space before ‘!’, ‘;’, ‘:’, and ‘?’.

Interesting. I've never realized it. I learned french in school but couldn't remember learning about that.

Example: https://www.gouvernement.fr/argumentaire/parcoursup-la-plate...

To be super-pedantic, the correct character is a non-breakable half-space (unicode 0x202F). It actually makes sense: it shouldn't be breakable, or the text layout would be ugly (the colon starting on a new line?). And it should be reduced in size, because indeed you don't want a full space before a colon...

This article is using the international quotation marks though, proper typography would be to use « and », which also require non-breaking spaces before and after them.

The rule for punctuation is that the nbsp is before the double punctuations, those with two elements (?!;:).

With all due respect, the idea that "several counties in Europe add a space before a full-stop" is bizarre. Which countries? Have you had a look at news media website from those countries to check? At the very most it's a mistake more common in European writers that you've worked with, but I can assure you it's not deliberate. Also note that in the examples given the space before the full-stop also means no space after it, which indicates a misplacement rather than an extra space.

I have a large number of French colleagues, so it's possible I've mistaken it for being more widespread than it really is.

I often see this space before punctuation with Chinese writers of English, though not very advanced ones.

Additionally, some individuals may intentionally include such errors to render themselves more difficult to identify.

Space punctuation is something I see from kbase article writers from India often.

>Businesses have become increasingly creative in their attempts to misuse the DMCA to remove negative reviews from the Internet. They have gone to great lengths to falsely claim copyright infringement with the intent of taking down content from Google’s search results and review sites.

I dream of a world where companies doing any consciously shady shit like that are automatically slapped with a huge (1/2 their annual revenue would be a good start) fine, even if it's nominally legal, as long as a jury of experts decides their actions were harmful to society/transparency/the environment/etc.

Why have laws at all. Why not just let a "jury of experts" punish anyone in anyway they please. /s

Laws already codify experts' opinion of what should be punishable and how (both legal and domain experts).

The "jury of experts" role wouldn't be to punish "anyway they please" but to what is implied by their choice and description (as experts): to rule in accordance with their domain knowledge. And of course their job would be judged in turn.

If a society can't trust a jury of experts on a domain to clearly and effectively judge issues related to their expertise, then it has bigger problems, and laws wouldn't save it from them.

Any settlement without first going to the court will be abused and exploited. DMCA is a just bad law.

Practically all the links (e.g. the ones that were allegedly restored like the Adweek article on Torrence Boone) were still removed.

Similar thing seems to be going on on Youtube.

A lot of the direct DMCA problems come from improper use needing to be demonstrably wilful in application. When algorithms send out these notices, you get some plausible deniability.

But setting up a news site to inject a backdated and infringing article would seem to be several, moderately serious criminal offenses.

If someone does this to you, push back hard. They're facing prison and fines.

The DMCA has been abused quite a lot. How many people in history have been sent to prison? Can you find one?

The laws in the US are laws to control the behavior of the weak not the strong.

My point was this is very far beyond the typical takedown abuse.

Creating (itself infringing) content on a for-purpose site ticks the "wilful" requirement of the DMCA's defined requirements for perjury.

Links in the article are broken. Also DMCA takedown related?

It happens because doing fakes is free but struggling with them costs money. This problem can be solved if Google search result would be driven by rules/smart contracts but not by the income of Google's shareholders.

> This problem can be solved if Google search result would be driven by rules/smart contracts

How would that fix the issue? It wouldn't change legal obligations, AFAIK. Confused.

Yes, introducing SERP rules intends obligations. The obligations could be enforced by legals or by union of “angry” users who will stop fighting with each other for SERP but start fighting with Google instead who treats them as useful idiots.

To be honest I believe DCMA is a entirely negative law. Even the legitimate strike claims were used in ways that help noone, sometimes hurting even the person doing the claim in first place, for example Nintendo been so agressive on YouTube, taking down for example videos of conventions that happened to have a Nintendo trailer or game playing somewhere on camera, that journalist stopped covering Nintendo because doing do cause strikes against you.

They really need a reverse three strikes. Once you make three blatantly fraudulent claims, you are banned from making any more claims. Automated claims should not be allowed ever, they should all be verified by a human.

Alternatively, if you want to go full accelerationism, people should start abusing the system more and spamming fake takedowns on anything they don't like completely flooding it and getting legitimate content of important people taken down. There is probably no other way anything will change

Isn't a false DMCA claim a criminal offence? The question is whether it's realistic that someone will be prosecuted over this.

Only if you can't claim it to be a mistake, unfortunately.

It would be worse without it. Google would be liable for copyrighted material, and likely would preemptively ban suspected material to avoid liability.

What exactly are you basing that on? The previous law said nothing about hosters being liable for any copyright.

There was no previous law with respect to the internet. The specific purpose of the safe harbor provisions of the DMCA was to exempt ISPs and content providers from liability for hosting copyright infringing material.


The law previous to DMCA was "common carrier" regulations.


I don't think that applied to most services, though, including Google.

DMCA notices are required to be sworn statements under penalty of perjury. It sounds like that that isn’t being enforced at all by DoJ.

Just another example of how a corrupt government can selectively enforce laws to give its crony friends a leg up.

Normally, DMCA notices for non-infringing works can't trigger the perjury clause, because the lawyers really do represent the owner of the copyrighted work that's allegedly infringed, it's just the infringement claim itself that's bogus.

Here, if the article is correct, they're actually claiming to own works they do not own at all. So if they get called on it, they actually could end up having to answer perjury charges. The real question is who gets to enforce that? I'm not sure if it's the DoJ, you might have to bring a civil suit for slander to title first to get a criminal referral or something. I really don't know how that's supposed to play out because I think I've only seen one case where someone complained under the perjury clause and they settled, so it was never a criminal matter.

What if the company that have sent the notice is outside of US?

In my experience, what happens is the company hires a representative inside the United States to file on their behalf.

There is an organization inside Belgium that hired a law firm in New York to file claims against a web site I once owned in Texas to remove people’s vacation photographs taken in Belgium, under threat of legal action.

(This was pre-DMCA. The web site was sold long ago and is now defunct.)

To be honest I believe DCMA is a entirely negative law

I’m not a huge fan of DMCA, but it’s not entirely negative.

Without DMCA, there would be no YouTube, Imgur, Twitter, or pretty much any other site that allows the public to upload content.

It’s an ugly compromise, but one that allowed a significant portion of the web to exist and grow.

Hopefully it will eventually be replaced with something more nuanced.

> there would be no YouTube, Imgur, Twitter, or pretty much any other site that allows the public to upload content.


Because they themselves would be liable for copyright infringement instead. Servicing DMCA lets them run safely, as long as they respond to takedown notices quickly, and the complaining party usually won't bother going after the individual that posted the infringing work.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact