Now comes the interesting part: Google won't even autocomplete "torrence boone enfatico" — Shame upon him who thinks evil upon it...
If you've been wondering about the moral decline at Google, this is the kind of people they hired as top management.
I'd argue that it's a quite good idea to keep your LinkedIn profile looking exactly like that, unless you've decided that you need a good-looking profile right now for "advertising" because you're looking for a job, and you're going to use LinkedIn for that, which many job-seekers won't do.
Then again, it's not like he's shy about clearly lining up all of his jobs before the Enfatico disaster or his Harvard/Stanford alma maters.
It clearly constitutes a form of libel against the victims falsely accused of a civil violation. The lawyers involved are violating their ethical duty as court officers and should be charged with barratry. The ACLU/EFF ought to be pursuing this tack to curb the abuse.
Another example would be reporting content or people for violations they did not make.
Copyright terms ought to be much shorter. This is the root of the problem.
- write a blogpost
- build a canonical representation based on a simple HTML or XML template; including canonical URL, title, content, authorship (emails, social media URLs, etc.), optionally related multimedia in a Zip archive, perhaps inlined via data URL
- alternatively, re-purpose a common Web scrapping format, like the one used by Firefox, for canonical representation
- submit the canonical representation for timestamping 
- provide a <meta name="publication-timestamp" value="canonical representation url"> in the document as published, to facilitate automatic and manual verification
- upon dispute, provide the canonical representation and verify in the relevant timestamping services
 for example http://opentimestamps.org/
 from experience, the processing takes under 30 minutes with OpenTimeStamps
The proposed workflow seems overly complex, though. For authorship verification all you need is ability to submit a block of text and receive a signed timestamp.
Output: timestamp + signature
Verification input: text + timestamp + signature.
Any metadata could be built into the text itself as comments or hidden fields. However, this will not verify the URL the data is coming from, which might be desirable.
The second kind of service could be verifiable archival service.
Output: page content after rendering + timestamp + signature.
Verification input: text + timestamp + signature.
The service stores: timestamp + signature.
Fair enough; the tech I know and mentioned (OpenTimestamps) is tied indirectly to Bitcoin's blockchain confirmation speed, thus both slow-ish, and also probabilistic rather than guaranteed. On the other hand, it gives good assurance that the tool will be around for a long time, and able to survive any single service provider's failure.
> The crypto involved should be pretty trivial.
Depends on the requirements. The tech I know and use (OTS) is privacy-conscious and handles timestamping of documents without revealing their actual content. Instead you submit hash(hash(document) + random, private IV) , which provides for both anonymous use, and ability to keep the signed content private, as the signed hash cannot be directly tied to any plaintext. This isn't strictly necessary for publicly-posted web pages, but certainly is useful for private documents, security-sensitive documents, etc.
All in all, it's a starting point. Another possible approach would be to standardize submission to, and verification through a service like https://archive.is/, but that puts all trust into one 3rd party.
> Input: [plain]text
Directly timestamping the plaintext fails badly if the timestamping service provider is dishonest and creates an earlier timestamp for the content in their name, only to turn around and accuse the author of stealing the content. Or - a slightly paranoid scenario, but probably warranted in the post-Snowden era - if a whistleblower tries to timestamp some materials before distribution, but the timestamping service provider's filters catch the document and instead alert the government. Ergo, I'd prefer a blind timestamp, with the ability to reveal at the discretion of the author or whistleblower, even at the cost of more complex workflow.
 sadly this means the author needs to keep a file containing the IV and a bit of other metadata.
EDIT: Or as to the example above: whatever application is responsible for picking the main article to show somewhere.
> Our crawl process is algorithmic: computer programs determine how often we should crawl each site. If you post new articles on your site throughout the day, you should be able to see them appear on Google News fairly quickly.
> From the moment we discover a new article, we'll keep re-crawling it looking for changes. Since we noticed that most changes to articles occur just after they're published, we revisit articles most frequently in the first day after we've found them. After that, we visit them less often.
I am not sure if Google uses this to prove which is the original copy. Most website owners don’t know this exists. So it’s hard to use it as proof of the original, because the original creator may not even use the tool.
However, if everyone started to submit key articles, then it theoretically could compare the two fetches and decide which is earlier. It could be automated too with sitemap uploads as you post content.
Moreover, you could then use this as evidence in a DMCA take down that your copy is the earliest. Anyways, there are ways to prove earliest content.
The problem is akin to standing in a crowd where some people hold a sign with their age on it and others do not hold a sign, then trying to verify the age of an arbitrary people.
edit: unless the content itself contains a proof like the latest block hash or something like that, but that could be edited
Every time I write a blog post, I submit manually via webmaster tools "Fetch as Google" so that it gets indexed immediately. That way, the date google shows matches the date I published the article.
Content creators signing their content with a private key would be a better idea.
How would that help? Wouldn't the copier simply sign with their own key?
Maybe Google should run a timestamping service. It could be stateless: send an hash of the content, get it back timestamped and signed, then include it in the page before publishing it.
- create signature
- post signature, possibly publicly, (wow, maybe an actual use for a blockchain)
- post content
Anybody copying it could be proven to be a copyist.
Actually secure timestamping is already used by Windows to verify software signatures, even when the signing key expires: https://security.stackexchange.com/questions/47289/how-is-an...
Heck, you can even use Roughtime  to achieve secure timestamping as it uses digital signatures and allows you to provide a nonce. Even the design doc  mentions this use case:
> This means that Roughtime replies can be used as a time-stamping service, and thus that Roughtime servers can be used to check each other.
But they could be blocking it with robots.txt or some other excuse. It seems like an incredibly rare exception, what news site would block being crawled and indexed? They also are archiving domain info I'm sure. If they wanted to stop this, they have the data.
So it's my content, on a domain I no longer own or control.
How could I prove to Google that the content is mine?
You take over expiring legitimate domains, and ideally immediately (before Google notices) put up almost the same content, but with a link to whatever site you're actually interested in giving better search ranking to.
Repeat this 1000x and you can apparently make enough headway that people put time and money into the approach.
How do you know it's backdated if by the time you crawl the articles have appeared on both sides. How do you know which was published first? This attack can be performed faster than the re-crawling frequency.
One is that it's a business decision, where they know these cases may exist but they took the decision that it's not worth the extra overhead. Second possibility is that it's an oversight, essentially a "bug" in the system.
Either way, it appears that Google had not yet identified this as a serious problem.
I give credit to them for having a Transparency Report so that others can pick up fraud and abuse which they themselves may not have already found to have been a problem; an essential safeguard. I also view projects such as Lumen as being of increasing importance.
Google crawls surprisingly little of the web, even on web sites it knows about. That’s why web developers have to use site indices and Google Webmaster Tools to tell G that stuff exists on their sites, hoping that it will be discovered and indexed.
Considering a fake DMCA carries the same penalty as perjury (correct me if I'm wrong) then I would have expected Google to have taken some action. A suspension pending an internal investigation or something similar.
Have they made any public statements on the matter?
My understanding is that regrettably you're wrong, definitely at least in practice if not in theory. First on the theory side of things, Warner Bros at least argued in the countersuit Hotfile filed against them for abuse of takedowns that when it came to the DMCA:
>"The DMCA’s language applies only to impersonating a copyright owner or sending notices on their behalf without authorization; mere misidentification of the files being taken down or the works represented therein are at most incorrect ‘statements’."
Unfortunately the settlement Hotfile made in the original suit against it by the MPAA also ended its countersuit so Warner Bros arguments were not put to the test. If anyone else knows of a subsequent case that'd be good to read, but I can't remember any. If there's nothing at all then hard to say either way, but there is at least a strong colorable argument that it either doesn't mean what a lot of people think it should period or it's too ambiguous to have any real teeth.
And speaking of real teeth anyway, there do not seem to be any fixed consequences for perjury, and I do not think there has ever been a single prosecution for it in the now 20 year history of the law. In fact the sole and only victory of any kind I can remember of even a flagrantly abusive act of political censorship was years when WordPress won a default judgement against "Straight Pride UK"  (which had already gone dark of course). Lenz won vs UMC, but I don't think won any actual damages? And further the standard set there was that actual bad faith by the rights holder would need to be shown, which is yet another hurdle. I think only something like Diebold rose to that level. Edit: ubernostrum beat me to it and mentioned Diebold and that perjury is about who you're representing not what you're asserting too.
So at the end of the day there just has never seemed to be any significant consequences to abuse (or "mistakes"), and the result has been as would be expected given the incentives and interests involved.
On the other hand it definitely would appear to be a slam dunk on the bad faith part, but again there don't seem to be any really clear, really significant penalties on that side. Someone who goes through all the massive expense and trouble of challenging it could win 100% of the time, which would mean the content could be reinstated. But can they even get all their expenses back, let alone enough multiplier penalties to deal with missed enforcement or get lawyers interested in doing it on contingency? At some level of abuse courts might have other tools, maybe it can start to get into other higher level laws covering fraud or something, but that seems like it'd be a harder case by definition vs specific language of the direct law in question. Even if it's clearly defeatable, if the cost is high enough that may not matter in practice.
One thing though that is always possible is to at least generate political outrage, and in that case the more egregious and the more clearly they get away with it the better. There have in fact been multiple attempts to reform the DMCA in Congress including some really good language, but none of them have gotten through. That doesn't mean it's not worth continuing to bring up when the right examples present themselves that are extra photogenic. I've printed out some excerpts of this and I'm going to send it to my rep and Senators tomorrow along with some other collected concerns and a list of previous legislative efforts. I don't expect any action in an election season but sometimes pebbles add up. All the protection laws we have didn't come out of nowhere and didn't come up with zero opposition from entrenched interests after all.
The only part of a DMCA takedown that's under penalty of perjury is the assertion that you are the copyright holder, or are authorized to act on behalf of the copyright holder.
The infamous Diebold case -- where they did get penalized for a false DMCA notice -- involved Diebold more or less admitting openly in court "yeah, we knew it was wrong but we sent a notice anyway". Which means it's extremely difficult to get someone on a false DMCA notice.
But that's exactly the fradulent claim, and unlike other cases that are understandable genuine mistakes (even though the mistakes would be avoidable if the claimants put a reasonable amount of care into checking their claims), in this case, it's clear that the claimant knew the claim was false, and they falsified documents to support that claim.
I highly doubt courts would appreciate this sort of behavior, it's just that the victims of the censorship likely don't know, don't care, don't have the resources to fight, or don't catch it before the statute of limitations expires.
Unfortunately, it seems like the maximum penalty for perjury is five years, and I assume it's rare that anything near that is imposed, so I guess someone decided it's worth it.
Edit: actually, answering my own question -- when a domain is reg'd at GoDaddy and privacy is enabled, it's then "owned" (in the whois sense) by DomainsByProxy at GoDaddy.
The only scenario I could think of is claiming Google didn't index the original because it was blocked by robots.txt. Apart from that, how could Google not know the original date of publication?
I guess Google doesn't have systems in place to do this verification and they just remove from the SERPS anything and everything that has a DMCA violation, just or otherwise.
Another scenario I've witnessed myself is someone using a copyright free image (Unsplash) and then someone else claims a violation, saying they are the actual copyright holder and it was fraudulently posted on Unsplash by a third party. Google removes your homepage from serps, traffic tanks, business ends and there's no way you can even fight the decision. In the end who knows who's the actual copyright holder?
What if the article wasn't even online? I.e., some journalist "wrote" the article for some local newspaper without internet presence.
Whenever I publish something I made, I submit the URL to the Wayback Machine so if it gets hugged to death at least there's a public copy somewhere else.
a timestamp can't prove anything. What stops somebody else from copying your content, but timestamping it earlier and submitting it as proof? How can google verify it?
In fact, google may have an incentive to allow this - since those scammy sites may actually be using google ads, while legit sites may have their own ad-nework which google may not be part of...
You'd need some sort of negative affirmation as well. "when I published this article, searching for it yielded nothing" signed by a trusted party would be a stronger statement. But no clue how you actually do that.
Something like this can only work if all news publishers always publish the article hash and those that don't will simply not get crawled and not be valid as proof in DMCA takedowns.
What makes the bitcoin blockchain good for this? The bitcoin blockchain is good for posting transactions within the bitcoin network and kind of a shitty solution for "a generic blockchain"
Why a blockchain at all? Why not just a merkle tree? Surely there is somewhere better than the bitcoin blockchain for this. CT logging isn't done on the bitcoin blockchain for a reason.
So sure, you can prove that something wasn't created after the date of the transaction (assuming you're using a CS hashing function), but for this to work in this specific context, you'd need every publisher to be posting transactions to this log.
If I post an article and post a hash to $genericBlockchainWhichIsReallyJustAMerkleTree, what prevents someone from copying, backdating, and saying that I copied them? That they didn't post to the log as well?
Is a CT log a blockchain? Because really what we're talking about here is a signed ledger which may be decentralized but very well may be centralized as well.
I can use it for decentralized time stamping right now.
It's interesting to note also how they appear to use the Fox News trademark while having no apparent affiliation. Also, although I don't know suite numbers, it's interesting that the address on the domain registration looks to be very close to a UPS store.
I live in Europe and I have been browsing the Internet for 20 years now, and this is the first time I hear about it.
Also, I visit plenty of Eropean news sites, and none does that.
Example: « Pas de palais : pas de palais ! »
(^Quote from the Asterix movie that will be way more useful to you for making French friends in a bar than saying « I can’t speak French » in French)
Also: it looks like we're using colons differently, is that right?
The rule for punctuation is that the nbsp is before the double punctuations, those with two elements (?!;:).
Additionally, some individuals may intentionally include such errors to render themselves more difficult to identify.
I dream of a world where companies doing any consciously shady shit like that are automatically slapped with a huge (1/2 their annual revenue would be a good start) fine, even if it's nominally legal, as long as a jury of experts decides their actions were harmful to society/transparency/the environment/etc.
The "jury of experts" role wouldn't be to punish "anyway they please" but to what is implied by their choice and description (as experts): to rule in accordance with their domain knowledge. And of course their job would be judged in turn.
If a society can't trust a jury of experts on a domain to clearly and effectively judge issues related to their expertise, then it has bigger problems, and laws wouldn't save it from them.
But setting up a news site to inject a backdated and infringing article would seem to be several, moderately serious criminal offenses.
If someone does this to you, push back hard. They're facing prison and fines.
The laws in the US are laws to control the behavior of the weak not the strong.
Creating (itself infringing) content on a for-purpose site ticks the "wilful" requirement of the DMCA's defined requirements for perjury.
How would that fix the issue? It wouldn't change legal obligations, AFAIK. Confused.
Alternatively, if you want to go full accelerationism, people should start abusing the system more and spamming fake takedowns on anything they don't like completely flooding it and getting legitimate content of important people taken down. There is probably no other way anything will change
Just another example of how a corrupt government can selectively enforce laws to give its crony friends a leg up.
Here, if the article is correct, they're actually claiming to own works they do not own at all. So if they get called on it, they actually could end up having to answer perjury charges. The real question is who gets to enforce that? I'm not sure if it's the DoJ, you might have to bring a civil suit for slander to title first to get a criminal referral or something. I really don't know how that's supposed to play out because I think I've only seen one case where someone complained under the perjury clause and they settled, so it was never a criminal matter.
There is an organization inside Belgium that hired a law firm in New York to file claims against a web site I once owned in Texas to remove people’s vacation photographs taken in Belgium, under threat of legal action.
(This was pre-DMCA. The web site was sold long ago and is now defunct.)
I’m not a huge fan of DMCA, but it’s not entirely negative.
Without DMCA, there would be no YouTube, Imgur, Twitter, or pretty much any other site that allows the public to upload content.
It’s an ugly compromise, but one that allowed a significant portion of the web to exist and grow.
Hopefully it will eventually be replaced with something more nuanced.