Hacker News new | past | comments | ask | show | jobs | submit login
Internet Archive Seeks to Defend Against Wrongful Copyright Takedowns (torrentfreak.com)
255 points by jcr on Mar 25, 2016 | hide | past | web | favorite | 45 comments

The article about that on the Internet Archive written by Brewster Kahle himself:


"We filed comments this week, explaining that the DMCA is generally working as Congress intended it to. These provisions allow platforms like the Internet Archive to provide services such as hosting and making available user-generated content without the risk of getting embroiled in lawsuit after lawsuit. We also offered some thoughts on ways the DMCA could work better for nonprofits and libraries, for example, by deterring copyright holders from using the notice and takedown process to silence legitimate commentary or criticism."

And the Internet Archive DMCA 512 Comments are here:


And if you want the plain text version of the above:


Is the DMCA working the way Congress intended? All we ever talk about here is how it's constantly abused, and used as a club by large organizations over smaller parties.

Or, is that pretty much "what Congress intended?"

There are two parts.

1. Safe Harbor for websites, means you don't have personal liability as long as you comply

2. Ability for copyright holders to get stuff taken down quickly

1 is working fine, 2 is subject to abuse. The post says that 1 works.

If 1 was working fine, google wouldn't have had to implement Content ID. Which in turn led to even more of 2.

My understanding is that Google did that to cut down on the amount of time they spent responding to DMCA requests, not because they were worried about legal issues if they hadn't.

No, they were forced to implement it, as the copyright holder did not want to discuss with google every time. They wanted the right to shoot first, then ask.

The original post introducing it explicitly says it wasn't for legal reasons.


Do you have a source that says they were forced to do it by outsiders? Or any plausible way in which the DMCA required that?

Edit: also, content ID is automated checking of uploads. The ability for copyright owners to take down content directly is under a different program, I think.

That's correct. Also, note that Google's take down system is NOT a DMCA take down request. It does not satisfy the requirements in several places, and it is actually stricter than the DMCA.

Google is basically volunteering to allow people to issue take down requests without consequences; if it was an actual DMCA take down, they would have to

And yes, Content ID is another system entirely where Google pattern-matches uploads (and old stuff when they feel like it); when hits are found, Google tends to reasign the monetization to the "content owner" that registered with Content ID. Getting that money back has generally been impossible, even in cases where Content ID clearly screwed up (false positive).

If you're interested in the history of how Google has been using Content ID, there is an older discussion[1] from a few years ago when Google really started to abuse Content ID hard.

[1] https://www.youtube.com/watch?v=bt1ubSVMwaw

Viacom sued YouTube when it first got big. Viacom argued that YouTube encouraged copyright infringement and knew about. The safe harbor is voided if the hosting company knew the content was copyrighted or made itself willingly blind of that fact.

So YouTube introduced content id to show that it really was taking copyright serious, and not just using plausible deniability.

Viacom mostly lost the case because there never was much better evidence that YouTube employees knew about individual copyrighted videos on the site.

Google probably keeps the system to avoid new suits and to make business partners happy. A lot of Google content isn't crowdsourced anymore, it's uploaded by big business suppliers who are essentially business partners. I imagine those contracts include content id to be used.

Huh - this would be a great website to check laws, to compare what laws are used for and what we were told they are for, maybe have some sort of score + highlight the best and worst (also trends, too see if scope is being widened or narrowed for certain laws).

> We are deeply concerned that automated filtering could lead to taking down many materials that are being used in reasonable, legitimate and legally protected ways—especially when the underlying purpose of the complaint is not copyright related but rather an attempt to silence critical speech.

How would "notice and staydown" (vs. takedown) work with systems like IPFS which use content hashes? Would there be centrally maintained blacklists against which which all hosting companies would need to screen inbound content?

Even if we assume no user ever has legitimate access to a copyrighted work (which is untrue; take, for a trivial example the author attempting to show off their own work while travelling by downloading a copy of it from their home computer via IPFS), it would be necessary to do this differently based on the copyright laws of the user's country, along the same lines as the Linux "wireless-regdb" which says which wireless frequencies may be used in each country.

In general, the reason copyright enforcement is pushed for is not accurate enforcement of laws, but rather maintenance of business models while paving over "minor details" such as legitimate contracts and licenses between copyright holders and others.

The digital copyright regime has effectively paved over centuries of intricate law to create a binary of "free" and "nonfree", with no internal distinctions or intermediates. That this is not widely recognized is a sign of how effectively media conglomerates control perception of the issue.

You simply can't justify unilateral global takedowns on copyright grounds. Many legitimate countries disagree about copyright rules, and they just aren't so simple as a global "copyrighted?" flag. If you want to be a global moral police, you can justify blocking hashes for those reasons as long as there are no collisions and everyone agrees, but that doesn't sound terribly likely either.

The fundamental problem with computer enforcement of copyright is that the computer never has the context necessary to determine if the copy should be allowed.

So for example, I'm sitting next to a teacher waiting for a train and reading the newspaper. The teacher sees a story and says hey, can I have that when you're done so I can make some copies for classroom use?

That kind of copying is clearly fair use, right? But the same thing happens on the internet with some kind of hash-based copying prohibition and the teacher can't copy the story from me. Because the computer has no way to know that the law allows the copy. So it can only allow everything or prohibit everything.

That sort of system can't work. It doesn't have the information or context or logic necessary to make a fair use determination. But "prohibit everything" is exactly what Disney et al want, so they're always pushing for it anyway.

I'm not convinced that's a legitimate fair use case to be honest...

One of the fair use examples right out of 17 USC Sec. 107 is "teaching (including multiple copies for classroom use)". But instead of arguing about a specific arbitrary example, let's chalk that up to "making a fair use determination is hard" which was kind of my point.

Unless you're convinced that there is never a legitimate fair use case, feel free to substitute whichever you like.

I don't think it would. since material is supposed to enter the public domain after a time(its been a while as copyrights get longer, but thats another issue), if it was on an automated blacklist, it likely never would removed from the list, making copyright effectively forever.

There already is such a (voluntary) system for child porn.

Solution: monthly full core releases and daily update packs of the complete archive in torrent form.

Offload the copyright risk to those more willing to take it and you get to keep doing your friendly neighborhood scraping.

You might be underestimating the size of their archive.


They could start with a text only archive. Compresses beautifully and would make a great first step.

It seems obvious one should pay damages for making false copyright claims.

But.... why should we tolerate copyrights? What is in it for me?

I kinda like the idea of people doing paid work. If someone wants a something-for-nothing kind of formula they should pay for it themselves rather than creating an impossible burden for others (if not the whole world)

One can get enough funding before creating a work or before releasing it. After release the audience can make donations and/or chose to fund future works. This should be good enough for what we need.

Before we chose/prefer global mass persecution until the end of time over the crowd funding formula we should first have a good reason for it. Failure to preserve history for that extra bit of entertainment exploitation is not worth it. Not just because it fails to be entertaining.

I can't think of any but there might be a few works important enough for copyright to be an ideal formula but we can't expect it to work on the scale we are having it on right now.

At the very least a lack of license should default to something like creative commons. (If I rub a bit of snot on some paper I don't want to own the rights to it.)

If enough people want a copyrighted work they are going to get it anyway. The "dream" of artificial scarcity with infinite exploitation has ended. We have to write realistic laws now. Something that doesn't violate basic logic.

We the audience would gladly pay for a new season of Star trek. I suppose the fear here is that the audience would have influence on the programming?

It's fair use for historical and educational purposes. My technical explanation for copyright holders is tough noogies, or look at the hand. I'm just not at all sympathizing with their claims. The Internet Archive is a public good.

It's a nice thought. But this talk of "legitimate", "legally protected" and "wrongful" strikes me as just so much wishful thinking. To be reliably available, stuff must be posted in ways that can't be taken down. It's rather the same distinction as with encryption, isn't it?

The Internet Archive is one of the finest institutions of the Internet. The Archive is a not-for-profit and deserves your financial contributions and support. The proposed change to DCMA Copyright Takedowns rules would help resolve existing problems and ambiguities.

The Internet Archive archives many thousands of user profile pages and other types of user-generated content that the OP may eventually realize is not really wise to have online. Things like the individual message board posting histories of private figures. The Internet Archive has no right to display these things, as they don't have a license from the content's author or the outlet that the content author posted on. There is no public interest served by continuing to display these things, as the poster is a non-noteworthy private person and was in all likelihood just posting nonsense. The OP can contact the message board and probably get them to delete the profiles if they don't have a deletion option already baked into their platform, but what about archives like the Internet Archive? Will they comply with takedown requests for such individual profiles, or fight them as well, pretending that there is some public interest served by keeping it online?

Privacy is hard in the internet age.

Ummm If you post something in a public place, that is the opposite of private.

You can't say something on live television and then ask them to delete the tape for privacy reasons, well you could but it would be dumb.

You can't expect privacy when you have a meeting in a glass room I guess is what I'm saying.

If you're saying something on live television, you're probably a public figure. Nonetheless, you would own the copyright to any original statement you made on TV, and people couldn't just copy and replay it without getting legal permission. Most people who appear on TV probably sign documents granting the station licensing authority for any statement they make whilst appearing.

When you are meeting in a glass room, you can't expect privacy in the moment, but if someone takes the intellectual property you shared in that glass room and puts it somewhere else, you certainly can expect them to respect the laws that allow you to require them to stop. If they don't, you should expect law enforcement and/or the courts to assist.

>Nonetheless, you would own the copyright to any original statement you made on TV, and people couldn't just copy and replay it without getting legal permission. Most people who appear on TV probably sign documents granting the station licensing authority for any statement they make whilst appearing.

The broadcaster would typically own the copyright to the video. I am sometimes given specific waivers to sign when I'm recorded at conferences and the like--mostly because rights to use material for marketing/commercial purposes are more restricted than the same material used for editorial. Frankly, most events etc. don't bother because the (correct) assumption is that people doing things in public aren't going to suddenly want to get rid of the content.

Yes, they would own the copyright on the elements of the video that they produce. They wouldn't automatically own the copyright on any statement you made. The papers you sign would probably contain language much like the language on a random forum's ToS, discussing an irrevocable, non-exclusive, limited, global license to use any statement you make and/or to license the clips containing your statements out to partners. I've never been on TV, so I don't know. While it is possible that they try to get you to transfer the copyright to any statement you make, the courts are generally pretty dubious of such attempts.

Also, giving a presentation to thousands of people, whether at a conference or over a broadcast, is at least perceived a lot differently than leaving a comment on a message board, especially if it's a small one or a niche community, so different things are shared.

Suppose one is foolish as an 18-yr-old and posts on a forum most browsed by his friends, "Ha! I just got Lifelock and since I know you can't do anything to me when I have that, my SSN is 999-99-999. Just try to steal my identity!" Suppose the IA saves this statement. The OP would have a copyright interest in it and would be within his rights to point out that the IA has no privileges that entitle it to rehost that content, so please take it down. That is totally fair.

There is no reason that copyright law should only be usable by media conglomerates that mostly use it to stop the spread of free culture and not by private individuals trying to clean up some of their past mistakes.

I'm not sure we really disagree. Anything my 18 yo self wrote that's still accessible online was filtered through editors etc. and I'm not unhappy for that :-) While I'm no fan of much of the EU right to be forgotten thinking, I'm also sympathetic to the idea that not everything we write in a young, foolish moment should be discoverable forever with no recourse. I also think that there are a lot of practical issues to getting rid of the foolishness without also creating the opportunity to eliminate things that are legitimately in the public interest/part of a historical record although, in practice, I expect that some combination of time and the sheer volume of data deals with a lot of it.

We are talking about abuse of DMCA related to takedowns and copyright, and you are muddling the issue by stretching into privacy. Is privacy covered by DMCA? Is the DMCA something that an OP could use to request takedown of data from a site? If it is, then that would be what you should be speaking about, not grand visions of "how dare they copy and then host things that were on the public internet because some of those things might be privacy-sensitive!"

To play devils advocate against myself, I have made mistakes in the earlier days of the internet, that I am glad the archives failed to keep. I do understand that there is a need for privacy friendly user sites, but I am sceptical about what tools are allowed to actually perform this structure. Right now, the internet is a threat to the power that be, which is why we will see an ever increasing attempt to legislate it into the ground. If we allow government corruption to seep into the internet anymore than it already is, the real concern will be one of censorship and propaganda, and user privacy is less to do with publicly posting things you shouldnt, but more to do with the corporate/government merger and data sharing that is going on around us. Loopholes everywhere for suppressing dissidents.

I don't have a problem with IA's operation in general, but individuals do own the exclusive copyright on their works. The forums they post on generally have a ToS that states the user grants them non-exclusive license to display the content. This doesn't automatically extend to the IA. Thus, if an individual doesn't want its work to appear on the IA anymore, they can issue a DMCA takedown request, as can be done for any other copyrighted information on the internet. The IA should respect these instead of trying to claim that there is a public interest in keeping them accessible.

It's a valid question. If, for whatever reason, someone wanted to purge their identity from the Internet Archive as much as possible--including from properties that they did not control--I suppose they could try, but I doubt they's have much luck. For example, if you were to ask them to expunge any HN page that you had comments on. One issue is that there's no easy way to delete only your content if you're embedded in other discussions.

I expect it's a legal gray area that mostly works in part because most random forum posts are pseudonymous.

Not all pages contain content from other people. Most message boards have a page that shows just that users' posts. At least these should be pretty easy to get taken down by informing the IA that they don't have a license to display the content (which is true).

Is the DMCA something that an OP could use to request takedown of data from a site?

Why not? Forum posts are copyrighted at the moment of creation just like any other work, and while one certainly gives a license (implied or not) to the forum, there's no reason why that license would extend to the IA.

Interesting. So under this working theory, the entirety of the IA is fundamentally against copyright unless specifically allowed/released per site?

So perhaps a new IA that only indexes creative commons licensed sites might be in order?

>Interesting. So under this working theory, the entirety of the IA is fundamentally against copyright unless specifically allowed/released per site?

IANAL, but basically yes. It's one reason why the IA respects robots.txt even retroactively. The reality is that, if something was posted publicly by a copyright holder and intended to be shared, the overwhelming majority of people/entities don't care that it's being archived somewhere but there's no particular exemption for something like the IA.

The IA is like a library. Libraries don't have to get special permission from book publishers to be libraries.

Because of first sale doctrine. Anyone can set up a library by buying physical books, DVDs, or anything else they want to lend out to one person at a time without any sort of special permission. Digital library content, on the other hand, is based on specific contracts with the rights holders.

Except it isn't. By this standard, book publishers would have to go pound sand if libraries started giving away infinitely many copies of their book.

To be more precise, it doesn't *matter" if the Internet Archive is a "library" or not because libraries don't have any special status with respect to copyright law in the US. (Beyond whatever special status publishers may choose to give certain classes of libraries with respect to digital rights.) You or I can choose to setup a lending library tomorrow and we have the same rights to loan out physical books as the New York Public Library does.

Yes, as you point out, due to the first sale doctrine. Such a thing doesn't exist in the digital realm.

I haven't done my research yet, but the website I made when I was 15-16 is archived and needs to come down. I'm hoping being made by a minor helps? Anyone had experience taking stuff down or do you have to issue a DMCA?

It's hard to put genies back in bottles. If you still own the domain, my understanding is that the Internet Archive does respect robots.txt even retroactively. Otherwise, you can try doing a DMCA takedown but you may not have much luck as you no longer have any directly-established ownership over the content.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact