
Ask HN: Legality/ethics of archive.org to get pre-published versions of books? - wsec
There are some computer and math books produced for university courses placed on the internet pages of those courses. These books or lecture notes are then formally published and removed from the course webpage, but a copy can be found either on archive.org or some other site.<p>Is it legal or ethical to obtain the previously published pdf from archive.org or some other site? One example is &quot;Algorithms&quot; by S. Dasgupta, C.H. Papadimitriou, and U.V. Vazirani, which was previously available on Dasgupta&#x27;s webpage but later taken offline when the book was published in paper form.<p>The other situation would be course notes that were publically accessible but later removed from the webpage but still available on archive.org.
======
wahern
There are inadequate facts to ajudge legality. However, if the authors
intentionally linked to the "pre-published" PDF from their publicly accessible
website then arguably it was in fact already published. Nonetheless, in the
U.S. at least, the act of publication has little relevance for new works as a
general matter, so publication is something of a red herring except in so far
as it hints at implied licensing. Ultimately, the important question is how
the work was licensed before the final/derivative work was published again
through more commercial channels.

In terms of ethics, if you felt compelled to ask then perhaps you should dig
deeper, and in particular inquire with the authors about 1) how they intended
to license the work, and 2) what their licensing arrangement was with the
publishing house. It also couldn't hurt to inquire with whomever uploaded it
to archive.org; perhaps they had permission from the authors. Neither answer
would necessarily resolve the legal question, but might satisfactorily resolve
your ethical qualms.

~~~
des234
OP here: thanks for your answer.

My concern is whether obtaining a work through archive.or constitutes theft
when that work is no longer directly available from the author. In the case I
have in mind, professors write a book for their class which they post online
as a pdf. Later they publish this course book material through Addison or some
other textbooK publisher. Old pdf copies are still available although no
longer on the course websites.

Another example of such a case is Concepts Techniques and Modes of Computer
Programming for which pre-prints are also available.

My assumption is that from a legal point, since these books were first put
online with free access, then obtaining them through archive.org shouldn't be
illegal. However, this seems to infringe upon the intention of the authors who
have de-linked their original work now that the work has formally been
published. Archive.org is exempt from certain aspects of copyright; authors
who wish their work to no longer be available through archive.or must request
it.

I don't see any licensing text in the original pdf used as the course notes.

~~~
wahern

      > Archive.org is exempt from certain aspects of copyright
    

Achive.org isn't exempt from anything, at least not in the U.S. Their actual
_archiving_ of web content might be considered Fair Use, but that's distinct
from a right to redistribute such archived content to third parties, which in
turn is distinct from your right to make copies (e.g. to create and maintain a
copy on your hard drive by downloading the file.)

Ultimately, as regards the law of copyright, it doesn't much matter _how_ you
acquired the content. What matters is if _you_ have a valid license.

The reality of copyright is that the scope of authors' rights in works is
_huge_. We all regularly violate copyright. (I can't find the original
reference, but I remember once reading a draft SCOTUS opinion where a justice
wanted to claim that transcribing a poem onto a sheet of paper from a radio
broadcast was obviously not a violation of copyright--not fair use, but
outside the scope of copyright entirely--but all the other justices quickly
corrected him.) However, authors' ability to detect infringement, and their
available remedies, are knowingly and intentionally circumscribed. The law
often works this way (i.e. rights not being co-extensive with remedies), but
it's taken to the extreme in copyright. For one thing, it means that more than
most areas of the law, strict legality is quite divorced from your intuition.
Also, this can lead to complex conflicts between your personal ethical system
and the law.

