Link to how to get a library card at the LoC:
Also, pick your reading room carefully: the general collection can be delivered anywhere, but there are are some items that you can only get in a specific one.
The talk I so enjoyed is on YouTube with many others: https://youtu.be/AvqtY_7Q7hI?list=PLEA69BE43AA9F7E68 (video 30 is MacKaye)
"Each working day the Library receives some 15,000 items and adds more than 10,000 items to its collections. Materials are acquired as Copyright deposits and through gift, purchase, other government agencies (state, local and federal), Cataloging in Publication (a pre-publication arrangement with publishers) and exchange with libraries in the United States and abroad. Items not selected for the collections or other internal purposes are used in the Library’s national and international exchange programs. Through these exchanges the Library acquires material that would not be available otherwise. The remaining items are made available to other federal agencies and are then available for donation to educational institutions, public bodies and nonprofit tax-exempt organizations in the United States."
And that is completely ridiculous; a technical solution to an invented problem.
I have a friend, Ivan, who lives somewhere in the world - he's notoriously reticent about his location, and only communicates over OTR at strange times. Whenever I want to get some research done, I ask him if he happens to know this or that fact about this or that copyrighted database. I then cite him as a source if anyone has any questions.
Congratulations, you’ve just described most software projects by libraries and archives.
It’s funny, because back in the 60s-80s, libraries were leaders in building shared data systems and networked infrastructure. The history of OCLC describes this well.
But once the web came around, they had an identity crisis, were unable to react to technology trends, and largely got conned into predatory and restrictive arrangements by service providers (Elsevier, ProQuest, etc). The same thing actually happened in the past with microfiche and led to libraries destroying huge, valuable portions of their collections, that could have been better preserved with the advent of scanning technologies.
re: microfiche, it lasts much longer than digital scans -- if kept in good conditions it has a usable half-life on the scale of >100 years, while digital files need much more active maintenance to both prevent bit-rot and to ensure the file-type is still readable (eg countless file formats have been abandoned and are only currently accessible via emulators of older machines).
And if you want to bring the cloud into this, most libraries don't have the funding to bring in the technical know-how to manage a private S3 instance only accessible from that building.
What I do is regularly copy my files forward onto newer media. I started this back in the 1970s, and it is the only reason I still have a copy of the FORTRAN-10 source code of Empire:
All the other stuff I wrote at the time is lost because I stored it on a magtape, and the Caltech magtape drive had drifted so far out of spec the tapes could only be read on that machine which was lost.
I managed to preserve that by copying it over a serial line to a PDP-11 and storing it on a PDP-11 floppy. I later was able to save my 11 code by copying it over a much later serial line to an IBM PC, to put on 5.25 disks. As time went by, the files migrated to zip drives, then CD-ROMs, then a long sequence of hard drives (my older hard drives can't be read with modern IDE interfaces, even if the connector fits, I have no idea why).
I remember reading boxes of 5.25 floppies and burning them onto CD-ROMs, a long and tedious process. But now, nothing will read 5.25 floppies any more, but copying a year old hard drive to a new one is a simple process, especially since the new drives are usually much larger than the older ones.
Hence I have most of the stuff I worked on since the early 1980s. The old Zortech bulletin board stuff is gone, though, even though I still have the hard drive for it. Nothing can read that old drive. Not that there's anything particularly interesting on it, but I enjoyed running the BBS for many years.
I know! I’m a former librarian who has worked with a lot of “big players” on the institutional and software side of things.
> They are highly constrained by copyright law.
Here’s where I disagree with you: they’re largely constrained by the kind of administrative bloat that has permeated all of academia, which has no technical expertise and prefers corporate solutions or managing large-scale, dead-end projects for resume padding. I was on the receiving end of this so many times I left the field.
> re: microfiche, it lasts much longer than digital scans -- if kept in good conditions it has a usable half-life on the scale of >100 years, while digital files need much more active maintenance to both prevent bit-rot and to ensure the file-type is still readable (eg countless file formats have been abandoned and are only currently accessible via emulators of older machines).
True on the digital files part, not so much on the microfiche/ film which proved in many cases to be of poor durability and prone to data loss. But my comment was more about how it’s adoption caused libraries to destroy huge parts of their collections with little recourse once microfiche/film proved to not live up to its marketing claims or in cases where it was poorly implemented. I recommend Nicholson Baker’s “Double Fold” for a good account of all of this.
It's as if we passed laws to require that all email must be delayed for at least two days in order to preserve the business model of the post office.
There are levels of dirtiness in data. Hathi's metadata reached a level of dirtiness where the filthier fields could actually be mined for other data, given enough labor. While I was able to extract some benefits from this at some cost, it was no substitute for having things done well.
Many of the scans are described as "unusable" by the faculty and having looked at many of their complaints I would agree.
From what I recall, there were attempts at regularization by the big universities involved with project, but they’re both not very good at tech projects and especially not good at finding and keeping the high-level talent that would be needed for something of this scale.
But if you think bikeshedding is bad in tech, take a look into some disputes between library cataloguers over their arcana. Much of what they produce is valuable at level of a trained cataloguer who can navigate its inconsistencies and vagaries, but rarely so at the machine level.
I think it's actually a clever technical solution to a hideous legal problem, and it makes a lot of research possible that would be totally illegal otherwise, but it 100% gets in the way of totally legitimate research as well.
There are far better sources of wood, but land is more scarce.
I've used extensively Google Books for research of 17th century Mathematics (not just me, also the whole department were I was at that point). I couldn't have afforded to visit all the libraries to look up the originals but thanks to this repository I could download several copies of obscure works, which gives your very interesting insights: sometimes you hit a personal copy of a previous researcher, or there's an ex-libris from an old institution that you know your author was connected to, or you just discover something no one had noticed before simply with a clever string search. You can even backtrack to the forgotten true primary source of a mistakenly repeated fact for instance. It's been a blessing, honestly.
If you are into whatever not too transited alley of history, you realise soon that your (former?) company has produced a treasure trove for future generations that just shouldn't be shut down ever: there are many books there not digitized anywhere else. So thank you very, very much for your work there. It definitely belongs to the important.
That is the key result: the greatest library in human history is locked shut, primarily because it was perceived as a threat to the legacy business model of paper book publishing (and royalties based on sales of paper books.)
Restricting libraries to "non-consumptive use" betrays the fundamental purpose of a library: to enable people to read the books in the collection; it makes little sense with physical books and even less with digital texts which don't wear out.
Google was nice to want to share. But I think the original and ongoing concern was improved image recognition and NLP. Ie. Machine Learning the whole world knowledge.
Niceness had nothing to do with it. From the article, "As part of the deal, Google’s partner libraries made sure they got to keep digital copies of their scanned works for research and preservation use."