The end of that article is a not-so-subtle plea for someone within google to perhaps accidentally anonymously place this material in public.
> Page wanted to know how long it would take to scan more than a hundred-million books, so he started with one that was lying around. Using the metronome to keep a steady pace, he and Mayer paged through the book cover-to-cover. It took them 40 minutes.
> Michigan told Page that at the current pace, digitizing their entire collection—7 million volumes—was going to take about a thousand years. Page, who’d by now given the problem some thought, replied that he thought Google could do it in six.
> In just over a decade, after making deals with Michigan, Harvard, Stanford, Oxford, the New York Public Library, and dozens of other library systems, the company, outpacing Page’s prediction, had scanned about 25 million books. It cost them an estimated $400 million. It was a feat not just of technology but of logistics.
> At its peak, the project involved about 50 full-time software engineers. They developed optical character-recognition software for turning raw images into text; they wrote de-warping and color-correction and contrast-adjustment routines to make the images easier to process; they developed algorithms to detect illustrations and diagrams in books, to extract page numbers, to turn footnotes into real citations, and, per Brin and Page’s early research, to rank books by relevance.
Doesn't that take you back to an optimistic time when Google was exciting, and we thought that it could do amazing amounts of good for the world? I miss that era.
I’m frankly not very impressed by Google’s scanning process, OCR, image detection, de-warping, contrast adjustment, general QA/QC, etc.
Quality is variable, peak quality is mediocre, and the results are largely useless for anything but text.
Like a lot of other parts of Google, it’s a case where they tried to cheap out on trained human labor, and make it up with algorithms.
On the upside, even a mediocre book scan is better than nothing.
The "deal" resulted in anyone being able to download the full books if they go to a university library which is a partner of Hathi Trust. Of course, after that download, you have the full PDF and you can do whatever you want!
This isn't protection for copyright holders. (Hathi Trust got its books from Google Books scans and doesn't pay copyright holders.) This "deal" isn't helping anyone and it hurts researchers.
Anyone can go to a partner library (there's a list on the login page of the Hathi site) and download those books as PDFs. Try it! I've done it many times.
The Library of Congress is a Hathi Trust partner! So if you go get that card, you can download all of the out-of-print books that Google scanned on your own computer. No copyright holders are getting paid (and no one is being harmed), so why all these barriers in-between?
If I ever end up visiting DC again, I'm definitely going to do this. I have a new bucket list item!
Note that a Library of Congress card expires every 2 years as I recall.
But it seems to be the reason to require the rigamarole with library cards.
Do you have to actually go there in person too, or do you just get some sort of credentials (which no doubt some people would have already shared...)?
No one is trying to download Harry Potter from Google Books, lol!
No, it isn't. It is copyright infringement. Theft is something completely different. Different laws apply.
Usually people who call adblocking theft start squirming when these pre-digital examples of ad avoidance are put to them.
Every dollar a musician earns I'd, therefore, a dollar they take out of the hand of the musicians, whose work they based theirs off of.
This is why the public domain exists. You can make derivative works without paying anyone... But your work will fall into the public domain, so that you pass this benefit on to the next generation.
Anyways, you can't physically remove and deprive an owner of an idea. It doesn't fit the definition of theft at all.
But we're not going to have a nice discussion about it after an accusation of theft and piracy.
Something Aron Schwartz envisioned?
With friends like these, who needs enemies?
Google Books has failed to live up to its promise as the company has moved away from its original mission of organizing information for people.
Google was only about organizing all the worlds information while search ads was an unlimited fountain of money. As Google's ability to generate money with search ads has dwindled, their more grand (and not monetizable) projects have been either starved for resources or outright killed.
Sure the lawsuit was a pain. And book publishers are turds for arguing that they still have rights to books that they won't publish ever again. But the courts found that there was nothing wrong with Google having the information. That trove of text could be the worlds greatest source of knowledge but as we all know, people using internet search for work never click on ads and not enough of them are willing to pay a subscription service price to cover the cost of infrastructure. Google hoped that at one time they would make money by printing on demand those books that were out of print but people wanted, but that was shot down by short sighted publishers and agents. Perhaps it will be taken up by Amazon which has the resources to do it.
There was a time when all of google's properties catered to the users. Their search engine was the best. Google news was the best aggregate site. Youtube recommends used to be amazing to the point you could spend hours following their recommends.
Now google search, google news, youtube, etc are all garbage. It doesn't serve the people. It serves corporate interests. You can thank media companies and the elites who pressured them for that.
Your previous comment was attributing without evidence, actions of malice by descriptive but undefined third parties. That is the definition of "casting aspersions."
"Stating facts" would start with something like, "See this evidence that Google's policies were changed by <corporate entity> or <person or persons>."
Since you are doing the former, and not the latter, I conclude that the answer to your question is that yes, you are casting aspersions.
This reporter claims that she got youtube to change it's search list.
Should we believe her or is she lying?
"Google follows Facebook's lead and removes 39 YouTube channels linked to Iran"
These channels had been up for many years. Why do you think all of a sudden google decided to remove them?
Certainly it wasn't corporate, media or elite's pressure. So then who? Aliens? When chinese or russian social media companies remove and change their policies, why do you think that is? Aliens as well?
After 10 years of spectacular success of youtube being "you"tube, why did it suddenly become "corporate"tube? Why did they change their recommends, trending, etc? Must be aliens. It can't possibly be the elites and the media constantly attacking it?
"Facebook and Google are doomed, George Soros says"
I often use Google to locate a book, then check Internet Archive and HathiTrust if it's old enough that it should be public domain under US law. I really appreciate HathiTrust putting in the effort to check copyright renewals and make more of their materials fully visible. I don't appreciate the technical barriers to downloads that they erect, but that's out of the hands of the developers working there. As long as their web viewer shows individual pages you can be sure there will be a way to reassemble full books.
It's a lot of rigamarole for information that researchers need.
Trying to search by date is very hard. Limiting search to "Glenn Miller October 1942" might return one or two relevant results, or it might not return any. Trying to search by issue date doesn't work at all.
They have an index of Billboard issues which allows you to go to individual issues and read them, but the index stops at 50 pages, and for a weekly magazine, that limits the index to only a handful of years. Using the index, you can't go directly to issues before the 1980s, and with search by issue date useless, that means you're just out of luck if you want to see a particular issue in the 1970s.
The requested URL /books/serial/ISSN:08888507?rview=1 was not found on this server
I have some of those sources still on my hard drive in their scanned PDF format. They've now effectively vanished from the open internet. So much, available for such a short window. Our children will never believe us when we tell them what was once right there at our fingertips, and those that do should never forgive us.
It would be nice if anyone could build such tools, but all of that data is locked up inside of places like Google Books and Hathi Trust. Google isn't even interested in making their metadata available, other than by running searches.
This is because Google Books is acting like they own the copyright (or, at least, they feel the need to police it.)
There are many cases where you can download the entire book from Hathi Trust when you are sitting at a university library, giving you a PDF you can use anywhere. But you cannot even see the entire book or download it from Google Books (which has given its scan to Hathi). This is just stupid.
Hathi Trust isn't paying the copyright holders, either, so who cares?
I get 14,515 results, with 3,115 of them full view.
There is also
which seems interesting!
After the institution logs in, you can download anything you want. Bring a flash drive or portable hard disk and take home your PDFs.
I'd recommend calling the partner library first to make sure someone there knows the Hathi login. It's not that popular a resource, sadly, and many people have no idea what it is. You may also find a friendly librarian who is willing to do the download for you and email you the PDF, saving an in-person visit.
One thing to note: You don't have to be a student to use the resources of university libraries. They're open to everyone.
Is it possible to dump the metadata of a book and check if they have the right date? There should probably be multiple dates for a book -- date written, date copyrighted, date published, date of latest edition, etc.
My guess is that Google does not have a publicly-available issue tracker for Google Books so you can't easily report this problem. Hacker News is a good way to get their attention, though...
The TL;DR is that Google Books started out with the goal of digitizing every book ever written. Publishers sued, so they crippled their search and display functions and handed over the full texts they already had to a group called Hathi Trust.
Hathi Trust is seriously crippled on purpose. It only allows access to full texts when you are sitting in a physical library of one its partner universities. That's right... I can drive to a big university, sit in their library, and download a full PDF of any book I like. But if I'm at my house, I can only read one page at a time in a browser. This is ridiculous. Hathi Trust is helping the oil business more than they're helping researchers.
The marriage of Google Books content and Hathi Trust as a distribution platform is a joke. In some cases, you will even have to order an old book from interlibrary loan (see worldcat.org) if you can even get it -- when all the while Google has a scanned copy!
My grandfather wrote a book in the 1940s that’s been out of print since the mid 50s. Every entity associated with the book is dead, including the publisher, which merged into another in the late 50s and is probably an inactive imprint of some successor company. Grandpa died in 1985, and a cousin or I is likely the heir to his rights.
I have a copy of the book, which I bought via Alibris from a bookstore in Wales 15 years ago. If you needed the book for research, you’d probably get it via inter-library loan from a university or a big city library. Whomever the publisher is, they don’t have it and aren’t selling it. In no scenario does anyone get paid for transacting, other than a reseller or the post office.
Your grandfather had a book, the rights of which should have been presumably passed down to you. My grandfather had a patented mining claim that has been passed down to me. Where the ownership of your grandfather's book is questionable, for me the physical corners of the property are questionable. A number of them are defined by things like a "4 foot spruce post" or a "12 inch diameter tree trunk" that haven't survived the ravages of time.
But it is important that I patrol my property at least every couple of years because of Adverse Possession. If someone else were to use my property continuously and I don't say anything against it, one day their trespassing suddenly and magically would become ownership. For a land-owner, it is a scary idea that someone can steal my property from me, as has actually happened. 
But I can acknowledge that it makes some sense. It comes from the idea that land is meant to be used, and if you aren't using it, maybe the person who is using it should get the rights.
If nobody can stand up for an intellectual property claim, perhaps some kind of adverse possession is in order.
- Real property (i.e. land) is a scarce and limited resource. If a party is making productive use of the land, they should hold title. (There is only so much arable land. If someone raises crops, let them.)
- Intellectual property (particularly copyright) is not a scarce or limited resource. (Create your own copyrightable work if you wish to own the rights.)
Much of real property theory arose from the assumption that the government should recognize and encourage the "highest and best use" of real property. Traditionally, the highest and best use of land is the use that can most profit from the land's resources; often mining, grazing, farming, logging.
This is problematic.
- This view justifies colonization, and taking land from original inhabitants who don't use the land to extract resource value.
- This view does not recognize preservation of an ecosystem as a valuable use.
- This view does not account for externalities from use of the land's resources.
I think it's worth noting that you can calculate an estimate of the externalities and remove that from the profit to achieve a more balanced justification. Though unfortunately, unless you counted the loss of culture as an externality then you could still trivially justify the removal of land from those less productive/ technologically advanced than you.
Furthermore, even though I'm not personally supportive of the removal of land at the individual's loss I do have to ask if the removal could account for a net gain overall; improving many people's lives. Perhaps profit isn't the best measure of improvement to the collective but it is at least indicative.
It isn't scarce for those seeking rent from it; it absolutely is for those seeking to use the works under copyright. In case of books, music, movies, games, etc., the works are not substitute goods. If I need a particular book for my research, there's a good chance I can't just take a different book instead. So there is acute scarcity involved for a subset of parties interested in a copyrighted work.
I think a simpler* solution would be to return to limited terms on copyright (say, 14 years), and require periodic renewal by the rights-holder to extend that term. As part of the renewal process, you'd either need to demonstrate active use of the copyright, or pay a fee (or both?).
* Simpler from a process point of view. I understand it's probably not simple politically.
This problem is impeding real research without helping anyone.
I'm speaking here of old, out-of-print books, where this is very much a problem with Google Books and Hathi Trust.
Copyright should automatically expire 10 or 15 years after the last printing IMHO. If nobody cares enough to put it up for sale or even make a tiny print run just to renew the copyright, why should the government continue to enforce it?
Of course this scheme falls apart a bit in the digital age, except that even ebooks get pulled from the shelves for no apparent reason. Maybe we should just go back to having to explicitly renew copyright after 15 years or so, with a fee just large enough to convince people to drop dormant works. Maybe a couple hundred bucks every 5 years.
The best part would be having some easily accessed online system where you could check the copyright status of any work, including current contact information for the rightsholders if you want to arrange payment.
15 years is plenty to make a mountain of cash.
LibGen has no such problems.
Not sure there's an answer for a single search for a single individual though.
HN has a lot of users. There's often a pretty low overlap of commenters between different submissions.