
What Ever Happened to Google Books? - jeo1234
http://www.newyorker.com/business/currency/what-ever-happened-to-google-books?intcid=mod-latest
======
chippy
The reminds me of people's attitudes towards reCAPTCHA - started by
researchers at Carnegie Mellon University. "Stop spam, read books."

Everyone was supportive of it when it was used towards the non profit
digitization of out of copyright books. So, started externally, Google ran
with it and continued and expanded the range of books. It remained good.
Apparently loads of books were digitized.

Then, possibly along with the change in this article - or along with the
perceptible change within Google where everything had to be business
accountable a few years back, reCAPTCHA started being used to digitize address
numbers from StreetView to improve Googles online mapping, geocoding offering.
Nothing to do with books, nothing to do with improving the world.

Now reCAPTCHA is being used for image recognition and training (identify the
images with salads). Nothing to do with books, information or improving the
world - everything to do with Googles own offerings.

What's even sadder is that [http://captcha.net/](http://captcha.net/) still
states that it is being used to "help digitize books" but all the links go to
[https://www.google.com/recaptcha/intro/index.html](https://www.google.com/recaptcha/intro/index.html)
which have all but removed any public benefit wording. " "Stop spam, read
books." was removed from Googles site in 2014.

~~~
blfr
How is reliable Google Maps and Street View not improving the world? I'm using
them daily and they're improving my life a lot.

~~~
blub
They are improving your life in exchange for information about you and the
places you visit. Their product is closed, the proprietary data is not
accessible outside their apps, or if they wish to license it.

OpenStreetMap is improving the world.

~~~
BogusIKnow
I try to use open street map over and over again, b/c it has the better data.

But then their site and UX are so bad, I always cry when I use it for the lost
effort of so many people.

~~~
dublinben
The main OpenStreetMap.org site is really intended mainly for those
_contributing_ to the map. They don't make it clear enough that end users
should probably be using one of the many other sites based on their data.

~~~
BogusIKnow
Which is part of the problem. Suppose maps.google.com would only be to report
errors and you need to find another site that works.

~~~
dublinben
It's only a problem if you think the OSM project is trying to build a
replacement for maps.google.com. Since they're not, it's not a problem. Most
users of their free location database are not using it directly on the OSM.org
site, but through other apps and services. It is rightly the job of these 3rd
parties to deliver excellent user interfaces for the data.

------
jrochkind1
This article leaves out some important things.

Google making the project non-profit would not have saved them from the
lawsuit. The Author's Guild, separately, sued a non-profit partner in the
Google Books scanning project -- HathiTrust. [1]

That lawsuit was not resolved until 2012 -- when, without a settlement,
HathiTrust won on fair use.

The court decided that scanning books for searching was fair use. While the
court did not say the same for displaying full text -- what the OP wants -- it
is notable that the court's opinion was not primarily based on non-profit
status of the organization (as is common in U.S. fair use case law; the non-
profit factor has generally dwindled in court decision-making), but on
transformativeness.

The OP mentions "Others argued that the settlement could create a monopoly in
online, out-of-print books," but gives that opinion rather short shrift. This
was a very real concern -- what if Google's use really would be fair use? If
the court decided that, the opinion would apply as precedent to everyone. But
a settlement really does apply only to Google -- no one else even had access
to the terms of the settlement. Anyone else trying to do the same would risk
being subject to a decade-long lawsuit of their own.

The OP should ask, why didn't Google go to trial?

"If Google was, in truth, motivated by the highest ideals of service to the
public...." they should have gone to trial to establish the right for all. As
HathiTrust did.

The Google Books project still exists, they did not take it down because of
legal worries, even in the absence of the settlement. But it has indeed been
allowed to languish. While I'm sure the multi-year lawsuit contributed to this
-- Google starting ambitious projects and then allowing them to languish,
without improvement, without fulfilling their original promise, slowly
degrading and withering away -- is a pretty common Google practice even
without multi-year lawsuits.

[1]
[https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._HathiTr...](https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._HathiTrust)

~~~
magicalist
> _Google making the project non-profit would not have saved them from the
> lawsuit. The Author 's Guild, separately, sued a non-profit partner in the
> Google Books scanning project -- HathiTrust._

> _That lawsuit was not resolved until 2012 -- when, without a settlement,
> HathiTrust won on fair use._

I think you're mixed up on some things. HathiTrust wasn't sued until _after_
the settlement was rejected, meaning Google was getting sued simultaneously.

It is good the original settlement was rejected, though I agree with the
author that a modification to the settlement, similar to what the Internet
Archive was asking for, was a much better solution than another decade of
court battles and no orphan works for another century.

> _The OP should ask, why didn 't Google go to trial?_

Google did go and the district court ruled that scanning, searching and
snippets were fair use in 2013[1]. The case is still ongoing, the Second
Circuit heard arguments last December[2]

[1]
[https://www.publicknowledge.org/files/google%20summary%20jud...](https://www.publicknowledge.org/files/google%20summary%20judgment%20final.pdf)

[2] [https://www.eff.org/cases/authors-guild-v-google-part-ii-
fai...](https://www.eff.org/cases/authors-guild-v-google-part-ii-fair-use-
proceedings)

~~~
jrochkind1
ah, thanks!

------
espes
Shout out to the Internet Archive, a non-profit close to what the author
describes. Unfortunately it turns out book scanning is expensive and rights-
holders still don't like it.

[https://archive.org/details/texts&tab=about](https://archive.org/details/texts&tab=about)

~~~
yonran
Note that the Internet Archive’s founder Brewster Kahle was one of the
strongest critics of the creative class-action settlement which would have
allowed Google as a one-off to publish the out-of-print orphan works that it
scanned. If the lawsuit had gone to trial and Google prevailed on fair-use, it
would have set a precedent that would benefit the Internet Archive. But the
Internet Archive was vehemently opposed to any settlement that didn’t help
them.

Reference: Brewster’s op-ed [http://www.washingtonpost.com/wp-
dyn/content/article/2009/05...](http://www.washingtonpost.com/wp-
dyn/content/article/2009/05/18/AR2009051802637.html) and filings by Internet
Archive and Open Book Alliance [https://dockets.justia.com/docket/new-
york/nysdce/1:2005cv08...](https://dockets.justia.com/docket/new-
york/nysdce/1:2005cv08136/273913)

------
wyclif
I was an early and enthusiastic fan of Google Books. I often do research that
relies heavily on 17th-19th century English academic works, which is right in
the Google Books public domain sweet spot.

But something went awry—I'm not sure what—and the project was allowed to
languish by Google. The interface has been in maintenance mode for ages, with
no development going on. This leads to a lot of frustration: for instance, you
cannot share all of your saved and tagged books with another user, and sharing
a shelf or series of shelves is awkward and clunky. In terms of UX, it appears
to be abandonware.

On top of that, there's no way to know what's going on or who to talk to,
because users can't actually contact anyone at Google Books.

~~~
a3_nm
A related complaint is that, even for public domain books, downloading them is
painful: sometimes you need to fill a CAPTCHA, recently you often have to
create a Google Account just to download a free book
<[http://uzy.me/qc>](http://uzy.me/qc>), and hunt for download options (the
system tries to make you read the book within Google Play instead). At some
point, also, the books' first page included boilerplate with requests for
attribution and non-commercial use (which I think is bogus for public domain
material, even if Google digitized it).

If Google Books really were about improving public access to books,
downloading public domain books would be frictionless, and bulk downloading
and mirroring would be encouraged.

~~~
walterbell
Google-scanned books are also available as a direct URL on the Internet
Archive, there's no need to use Google's multi-step access interface.

------
walterbell
In June 2015, the US Copyright Office issued their report with draft
legislation, " _Orphan Works and Mass Digitization_ ", after consultation with
creators, libraries and tech companies. This guidance can be used by the US
Congress to create laws permitting Google Books and other efforts to digitize
orphaned works.

[http://copyright.gov/orphan/](http://copyright.gov/orphan/)

~~~
walterbell
Here is a legal analysis of the Copyright Office's proposal to limit damages
for orphan works, in the context of TPP requirements for damages,
[https://www.washingtonpost.com/news/volokh-
conspiracy/wp/201...](https://www.washingtonpost.com/news/volokh-
conspiracy/wp/2015/09/03/in-a-dark-corner-of-the-trans-pacific-partnership-
lurks-some-pretty-nasty-copyright-law/)

 _" How is this scheme, which provides for no damages (in some circumstances
at least), compliant with the TPP requirement that the US must provide for
either “pre-established damages,” or “additional damages”? No damages = no
damages, no? A court would (as I read the new statute) NOT be permitted to
award punitive, or exemplary, damages in orphan works cases – but the TPP
seems to require that._"

------
oneJob
Attempting to force Google to change course on this would at best result in an
empty gesture on Google's part with likely no follow up regarding the bigger
project. They may open up what is already scanned, but they're not likely to
resume the project under benevolent terms.

Employment used to last as lifetime. One could retire from Sears with decent
benefits if they were loyal. Today, many of us are contract employees or Uber
style "partners". This is the same phenomenon we see happening here, just on
the product side. We've been going down this road for a while now. You often
don't buy products that last a lifetime anymore. It's often necessary to buy a
whole new one rather than fix the one you already have because the replacement
parts aren't made available. So, it's happened to our workforce and our
products, now it is also happening to our companies. Always in the name of
profit. Fair weather friends.

Sometimes this is good. Sometimes it is bad. Almost always it is a false
choice.

The logic of capitalism insists on competition and specialization which are
often at odds with cooperation and leisure. Why should Google need to choose
between the bottom line and benefiting its community by sharing the work
they've already accomplished. In a word, competition.

To me, this is the greatest strength of open-source. It allows for cooperation
and facilitates transparency, the very foundations of community. Open-source
has done nothing to stymie specialization, one of the main thrusts of "The
Wealth of Nations". The other, self interest serving the whole, I think, has
been shown by history and OSS to be one possible, but not the sole, idea
regarding what motivations might facilitate a productive community.

So, back to Google Books. What happened to Google Books is exactly what one
should expect to have happened in a political-economy such as ours. A
different outcome would not necessarily have resulted from different decisions
by Google, but by different incentives and structures of a different economic
framework. Let's not be distracted by the red herring of this anecdote,
framing it as a one off, but instead look to this anecdote as a case study in
a much larger domain.

------
quink
Somewhat related: [http://www.theatlantic.com/technology/archive/2012/03/the-
mi...](http://www.theatlantic.com/technology/archive/2012/03/the-missing-20th-
century-how-copyright-protection-makes-books-vanish/255282/)

> Because of the strange distortions of copyright protection, there are twice
> as many newly published books available on Amazon from 1850 as there are
> from 1950.

Additionally, The Walt Disney Company is sure to get legislation passed before
2024 to extend copyright once more.

------
yannis
>The thrilling thing about Google Books, it seemed to me, was not just the
opportunity to read a line here or there; it was the possibility of exploring
the full text of millions of out-of-print books and periodicals that had no
real commercial value but nonetheless represented a treasure trove for the
public.

I had the same excitement as the author when Google Books came out. The
service has stagnated over the years. Reading snippets is such a frustrating
experience (you cannot even cut and paste the text). Even the books where one
can buy an ebook is not available for some countries. Many times it is quicker
to search on archive.org to find related books digitized by Microsoft.

We still have a long way to go where knowledge can be distributed at low cost
and in abundance ...

~~~
kuschku
The guttenberg project has a lot of ebooks available for free, but only for
public-domain books.

~~~
fernly
Indeed Project Gutenberg[1] has for a number of years been putting out-of-
copyright books online. They take great pains[2] to make sure the books are no
longer in copyright.

Another very important difference is that PG books are proofread by human
beings and have a much, much lower rate of OCR errors than the bulk-scanned
Google books which are, often, abysmally, disgustingly full of blatant scan
errors, sometimes to the point of being unreadable.

[1]
[http://www.gutenberg.org/wiki/Main_Page](http://www.gutenberg.org/wiki/Main_Page)
[2] [http://www.gutenberg.org/wiki/Gutenberg:Copyright_How-
To](http://www.gutenberg.org/wiki/Gutenberg:Copyright_How-To)

------
njharman
> split fifty-fifty between authors an publishers

Why? If anything it should be split between copyright holders. Whoever they
are. Legally, author / publisher, are meaningless.

But really if we can't get it together as a culture to eliminate perpetual
copyright we should at least make a rule that if a work is not available
(print or online) for 5 years then it is deemed abandoned and no longer under
copyright. Available doesn't mean free.

Our cultural heritage is a shared resource. It is not right for it to be
locked away.

------
joesmo
So these books that the fight has been about are out of print and essentially
do not exist anymore. They do not make money for anyone. They do not
contribute to anyone or anything. For all intents and purposes, they might as
well not have existed at all. Google tries to make a library of these
nonexistent works so that they can once again benefit humanity and the
copyright holders (which is pretty much never the authors when it comes to
books) are upset because they're losing out on their $0 of profit. Yeah,
copyright law really works well in this country.

------
giancarlostoro
If they somehow could turn those books accessible by a "Google All Book
Access" type of service, that would in turn enhance your searching to include
all their scanned books it would be amazing. They would however have to
somehow figure out how to even make such a service affordable while still
keeping publishers 'happy'.

~~~
Synaesthesia
I think the what's missing from this article is the intransigence of
publishers to give up anything, and the politicians will usually side with
them too with copyright law. There's a world of information out there which
could benefit mankind but it's all locked up for profit.

~~~
KayEss
Which might not be so bad if the copyright owners were actually profiting from
it, but for the most part we're talking about books that are out of print and
not earning their copyright owners anything -- it's just sheer intransigence.

~~~
WalterBright
And many cases where the copyright holder no longer even exists or can be
found.

------
ching_wow_ka
I can say pretty certainly that all the text they've gathered through the
Google Books project is in use in their language models and other AI models
for their search engine, speech recognition, etc.

They got what they wanted. I can't see what incentive they have as a business
to grant access to the books that justifies paying employees for it.

~~~
kylebgorman
Just my personal opinion, but when you have an indexed copy _of the whole web_
, a few million OCRed-but-not-corrected books from previous centuries added to
your LM are not going to improve 2015 speech recognition quality.

~~~
ching_wow_ka
How many words do you think the entire web, as crawled by Google, has?

~~~
CydeWeys
Way way more than a corpus of a few million published books, that's for sure.
Hell, there are individual message boards that have higher word count than
millions of books. Wikipedia arbitration cases (these aren't articles, but
rather, an esoteric back channel for handling disputes between users)
frequently reach novel-length.

The average quality is going to be lower, of course.

~~~
DanBC
There are hundreds of thousands of words on Wikipedia about en dash, em dash,
hyphen, and minus.

Here's one discussion over over ten thousand words:
[https://en.m.wikipedia.org/wiki/Wikipedia:Village_pump_(poli...](https://en.m.wikipedia.org/wiki/Wikipedia:Village_pump_\(policy\)/Archive_101#Hyphens_and_endashs)

The least interesting thing about Mexican American War is what type of dash
you use between Mexican and American. There are over twenty thousand words
about that dash on wiki meta.

15,000 words would be okay if at the end of it there was some kind of
consensus, or something that could be tramsfered to different articles.

The future people are going to have a skewed image of us if they think meta
wiki is representative.

------
marincounty
"If Google was, in truth, motivated by the highest ideals of service to the
public, then it should have declared the project a non-profit from the
beginning, thereby extinguishing any fears that the company wanted to somehow
make a profit from other people’s work."

I think Google might win over some critics if they resumed the project; set it
up as a non-profit, but not some slick non-profit that really doesn't help
anyone other than Google? The bylaws would be lawyer proof, and BOD proof. The
out of print(out of copyright) books would be available to anyone for free.

I was very excited about this project, and it did seem to just die?

I used to like and defend Google. As of the last few years, with the tracking,
plethora of Ads, and the way they ruined YouTube, at least for me.(Yea, I
didn't like the way they took it over. I don't like all the advertisements.
Plus, I still have embarrassing videos up there that I literally can't get
off. Some kind of password screwup that is beyond the helpful customers at the
"Help Boards". See Google employees can't be bothered with trivial stuff like
my videos. (I asked, and was told to figure it out.)

So Google, if you are listening, go back to your roots. Some people, including
myself, hold no loyality to your company anymore. My sister uses Bing. I used
to tell her, you might like Google better. Those days are long gone. I'd tell
her about Duckduckgo, but it's just not quite their yet.

~~~
benbristow
> I don't like all the advertisements.

Google's an advertising company. Of course they were going to add adverts.
Even if the original founders still ran the site it'd probably eventually have
gotten adverts. Bandwidth isn't free, especially at that scale. You expect
Google to run it out of the love of their hearts after paying a huge sum to
buy it?

If you don't like the adverts, just use Adblock Plus/uBlock Origin and call it
a day.

~~~
LeoNatan25
The tone you have taken to answer, together with the last comment, make no
sense together. So you support the ad model, including the silly "out of the
love of their hearts" rhetoric, but conclude with installing an ad blocker.

~~~
roblabla
Well, it does. OP said it makes sense from youtube's point of view to put in
ads, and was to be expected of them. He didn't support the ad model himself,
just claimed it made sense.

------
hanlec
> I have a simpler suggestion, nicknamed the Big Bang license. Congress should
> allow anyone with a scanned library to pay some price—say, a hundred and
> twenty-five million dollars—to gain a license, subject to any opt-outs,
> allowing them to make those scanned prints available to institutional or
> individual subscribers.

Wouldn't this be great? Many of these materials are not indexed and chances to
discover them are decreasing every day. Second, getting access to these
materials is for many almost impossible (out of print, not available in the
libraries, etc.)

------
tuxt
>But, of course, leaving things to Congress has become a synonym for doing
nothing, and, predictably, a full seven years after the court decision was
first announced, we’re still waiting.

Ha, never thought about that 7 years ago. :)

------
pervycreeper
My biggest pet peeve with Google Books is that too many books which are
presumably in the public domain have access to them restricted. Not sure if
this is oversight, or on purpose.

------
jay_kyburz
The authors Big Bang licence is a bit crazy. Why not just monetise Books in
exactly the same way they monetise Video in YouTube?

Surely the copyright law is quite similar.

~~~
o_nate
The law is probably similar, but the technological ability to restrict access
is not. Youtube allows you to stream only, not download. That makes it easier
to get license holders on board. (Yes, there are workarounds for that, but
most users won't bother to go to the trouble, and the quality of the download
would be degraded from a purchased copy.) With books, there's no easy way to
limit access like that while providing a usable service.

------
wedesoft
Reducing the copyright term to something more reasonable would help a lot,
too.

------
analognoise
Somebody could always start a non-profit and continue the work - we don't have
to leave it to Google and stay disappointed.

~~~
chrisbennet
The rights holders that sued Google would probably sue a non-profit as well.

~~~
jrochkind1
They did sue the non-profit HathiTrust (a coalition started by the U of
Michigan) -- HathiTrust won, after many years.

Now, HathiTrust was not generally making _full text_ of in-copyright works
available, but only search results. HathiTrust doesn't even provide snippets
of your search results, just page numbers, unfortunately -- mostly out of fear
of lawsuits.

But HathiTrust's court victory helped establish at least some fair use rights
for scanning of books -- more than Google itself did, or was interested in --
if the settlement had been accepted, no legal rights would have been
established for anyone, only Google would have had permission from the
Author's Guild, endorsed by a court to give them -- but nobody else-- freedom
from lawsuits from anyone else too.

~~~
magicalist
> _But HathiTrust 's court victory helped establish at least some fair use
> rights for scanning of books -- more than Google itself did_

er... [https://www.eff.org/cases/authors-guild-v-google-part-ii-
fai...](https://www.eff.org/cases/authors-guild-v-google-part-ii-fair-use-
proceedings)

