Hacker News new | past | comments | ask | show | jobs | submit login
Better World Books and the Internet Archive Unite to Preserve Millions of Books (against-the-grain.com)
389 points by l1n 11 days ago | hide | past | web | favorite | 44 comments

Hmm... so, at first blush at least, this sounds like a pretty Big Deal, and a Good Thing. From the press release:

Better World Books, the world’s leading socially conscious online bookseller, is now owned by Better World Libraries, a mission-aligned, not-for-profit organization that is affiliated with longtime partner the Internet Archive. This groundbreaking partnership will allow both organizations to pursue their collective mission of making knowledge universally accessible to readers everywhere. This new relationship will provide additional resources and newfound synergies backed by a shared enthusiasm for advancing global literacy. Together, the two organizations are expanding the digital frontier of book preservation to ensure books are accessible to all for generations to come.

This new relationship will allow Better World Books to provide a steady stream of books to be digitized by the Internet Archive, thereby growing its digital holdings to millions of books. Libraries that work alongside Better World Books will now make a bigger impact than ever. Any book that does not yet exist in digital form will go into a pipeline for future digitization, preservation and access.

Sounds good to me. Of course, there's a big difference between issuing a press release and actually doing something. It'll be interesting to see how this plays out in reality.

Better World Books has in my experience been purposefully misleading about the structure and purpose of their business. They put book donation boxes everywhere that are covered in "save the children" style graphics and messages.

But the reality is, they are a for profit company that very much enriched the owners while donating a very small percentage to charity. And it's one thing to have a corporation that does social good. But their entire business is based on receiving donated books from people who think they are donating to charity. They also make deals with public libraries to sell their books online and take a big cut.

Which makes me suspicious of this new arrangement. So it is now a for profit corporation owned by a non-profit?

Did the owners decide to reorganize or donate the whole business or something?

According to their website, they have donated $28,000,000 and 26,000,000 books. These seem to be separate activities, i. e. the money is straight-up cash given independently of the donation of books.

I'm not sure if that fits with your characterisation as "a very small percentage". They don't seem to publish profit numbers, making this somewhat unanswerable.

But having read the press release and website, I did no get the impression that this was necessarily a non-profit. There are dozens of "free shipping", "sale", "save", "bargain" claims, and barely any of do-goodery (the name, plus those donation numbers at the very bottom). "Non-profit" is mentioned nowhere on their site, and they characterise themselves as a "social venture", a term that means exactly what they are.

They may have cleaned up their act some. But this is what their "donation" boxes used to look like: https://www.alamy.com/book-and-clothing-donation-bins-in-a-s...

Notice they say, "Donate books here". But you aren't donating. That's a total lie. Any more than giving free merchandise to Walmart is "donating it" because walmart gives to charity every year.

No one goes directly to their website. They sell primarily on 3rd party sites, so they probably decided to clean up the site a bit. I remember it being much more misleading in the past. I'm sure you could look on the wayback machine.

As for the large number of books given away, that is simply a cheap way for them to get rid of the books that are worthless. Otherwise, they would have to pay to have them hauled off.

And you'll also notice that they don't say they have "donated" $26m to charity. They say they have raised $26m for charity. Not sure what that means, but it sounds more like they are soliciting donations from others. Maybe they are providing matching funds in some cases or maybe they are just organizing donation programs.

Also, that number is the amount raised since 15+ years ago. They have annual sales estimated around $75m a year. So $26m lifetime is rather small.

In the past, they prominently offered ‘free worldwide shipping’ but silently changed their prices according to the incoming IP. (This has now been fixed, possibly because they did ship orders placed in the US at the US-disclosed prices.)

As a prior BWB employee (with no equity or outcome from this sale) I'm happy to respond to some of these concerns.

"Very much enriched the owners" is quite a stretch. Especially considering pretty much every equity holder was wiped out ~5 years ago when they had to raise money to stay in business.

The CEO lives in a house in central Indiana that costs less than the average 1BR condo in San Francisco. Before that he lived in a townhome in suburban Atlanta. This business is not and never has thrown off cash. At best they've made as much money in profit as they've donated, in cash, to literacy causes over the life of the business.

A "very small percentage" donated to charity is up for debate based on everyone's belief of what that means, but all charity payments were made as a percentage of net sales revenue, which was essentially the money that came in from any sale. The only cost that was subtracted out was marketplace fees when selling on Amazon, ebay, etc. Sales percentage back to the book sources (academic and public libraries) and non-profit partners (on all sales from all sources) were paid out before shipping costs were even accounted for.

"Their entire business is based on receiving donated books from people who think they are donating to charity" is not true. Drop box books accounted for < 10% of all books sourced and even less than that of revenue since those books are typically the lowest quality stream (slightly better title mix than thrift purchases, but much higher logistics cost and risk of spoilage). The business is almost entirely dependent on public and academic library partnerships.

There were many, MANY monthly all hands meetings we sat in where the business lost money but wrote 6 or 7 figure checks to literacy partners. Donations were not a function of making money.

BWB was a great company full of people who truly cared about the mission. Some people did well, but nobody got rich off this business. If they didn't move most of the corporate activity out of Atlanta and up to Indiana I would still have been happy to keep working there.

>The CEO lives in a house in central Indiana that costs less than the average 1BR condo in San Francisco.

So your evidence that one of the founders didn't make a lot of money is that he owns a million dollar home in central indiana? A $1m home in the midwest is going to be very, very nice and probably an order of magnitude more expensive than the average home in the area ($148k in Indiana).

But that is good to hear that they really are donating money. Because their huge "Donate books here" boxes are very misleading.

Do you know any one at Jenson Books or who the owner might be :) That's the biggest mystery from my entire time selling books on Amazon. They're by far and away the most frustrating seller in the marketplace.

I truly despise them. They can't POSSIBLY make money on so much of their inventory without lower negotiated FBA rates from Amazon. It's not just a few like with BWB, but over 75% of their inventory seems entirely unreasonable to sell at their market price.

BWB is also the only online bookseller that seems incapable of delivering their orders consistently. I have given up trying to order from them.

You should try Thrift Books they also offer inconsistency as a service.

Ebay is where I get my out of circulation library books for really cheap. I just picked up a never loaned "Toward Artificial Sapience : Principles and Methods for Wise Systems" for $3.73. On Thriftbooks it's $156. Looking back, the seller was "BetterWorldBooks"!

I have ordered from Thriftbooks dozens of times. The only two things that happened was one package theft (obviously not their fault, but they re-sent the books for free, no questions asked), and one extra shipment of books I had not ordered (this was probably someone else's order that they sent me).

Interesting... I've bought probably two dozen books from them over the past couple of years (via their Amazon storefront) and I've never had a problem with them.

I've ordered quite a few from them (novels, maths textbooks, ...) and never had an order cancelled. :-/

do they cancel your order or take forever to deliver?

Cancel order after about 4-5 days from experience. Had it on several books!

Order cancellations here as well, after books have been sent out and returned as undeliverable. Have not ever had that happen with any other shipment to my address, so I suspect they have a recurring issue with label printing. Their customer support just ignored me when I brought that up (several times) and issued a refund without any explanation.

Yes, it's suspicious and I have questions. But it's also plausible that it's all legit.

People who donate books often overestimate what their books are worth. Selling the books and using the money to fund the library (which can buy books it actually wants) might be more efficient than paid staff handling a bunch of books they mostly don't want. What's the expense on doing the work themselves, and can outsourcing beat that?

Also, it seems like people who donate stuff have lots of other illusions about how much good they're really doing, and the people actually doing the work will entertain those illusions, but they have to deal with reality. Personally, I'm fine with anyone who takes stuff I don't want off my hands for free with no hassle.

Hmm. Especially knowing that being a nonprofit is one of the things that will support your claim to being a "library" as far as the parts of the copyright code that give libraries special rights to copy things others don't have (which is part of what the IA relies on)... now I'm suspicious too.

Maybe it's just due to the current sociopolitical environment we are living in but "socially conscious" makes me wonder if they'll decide which books are worth preserving and which are not.

Mek here from Internet Archive's OpenLibrary.org. We've thought carefully about this as well.

We feel it's critical that Internet Archive and Better World Books aren't the only organizations involved in determining which books should be preserved.

In late October, we launched a new program which allows anyone in the world to take control of our Library's shelves and contribute/sponsor the books which reflect their values: https://openlibrary.org/sponsorship

So far we've seen an influx of high quality reference books come in (everything from African antiquity to Microbial biology) that we almost certainly wouldn't have procured ourselves based solely on the data available from e.g. library holdings, wikipedia citations, etc.

Having Better World Books as a partner has been incredibly helpful because we're able to calculate sponsorship prices quite accuracy (because shipping prices are included) which was a big challenge with other APIs we previously used. Also importantly, Better World Books helps our patrons sponsor older books which are often more at risk of disappearing, which is one (of many) useful heuristics when considering preservation.

Is the number of digital copies you can lend out limited by only how many physical books you have "dematerialized"? For example, if I shipped 10 used copies of a book to Open Library and you recorded you destroyed them, could you lend out 10 digital copies at a time?

This question appears in https://controlleddigitallending.org/faq. Specifically for our sponsorship program at this stage, we decided to limit eligible books to those we had no copies of (in the interest of maximizing preservation -- we're as much an archive as we are a library).

In general we're not thrilled about books being destroyed. There's always someone who can benefit from a book. Also, one never knows when a book may have to be scanned again because a digital copy could theoretically become lost, corrupt, or benefit from new tech (e.g. a palimpsest). These are all big reasons our digitization process is explicitly non-destructive.

http://openlibraries.online describes how we are working with other Library partners and including them in our model to help make works more broadly available.

None of this directly answers your question which is probably better answered by a lawyer. I'm just a fake librarian.

Thanks for taking the time to reply.

I love this. Individual self interest for the collective benefit.

Now, I wish there was something like this for translation and summarisation. I'm a fiend for renaissance/enlightenment lit that is in Latin or just hard to read -- but amazing and world changing.

I suggest updating that FAQ to make it clear that you also accept direct book donations and linking to:


Original source on the Internet Archive blog: https://blog.archive.org/2019/11/06/for-the-love-of-literacy...

I've had really good experiences with Better World Books. However, I do wonder how they source some of their stock. I've bought several books from them that have historical value and never should have left the Smithsonian.

Were they literally from the Smithsonian?

Either way, libraries/librarians deaccession books all the time that outsiders might think "that's so valuable, why'd you get rid of it?"

There is only so much budget/space to store stuff. You get rid of stuff that is no longer relevant to your mission, or lots of other libraries have so you don't need one too, or just isn't _as_ important to your users/mission as other stuff. No library has infinite space.

You also reclaim space if you have multiple copies of a work and the demand for the title in your collection is limited. So even if you've received a copy that was formerly from some library they may have kept a copy or two if they used to have a dozen in circulation.

Those are probably books from Academic Library partners, since they tend to have the rarest and most valuable stock.

Every book that comes in gets scanned and assessed to one of a few possible streams: - List for sale across all markets - Donation (too many of that title in inventory, desired by one of the specific literacy partners) - Recycling (too many of that title in inventory, not desired by literacy partner, or condition too poor) - ARC Books

That last business line is the Antiquarian, Rare, and Collectible group. These books are diverted to a team of people whose sole job is to manually price these books and work with rare book dealers as well as some of the more high-end marketplaces to move them. This went for more than books as well, as sometimes there were interesting related pieces that came in the door.

Also, fwiw, when I was there any book like this that sold for > $500 had a whole separate commission structure where at least half of the sale price would go to the group that sourced the book. So if an academic library sent a book from the early 1800's, and that sold for $15,000, they library would get $7,500 back.

please do tell us more!

> Now libraries who deaccession to BWB can have even greater social impact, because the Internet Archive will acquire, digitize, lend, store and digitally preserve millions of books from BWB’s inventory over the next few years

Perhaps this is an obvious question with an obvious answer but how will this work in relation to copyrights? I know that many copyrights expire and sometimes the copyrights are forfeited and as a result many creative works from long ago end up in the public domain.

But what is the plan here to achieve this mission while respecting copyrights?

They appear to be treating this like a digital library. From the Wired article[1]:

> you can click the name of the book and see a two-page preview of the cited work, so long as the citation specifies a page number. You can also borrow a digital copy of the book, so long as no else has checked it out, for two weeks—much the same way you'd borrow a book from your local library.

The two page preview should be defensible as fair use; this is less than what Google's archived books allow you to view, for example.

The legal details of the borrowing process are further documented here[2]:

> We have recently made available a small number of books (currently 61 books) published between 1923 and 1941 under a provision of US Copyright law that was written to permit libraries to copy and lend titles that are no longer subject to commercial exploitation, and selection is currently overseen by lawyers expert in US copyright law.[3]

It sounds like as long as a book in the last twenty years of copyright and is out of print, it might well be fair game for archival and lending. I'm not sure about books more recent than that.

1. https://www.wired.com/story/internet-archive-wikipedia-more-...

2. https://blog.archive.org/2018/01/24/digital-books-on-archive...

3. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3049158

I think you have found a lot of the relevant law.

The relevant laws are definitely not _entirely_ clearcut. Especially in how it applies to making a digital copy like this. Which means they aren't entirely clearcut against them either. But someone could certainly try suing them. Apparently IA is willing to be a bit risk forward here.

But if they're only doing out of print books, there's less likely to be someone who _wants_ to sue them (a lot of old out of print books are basically "orphaned", there is no identifiable copyright owner, which is what a lot of the relevant laws are targetted at), and if someone did want to sue them, they'd have less chance of winning for an out of print book. (It's not entirely clear to me they're only doing out of print books?)

Don't forget that Google's ability to scan and offer previews of books as 'fair use' was the subject of a multi-year lawsuit too! Which ended in a settlement, not a court ruling...

The post on archive.org states: >Any book that does not yet exist in digital form will go into a pipeline for future digitization, preservation and access.

My reading of this is that they will put it on a list of things to scan and make available once the lawyers decide it is ok to do it (or more likely scan it and wait to put it online until they think it is ok to do so).

Don't libraries have some kind of exemption?

These days, my mind immediately jumps to "what if you plugged all those books into GPT-2!?"

mek here from Internet Archive's openlibrary.org project. We've been in broad talks w/ folks like OpenAI about how the contents of texts may be used to power better discovery and to increase usefulness of books. Open Library is pretty far from GPT-2, but we do have fulltext search across ~3.5M books: http://openlibrary.org/search/inside

We're also an open source project [https://github.com/internetarchive/openlibrary] and happy to collaborate w/ folks on such projects. I'm personally very inspired by the https://techcrunch.com/2014/07/25/apple-booklamp/ Booklamp project; building a genome for every book and surfacing as much content as we can to help patrons discover citations, quotes, and other useful content which would inform their reading choices and otherwise be completely inaccessible behind a borrow.

If anyone is interested in helping us move the needle on such an effort, please do get in touch and we'd be glad to invite you to Open Library's slack channel.

The OCR text (and ABBYY) for about 15 millions are already available today, e.g. c.f. https://archive.org/advancedsearch.php?q=format:abbyy&fl=ide...

The ABBYY files are raw output of OCR, but most "items" on the Archive also include the extracted plain text.

It's noisy but generally quite readable. A few heuristics would pass pretty clean copy to GPT-2.

How is this different from this project? https://www.gutenberg.org/

It seems books that are so old they enter the public domain end up at Gutenberg in eBook format.

I had a friend of mine, Mike Crawford, who died. He refused to put his works in a book and put it up on websites. Well nobody can maintain his website but if he did a book and submitted it to Gutenberg under creative commons his work would live on past him. Many books are on websites for free with advertising or donations to keep them running.

(1) Project Gutenberg offers fully transcribed books (not just OCR) that are also reformatted to be readable in the browser (or as EPUB's). The Internet Archive just stops at the scanning stage and provides page images.

(2) The Internet Archive apparently has some arrangement to "lend" books that are still under copyright, and most (if not all) of the books that BWB supplies to them are going to fall under that. But they do have a lot of stuff that's totally out of copyright too, with no such restriction. Even PG (as well as its sister site, Distributed Proofreaders) takes a lot of their "source" material from the IA.

Any alternative to Google's land grab of intellectual rights to the books they digitize (in a horrible quality to boot) can only be a good thing.

Doesn't Google Books push a lot of their PD content to the Internet Archive themselves? That doesn't seem like an undue "land grab", it seems like they're being good net citizens for once.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact