Hacker News new | comments | show | ask | jobs | submit login
Librarians in Florida went rogue to save 2,300 books from an algorithm (qz.com)
72 points by webmaven on Jan 11, 2017 | hide | past | web | favorite | 59 comments

> He says his aim was actually to save the library money in the long run, by not having to repurchase books which often go in and out of fashion with readers. One of Finley’s choices, for instance, was John Steinbeck’s “Cannery Row.”

Any librarian worth their salt will tell you that every library should have a copy of Cannery Row. Any algorithm that says it should be discarded is simply wrong.

Circulation numbers can be very useful. But they cannot tell you what people will be looking for in the future. Common sense and experience are necessary to put the list in the proper context.

A good librarian can talk knowledgeably about thousands of books. They are community treasures that are not easily replaced by algorithms. They do a lot more than just put books on the shelf. Just as there will always be bartenders, there will always be librarians.

There is a culture war going on in libraries. The old guard -- book reading, book loving librarians -- are being replaced, especially at the top, by hip "Library Scientists" who want to push ebooks, internet access terminals and even rock concerts in libraries.

Ultimately there is a balance to be struck. Budgets are tight and libraries are evolving. But if libraries are going to remain useful and relevant, they will need to provide both internet access AND copies of "Cannery Row".

That said, the guy in the article did a bad thing and he'll likely lose his job at minimum.

This effectively boils down to whether libraries should serve what people actually read/do - or what they "should" read. Pretty much all the comments from this article fall on one of the two sides with each person talking past each other.

I don't think that the issue isn't the difference between what people "should" and do read, but between what people read now and what they're likely to read in the future (on the scale of years or decades). Collective tastes of the public ebb and flow over time.

The trick would be between finding the difference between actually needing the book later, or just hoarding something that will never be needed again...or at least, that introduces more of a cost to keep than it would cost to dispose of it and buy it again later if it does come back into fashion.

Optimizing for future use would generally lead you to purging most older books altogether, as newly released books generally have higher readership. With few exceptions its unlikely for popular taste to switch to a classic book en masse at the same time.

The only exception is if you knew a school were to assign the book as required reading to an entire grade - but that's really a rare and special case.

FWIW I think there is room for curated "should read" sections in a library. Optimizing purely for use is more the purview of a bookstore. But to support that the bulk of the shelf space should be serving the broadest use possible.

I had in mind cases where a book might surge from being checked out once a year to being checked out once a month, maybe something like a historical book about a specific Native American tribe when they're in the news for a while, or something.

It seems like one of those "big data things", looking for odd relationships between things, like vampires being more popular during Democratic presidencies and zombies being more popular during Republican ones (as a tongue-in-cheek example: http://www.mrscienceshow.com/2009/05/correlation-of-week-zom...)

The librarian in the article cited a future need to re-buy a book that was previously culled. I guess what I'm trying to work toward would be an idea of finding non-obvious signs of when certain categories of books might become more and less popular, what patterns (if any) govern that schedule, and from that, make predictions of expected cost in keeping certain books around, as opposed to selling them off immediately. I'm playing around with the idea of whether there's a better way than "current readership + age of book" to predict, to a useful degree, the probability of it being more or less popular in the future.

There's more than just the merit of the title to consider, however. Condition is an ignored part of many library collection policies. Books that look old and dirty are a turn off for users. I've said over and over again, "If a book is worth buying, it's worth buying again." If in doubt, throw it out. Buy a new one.

Curious why you think Cannery Row is such an important novel?

Well, it's no Of Mice and Men, but:

- Steinbeck is a major American writer (Pulitzer Prize, Nobel Prize). I feel like any library should have a complete collection of his works, if possible.

- He wrote a sequel called Sweet Thursday.If you're going to keep a copy of that, you'd best keep Cannery Row too.

- It's really thin. Like maybe 1/6 of an inch. Won't take up much space.

- It's often assigned in schools.

To be honest, I didn't like it very much. Now that I think about it, maybe the algorithm was on to something. :)

There's a lot of important works that aren't very likable, but that doesn't diminish their importance.

Maybe you hate A Tale of Two Cities or loathe A Handmaid's Tale, but it's important that books that make up the cultural foundation or provide relevant commentary on it are available.

If it was all about popularity the library would be jammed full of nothing but garbage by John Grisham, Dan Brown and Stephanie Meyer. It would be high-fructose all the way.

Eat your vegetables. Digest difficult, disagreeable things.

Honestly, if you're running an algorithm based solely on circulation numbers without taking into account at least one "Greatest 100 books of all time" list then you're overweighting immediate observation and underweighting historical consensus.

... Besides, what's the point of having books on shelves if you can't browse and find something great?

So, I'm a librarian. Our library uses what's called the CREW method [1]. It does a good job explaining the reasons for weeding and the parameters.

It's important to mention that a well maintained collection gets used more than one that is not reviewed. The "younger" the average age of your collection, the more it gets used. This isn't just because people don't like old books, but they don't like books that are yellowing, grey, worn, irrelevant, etc, etc. I recognize that libraries have many roles to play, but it is not the place of every library to play every role. A community public library should check out books, not store them. (There's obviously exceptions here. For example, my library maintains a local history collection).

Ultimately, for a public library there is much more going on than books. Wireless Internet, meeting rooms and simply its characteristic as a physical place (chairs, tables, etc.) are all part of why people support a library and having a new, clean and well maintained collection is part of increasing the library's appeal across all those areas.

1: https://www.tsl.texas.gov/ld/pubs/crew/index.html

> A community public library should check out books, not store them. There's obviously exceptions here. For example, my library maintains a local history collection.

I could certainly see the value of each community library having some space allocated to different "specialty" sections. That would make for a pretty rich loan inventory.

Absolutely. I've been to libraries with good specialized collections. Most of these focus on local interests or history.

I've also been to many libraries that just feel like dumps and are not using space well. They have masses of cookbooks or old magazines that they call a special collection. There's a fetish surrounding "the book".

In the end this guy is arbitrarily deciding which books have value and which don't, which should ring an alarm bell for us. We might agree with what he saved, but we also don't know what he didn't feel was worth saving. That's why as systemic approach is so important.

> His creation, the fictional Charles Finley,

FYI, that's the alias of choice for Sam Axe, a freelance vigilante in the TV series "Burn Notice" (https://en.wikipedia.org/wiki/Sam_Axe#Sam_Axe_as_Chuck_Finle...)

Chuck Finley also pitched in the MLB for 16 years[1]. The weirdest part of the whole story to me was that he gave his fake ballplayer the name of an actual ballplayer.

[1] https://en.wikipedia.org/wiki/Chuck_Finley

I do not understand why he loaned them with a fake name. He could just have made the loans in his own name. If they questioned his high loan numbers, he could have told them that he was compiling a bibliography, or researching typography, or something, in his own free time.

He went through a lot of trouble to make what he did illegal.

I doubt he knew this was illegal. Charging someone with faking a library card is about the same level as shutting down a lemonade stand for failure to have a business license.

And yet no discussion of what books were not carried that otherwise could have been without this distortion in data. It's fine to talk about the merits of what was saved, but it is a false discussion if there isn't some inclusion of what opportunities may have been lost as well.

Sounds like if shelf-space is at a premium, free ebooks for everything over say, 50 years old should be the norm. (Broken copyright law is the likely impediment).

Sure, keep a few physical copies for popular classics but ebooks are clearly the solution for the long tail, and so much easier to carry.

I don't agree. You can't browse an ebook the same way you can a physical book, and while I very much appreciate Amazon's 'you might also like...' suggestions, I only rate thema round a C+/B- grade. The great value of libraries and bookstores is discoverability as opposed to perfect organization (although that's also very valuable). I can think of quite a few books that I might not have encountered if some librarian or buyer hadn't bucked the prevailing taste; conversely on the rare occasions when I go into a big bookstore now I feel really alienated by the shallowness of the selection and the general lack of awareness among the bookstore staff about, well, anything.

I never encountered this attitude of 'books as product' until I came to the US. I remember going into a Borders and looking for some Dashiell Hammett books - I had been in San Francisco a few months, and Dashiell Hammett was a very famous mystery writer who came from and wrote about SF in the 1930s, so I wanted to learn more about the literary culture of the city I had moved to. the people in the store had never heard of him, didn't understand why interest in local authors would be a thing, and couldn't recommend any other bookstores because they were not really Book People, they just happened to work in a store that sold books but whose corporate culture required staff to refer to them as 'product'. This would be like going to a store that specialized in Computer Science books and having to deal with people who had never heard the names Turing, Von Neumann, or Knuth. Capitalism can be very corrosive of culture that way, Absent a reliable metric for quality, it can only rate popularity and novelty and so discounts long-term relations at the expense of short-term returns.

Amazon recommendations are often semantically on point but it will often recommend books that are merely derivative. 'Frequently bought together' is a great reminder to get some batteries or accessories for some electronic gizmo you bought, but that doesn't work so well for books, where reading one might alter your subsequent buying preferences onto a new path because you read the book and it changed your perspective in some fashion.

Admittedly a lot of my reading tends towards the obscure, so I can't blame Amazon for doing what they do if it works for the vast majority of their consumers. It's great when I want something very specific and I'm a happy regular customer, but I could happily spend hours in a second hand bookstore.

Sorry, we're talking about libraries (right?), many of which carry ebooks, and have non-infinite shelf space. I was also trying to make the point that anything over X years should be public domain.

Sure physical books have their benefits (as do ebooks), however I'd much rather have an ebook than nothing, because the book was thrown in the garbage due to lack of space.

And, not one comment here or in the story about associating book title viewing activity with a library account with personally identifiable information.

They could have used book activity records, I think.

I guess stripping personal information was too much work.

Books book culling reminds me of:


Canticle for Leibowitz

Fahrenheit 451

The Name of the Rose

At the same time new books keep coming out, either a library stops buying new books, continually expands so it always has more room, or disposes of some old books. I think an algorithm is a good guide, but I would trust the Librarians to be able to veto it's recommendations and keep some books that are not read often.

I know how it's played out at my local library. The shelves are significantly less full and there are fewer of them. What did they cull? Among the the books was Richard Rhodes' Making of the Atomic Bomb which won a Pulitzer. Elsewhere though, there's eight shelf feet of Orson Scott Card and twelve of Danielle Steel.

So they got rid of niche books that nobody borrows and replaced them with one of the best selling authors alive and one of the top SciFi authors? Sorry but that seems completely working as intended.

Libraries are for everyone. Not just people with niche academic tastes living in their ivory towers. Entertaining books and regular readers are literally what keeps libraries open.

> So they got rid of niche books that nobody borrows and replaced them with one of the best selling authors alive and one of the top SciFi authors? Sorry but that seems completely working as intended.

That's myopic.

It's far more valuable to have a place where you can find less popular, harder to find items than to have yet another place full of the popular stuff. That's not to say libraries shouldn't have popular stuff but that they should have a bias towards other things. If you want to read Orson Scott Card, you can find his stuff easily, not so much so for the stuff they're culling.

There are different types of libraries. At one extreme, we have archives, with an explicit goal of having a copy of every book and never throwing anything out; in the middle we have research libraries; and at the other extreme we have community public libraries, with a goal of bringing written material to the masses in whatever form the masses are going to consume it.

It's absolutely valuable to have a place to find obscure books... but a community public library isn't that place, and isn't intended to be that place.

It's absolutely valuable to have a place to find obscure books... but a community public library isn't that place, and isn't intended to be that place.

The Making of the Atomic Bomb is a classic, award-winning and relevant book. It's exactly the sort of thing a good community library should have, even if it circulates less than Danielle Steel. There is a big difference between what's important and what's popular.

Those Danielle Steel might provide some entertainment and comfort to people (nothing wrong with that). But The Making of the Atomic Bomb could launch a career or a movement, and possibly even -- we can only hope -- save the planet from destruction.

Truly obscure and unimportant books should of course be culled from circulation in a small community library. Their place is in research libraries, etc. But community libraries serve many purposes, the most important of which, in my mind, is to educate and facilitate discovery and personal growth.

Sometimes you have to give people what they don't know they need yet, not just what they think they want right now.

“A library outranks any other one thing a community can do to benefit its people. It is a never failing spring in the desert.” -- Andrew Carnegie

I don't think he was talking about romance novels.

But there's a arbitrary decision being made here. We don't know what this guy didn't decide to save. I don't necessarily agree that an algorithm is good, but there does need to be a systemic approach, otherwise you have a reflection of what one individual thinks is important and that's no better.

In your model, how do you expect anyone without a current academic affiliation to get hold of obscure books?

(Interlibrary loan is a significant partial answer here, but as soon as you take ILL into your model you realize that you want to curate the set of all books owned by all libraries collectively. If you locally optimize each library individually then you get too many copies of the common books and too few of the others.)

Optimization of all libraries is a losing battle. Better have lots of well used libraries, which then have budgets for larger collections and more collaboration. Part of a well used library is community support and a collection that serves the communities needs. This leads to bigger budgets, and so on.

A public library is there to serve its community. Participating in ILL is part of that mission, so reciprocating is also necessary. But the library has to optimize for its customers first and others second.

Interlibrary Loan aside, if you live anywhere near a university with an academic library, you can usually buy a membership even if you aren't otherwise affiliated with the university. Locally, I know both UNC and Duke offer programs of that nature. I have previously kept up a UNC library card so I can use their math/CS library. Which reminds me, I should probably go renew that... I haven't been over there in a while.

Also, used bookstores, and Internet retailers like Amazon, Ebay, etc. Not all books that are "obscure" are necessarily expensive in the "rare, hard to find, valuable" sense (although to be fair, some obviously are).

Also, don't forget that community colleges also have libraries, and they're probably a lot more lax than large universities. For instance, one local community college even partnered with the city to create an integrated community-college library.

Yeah, the algorithm probably needs adjustment and flexibility.

As long as a book is available within reasonable time via inter-library loan, the library system hasn't lost the book. The algorithm (or more generally, "the practice") probably needs a "consolidate" outcome in addition to "drop."

> As long as a book is available within reasonable time via inter-library loan, the library system hasn't lost the book.

That's not entirely true, as there is a significant difference between having a book browsable on the shelf and havingnit borrowable with substantial latency via ILL, so there is a loss even if there is still some access available.

Inter-library loan is a distributed peer to peer system. There's no reason to believe that other libraries won't cull the same book...and culling from one library could be considered empirical evidence that it might be culled from other libraries.

Have you considered cold storage?

Every library weeds books.

All those books are fairly recent. What if libraries took your advice in the late 50s, quit culling, and said "Oh that new sci-fi thing is a crap fad, we only stock the classics." You probably wouldn't even know about these books existing. Maybe a token Asimov book or two, space permitting. Not the hippie stuff you posted (from the perspective of some old librarian in the 50s).

While I can certainly sympathies with the Librarian in this story and agree that human judgement should trump an algorithm in what books a library stocks the method used here seems inappropriate and counterproductive in the long term.

I agree. Library stocks really do need to be managed (nobody claimed any one library has to be "complete"), and circulation numbers are the strongest signal that a book is valuable. Working around that to "save" books when budgets are limited is actually zero-sum and will have consequences elsewhere in the system.

circulation numbers look at recent circulation, with no consideration of the way certain books may change in popularity. Here we have a Domain expert without financial motive who is saying that the algorithm is inefficient.

Algorithms should inform book culling decisions made by the Librarian, but they shouldn't replace the Librarian's knowledge and expertise.

I don't use our local library system much because they stock few or no copies of the things I'm interested in getting from them, like non-fic outside the poppiest of pop-non-fic—not even hardcore academic stuff necessarily, just not-garbage non-fic—math/computing books, and semi-obscure classics or translations. It's usually a waste of time to even check for those sorts of things there, and they don't even seem to be part of an inter-library loan program that can provide them.

How can circulation numbers take into account people who aren't using the library because it already doesn't have anything they want? I can't be alone in this, but simply looking at circulation numbers won't capture that, and piloting with a book or two from those categories won't get me back in (I probably won't notice because, after many disappointments, I've mostly stopped looking). Let an algorithm go nuts on that data without human judgement and I suspect you'll gradually put more and more people in my position, unless it's a damned smart one.

Any chance they have an interlibrary loan system? I can get all sorts of books from a variety of library systems (including the main state university libraries) delivered to my local town branch.

Interlibrary loan is great, not least because it's given us WorldCat, but it's no excuse for having a crappy collection. Libraries should of course add new things constantly, but the library is not just a lending club for poor people; it's meant to function as a repository of renewable intellectual content, and provide people with the opportunity to encounter ideas they might not otherwise.

We do a few things:

1. Regular strategic planning which includes a community survey. This is conducted by an outside firm and non-users are targeted as part of the survey. We also do focus groups.

2. Interlibrary loan. You local library can likely get titles from other libraries. This should inform the collection.

3. Patron driven acquisition. Some way of suggesting books should be available. We buy most suggestions and find that books that are suggested circ better than ones selected by staff (generally, anyway).

and circulation numbers are the strongest signal that a book is valuable

I disagree. A book might go out once and chance someone's life. Another might go out a hundred times and sit unread, or have a negligible impact. An algorithm can't determine this. A good librarian can.

circulation numbers are the strongest signal that a book is valuable

LOL no, by that logic The da Vinci Code or Harry Potter books would be literary landmarks. Now, I enjoy a good yarn as much as anyone (and Dan Brown and JK Rowling write great yarns) but it's not like books and the book selling industry are all a giant intellectual meritocracy, marketing and commercial leverage have a lot to do with it. Certainly popularity is a useful signal, but things can be popular because they appeal to a lowest common denominator rather than because they're of great quality.

Let's take JK Rowling's Harry Potter works as an example. They're excellently crafted in terms of narrative pacing, clear characterization, and engaging storytelling. I enjoyed them hugely when I read them a few years ago, and happily spent a week or 10 days immersed in the wizarding world. Anyone who likes reading can enjoy them, and aspiring writers can learn plenty from studying them. At the same time, they're very formulaic and recycle lots of common fictional tropes --- indeed that's partly why they're worthy of study, - their very shallowness makes it easier to spot the structural features. But we would be making a mistake is took the excellent qualities as the standard of value, because the popularity of Harry Potter and its many imitators in the marketplace would have a tendency to crowd out books that excel in other areas where works of popular fiction are deficient.

It's fine for publishers and bookstores to pursue the public taste wherever it leads, but the function of the library is provide more rounded and balanced offerings whose value will span generations rather than only focusing on what's most popular. There's a lovely little library a few blocks from me that I thought I would spend one day a week in when I first saw it, but I hardly ever go there because the selection is abysmal. It's full of multiple copies of things that used to be popular, and woefully lacking in good things that were never very popular. A library's primary purpose is to be full of good books so that even if you can't find a particular work you hoped to read today (perish the thought that one might have to wait for anything...), you will have many high quality alternatives.

This is only possible of you allow librarians to do their job as curators. They should not be required to distil years of complex and highly individual literary and organizational knowledge into the sort of simplistic metrics that appeal to professional administrators. Of course, part of the problem is that some people see the intellectual and cultural diversity of libraries as a cultural threat that has the potential to undermine the status quo, and thus attack the function and independence of publicly-funded libraries in an effort to curtail rather than curate the spread of knowledge.

Well if you take people's autonomy away they're going to find other routes to the same objective. Librarians aren't shelf-stocking robots, but the law as written devalues their professional contribution in favor of a foolish consistency.

> and agree that human judgement should trump an algorithm in what books a library stocks

Why? People are much more biased than machines. If this algorithm is well written and unbiased then we may have a case of someone saving books that speak to their biases. What if it was revealed it was books about young earth, anti-vaccination, promotion of Islamic extremism, and white supremacy? Would you be so welcoming of these rebel librarians?

The nice part of the algorithm is that, in theory, it should just be looking at circulation numbers and culling appropriately. Regardless, this is all a stop-gap measure until we can get every library digital so we don't have to worry about shelf space.

There's no such thing as an unbiased algorithm. Machines just amplify the biases of their users. Every choice reflects implicit values and assumptions. More so when you're trying to develop an objective function around the unknown preferences of future library users. What is popular today may not be tomorrow.

And you should really be asking yourself whether circulation numbers are a good proxy for the real value that people derive from the collection. My mother used to borrow ten books at a time and barely read any of them. Building a rigorous utility-maximizing model here is impossible. Human judgement is required.

A library is expected to have a social value beyond just supplying books, this gives it an intrinsic bias by design. And if a library is intended to be comprehensive some books will always be less popular. Librarians are trained to make those kind of disinctions.

I agree that this is certainly a concern, everyone has a bias and we should be aware of it. On the other hand this bias isn't always bad. A librarians bias presumably comes from reading hundreds of books and years of education. If they think an unpopular book is worth saving presumably it is because the book has a high value for the few people who do read it.

It also comes down to trust in the librarian, presumably in the interview process you try to make sure you are not hiring an outspoken Nazi. If you do hire a Nazi librarian they have many other ways of injecting their bias by inviting speakers, recommending books etc. so it is still going to be an issue even if they aren't picking what books you have.

An algorithm that only takes into account popularity and not reviews nor importance of the work to literature or to its technical field contains its own biases.

Says who? It could cross-reference a list of important cultural works and make them exempt, for example. That list can be compiled by a committee instead of a lone guy who may have an agenda.

But the whole point of librarians is to allow that to happen, rather than thinking there's one perfect list of books that everyone should just copy from. For sure you would see lots of bias - which you could then measure and challenge, but which could also turn out to be valuable.

It's odd to me that you can't see the inherent bias in saying that popularity is a proxy for literary value and designing your algorithm around that. Sure, it'll give you an unbiased insight into what's popular, but that assumes that libraries are in the business of catering to the popular taste, when we have a bustling private sector to do that for us already, which takes account of the fact that a great deal of the publishing industry's output is disposable and of only short-term interest. I mean, McDonalds is arguably the world's most popular restaurant chain, but that doesn't mean their food has significant gastronomic value, does it?

What is in the public interest, and what the public is currently interested in, are two wholly different things.

Explicitly favoring recent checkout counts over the other criteria is a bias. This is almost a tautology.

Choosing based on criteria you favor rather than equally valid or possibly more valid criteria others favor is a bias. It does not matter if you encode that bias verbally, in print, or in software. A bias is a bias is a bias.

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact