
Ask HN: How can I automatically scan and catalog a mountain of books? - cconcepts
This really kind, eccentric guy in my neighbourhood is stockpiling books and has been doing so for years. He has an enourmous barn that he is obsessively filling with whatever reasonable quality books he can get his hands on but he is completely overwhelmed in terms of cataloging&#x2F;indexing them so customers have to go through his barn sifting through cartons full of books. He charges $1 or $2 for whatever book you dig out.<p>He buys bulk lots from deceased estates and bookstores that are closing down. Entire shipping containers are being gifted to him and showing up at his barn. The barn is full and he is now storing in shipping containers outside.<p>There is great quality books among this quagmire but it takes hours of searching to find them. I figured HN might be able to point me to a solution where I could quickly photograph the front cover and have a script&#x2F;google images compare the image to online info to index the title and author and then perhaps list them online...<p>I dunno, it just seems like such a treasure trove of books that he will sell for practically nothing because he loves books and hopes that they will find their way to people who want them - the barrier is allowing customers to find what they are looking for.<p>Thoughts?
======
walterbell
Please, please tag each batch of books with a unique lot number, so they can
be associated with a specific estate or deceased bookstore. One or more humans
spent a lot of time curating those collections. If one of the lots was well
curated, then anyone who finds a book that they like will want to see the
other books in that lot.

Source: people who have spent years trying to find the names of the 8,000
books in R.A. Lafferty's personal library, lost after he died. About 300 title
names have been recovered, [https://www.ralafferty.org/tulsa-
books/](https://www.ralafferty.org/tulsa-books/)

~~~
kej
This would also be good when you inevitably find letters or photographs tucked
between pages.

------
pjc50
One of these days I need to write my essay titled "Rubbish has no SKU".

I've seen a few of these, and the basic minimum difference between "pulp
waiting to happen" and "bookshop" is basic shelving. Different shelves by
category: fiction vs non-fiction and their subdivisions. Within the shelves,
alphabetise. Now it's possible for browsers to actually find things. When you
put them on Amazon this will also help find them for shipping.

This process will also help you find the stacks of duplicates. You'll have a
crate of 50 Shades and Twilight and Stephen King. The Stephen King will
eventually resell; the others won't.

This page from the excellent Barter Books on their acceptance policy may be of
some help:
[https://www.barterbooks.co.uk/html/About%20Us/Incoming%20Boo...](https://www.barterbooks.co.uk/html/About%20Us/Incoming%20Books.php)

~~~
EnderViaAnsible
While I largely agree with your sentiment, I'd like to note that alphabetizing
only assists those who know what they are looking for.

In a bookstore, this is a virtue. But we appear to be dealing with a book
barn. Perhaps the patrons of book barn have not wandered in by accident while
searching for a bookstore :)

For OP: I think you might be better off photographing the ISBN and then using
a service or script to do a lookup and associate that with a cover and title.
Various editions might not have their covers recorded in a database, and
titles will give many dupes, but an ISBN will uniquely identify a book.

Additionally, OCR'ing stylized text is problematic. I wouldn't expect easy
reading of covers, particularly of used books.

~~~
mongol
The challenge is, if the catalogued book is not immediately associated with
where to find it if retrieving it later, all is in vain...

~~~
jimnotgym
You need to stick a barcode on the location (shelf, box, etc) like they do in
a warehouse. Scan the book, check it is correct, scan the location to book it
in. Doesn't have a barcode, put it in another area for later. I wrote a sketch
of exactly this once...

------
vessenes
Oh, I am very pleased to see this request, and I may have some actual help for
you.

A number of years ago a west coast startup spent quite a lot of time on a
product that could identify books by their spine _image_ which I think is what
you want here; finding isbns and barcode scanning them is totally impractical
at this scale.

A few months before they closed up shop, I introduced them to Brewster kahle
at the internet archive and convinced them to leave a copy of their database
with the internet archive.

I have no idea what happened after that, but I believe they did send the data
over. Machine learning is vastly different today than when they launched, and
even back then they had enough data that they could get 10/12 of my books in a
single photo; I really encourage you to get in touch with the archive.

The company was called bitlit then Shelfie.

As a side note I got interested because I thought it would be great to get a
spine image as an api for rendering my ebooks as a library in vr/ar - I still
think this would be cool.

~~~
libguy
likely the dataset never became publicly available

[https://www.theverge.com/2017/4/9/15235686/kobo-acquired-
she...](https://www.theverge.com/2017/4/9/15235686/kobo-acquired-shelfie-app-
readers-discounted-ebooks)

~~~
walterbell
After installing the kobo app, there's no sign of the Shelfie technology.

------
crispyambulance

        >  the barrier is allowing customers to find what they are looking for.
    

I am really glad that there are people like the old man who are willing to do
stuff like this and people like yourself who are willing to help.

The real barrier, I think, is a bit more complicated than just being able to
find stuff. It is also the fact he will be running out of space and that as
more and more people find what they want the undesirable stuff (that no one
wants) will just keep growing. There does need to be regular culling, I think,
to keep weeding out the duplicates or books that no one wants. Also, there
needs to be some effort to discover and sell the really valuable books which
could produce occasional windfall funds to keep the endeavor going.

"The Book Thing" in Baltimore
([https://bookthing.org/](https://bookthing.org/)) which I have visited many
times seems to be tackling this problem. It's basically a "free book"
exchange. Massive. In a warehouse. It is a fairly popular place and is run by
an interesting eccentric fellow with very particular ideals. I would recommend
see how they do this stuff.

As for thoughts, I think that regardless of what he does, he will need one or
more employees (or dedicated volunteers) to actually perform the indexing and
physically organize the books.

------
profsnuggles
I would check out
[https://www.librarything.com/](https://www.librarything.com/) also. They have
a decent app for scanning barcodes and retrieve data from multiple sources.
Their own database which consists of lots of imported marc records from
university libraries I believe, library of congress, amazon, etc. Then they
have another project librarycat where you can set up your books as a lending
library.

Cataloging a large number of books is not going to be an easy process unless
they are all relatively new popular books. According to librarything my
library is 439 books, every few years I delete my catalog and re-import them
it takes about a full weekend. Older books don't have barcodes, old paperbacks
have the ISBN barcode on the inside cover. Some books don't have ISBN numbers
or Library of Congress numbers. So you will still end up doing a fair amount
of manual entry and searching.

~~~
DaveWalk
Came here to also recommend LibraryThing. It's been around for 14 years, and
the community has lots of similar situations to OP's. Probably a good place to
look up anecdotal evidence for this kind of project, too.

LibraryThing also hooks into
[TinyCat]([https://www.librarycat.org/](https://www.librarycat.org/)) which
can make your database more like a library's.

------
flurdy
It may very expensive in time and resources to scan it all even if it is just
the covers. You need to work out how long it takes to fetch a book from the
barn or container, flatten/unbind if necessary, scan the cover, rebind, and
put back. Then multiply by how many books...

I worked at a small startup in the early 2000s that somehow got massive
contract to digitalise a Middle Eastern Oil & Gas company's very very
extensive documentation library. We had an e-learning product where you could
use a scanner to digitalise a printed book into online documentation.

Demos of scanning a book or two was really impressive. So surely scanning more
than a million books/manuals/charts will be just as easy. Not quite.

Think we calculated it would take years as the bottleneck is the manual
unbinding and re-binding before and after scanning. Scale that to a million
and it was not the 2 months project initially forecasted. Buying more scanners
and hiring more local staff scaled that part horizontally and improved the
speed but still a long project.

However the client "forgot" to pay us for a few months, the bank and our
accountants forgot to check and we went bankrupt soon before we really got
started. Though at least I got a trip to the Middle East for a few weeks.

~~~
heraclius
I think “scan” here was in the sense of “scan the ISBN to catalogue the book”,
not scan the insides. Since many of these are old books, many of them will not
be perfect bindings. Scanning their contents therefore either requires opening
the books and moving the pages (as Google Books did) or cutting the spine,
which would be highly unhelpful since that would preclude any rebinding of the
books into anything other than a perfect binding, reducing strength,
repairability, and the ability to fold flat.

------
Freak_NL
Zotero (the free software reference manager) hooks into a bunch of online
catalogues. You can use Zotero to manage books (I manage my own collection
with it, but that's just a small personal home library of around 1000 books).

If a book has an ISBN, often Zotero will manage to find it using the magic
lookup button. Just enter the ISBN (DOI's work too!) and it will usually find
the book you meant. That covers about 90% of books with an ISBN.

The rest would have to be entered manually.

Zotero is not a full-blown inventory manager, but it may suit your needs.

------
1k
Get a barcode scanner, scan the ISBN and use that to do a API query on Amazon
to retrieve title, author, category, price, etc. Store this info in a DB.

Some scenarios:

1\. Generally lookup should return something. Store these book by categories,
e.g. business, children, fiction, etc. in their shelves/containers for
physical browsing by your customers. The more subcategories you can do the
better.

2\. If price is bigger than some threshold then store these books privately
and list for sale directly in an online marketplace. There’s an industry
around book scalping (forgot the actual term) where traders buy books from
fairs and sell online based solely on margin.

3\. The lookup returns nothing - these books are probably very valuable or
worthless. Some manual action required.

I was actually considering doing something like this for remainders before,
but never got it going. I’d love to know more about your eventual solution.

~~~
rdsubhas
I'd highly suggest Calibre. [https://dearauthor.com/ebooks/dear-jane-
ebooks/dear-jane-can...](https://dearauthor.com/ebooks/dear-jane-ebooks/dear-
jane-can-i-use-calibre-to-manage-my-paper-books/)

------
nsomaru
Goodreads has a scanner in their app (on iOS/Android) that can scan covers
although for some reason it automagically adds those books into a "to-read"
shelf but I guess this isn't a problem for you if you create an account for
the purpose.

The API is severely rate-limited (1rps), non-standard oath and badly
documented, but you should be able to get some xml out of it and parse that
however you'd like.

~~~
rossdavidh
I'm wondering about the option of having a (cheap, not worth stealing)
smartphone, with the Goodreads app on it, logged in to the Goodreads account
for this place. People who come in are asked to use it to scan 10 books, and
move them over to the "scanned and catalogued" shelves, in addition to paying
the $1 fee to get a book. Most people would be fine with that, and over time
you get it all scanned.

Goodreads cannot recognize anything, but since it works on either book cover
or barcode it will work on lots.

------
emmanueloga_
A barn full of old books sounds like the perfect breeding ground for all sorts
of bugs... I had an acquaintance that had a problem with bed bugs that
apparently started when she got books off of those "Take one book" boxes
people put on the front of their houses.

May be worth for your neighbor to check that sort of thing too. Apparently
there are dogs that are trained to sniff bedbugs... those furry guys can sniff
anything :-) [1]

1: [https://www.nytimes.com/2012/12/06/garden/bedbugs-hitch-a-
ri...](https://www.nytimes.com/2012/12/06/garden/bedbugs-hitch-a-ride-on-
library-books.html)

------
bloak
If they are recent books (from about 1980) then they probably have a barcode
on the back cover, so use that. My guess is that it won't be worth trying to
automatically recognise older books from the cover: a lot of them had a dust
jacket, that goes missing, and a cover under the dust jacket that is not at
all distinctive. The title might be on the spine, but how many online images
show the spine clearly?

~~~
kabdib
I tried doing a book catalog about ten years ago. I got about 80% recognition
rate by using multiple numbers (ISBN, and the Library of Congress number) and
multiple online data sources. It was a pretty slow process, to the point where
simply keyboarding the information was easier and less error-prone, and I had
to manually enter the books that didn't get any online matches anyway.

Definitely not a "scan/beep/scan/beep" kind of thing. More like "scan . . .
uh, scan . . . scan, damn you, SCAN I say! (beep) Okay . . . now the first
problem is that 'The Sands of Mars' which I am holding is definitely not
'Great Montana Flapjack Recipes' on B&N, let's try the library of Congress . .
. . nope, not 'Annals of 1959 Steelmaking', so (tap tappity-tap...)"

~~~
Kim_Bruning
The last time I looked at this (admittedly quite a while ago) the book bar
code contained the ISBN.

What was causing the mismatches? Bar codes that did not contain the ISBN? Non-
unique numbers?

~~~
mcguire
You would be surprised to see how many books have the wrong ISBN or have a
mismatch between the barcode and the ISBN. Not as much of a problem with major
publishers now, but some from the 80s and 90s were hilarious.

------
MayeulC
This sounds like a use-case inventaire.io ought to support. I'll try to ask
them about it. They use wikidata for filling up book metadata.

Otherwise, as stated elsewhere in this thread, Zotero can usually find books
with very little information:ISBN or title. It might be worth trying to set up
an OCR with it.

In any case, if you go to the length of taking a picture for each book, you
might as well save them and make the dataset public, for OCR training purposes
(and a second pass). There is also the mechanical Turk option if you go this
way.

And as someone stated already, you should plan the physical layout in advance.

~~~
maxlath
yep, inventaire.io could help there, to some extent: they could scan books
barcode in bulk from the webapp
[https://inventaire.io/add/scan](https://inventaire.io/add/scan) , which
should find data for most books. But then it gets tricky for books without
barcode/ISBN has they would probably have to fill the data manually, which can
be quite some work for large inventories. No plans to add OCR, yet ;)

------
cconcepts
Wow, judging by the response this is a problem a lot of people think about. Am
overwhelmed by the helpful info. Obviously have to start at the low hanging
fruit as I am working with non-technical people and am relatively non-
technical myself. I just tested LibraryThing and it seems very fast and
accurate so will give it a whirl.

Again, thanks HN for the overwhelming response.

~~~
ordinaryperson
I use the paid version of Books from Sort It Apps:
[https://itunes.apple.com/us/app/book-list-library-isbn-
scann...](https://itunes.apple.com/us/app/book-list-library-isbn-
scanner/id476621639?mt=8)

All you have to do is hold the phone over the bar code for a second and it
automatically downloads all the relevant information. This is by far (IMHO)
the fastest way to catalog a mountain of books.

Not affiliated with the app or company in any way, just a happy user.

------
ghr
[https://www.reddit.com/r/DataHoarder/](https://www.reddit.com/r/DataHoarder/)
and
[https://www.reddit.com/r/datacurator/](https://www.reddit.com/r/datacurator/)
are good resources for this kind of thing.

------
mikepurvis
Surprised to see no mention of AbeBooks yet. We have an indie bookstore in
Waterloo which is integrated with them and it seems to work pretty well. He
tells me he still does most of his business in the IRL shop, but there's a
steady stream of people buying online as well. Plus, it's nice for him to be
able to quickly check how many of something he already has before committing
to buying a bunch more of them. See: [https://www.abebooks.com/old-goat-books-
waterloo-on-canada/1...](https://www.abebooks.com/old-goat-books-waterloo-on-
canada/1611996/sf)

I'm not sure what options there are for hardware integrations, but Abe
provides at least online inventory and ordering capabilities. I assume if you
had a barcode scanner capable of acting as a USB keyboard and entering ISBNs,
it would go pretty quickly.

------
rdl
My plan for books is to pull the rare/valuable ones, then subscribe for the
$100/mo 100 book/mo plan at [http://1dollarscan.com/](http://1dollarscan.com/)
and send them all the rest, produce PDFs, and pulp the books. I have maybe
3000 books in storage and this would be preferable to anything else I've
found, as I ultimately would rather consume them electronically.

~~~
walterbell
1DS $100/month is about 30 books of ~300 pages (3+ "sets" of 100 pages,
rounded up).

Scanning is only worthwhile for books which are not already available in
electronic form ... somewhere.

If you bought from Amazon, there's sometimes an option to get the ebook
cheaply. Archive.org has many books. There are also e-books at public
libraries, so it may be enough to keep a list/photos/calibre of all your
titles and discard rarely-accessed books.

------
good-idea
A lot of people are suggesting querying Amazon for ISBN data - another option
is the ISBNdb API: [https://isbndb.com/](https://isbndb.com/) There's also the
OpenLibrary API (from the Internet Archive) which may include some more info
[https://openlibrary.org/](https://openlibrary.org/)

------
8_hours_ago
Don’t forget about the Dewey decimal system. For the books with ISBNs, you can
sort them into boxes by their Dewey decimal. If you don’t have time manually
categorize the books without ISBNs, they can be put into “other” boxes and
left unsorted

~~~
dredmorbius
Library of Congress catalog information is more generally available, by ISBN
if not already on the copyright page. For any conventionally published book
since 1970.

[http://eresources.loc.gov](http://eresources.loc.gov)

------
achenatx
For books with an ISBN. Some of these can scan the cover

[https://www.collectorz.com/book/isbn_database.php](https://www.collectorz.com/book/isbn_database.php)

[https://bookriot.com/2016/01/14/8-reasons-catalog-
books/](https://bookriot.com/2016/01/14/8-reasons-catalog-books/)

[https://www.goodreads.com/blog/show/913-goodreads-hack-
scan-...](https://www.goodreads.com/blog/show/913-goodreads-hack-scan-a-book-
cover)

------
GnarfGnarf
Photograph the books, a dozen at a time. Put a box number label next to the
books. Put the books in their box, glue the label to the box. Stack the boxes.

Sort the boxes by height, line them up in a row. Put slats of wood between the
rows to distribute and stabilize the load.

I wrote a program to automatically generate simple HTML files to display the
images. See sample:

[http://kyber.ca/b/index.html](http://kyber.ca/b/index.html)

Use OCR to digitize title & author..

~~~
GnarfGnarf
Hire Mechanical Turks or cheap offshore labor to type titles & authors from
the photos (no need to ship the books).

------
callmeal
When I did this for my (admittedly medium sized) collection, I used Booxter
([https://www.deepprose.com/](https://www.deepprose.com/)) and a cuecat
scanner to catalog all those books.

Was a simple process of having enough boxes and labels, and I did that anytime
I had some free time. Scan a bunch of books, drop them in a box, slap a label
on the box, wait for booxter to find and fetch the metadata, update the label
in booxter and repeat.

Will take time, but is easily doable.

------
52-6F-62
Where is that? I really would like to pay a visit...

That story also reminds me of this fellow (who actually might get me to the
middle of nowhere SK): [https://www.macleans.ca/news/canada/canadas-most-
inconvenien...](https://www.macleans.ca/news/canada/canadas-most-inconvenient-
bookstore-is-a-treasure-on-the-prairies/)

(Edited to add Apple News links without ads if anyone uses it:

Free version: [https://apple.news/Ar3trUQ-
cR7C-c9L3YzPYjg](https://apple.news/Ar3trUQ-cR7C-c9L3YzPYjg)

Issue version:
[https://apple.news/AQD4nDgB4SKi6yTcZy60tPw](https://apple.news/AQD4nDgB4SKi6yTcZy60tPw))

There seems to be a ton of relevant help in this thread and that seems
exciting.

Like someone pointed out—something like OCR might be a best first step as it
seems like a data entry task at a glance.

It does sound like there may be a significant amount of physical, tedious work
involved no matter what software solution you find. Sometimes you have to
accept that aspect and push through. Your best bet might be to recruit some
physical help there—start a fund or a labour drive or something. Recruit book
lovers, etc. Seems worthwhile. Maybe he would donate books to helpers.

------
Adamantcheese
Use one of the solutions listed below, but you HAVE to do sorting on the fly.
You need to have places to put books and sort them by some general genres and
you HAVE to throw out books that aren't worth the time due to damage or any
other reason a book would be deemed a recyclable. With that many books, a
proper library style cataloging system may be your best bet.

That being said, if you do want to do image comparison for covers, books
without covers usually have a copyright page with most of the info on it. Use
that to determine what a book is when the other method fails. Throwing
together some cheap bookshelves with plywood and 2x4's will greatly help with
the finding part, but while scanning use some big bins to do a rough sort.

And I can't stress enough you HAVE to throw out books. It's clear that there's
a space issue and if he's willing to get them for free but has a hard time
getting rid of them, that's hoarder behavior, not just eccentricity.

------
tsjq
that'll need a bunch of volunteers / friends to help with this work. also,
check this podcast episode. might get some info / contacts
[https://www.npr.org/sections/money/2014/11/10/363103753/text...](https://www.npr.org/sections/money/2014/11/10/363103753/textbook-
arbitrage-making-money-off-used-books)

once you've started this sorting / cataloging work: request visitors not to
reshelf the books. have a central location (table / bins) for them to put the
books, so the volunteers / barn-man can keep back in the right shelves.

also, what exactly is the barn-man's objective? just collect books and not
bother about further? or, be the most helpful to book-lovers? or, make good
money from these books ?

------
teddyr009
simple approach would be to ask book lovers around the locality to volunteer
with this task. Borrow some barcode scanners and computers. Give'em whatever
books they like and it's kinda get-together for bibliophiles.

------
cik
Been there... since my library is now a little over a thousand physical books,
and in multiple languages.

If you have a Mac, get a copy of Delicious Library ([https://www.delicious-
monster.com/](https://www.delicious-monster.com/)) and a compatible barcode
scanner, like the Flic.

If you have an Android phone, and you're happy with dealing with your phone
and CSV export, you'll probably be okay with Libib
([https://play.google.com/store/apps/details?id=com.libib.app](https://play.google.com/store/apps/details?id=com.libib.app)).

The biggest issue is that there are _tonnes_ of books (especially if like me
you have older ones) that predate ISBN. That kinda sucks - but it's life.

~~~
gshdg
A lot of the older ones can be looked up by their Library of Congress number,
IIRC from digitizing and barcoding a school library’s catalog a couple decades
ago.

FWIW, that effort took a half dozen people about 3 months for somewhere on the
order of 50,000 books.

~~~
cik
100%. But given that I want automatically - the barcode scanner integration
with varying tools just doesn't do it.

The flip side is that I have books in English, French, Hebrew, and Arabic. The
religious Hebrew books don't have ISBNs, unless they came from a North
American publisher. The same is true of the Arabic books I have. The French on
the other hand went ISBN furiously - or at least enough that it scans true :)

~~~
gshdg
What percentage of each? It sounds like this book barn could use some
organization in general, and sorting by language might be a good first step.
Then you can scan the English and French as lower hanging fruit, and maybe
recruit extra help to deal with the Hebrew and Arabic manually, or something?

------
bartimus
Sell 5 random surprise books for $15 (ex shipping). Include a box to send back
any books they don't want (or any other book). Process returned books (take
pictures, put barcode, register title+author). For the next order give $3
discount for every book they sent back previously.

------
jccalhoun
It can definitely be done but I don't know the details.

A couple times a year the local Half Price Books Outlet does a "fill a bag for
$20" event and every time there are at least a couple people there with
shopping carts full of bags of books.

They have dedicated bar code scanners attached to their phones and will scan
books at around 1 a second. I don't know what software they are using but
clearly they are looking up prices to see what they can get to sell for a
profit.

I use goodreads to keep track of my own book collection and using the camera
and the goodreads app usually takes 30 seconds plus to focus on the bar code
and then to look it up. So whatever they are using is much faster than that.

~~~
slm_HN
Yep, I've seen the exact same thing at my local used bookstore on normal
business days. Someone with an ISBN scanner methodically scanning every book
and buying whatever the app told him.

------
thaumaturgy
First, is this really a problem that needs to be solved? Personally, his place
sounds like my favorite kind of book shop. A lot of bookworms prefer wandering
through dense forests of precariously-balanced piles of books. Is he getting
those people, or is he getting people that are expecting Barnes & Noble?

If they really do need to be cataloged, then the next thing is to forget all
about trying to inventory the entire thing. Instead, you're going to partition
the collection into "easy to catalog" and "hard to catalog": pick a section of
the barn and make this the organized area. Get a barcode scanner
([https://www.newegg.com/Barcode-
Scanner/SubCategory/ID-583](https://www.newegg.com/Barcode-
Scanner/SubCategory/ID-583)) and throw together a quick API client that'll
take an ISBN and display a title, author, edition, and picture. If it comes up
correct, great: book goes into the cataloged section. If it doesn't, it goes
somewhere else. Make it really simple, so that a single keystroke can accept
that book into inventory.

Grocery stores have to regularly inventory everything on the shelves. I worked
for an outfit once that wanted to do it all in-house, so we bought the
commercial Telxon handheld wireless devices and I set about figuring out their
software. Turned out that they just wanted to speak basic telnet to a server
at a pre-configured IP address, so I put together a sloppy little telnet
server interface and staff were able to count the entire store right on the
devices in a few hours. That's way more complicated than what you'll need to
do, so, y'know, your thing is doable. You'll have the added benefit of free
online book databases and better hardware and easier-to-hack-together
software.

Also might not be a bad idea to talk to your local librarian. They're book
nerds too and he or she might have an actual library science degree. This
would be right up their alley.

~~~
sseagull
> Also might not be a bad idea to talk to your local librarian. They're book
> nerds too and he or she might have an actual library science degree. This
> would be right up their alley.

I second this. Also, maybe check with university libraries or university
MLS/MLIS programs (Masters of Library Science). This is not a new problem, and
they would be aware of existing tools/methodology. Also, maybe you could get a
grad student/intern to help.

~~~
nik61
Smartphones of both flavours can load cheap or free apps that are quite
effective enough to read barcodes and identify books. Librarything and its
various catalogue tools can help with the metadata too. That said, the advice
to get specialist help is well-founded.

------
westondeboer
I am also grading books in my kids library. They don't have a librarian and
they have a stack of 1,000 books that have been donated.

I am grading them by reading level A-Z. Currently I am googling the book and
then adding "reading level" to the end and then if it has it, it will show up,
or I can find the Lexile number and use that as a grade also. I am using the
speech to text command in google, so it doesn't take that much time.

This is a hassle and am looking into other ways to speed up this process. And
or get other parents to volunteer if it was an easier process.

------
sandreas
You also ask at the forum of

[http://diybookscanner.org/](http://diybookscanner.org/)

Perhaps there are users with experience...

~~~
BeetleB
I second DIY Book Scanner. I don't own one but I did scan a book once with it.
The scanning part is incredibly fast. I'm sure there are solutions out there
for creating a PDF (with OCR).

~~~
sandreas
Once upon a time (years ago) i wrote a tool called bookbuilder, which did
exactly this :-) Take a bunch of camera photos of a book on dark background,
find the edges and extract the text part :-) I'm not sure, it is still
working, because it was java 6... but it is using the excellent boofCV library
([http://boofcv.org](http://boofcv.org)) If you would like to try it, give it
a shot (tesseract 3 is needed):

[https://mega.nz/#!EVxA0ZoD!6Uy5A4HexJbewXqkOLsW-4sj5IO5LOGef...](https://mega.nz/#!EVxA0ZoD!6Uy5A4HexJbewXqkOLsW-4sj5IO5LOGefTfN0z-acQE)

java -jar bookbuilder-0.2.jar --input-path=input --output-
file=output/output.pdf --ocr-embed-layer --rotation-degrees=180

------
influx
There’s a whole universe of folks selling used books on Amazon FBA. I suggest
start with a google search of exactly that.

There’s apps which allow you to scan UPC codes and look up a price on Amazon.
I’d personally sort the books by market value. Sell the books that are
profitable, trash the ones which are not, save the ones which are very rare or
have no UPC code, and use the money to grow the storage space.

~~~
phonebanshee
Except you're talking about a guy who seems to be happy getting containers
filled with almost certainly worthless books. I doubt his values and yours are
aligned.

------
niedzielski
I have a related problem on a much smaller scale (only a few hundred books) in
that I wish to make full digital copies of my books. I reached out to
Archive.org but they can't use them due to copyrights. I'm looking into
[https://1dollarscan.com/](https://1dollarscan.com/) but it's a destructive
scanning technique.

------
xiconfjs
While we are a bit on this topic: is there a alternative to calibre [1] for
managing a shitload (50000+) of ebooks which is still performant? Specially
ebooks which have no ISBN (PDF, whitepapers, etc) which only information about
them is in their EXIF file data.

[1] [https://calibre-ebook.com](https://calibre-ebook.com)

~~~
walterbell
Performant for which function? Searching? Parsing metadata from imported
files? Manual editing?

------
qubex
I raised the same question several years ago on this very forum. I got some
good answers but none that ultimately satisfied my needs, but they could be
useful for you:
[https://news.ycombinator.com/item?id=9631362](https://news.ycombinator.com/item?id=9631362)

------
ryanmarsh
I worked for a company that scanned and catalogued many books in the ‘00s.
There’s two primary challenges to solve, nondestructive scanning and speed.

1\. In order to get a good scan (back then) we had to lay each page flat
against a piece of glass (no matter the orientation). This tended to damage or
destroy the binding by the time scanning was complete.

2\. An average of ten seconds per scan (from page flip to page flip) is
blazing fast (including rescans). For a 200 page book this is 33 minutes. To
scan a library of 200 books at this rate requires 3.2 man years of work
(normal 40 hour work week + holidays).

One way to speed this process drastically is to use a bulk scanner. This
requires slicing the binding off the book and feeding in the book as a stack
of pages, scanning the cover separately. Obviously this completely destroys
the book.

Good luck.

~~~
zoomablemind
These days, using a couple of dedicated hi-res cameras may be a much faster
way to aquire the page images

A scanner's workbench could be rigged with screens for live preview and QC.
Then assemble/OCR in software. The main manual task is page turning, the rest
could be fixed (light, exposure, alignment etc)

------
juskrey
This would be my MVP: I'd implement simple inventory app based on ISBN
scanning and simply enumerated boxes with, say, 50 books each. Scan ISBN - put
in the next empty box, take another box when full, and so on. Then based on
title demand, I'll sort popular titles in their own boxes.

~~~
GnarfGnarf
20 to 25 books is about the max that can be comfortably lifted.

------
Floralegeium
I work with scanning documents for business purposes. I went to sales meeting
with a company called Biels, which has now been bought and is called Instream.

While at this meeting they were displaying a book scanner that you could place
in the machine, it would flip each page then take a high resolution photo, and
had options for OCR software wihich would read the entire page and present any
questionable words or characters the OCR could not identify. This machine and
software was pitched to Museums and large libraries. I would highly suggest
asking a local Museum or Library if they have any hardware that would be able
to archive the books your describing.

I tried searching for the exact machine but I could not locate it, I want to
say Canon was the vendor.

I wish success with your en devour.

------
thecupisblue
Use a OCR service such as Firebase ML Text kit or the Amazon's similar
offering or something and take pictures cover by cover, ping an API - even
amazon or ebay might do to see if it exists and price of the book on average.

It also shouldn't be hard to up the speed by taking pictures of a stack of
books - if you take an image of a stack of books and crop it book by book,
training models to recognise books shouldn't be that hard but you could also
use a CV solution (firebase, amazon, azure again) and then from the books it
found in the stack ping the API for each one. This could probably be the
fastest way if you can take a panorama and have it search from that.

Anyways, if you do it - try to get the price, ISBN and editions from the
results.

------
jcelerier
I used Tellico to scan my library, it can automatically lookup the books from
Amazon with their ISBN if you have a barcode scanner (else you have to type
them by hands...)

[http://tellico-project.org/](http://tellico-project.org/)

~~~
mcguire
Amazon shutdown that API last fall, I think. I've been using one of tellico's
other options since then. (I don't remember which offhand.)

------
swayvil
Dump them all in a shredder. Blow the shreds through a well-lit tunnel full of
digital cameras. Assemble the books from the images. Now it's just a software
problem.

(This isn't my idea. Either Rudy Rucker, Vernor Vinge or Cory Doctorow thought
it. I forget exactly who.)

~~~
ahazred8ta
The ham-handed Librareome digital preservation project from Vernor Vinge's
Rainbows End

>> "The raging maw was a "NaviCloud custom debinder". The fabric tunnel that
stretched out behind it was a "camera tunnel" ... thousands of books that had
already been sucked into the "data rescue" equipment"

~~~
swayvil
Has anybody tried that? It seems like a fun software problem.

------
80mph
LibraryThing has an app, or you can order a CueCat scanner.
[https://wiki.librarything.com/index.php/Adding_and_importing...](https://wiki.librarything.com/index.php/Adding_and_importing_books)

------
mongol
Google has a Books API. Look into that. There are smartphone apps that solve
the problem of books that have barcodes. No matter what, this will be a huge
task to complete. I scanned my small library (2-3 shelves) and was quite tired
of it in the end.

------
zimpenfish
Surprised that no-one has produced a Vivino-alike scanner for books.

(Although I suspect the range of book covers is somewhat larger than the range
of wine labels...)

[https://www.vivino.com](https://www.vivino.com)

------
Grustaf
My thought is that the entire point of an old fashioned second hand book shop
is to be able to wander around and explore. IF everything is catalogued, his
barn instantly turns into a very bleak version of Amazon.

------
jonsen
Tangential anecdote:

I once entered used-book store looking for and old math book. Noone there
except a grumpy looking man with a wild hair and a big beard sitting at a desk
in the far corner.

I start browsing the shelves.

“WHAT ARE YOU LOOKING FOR?”

“Um, eh, Play with Eternity.”

“WE DON’T HAVE IT.”

------
anoncow
Not sure if this will help -
[https://aws.amazon.com/rekognition/](https://aws.amazon.com/rekognition/)

~~~
fauria
Using computer vision may provide a good enough result for this use case. An
interesting approach would be to segment the book piles and bulk scan the
spines. There are projects that already tackled this problem:

[https://www.cs.bgu.ac.il/~ben-
shahar/Teaching/Computational-...](https://www.cs.bgu.ac.il/~ben-
shahar/Teaching/Computational-
Vision/StudentProjects/ICBV151/ICBV-2015-1-PavelRubinson/index.php)

Won't be as accurate as barcode scanning, but will be definitely less time
consuming.

~~~
davewasthere
That's fantastic. I was after a similar solution and after playing with
OpenCV, I can see how they put the pieces together.

Annoyingly, I was trying to scan multiple books in charity shops, when one of
my favourites started putting their own stickers directly over the barcode
(when it existed).

------
GBiT
I have experience working with document storage business. You need to barcode
and index all books. Barcode all locations and scan all books to that
container they are. It will be like an excel table with 3 columns. Book name,
barcode (ISBN or smth) and location barcode. If someone will look up from the
catalog you will know the location. If they have ISBN it's possible to write a
script to pull book info by it.

------
bshep
A while back i wrote an ISBN barcode scanner which would lookup the item on
amazon and fond the price, the script would use your webcam as the source for
the image, im sure you could adapt it to your needs, its very simple and has
minimal error checking so beware.

[https://github.com/bshep/ISBNbarcodescanner](https://github.com/bshep/ISBNbarcodescanner)

------
szafranek
I'm surprised nobody mentioned
[https://www.libib.com/](https://www.libib.com/). It comes with a mobile app
that has a barcode scanner with an option for manual entry.

Yes, some older books, especially in languages other than English, are not in
its database, leaving you with the manual option, but it will let you index
the books that are there in no time.

------
andylynch
For books that are not that old, you will often find the info you need on the
copyright page - for US publications, the Library of Congress CIP info is
there; see [http://www.loc.gov/publish/cip/](http://www.loc.gov/publish/cip/)
. Other countries have similar programmes eg the British Library does the
same.

------
georgespencer
You might try Delicious Library: [https://delicious-
monster.com](https://delicious-monster.com)

~~~
jrgd
I was going to second that; it's the first thing that came to my mind—but then
it doesn't work for books without barcodes (or does it now?). I worked with
version 2 a while back, it was great. The iPhone app to scan books without the
laptop nearby requires version 3 though. The Mac app could also check the
price the book sells for online.

------
krekligit
Internet archives? [https://www.atlasobscura.com/articles/marion-stokes-
televisi...](https://www.atlasobscura.com/articles/marion-stokes-television-
news-archive?utm_source=facebook.com&utm_medium=atlas-
page&fbclid=IwAR161Fm3Bu3aJ-0w6QP6_XLvpO0x9E80ZfyjKc-sW65zwz7I4-Z2BzhgtWA)

------
giarc
My local library has a 24/7 return system. You simply put the books on a
conveyor belt and it takes them in and scans them. I imagine it reads the RFID
but you could get a similar system to scan the barcode, you just have to
insert them back side up. Would be quicker than scanning with a handheld
barcode scanner.

------
BigBalli
Please do let me know more regarding your pain points. I released
[http://mybooklist.club](http://mybooklist.club) Obviously it already includes
manual insert and barcode scanning but now i'm working to implement adding by
image recognition (of the cover).

------
dredmorbius
This is a complex project, though also a well-developed space -- it's much of
what libraries do.

Numerous queryable catalogues of book and other matrials exist, with Worldcat
arguably the most developed of those:

[https://www.oclc.org/developer/develop/web-
services/classify...](https://www.oclc.org/developer/develop/web-
services/classify.en.html)

The US Library of Congress also has a huge (if intimidating) amount of
information available.

[http://eresources.loc.gov](http://eresources.loc.gov)

[http://fortune.com/2017/05/17/library-of-congress-free-
recor...](http://fortune.com/2017/05/17/library-of-congress-free-record-
release/)

Figuring out what you hope to accomplish, how, with what resurces (people,
software, equipment, space, etc.), in what timeframe, and with what throughput
(how fast are materials arriving and leaving, whatis the current backlog) are
all considerations. And what end this will serve; book sales, in-person or
online, and what is sufficient to that end is also significant.

~~~
dredmorbius
Abebooks has a freely-available inventory-management system, HomeBase:
HomeBase is AbeBooks' free inventory management software and one of the most
widely adopted programs for booksellers worldwide.

 _This easy-to-use program streamlines inventory management and bookselling on
AbeBooks. HomeBase helps take care of everything from maintaining your
inventory database, keeping track of buyers and issuing receipts, to uploading
your active listings to sell on AbeBooks. Plus, you can also use HomeBase 3.0,
to send your inventory to other marketplaces such as Amazon._

[https://www.abebooks.com/homebase/software-inventory-
managem...](https://www.abebooks.com/homebase/software-inventory-management-
system-catalog/index.shtml)

There's a longer list of software and tools here:

[https://www.whiteunicornbooks.com/home/bookinventoryprograms...](https://www.whiteunicornbooks.com/home/bookinventoryprograms.html)

------
__initbrian__
Maybe look into how public libraries get books back onto shelves
[https://www.bibliotheca.com/library-return-
sorting/](https://www.bibliotheca.com/library-return-sorting/)

------
rajkpal
[http://www.k2.t.u-tokyo.ac.jp/vision/BFS-
Auto/](http://www.k2.t.u-tokyo.ac.jp/vision/BFS-Auto/)

though this is to scan whole books in case there are rare ones..

------
ejdanderson
The amazon seller app does exactly this. I think eBay might have it built in
as well, but I’m fairly certain with amazon you can scan the barcode or cover
of a book.

------
anonu
Ideally you'd just do a pass through with a high res video camera, generally
make sure the book spines are facing out to capture their names, run it
through some image filter to pickup the book names.

Then you'd have to run some algo to match the name with the isbn and tag it
with the general location.

Once you get the process down you could run a new video every few weeks.

This is kind of like Google Street view for the book barn.

Am I dreaming? Could this work on practice.?

------
secfirstmd
The easiest way by far is to download Goodreads and use the barcode scanner in
their app and their lists feature.

------
emmelaich
Google docs will ocr.

At a first parse, just take a picture of a whole bunch of books spines. Some
will be ok, some not.

------
elcomet
You should ask on reddit on r/DataHoarder. They are the best for this kind of
stuff.

------
paulcarroty
Maybe light hardware scanner will be helpful, something like workers use in
warehouses.

------
sandwall
1st, where is this treasure trove?

2nd, barcode scanning would certainly be the most effective method.

------
kmfrk
Photograph the spines of the books on the shelves and OCR the titles and
authors?

------
FZ_BA
You could scan the barcodes and use something like TELLICO to make a catalogue

------
thegabriele
Yep, charge each customer 1$ + at least 5 casual books indexed into a database

------
oflebbe
i would recommend to check with archive.org, if the books are already
available online and if not if they are interested scanning or get the scans.

------
viraptor
Ask a professional. And by that I mean - get in touch with Jason Scott:
[https://twitter.com/textfiles](https://twitter.com/textfiles)

~~~
jocmeh
If we say books and professional, I think of the Internet Archive. They
developed their own software which is used on the TT Scribe system, see
[https://archive.org/details/tabletopscribesystem](https://archive.org/details/tabletopscribesystem).
These people run an amazing operation, but even for them this still involves a
lot of manual labor. In the end, what it takes is a person to grab a book and
type in the title.

~~~
dredmorbius
Jason Scott is the Internet Archive's Free Range Archivist.

~~~
textfiles
HI

~~~
ebcode
Is it true what vessenes said above about the Internet Archive receiving a
dataset of book spines? And would it be possible for the IA to release that
dataset publicly?

------
libguy
So where is this barn of books?

I’d like to have a look. Thanks!

------
Theodores
Hay On Wye is a town known for the book shops in Wales. Maybe this is the
business model that you need to look into.

What Hay on Wye is known for is a literary festival. So there was a pivot to
this a few decades ago that has worked.

This is the guy that started it:

[https://en.wikipedia.org/wiki/Richard_Booth](https://en.wikipedia.org/wiki/Richard_Booth)

Note the way it started, buying library stock from America that was available
due to libraries closing.

Some of the Hay book shops are really good, there is a former cinema that you
could get lost for hours in. Some other book shops are more like 'extras'.
They might have books in cases on the pavement with prices being pennies. This
stuff could be fairly pulped, but, collectively it gives the whole town this
aura of literature that is way beyond what the local sheep farmers necessarily
go in for.

Now if a tourist visits Hay for the festival then they spend £££ on books but
they also spend a lot more on cups of tea, admission fees to see performances,
accommodation and whatever else. A given tourist might spend pennies on books
but in so doing spend many pounds in the town. They might not even read the
books purchased, they might become more souvenir value, and far from generic
souvenirs.

The reputation from the festival is enough to bring a respectable amount of
tourists to Hay throughout the rest of the year.

It also works with a sponsor, normally the Hay festival works with people who
have a vested interest in it being successful, so you get a lot of coverage on
BBC's Radio 4.

Hay also has splendid scenery going for it as well as it being in Wales,
proper. There are towns nearby that are just as pretty with similar scenic
backdrops but nobody remembers the names of those places. The books thing -
which is the effort and inspiration of one man - has put the place on the map,
literally.

So, rather than the high tech solution, maybe the preinternet solution has
some pointers. Get some local store fronts that are closed premises to become
book shops. Segment the collection so that some shops are more specialist than
others. Have some shops in less prime locations so that collectively there is
the same thing going on as with Hay on Wye. Create a fake literary hub and
then make it into a real literary hub by putting on the ten day festival.

If you get the council and local businesses in on the act then you might be
able to get the whole thing started. Build it and they will come works for Hay
even though it is the middle of nowhere with just sheep for local population.

Trying to shift the product online for a pittance is no fun at all, the
festival and tourist location thing could be much more exciting. Try and twin
the town with Hay to get started...

~~~
Freak_NL
> […] has put the place on the map, literally.

I'm pretty sure Hay On Wye was quite literally on a lot of maps before that.
_Figuratively_ speaking you are probably correct.

