Making my bookshelves clickable

greggsy · on Feb 17, 2024

What I would really like is a tool that takes a video or series of photos, and automatic catalogues the contents. This would be nice from a ‘document all the things’ perspective, but really, really convenient in case of a fire or theft claim.

If you haven’t already, I strongly urge everyone reading this comment to stand up and do a video walkthrough of your house, today. Do all of your book spines, jewellery, DVDs, games, clothes, tools, cutlery, etc. Take photos of your bike’s serial numbers (usually under the hub).

Store the videos and photos in a cloud album or even a free tier somewhere. Email it to yourself, whatever, just don’t forget to do it.

It might take half an hour or, but this evidence is priceless (in terms of time, but it does actually have an monetary value) if you ever need to claim insurance in case of a fire or buglary.

It simply isn’t possible to remember everything you own.

Some insurers demand photographic evidence of recent ownership - I found this out the hard way (who here has a photo of themselves with their bike?! I even had the receipt!)

zerojames · on Feb 18, 2024

You may like "Writing documentation for your house", written by hsiao.dev, and its corresponding HN thread: https://news.ycombinator.com/item?id=38444577.

enos_feedler · on Feb 18, 2024

its also good for airbnb hosts who want to decide if anything is missing from guests

miniatureape · on Feb 17, 2024

I would really love a control-f for the real world.

Imagine you have a list of wines you want to try, or used books you’re hoping to buy.

In the store you open your phone and scan the shelves with your camera and if it finds any matches from your lists, it shows you them on the screen.

pjmorris · on Feb 17, 2024

Same. I've imagined 'spines.com' for books, where the work of linking book spines to ISBN's, etc, has been crowdsourced and you can point your phone camera at a shelf and look up reviews, etc.

throwoutway · on Feb 18, 2024

When I visit spines.com it does a redirect to a self-publishing tool called booxAI.com . I don't think that's what you were referring to, or did the spines tool sell their domain name?

buildsjets · on Feb 18, 2024

Apple iOS does this, kindasorta, but it's not real time. It does text recognition on text on images in your photostream, and you can search for text in your photos using the search bar.

greggsy · on Feb 17, 2024

Apple lidar is absolutely suited to this, and I’ve always thought there is an opportunity to integrate it into the dollhouse addon in Homeassistant.

Or, create a digital twin of your garden, and simulate light shadows throughout the day after adding or removing a tree. Add pruning schedules to a fruit tree.

It’s trivial to do a scan, but it hasn’t really taken off in a practical sense.

boulos · on Feb 17, 2024

It took me a while to realize digital twin got autocorrected to digital town here :).

lathiat · on Feb 17, 2024

Vivino sort of kind of can do that. Has a rapid multi scan mode for shelves. Not quite the pointy AR experience yet though.

KTibow · on Feb 18, 2024

Gemini can help with this but it has a large amount of overhead compared to dedicated models

dmd · on Feb 18, 2024

I wish when people published things like this they would take the time to notice that their requirements.txt and READMEs don't actually work; e.g., try it out in a fresh VM or container divorced from their working environment. Not even their arguments match what's in the README.

adammarples · on Feb 18, 2024

It blows my mind how much effort people, particularly python ml people, put into creating, training, blogging, promoting their github repo and then give so little thought to whether anybody will be able to use it. You're lucky if you get an incomplete and unversioned requirements.txt and no mention of what python version was used. If it requires conda, just give up. You might get an injunction to "install pytorch" and you're on your own.

maroonblazer · on Feb 18, 2024

While watching an interview recently, where the interviewee was sitting in front of their bookshelf, I was trying to discern the book titles to add to my reading list. I screenshotted the person/bookshelf and tried asking ChatGPT+/GPT4 to list all the books. It could only identity a tiny fraction.

moonlitzxspec · on Feb 17, 2024

This is a nice time saving tool for "minimalist" information hoarding. A pet project of mine is to thin out my bookshelf only to books I regularly reach for and store "never going to read" books out of sight.

The idea is, if I can save the details of the "never going to read books" and acquire a digital copy of them, it may be easier for me to psychologically let go of the physical copy and gain the storage space again.

I was going to take a photo of my crowded bookshelves and manually put the ISBN and titles into a spreadsheet. Keeping the photos simply for extra reference. Your project making the photo clickable is a great bridge between the data and artifact.

abhgh · on Feb 18, 2024

This is interesting. For me it seems like me not reaching out for books is a consequence of lazy "rotation", i.e., these are the same books I'd eventually find time to read if I could see them around often enough, so when I have some leisure I don't forget about them.

Anyway this is our (my wife and mine) hypothesis - so we are currently working on rotating books, let's see how that works out : - )

JoeDaDude · on Feb 18, 2024

For those not interested in taking photos, this Virtual Bookshelf project was posted some time ago:

https://github.com/petargyurov/virtual-bookshelf

cxr · on Feb 18, 2024

HTML has image maps. It's been there since longer than many of the people who might read this post have even been alive. You don't need SVG and JS for this.

lights0123 · on Feb 18, 2024

It is certainly odd that they didn't use them, especially since image maps are mentioned at the beginning of the article.

DecoySalamander · on Feb 18, 2024

You'd still need something to outline clickable areas. Map areas do have a visible outline when focused, but I doubt that there is much else.

cxr · on Feb 18, 2024

> something to outline clickable areas

Any graphics editor will let you do that.

jonititan · on Feb 18, 2024

What's the benefit of image maps vs SVG here?

cxr · on Feb 19, 2024

Weird question, but let's give it a shot. Off the top of my head:

Standards, semantics, simplicity of implementation + time-to-implement, cost at runtime...

What's the impetus for re-implementing image maps with JS instead of just using the browser-native implementation?

codedeep · on Feb 18, 2024

Looks interesting, but surely image maps (https://developer.mozilla.org/en-US/docs/Web/HTML/Element/ma...) would be more suitable? no need for `onclick` handlers either then.

(They are mentioned at the start as an option so maybe there was a reason they were disregarded)

Nevermark · on Feb 18, 2024

Works great in Vision. But now I want to get a browsable spacial model when I tap any book in my field of view

Cool demos are so frustrating!

butz · on Feb 18, 2024

Was expecting to find a project where you add switch behind a book and "clicking" it opens up secret entrance behind the bookshelf, but this is also very cool.

jonititan · on Feb 18, 2024

Very nice project. Seems like it would be the ideal kind of thing for AR/XR.

Imnimo · on Feb 17, 2024

Would it make sense to have the user click and then use that point as a SAM prompt? It might let you find a book even if the initial SAM query doesn't find it.

zerojames · on Feb 18, 2024

Post author here. I like this idea. I plan to explore it and make a more generic solution. I'd love to have a point-and-click interface for annotating scenes.

For example, I'd like to be able to click on pieces of coffee equipment in a photo of my coffee setup so I can add sticky note annotations when you hover over each item.

For the bookshelves idea specifically, I would love to have a correction system in place. The problem isn't so much SAM as it is Grounding DINO, the model I'm using for object identification. I then pass each identified region to SAM and map the segmentation mask to the box.

Grounding DINO detects a lot of book spines, but often misses 1-2. I am planning to try out YOLO-World (https://github.com/AILab-CVC/YOLO-World), which, in my limited testing, performs better for this task.