It's funny. As soon as he described his problem, I suspected ChatGPT would enter the picture. It's often significantly better than search engines for finding the name of even an obscure work from a description, so of course folks on book-finding subreddits would use it a lot.
But the author's absolutely right to warn that it also regularly fails us, and the author's also right the celebrate the folks who are trained specifically in finding this sort of information in ways that the automated tools can't replicate yet.
I was sent a photo of a page from a book with a great piece of writing. He didn't know the book. I ocr'd the page and pasted it into ChatGPT. It lead me on a merry dance where it started unequivocally that it was a book it couldn't have been. It then started making up books from similar authors. Every time I said, 'there is no such book', it appologised and then made up a new book. It was like talking to a salesman, trying to bullshit it's way through a pitch.
I put a short piece of it into Google books and it found it! I asked ChatGPT about the actual book and it claimed to know it!
It was a book called Blood Knots by Luke Jennings. I bought it, and before I read it I saw the friend who sent me the excerpt, and gave it to him. A year later I saw the same book, shelf soiled, in an independent store. It was worth the wait, it was a great read.
I also saw David Allen Green (author of the above) ask his question on Bluesky on my first day using it. Somehow I feel part of this story
I typed a reply and deleted it but I've had the same experience. Also you don't need to OCR with the phone app typically you can snap a picture.
Beyond books, it's really awesome at finding movies even with super weird things I happen to remember - I assume it's trained on quotes, scripts and maybe fan knowledge from IMDb or something.
Song lyrics it did the same thing. yes it was for a niche melodeath metal song but the lyrics are very distinct and Google/bing can't seem to do exact string searches anymore but GPT was confidently incorrect on what band it was.
They were also pointing out an interesting point that ChatGPT does. It treats everything as relevant. Whereas the librarian who found the book. Systematically discarded possible 'facts' and substituted others (goblins->demons) to find out what was going on. Not sure any AI does this currently.
ChatGPT does do that for me, when I'm using it for tasks like David Allen Green's book hunt.
This has yet to help. If it can find it, it (so far) has not needed to change the details I provided; if it doesn't know, it will change them to something thematically similar but still not find what I wanted (and if I insist on requiring certain story elements that it had changed, it will say something along the lines of "no results found" but with more flowery language).
I suspect that, given a reasonable prompt, it would absolutely discard certain phrases or concepts for others. I think it may find it difficult to cross check and synthesize, but "term families" are sort of a core idea of using multi-dimensional embedding. Related terms have low square distances in embeddings. I'm not super well versed on LLMs but I do believe this would be represented in the models.
I think a more robust approach would be to restrict the generative AI to generating summaries of book texts. First summarize every book (this only has to be done once), and then use vector search to find the most similar summaries to the provided summary. Small mistakes will make little difference, e.g. "goblin" will have a similar embedding to "demon", and even entirely wrong information will only increase the number of books that have to be manually checked. Or better yet, develop an embedding model that can handle whole books at once and compare the vectors directly.
Perhaps somebody with more compute than they know what to do with could try this with Project Gutenberg as a proof of concept.
That works for description searches id imagine, however I'm the type who'd remember "book with a boy who had a yellow ball" type things that only happen on one page of a book.
It's also interesting that years of trying on Twitter and Reddit failed, but asking on Bluesky succeeded. I'm certainly not claiming that Bluesky is some kind of great leap forward compared to Twitter. But it could be that being a new service it just isn't as crowded with bots, spam, and BS -- thus allowing the signal to come through.
But the author's absolutely right to warn that it also regularly fails us, and the author's also right the celebrate the folks who are trained specifically in finding this sort of information in ways that the automated tools can't replicate yet.