So that's a downvote from me, please try to find a source that doesn't require you register an account.
Also explaining why you're downvoting is a good idea.
OP: Paywall Link
You: Here’s the non-paywall link.
Not that hard. And if you think about it, OP typed the answer “Ruby on Rails”. Would you have downvoted if he didn’t provide a source?
However, it's lacking in any context explaining what you're trying to achieve and why.
It's probably obvious to some people but for me, it's not, which I think is a shame.
not dissing the author, genuinely trying to understand the spectrum of data science needs
I'm just shocked Amazon has been able to own this niche with so little effort.
It's a small moat, but definitely penetrable with more than a little effort.
Book publishers or people who've worked in book publishing will know that the book database is one area you don't want to mess with unless you know what you are doing. ISBN's are not the be all and end all of the story, and when you start taking into account special editions, covers, ebook editions, language translations, you'll start to realize that the Book Catalog system going back in history, including Dewey decimal system is a marvel of human achievement.
Of course establishing a good quality index is going to take work. People often forget that quality take human work and effort.
EDIT: I lied. I changed the number from my original estimate of a "few hundred" to "hundred thousand". The Goodreads Librarians group has 103718 members as of when I just peeked now - so it's actually a large number of humans submitting fixes to their catalog.
If you take a look at the kind of discussions taking place, those are the kinds of things any competitor to Goodreads needs to know about.
I _love_ that I can take a book I enjoyed, see it's on a list of "Best Magic Systems", and note what was rated even better for its magic system
A simple method of discovery for me
I don't think there's much value left on the table in the niche, though. Kindles have first-class Goodreads sync and even a Goodreads button in their global navbar. And Goodreads' competitors, for the few people who don't want to use Goodreads, already have a deep rut of incumbency.
Even you, who has supposed great issues with Goodreads, apparently wasn't bothered enough to even see if competitors existed all this time, much less before writing your comment. Doesn't bode well for the Goodreads' competitor market, lol.
LibraryThing: 0 results
Goodreads: 2,000+ results and they're well sorted
And of course Goodreads has issues of its own, but none of them are show-stoppers for most people, especially few of the people who just use it as a glorified Excel spreadsheet.
I only chuckle about this because, like many enterprising HNers, I myself have considered building a Goodreads competitor in the past and even managed to build the ol' weekend prototype (i.e. 0.001% of the work). It's one of those projects where you start and, after you get some of the easy things done like fuzzy search, you go "wait, wtf am I doing? Who would switch to this?"
Using and improving OpenLibrary is also alluring, but pretty hard to do without an application with actual users that have some sort of "edit book" functionality that you can then moderate and submit upstream to the OpenLibrary data source.
For example, look how ListenNotes.com lets users edit its podcast database: https://www.listennotes.com/podcasts/the-joe-rogan-experienc... -> the "Edit" tab.
Different usage than reddit
So basically, keep using GoodReads for now?
One optimization though is separating your loading tasks from compute tasks. This makes the pipeline more resilient and makes backfilling/reprocessing less of a headache.
For example someone could show the disparity between a New York times bestseller and the book getting the most amount of activity on GoodReads (added to most shelves for example)
Anyone run this who could comment on the metrics and, consequently, server sizing?
I'd recommend taking a look at dbt  for a refreshing approach to this domain. The AWS EMR Redshift approach is great if you _know_ you'll need all the configurability, but chances are you won't, and even with that said, the above stack provides it as necessary.
I've also seem the opposite, with 5 star ratings on an unreleased book because the author (as a person) is like by the community
Amazon tends to want every part of itself to be in ship-shape, and giving itself a massive discount would discourage efficiency in non-AWS parts of the business.
Disclosure: neither a current nor former Amazon employee.
Any recommendations for an alternative?