Hacker News new | past | comments | ask | show | jobs | submit login
In Amazon’s Bookstore, Orwell Gets a Rewrite (nytimes.com)
104 points by jbegley 60 days ago | hide | past | web | favorite | 45 comments

Wow I haven't been confident in buying electronics or anything with a battery on Amazon for a while - now I guess I'll add books to the list. Lately even buying the same brand of toilet paper has resulted in differing qualities each time.

Amazon is becoming less reputable than eBay at this point... at least with eBay, the seller has individual ratings and is not hidden behind whatever brand of product they are selling.

I don't know why people rag on eBay so much. eBay is great. Well, for certain things at least. The thing people don't get is that eBay is basically like one giant outdoor sale that one company has organized, and then leased space to a bunch of small vendors who have all set up their own little tents. Some of those vendors suck, others don't, and you're going to find stuff there that you won't see at a large brick-and-mortar retail store.

If I want to find some used auto part for a MY2001 car, Ebay is the first place I'll look, and I'll probably find it there, because tons of auto recyclers use Ebay to sell their parts. If I want to buy a small quantity of some odd item, or some item only available in other countries, Ebay is again the first place to check. In short, Ebay is perfect for finding weird things you won't find anywhere else. You do have to be careful who you buy from though, but the feedback ratings do a tolerably good job of helping you here, and you won't find this on some independent website.

Amazon has indeed gotten pretty bad in a lot of ways, because you actually don't know what you're going to get, depending on who ships the thing to you. It's still OK I think with sellers that are more independent and ship their own stuff, but for those the prices aren't usually very good and you might as well just go to Ebay.

I don't know if this is true in 2019, but 2 years ago they didn't have a proper 2FA, only SMS verification. The issue was that someone figured out a way to get around that, and they were able to log in and buy a lot of expensive stuff.

I reported it to ebay, they reset passwords and set in motion the case to refund me. I removed the perpetrator's address info, changed everything, and two days later it repeated again (because no real 2FA, again). After this happened twice within the same week, i got tired of it, finally got my refund, and closed down my ebay account.

You would think that in this scenario, ebay would block the perpetrator by IP or would at least lock the ability to set the delivery address to the same one that the perpetrator used the first time around, but they did nothing of that sort, allowed that person to order more stuff through my account illegally, and left me with a lot of headache.

It’s still SMS 2FA. What I do is don’t have payment cards saved to the account.

You used to be able to use your PayPal football back in the day (a paypal branded verisign hardware 2FA) or the VIP app but they have slowly been removing that in favour of SMS 2FA to the point where I don’t think you can even add a VIP token if you tried (used to be that you could add one, but they would try their hardest to make you use SMS instead).

If you had it enabled back in the day, it’s still active on your account (both on eBay and PayPal) but you will often find your login in flow disrupted if you still use the “old style” 2FA. (Example on some login pages but not all you are able to login by amending your code to your password. But it’s hit and miss and iirc you can’t use the PayPal app at all if you have 2FA enabled and have to do business via the website.

Note: I’m aware that PayPal and eBay are 2 separate companies now. But for the longest time they acted as one that their application flow feels every similar to each other even still.

I was using PayPal on my account, and they didn't have a legit 2FA back then either (now they do, thankfully). Blows my mind that something as crucial and attack-desirable as a payment system wouldn't have a legit 2FA in 2017, even though random places with way less import stuff to lose like Twitch would.

I will be adding books to the list now too, but the number of categories I won’t shop for on Amazon has ballooned in just the past couple of years. It’s amazing to browse the reviews of brand-name products ranging from kitchen gadgets to yard tools, to even board games, and see customers saying things like “much cheaper plastic than the one I bought at Target” or “huge downgrade in card printing and game piece quality.” It’s not impossible that some of these might legitimately be massive cost-cutting measures by the OEMs, but I have to think most are the result of Amazon’s rampant counterfeiting problem.

In fact, I don’t think I really even have a product category blacklist anymore; it’s more of a whitelist. The only thing I will buy at Amazon are products that should be impossible to fake (beyond the industrial design), like an iOS device.

It's definitely more work to buy authentic products than it used to be. These days I find myself aborting purchases at the checkout phase, when I can review the actual shipper of the product. If it's "fulfilled by Amazon" or a shipper I already trust, then I consider it good to go, but otherwise, I back out and start over with the same product and a different product link. Now with Amazon fulfilling for other shippers it's going to be really hard unless Amazon can do the heavy lifting to ensure product authenticity on the way in to their warehouses. If they don't, I guess it'll be time to give up on Amazon entirely.

This book scanning junk is definitely a next level problem. Wow.

> Now with Amazon fulfilling for other shippers

They've been doing this for years. That's what Fulfilled By Amazon is. And no, they don't reliably check product authenticity.

There have been fake iPhones on eBay for a while, with copycat operating systems full of malware

1984 was published in 1949, 70 years ago. The author died in 1950. If it weren't for ridiculous US copyright laws, reputable public domain copies would be available at reasonable prices and there would be no black market for low-quality knock-offs.

>reputable public domain copies would be available at reasonable prices

so I read a lot. Mostly on the kindle. Lots of public domain classics; I love Conrad and Melville.

I'm also pretty happy to kick in ten or fifteen bucks to avoid spending time finding the best formatted/least incorrect copy, even if the book is public domain.

but you know what? when I buy public domain e-books, there's so much crap mixed in that is worse than gutenberg that I often end up going with Gutenberg first these days, if my favorite publishing houses don't have a copy.

It's actually easier, I think, for me to find quality e-books of books that are still under copyright.

I mean, this is mostly me complaining about amazon and how they aren't really serving my needs even when I'm begging them to take more of my money... but it's also a point that finding reputable public domain copies is awful hard.

For those of my taste looking... my favorite publishing house for that sort of thing is Melville house publishing; but they have something of a limited selection of public domain reprints. (they do have their 'art of the novella' series, which comes out in the cutest little paperbacks. And their kindle formatting is consistently good.)

on the other hand gutenberg formatting isn't great, but it's consistently not terrible, which is more than I can say for a lot of the dreck you find in the kindle store. And a lot of people have a different time/money equation, and sometimes books under copyright aren't available on the kindle at all and paper copies can get pricy (and are way harder to read, for me at least) so certainly public domain is a good thing that should be preserved. I'm just saying, it's not a panacea when it comes to badly-done e-books

Found this here on HN a little while ago. Public domain ebooks but with higher formatting standards than Gutenberg. Less selection as a result but might be worth checking out:


Thank you. I'll check it out. Huh. the format they use doesn't work with the amazon 'send to kindle' functionality.

Barnes and Noble sells it for $7.99. What is a reasonable price?

I buy most of my books on ebay from the seller "thrift.books", and some other goodwill ebay/physical stores. They're used, and generally around $3.00/free shipping. Some Goodwill stores have every book priced at $1.00, and have nearly everything you'd want. I'd never buy a new book again, especially from Amazon.

Agreed, thrift.books seems to be the most reliable way to get a nice physical copy of an older book.

I still buy some new books, but almost all books I have bought for ~10 years have been from Thrift Books (or some other similar seller). Great way to build a library. Many desirable books, often in like-new condition.

I just go straight to https://www.thriftbooks.com/ generally. I imagine it's the same source.

The writer of this piece did not note that the pen-name of the editor of the "high school edition" of "Down and Out in Paris and London", Moira Propreat, is an obvious pun on the phrase "more appropriate". So obvious it didn't need to be mentioned?

Perhaps we should think about establishing a public listing of checksums for important texts?

Not sure exactly how it would work, you'd probably only want to run it against the main text so there would remain some flexibility for chapter headings, forewards, etc.

But what is the "main text" and who has the definitive copy?

At first glance, most people would think of chapter text and headings but even that can vary from edition to edition and printing to printing and country to country and translation to translation. How are typos/corrections handled and who has the right to make them?

Then we need the schema/structure for the content. And that schema has to preserve whitespace because while it's not important most of the time, other times it's vital like in poetry. Obviously, we need to be careful of character encodings too.

But most artists and writers consider their "work" to not just be the final product itself but the things that go around it like cover art, dedications, etc.

At first glance, this is an "easy" problem but gets ugly quickly. It's also fascinating though.

* I spent a few years at the Library of Congress working on their digital preservation project so lived and breathed these questions. When we worked with records (as in the musical kind), the album and cover art was just as important as the actual music most of the time.

> But what is the "main text"

Right. See this incredible example. Both versions are authentic, in a way.


> Mitchell himself explains the reasons for the discrepancies in an interview quoted in Eve’s paper: they occurred because the manuscript of Cloud Atlas sat unedited for around three months in the US, after an editor there left Random House. Meanwhile in the UK, Mitchell and his editor and copy editor worked on the manuscript, but the changes were not passed on to the US.

Interestingly I feel like leaning on some work in the field of genomics where comparing different formats, each of which contain potential 'errors' is something done.

Search engines also seem to do something like this already as well https://en.wikipedia.org/wiki/MinHash. MinHashing is also used in genomics. White space, if handled appropriately are just more characters.

But most literature won't be available via flat text files I imagine. Some sort of image -> text converter would be needed, which I bet exists, but may require tweaking to allow more fine grained representation of white spaces.

Authors publishing new texts could release some kind of checksum to go with it ... or to venture into waters that I don't know much about ... could blockchain be used in some way to keep a record of edits to text?

I'm sure someone out there has put a lot of thought into guaranteeing the authenticity of a text.

Edit to add: This is interesting to think about in terms of all media. Wasn't it just last week that there was a headline about Boris Johnson editing some of his old videos? How do you guarantee that the information that you viewed a year ago is the same today as a year ago?

Ok - your points are all valid, but they are also huge. So if we try and address them we will get nowhere for a long time.

What would the MVP of this be? What if we had a register of signed checksums with reputation and community selection (I trust these folks, but not these folks, verified artist wins)?

Imo this is just good old fashioned quality control. When you can't trust your wholesaler you randomly sample the shipment and test it, rejecting the whole shipment (or exact penalties) for product outside an acceptable threshold.

The publisher could supply a manuscript. It could be compared against the manuscript database (note: not with a checksum), and if it's too similar to other copyrighted works then it should flagged for manual review, and rejected if identical. That way you leave the door open for new translations/editions from different publishers, which should be treated as separate products.

Then when the shipment of physical books arrive, you sample them via OCR + text differences and if it's outside your standards, reject it.

But like the article mentions, that's expensive.

I've been brainstorming about this, applied to gov't legal texts, and their republication.

Nature, June 6, has a very interesting article on attribution of authorship, that seems relevant and is on my reading list:

Credit Data Generators for Data Reuse, pp. 30-32.

Or a signing certificate by a CA?

I'm genuinely curious - can someone explain how Amazon has avoided litigation on this for so long?

Who's gonna litigate them? When you hire tens of thousands of people around USA all of sudden bunch of politicians, lawyers and judges roots for you and wishes you all the best, each for different reason. I mean which prosecutor that ultimately answers in some manner to a mayor of a big city that his/her future depends of how many Amazon employees will get hired/fired, will shit in their own sandbasket and go after Besos?

Sure, if you me or Joe Doe opens an online store and have 1% of counterfeit problem than Bezos currently has, you would be SWAT-teamed, locked in jail and probably without bail too, with your finances frozen so you barely can afford a lawyer. But that's corporate America. This is business and that's how things work here in grand scheme of things when you become too big to jail/litigate.

If you're wondering about some weird social ill plaguing society from the tech industry, 99% of the time the root cause of it is Section 230.

Section 230 of what code / title?


sounds like the "rewrites" are scanning errors (counterfeiters are probably digitizing the original works and errors are creeping in) and in the case of the "gibberish" copy, quite clearly a character encoding issue.

nothing in the article points to intentional alteration, like the tone of the article suggests

Most of it is scanning/encoding errors, true, but the example of the high school edition of _Down and Out_ really does seem to be intentional alteration -- there's no way that rewording "Come here, my chicken" to "Come here" is a technical issue. For whatever reason, a human wanted that change.

Amazon had it's share of trashy classic copies for a while.


Could it just be that bootleg operations are scanning in the books with cameras rigs, and then using tesseract or some similar OCR to parse the text back out and then running it through Microsoft Word spell check to manually resolve conversion errors?

When Gnutella file sharing first became popular, I noticed that if I searched for .txt files I could find popular books of the day. Sometimes I would download them and do a find and replace operation to change the names of characters, sometimes to my own name or of other people, just to be silly. I think I did this with all the Harry Potter books. I also recall creating a heavily modified version of Fight Club which had a happy ending. This isn't really what the article is about, but it does show that some edits may well be intentional. None of the works I modified, as far as I recall were in the public domain at the time.

And about a decade ago there was a brief humor fad that consisted of modifying public domain classic literature to add fantasy or SF features -- "Sense and Sensibility and Sea Monsters" or "Android Karenina" and so on -- but obviously the people buying those knew that they weren't buying the original works.

Who could forget "Pride and Prejudice and Zombies" ? They even made a movie of that one

PaPaZ was the one that almost got me into Audible. The lady that was reading it did it in an "aristocratic" delivery that just added to the humor.

It certainly sounds a lot more interesting than the original version.

I love how the High School edition of Down and Out in Paris and London says it was edited by Moira Propeat...... in 2105.

Power corrupts, absolute power corrupts absolutely.

If your in the game of monopolising online sales, your not here for a long time, your here for a good time.

Fake items have made alot of people rich. Amazon needs a cut of that pie. How else do you maintain growth year in and year out?

This is scary & ridiculous... we have to deal with #fakebooks now as well as #fakenews?

It would be great if you manbun faggots would not post paywalled articles.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact