Hacker News new | past | comments | ask | show | jobs | submit login

Amazon is probably trying to correct publishers who copy+paste their hardcopy texts (from InDesign or wherever) into eBook format, taking with them artifacts from the print designer, like forced hyphenation. These artifacts from the print world are not useful in eBooks, since the reader is automatically breaking up lines based on the screen size, font, etc.

http://indesignsecrets.com/field-guide-to-indesign-hyphens.p...

I presume that this is what happened because a) she has a paperback & eBook edition of this book, b) this happens a lot, and c) someone complained. I would hope that Amazon would do a search in the eBook and not only look at the total hyphens, but also find a few examples of words that were probably broken up for print layout reasons. If so the author should remove the hyphens and resubmit the eBook.

If that's not the case, it's very silly. Obviously hyphens are useful and shouldn't be banned.




This would be easily rectified if they looked up the words on each side of the hyphen. If they are actual words, then that hyphen should not count. If they are just partial words, then they should. But removing a book from circulation based on an obviously shoddy algorithm should never happen. As a Kindle author myself, I'm having second thoughts about the platform. Censorship by robots isn't exactly an attractive feature.


> If they are actual words, then that hyphen should not count.

That would have side-effects e.g. https://en.wikipedia.org/wiki/Double-barrelled_name


Of course there will be exceptions, and it wouldn't need to be perfect. It just needs to be better than counting hyphens alone, which the suggestion above inarguably is.


Not to mention fantasy/SF love of invented languages. I'd hate to be HP Lovecraft if this becomes common.

Or have to quote transliterated Arabic, for that matter.


Names are proper nouns and therefore capitalized, the edge case is resolved until you find a proper noun with capitalization in the middle of the word with the hyphen coming before the capital letter. This seems to be limited to foreign words(German in particular) and Company Names where the author inserted a hyphen. https://en.wikipedia.org/wiki/CamelCase#Current_usage_in_nat...


Well then they shouldn't do it at all or rethink their algorithm altogether.


Or do the sane thing most other people do: back up an imperfect but effective algorithm with competent human review.


Check the comments after the post. That's mentioned as a possible reason and the author says that's not the case. Apparently Amazon is complaining about real hyphenated words, not line-break hyphen-ated words.

[edit: It appears, though, that the author is stylistically abusing hyphens, so while they may not be wrong, they're grating to most readers. So, should Amazon be in the business of banning books on the grounds of poor style, rather than technical grounds that are inarguable? (If the book contained the word hyphen-ated, that would be wrong unless it was dialogue and the speaker was pausing between syllables.)]


If "stylistically abusing" punctuation is worth of blocking an ebook, then I have to assume Huck Finn and Cloud Atlas are next on the list of books to remove for their heavily punctuated attempts to represent speech patterns.


furyg3 knows that the book only has proper hyphens. Amazon took down the book because they thought it was full of line break hyphens. It wasn't, but that's what Amazon thought because they never actually looked.


So really the title of the post should be "Collateral damage when Amazon went to war against bad punctuation," which casts Amazon's motives in a different light.


I don't believe that is satisfactorily disproving this position. It is very very easy to believe that the automated system trying to prevent lazy formatting is just interpreting a large prevalence of hyphenated words as line break hyphens.

Graeme merely asserts that his book was free of those, which we knew going in. It does nothing to say why Amazon was raising the issue, and it wouldn't shock me that an automated system would be unable to tell the difference.


Thanks for making that point, this makes a lot more sense now. I really hope this is the case. (Though the author does state in a comment that they don't have any of those errors in their text.)


> she has a

Btw, the author's name appears to be "Graeme", which seems to always be male:

http://www.behindthename.com/name/graeme


This explains the strangely-located hyphens I've been seeing in a book I'm reading currently. This must be exactly how the book was submitted.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: