Patented Book Writing System Creates, Sells Hundreds Of Thousands Of Books

jkn · on Dec 14, 2012

This reminds me of a post by an author who had his own book's prize on Amazon decreased in a bidding war between bots [1]. It's a different matter but the post begins with this:

Before I talk about my own troubles, let me tell you about another book, “Computer Game Bot Turing Test”. It's one of over 100,000 “books” “written” by a Markov chain running over random Wikipedia articles, bundled up and sold online for a ridiculous price. The publisher, Betascript, is notorious for this kind of thing. A story about computer science and other improbable things.

It gets better. There are whole species of other bots that infest the Amazon Marketplace, pretending to have used copies of books, fighting epic price wars no one ever sees. So with “Turing Test” we have a delightful futuristic absurdity: a computer program, pretending to be human, hawking a book about computers pretending to be human, while other computer programs pretend to have used copies of it. A book that was never actually written, much less printed and read.

[1] http://carlos.bueno.org/2012/02/bots-seized-control.html

revelation · on Dec 14, 2012

Amazon needs to clamp down on this nonsense, seeing how it has already infested the Kindle store, where you can find thousands of either automatically generated books or books consisting entirely of public domain content from Project Gutenberg or books mixed together from Wikipedia content (where they violate the license).

Ironically the only thing not automatically generated in Parkers books is the description (and probably a bunch of fake reviews). Very much into the scam territory here.

belorn · on Dec 14, 2012

Aren't this the same books that commonly breaks the license requirement of Wikipedia?

In particularly the kindle versions, if they take data from Wikipedia, then the book need to be licensed under CC-ShareAlike license and users could re-distribute copies freely. If not, then the company behind the book is basically doing copyright infringement, in large scale, and for profit.

For copying texts from public domain, I find that the written language evolve faster than copyright expires. For example, the Swedish written language has changed so much in the last 100 years that middle aged adults can not understand 80% of the words written in many books which are still under copyright.

lrm · on Dec 14, 2012

There's nothing in the CC-ShareAlike license about only being able to use the content for non-commercial purposes. I think that as long as you released the Kindle book with the CC-ShareAlike license, you could still charge for it and it would be legal.

konstruktor · on Dec 14, 2012

And at least you can return a physical book to the seller when you find out it is auto-generated rubbish. Is this possible on kindle?

belorn · on Dec 14, 2012

The return policy of amazon is seven days of the date of purchase in regard to books on the kindle.

rurounijones · on Dec 14, 2012

How well are the sources curated? One of the sample books is a medical text to be used by doctors etc. I would hate to see parts of that book sourced from unreliable places.

Seeing as this thing can apparently put together books on new topics on the fly I am highly doubtful of the quality.

benesch · on Dec 14, 2012

Some of these books are absolutely ridiculous. I can't imagine there's a market for "Finished Weft Knit Fabrics Made of Broad Fabrics Measuring at Least 12 Inches Wide That Have Been Knit and Finished in the Same Establishment Excluding Hosiery: World Market Segmentation by City."

I have a feeling the Amazon reviews for his books may be particularly unrepresentative of their quality—something strikes me as sarcastic about most of the reviews... [1]

[0]: http://www.amazon.com/Report-Finished-Fabrics-Measuring-Hosi...

[1]: http://www.amazon.com/2007-2012-Building-Excluding-Cafeteria...

DanBC · on Dec 14, 2012

The title you give feels a bit like a parody of the old soviet regime's 5 year plans.

About the reviews: Now he's sorted out book creation he can work on review creation. Disturbingly, there may be more money in that (from shady characters) than in book creation.

Vivtek · on Dec 14, 2012

The difference being that Amazon may tolerate book spam, but it comes down hard on review spam.

mainevent · on Dec 14, 2012

They remind me of the reviews on John W. Trimmer's classic "How to Avoid Huge Ships" [0]

[0]: http://www.amazon.co.uk/Avoid-Huge-Ships-John-Trimmer/dp/087...

RobAley · on Dec 14, 2012

Thats the downfall of systems like this. The algorithms don't (yet) have the intuition that humans do, a human creating that book would instinctively know that the market needs that data segmented by the zodiac sign of the fabric company owner, not by city.

mynameishere · on Dec 14, 2012

The reviews are clearly sarcastic. It's an amazon tradition, and they usually put up with it.

http://www.amazon.com/JL421-Badonkadonk-Land-Cruiser-Tank/dp...

http://www.amazon.com/Looking-For-Best-David-Hasselhoff/dp/B...

freshhawk · on Dec 14, 2012

You should be. If it could produce books of any quality it would be useful for fantastically more profitable things than spamming amazon.

cduan · on Dec 14, 2012

Here's the patent (7,266,767):

http://www.google.com/patents?id=bHeBAAAAEBAJ

cskau · on Dec 14, 2012

Interesting. I just publish my first work myself, slightly along the lines of this, though obviously nowhere near the scale.

I basically just took a freely available dictionary data source and compiled it into a nicely readable, Kindle-compatible format. I put it up on Amazon through their KDP and almost instantly had sales, which was a cool experience.

The whole thing got me thinking of some of the large number of other opportunities of throwing a little code at compiling data into useful volumes for for instance the Kindle.

curiousdannii · on Dec 14, 2012

Anyone got sales data on these books? Just because they're for sale doesn't mean anyone is buying them.

JacobAldridge · on Dec 14, 2012

The first time I heard about the 'long tail' was at a business conference in 2008 (09?), and this was used as an example.

It did a wonderful job of breaking through the assumptions of many in attendance - people who were tuning out because "my industry is too complex" or "we're too creative for this to apply" suddenly realised that if books could profitably br automated and printed on demand, then there were opportunities for them if they wanted to consider them.

ck2 · on Dec 14, 2012

This seems very close to the technique that seo content spammers use.

His data sources are free and don't cost him anything? Otherwise 20 cents per book is wrong.

otakucode · on Dec 14, 2012

I will resist the kneejerk 'oh you can't automate that because humans are special snowflakes' aspect that I'm sure will be well covered by others.

What concerns me about this is factual accuracy. It seems to me that these books, while they may compile information, would have to be incapable of doing anything but, at best, providing the consensus about a certain topic. This type of system could also most likely be used to easily manufacture consensus which might be even more worrisome.

I'd bet $100 that he could set this system on any psychology/sociology topic and it would be hailed as an innovator.

psionski · on Dec 14, 2012

I don't think others will cover said aspect. We actually like computers more than we like humans here.

rayiner · on Dec 14, 2012

both Google and Amazon need to clamp down on these sorts of things. A few short years ago, if I was considering a new product XYZ, I used to be able to type "XYZ Review" into Google and quickly get a pretty comprehensive list of reviews on the topic. Now, I get page after page of auto-generated fake reviews, that source some specifications from the product page and throw in meaningless accompanying text. It has absolutely destroyed the utility of Google for that particular use.

VSerge · on Dec 14, 2012

Patenting an algorithm, ie a logical machine built solely on maths, is pretty much like patenting a natural law of physics. Not to mention the workflows in the patent are obvious statement, after obvious statement, after obvious statement. This is absurd beyond description, and of course doesn't help innovation one bit. Kudos to the patent examiners for their deep analytical skills.

On a side note, the cost of a book being described as being electricity and hardware, it has only two logical conclusions: - open source licences are getting abused as pointed out by belorn - authors are getting their contents used and sold as if the "author" was that system

alexpopescu · on Dec 14, 2012

Is this the awesome & absolutely cr*p generator?

lifeguard · on Dec 14, 2012

Crowd-sourced prior art will easily invalidate this patent. Cool idea though. Wikipedia books are also excellent and machine generated.

madiator · on Dec 14, 2012

People seem to have bought a few books and given reviews: http://www.amazon.com/Websters-Icelandic-English-Thesaurus-D...

PLejeck · on Dec 14, 2012

I think the part that shocked me most was "The 2007-2012 World Outlook for Wood Toilet Seats for $795"

drone · on Dec 14, 2012

It would probably be less shocking if you were thinking of expanding your toilet seat factory, and you needed a reference for the financial projections you were proposing to the bank in support of an expansion loan.

I've, more than once, spent several hundred dollars on "reports" from "well-established analyst firms," that happened to be the only source of up-to-date, relevant data. Most of those, upon receipt, could have easily been automatically generated from the dataset, and the value of the human was difficult to ascertain.

barkingcat · on Dec 14, 2012

SKYNET begins collecting information that it will use to become self-conscious during the formative years of intelligent writing computers (2011-2012)

chris_wot · on Dec 14, 2012

SKYNET finds collections of books written by automatons and thus the data becomes corrupted.

Tichy · on Dec 14, 2012

What kind of reviews do his books get?

logn · on Dec 14, 2012

Auto-generated ones, I presume?