This reminds me of a post by an author who had his own book's prize on Amazon decreased in a bidding war between bots [1]. It's a different matter but the post begins with this:
Before I talk about my own troubles, let me tell you about another book, “Computer Game Bot Turing Test”. It's one of over 100,000 “books” “written” by a Markov chain running over random Wikipedia articles, bundled up and sold online for a ridiculous price. The publisher, Betascript, is notorious for this kind of thing.
A story about computer science
and other improbable things.
It gets better. There are whole species of other bots that infest the Amazon Marketplace, pretending to have used copies of books, fighting epic price wars no one ever sees. So with “Turing Test” we have a delightful futuristic absurdity: a computer program, pretending to be human, hawking a book about computers pretending to be human, while other computer programs pretend to have used copies of it. A book that was never actually written, much less printed and read.
Amazon needs to clamp down on this nonsense, seeing how it has already infested the Kindle store, where you can find thousands of either automatically generated books or books consisting entirely of public domain content from Project Gutenberg or books mixed together from Wikipedia content (where they violate the license).
Ironically the only thing not automatically generated in Parkers books is the description (and probably a bunch of fake reviews). Very much into the scam territory here.
Aren't this the same books that commonly breaks the license requirement of Wikipedia?
In particularly the kindle versions, if they take data from Wikipedia, then the book need to be licensed under CC-ShareAlike license and users could re-distribute copies freely. If not, then the company behind the book is basically doing copyright infringement, in large scale, and for profit.
For copying texts from public domain, I find that the written language evolve faster than copyright expires. For example, the Swedish written language has changed so much in the last 100 years that middle aged adults can not understand 80% of the words written in many books which are still under copyright.
There's nothing in the CC-ShareAlike license about only being able to use the content for non-commercial purposes. I think that as long as you released the Kindle book with the CC-ShareAlike license, you could still charge for it and it would be legal.
How well are the sources curated? One of the sample books is a medical text to be used by doctors etc. I would hate to see parts of that book sourced from unreliable places.
Seeing as this thing can apparently put together books on new topics on the fly I am highly doubtful of the quality.
Some of these books are absolutely ridiculous. I can't imagine there's a market for "Finished Weft Knit Fabrics Made of Broad Fabrics Measuring at Least 12 Inches Wide That Have Been Knit and Finished in the Same Establishment Excluding Hosiery: World Market Segmentation by City."
I have a feeling the Amazon reviews for his books may be particularly unrepresentative of their quality—something strikes me as sarcastic about most of the reviews... [1]
The title you give feels a bit like a parody of the old soviet regime's 5 year plans.
About the reviews: Now he's sorted out book creation he can work on review creation. Disturbingly, there may be more money in that (from shady characters) than in book creation.
Thats the downfall of systems like this. The algorithms don't (yet) have the intuition that humans do, a human creating that book would instinctively know that the market needs that data segmented by the zodiac sign of the fabric company owner, not by city.
Interesting. I just publish my first work myself, slightly along the lines of this, though obviously nowhere near the scale.
I basically just took a freely available dictionary data source and compiled it into a nicely readable, Kindle-compatible format.
I put it up on Amazon through their KDP and almost instantly had sales, which was a cool experience.
The whole thing got me thinking of some of the large number of other opportunities of throwing a little code at compiling data into useful volumes for for instance the Kindle.
The first time I heard about the 'long tail' was at a business conference in 2008 (09?), and this was used as an example.
It did a wonderful job of breaking through the assumptions of many in attendance - people who were tuning out because "my industry is too complex" or "we're too creative for this to apply" suddenly realised that if books could profitably br automated and printed on demand, then there were opportunities for them if they wanted to consider them.
I will resist the kneejerk 'oh you can't automate that because humans are special snowflakes' aspect that I'm sure will be well covered by others.
What concerns me about this is factual accuracy. It seems to me that these books, while they may compile information, would have to be incapable of doing anything but, at best, providing the consensus about a certain topic. This type of system could also most likely be used to easily manufacture consensus which might be even more worrisome.
I'd bet $100 that he could set this system on any psychology/sociology topic and it would be hailed as an innovator.
both Google and Amazon need to clamp down on these sorts of things. A few short years ago, if I was considering a new product XYZ, I used to be able to type "XYZ Review" into Google and quickly get a pretty comprehensive list of reviews on the topic. Now, I get page after page of auto-generated fake reviews, that source some specifications from the product page and throw in meaningless accompanying text. It has absolutely destroyed the utility of Google for that particular use.
Patenting an algorithm, ie a logical machine built solely on maths, is pretty much like patenting a natural law of physics. Not to mention the workflows in the patent are obvious statement, after obvious statement, after obvious statement. This is absurd beyond description, and of course doesn't help innovation one bit. Kudos to the patent examiners for their deep analytical skills.
On a side note, the cost of a book being described as being electricity and hardware, it has only two logical conclusions:
- open source licences are getting abused as pointed out by belorn
- authors are getting their contents used and sold as if the "author" was that system
It would probably be less shocking if you were thinking of expanding your toilet seat factory, and you needed a reference for the financial projections you were proposing to the bank in support of an expansion loan.
I've, more than once, spent several hundred dollars on "reports" from "well-established analyst firms," that happened to be the only source of up-to-date, relevant data. Most of those, upon receipt, could have easily been automatically generated from the dataset, and the value of the human was difficult to ascertain.
Before I talk about my own troubles, let me tell you about another book, “Computer Game Bot Turing Test”. It's one of over 100,000 “books” “written” by a Markov chain running over random Wikipedia articles, bundled up and sold online for a ridiculous price. The publisher, Betascript, is notorious for this kind of thing. A story about computer science and other improbable things.
It gets better. There are whole species of other bots that infest the Amazon Marketplace, pretending to have used copies of books, fighting epic price wars no one ever sees. So with “Turing Test” we have a delightful futuristic absurdity: a computer program, pretending to be human, hawking a book about computers pretending to be human, while other computer programs pretend to have used copies of it. A book that was never actually written, much less printed and read.
[1] http://carlos.bueno.org/2012/02/bots-seized-control.html