Now that giant book stores are on their way out, it seems like we should be ready to end the pretense of retail channel relationships and marketing as being worth virtually all the money in the tech book production value chain.
(2) http://www.npr.org/2012/01/08/144804084/a-self-published-aut... around 5:30 in the audio
It's all digital, though, with no print-on-demand option.
For anyone who's not familiar with his work, look these over:
At present all of our books are released as pdfs. Once the meaps are published they will be converted to mobile format epub and mobi. We understand the desire for mobile formats and we are looking to in the future, hopefully near future, to have all books available in mobile formats, meaps included. You can find all titles we have available in mobile format here: http://www.manning.com/catalog/mobile
Each book is converted manually to ensure that everything transfers to the new format as the Author intended it to appear. This is a painstaking process and does take time. Since each book is different in number of pages and images we do not have a set time frame for when each book will be available but know that as soon as the final ebook is complete it is sent to be converted.
"When the book is finished and the ebook is created, a kindle epub version will be created at that time as well."
Nathan, do you think you'll be including pseudo-code, or will one need to be a clojure programmer to best leverage your book?
There is a big emphasis in the book on using multiple languages together. This is reflective of how I myself have architected systems, with our team using Ruby, Python, Clojure, and Java for the same product. Chapter 2 is about creating a schema for your data using Thrift, for example.
I ask because I'm intrigued by this kind of design, but not the server cost that seems to be associated with it for a newly launched (and potentially unproven) product.
While I think these techniques can scale down, the current crop of Big Data technologies (esp. Hadoop) don't scale down very well. That is, they have a lot of overhead for small amounts of data. So while these techniques can work for "small data", it's going to be relatively more costly. For big data, the overhead is amortized. In the future, I do see scaling down as an important evolution for these technologies.
Not being huge into Java isn't helping either. Would I be better served by biting the bullet and doing things in Java initially or can I skip right to jython or jruby or clojure or something?
Sam wrote the pallet-hadoop tool which can spin up Hadoop clusters at the click of a button ( https://github.com/pallet/pallet-hadoop ). Although if you're on AWS you're better off just using EMR.
You don't need to use Java. I do everything in Clojure (using Cascalog and Storm's Clojure DSL).
have you tried
and then migrate to multi node
Of course, this won't net you any benefit (in fact, performance will be slightly worse), except that it will be relatively easy to scale out and add servers later on.
"In the past decade the amount of data being created has skyrocketed. More than
30000 gigabytes of data are generated every second, and the rate of data creation is
How could you even hope to put a number on the rate at which data is being generated? What does it even mean to generate data?
Would make for a fun (& meaningless) interview question!
Free food and drinks.
Signup here: http://www.airbnb.com/meetups/zjw9ks5q9-nathan-marz-of-twitt...
We needed a meetup tool for all of the awesome Community Meetups that we throw around the world, so two of our engineers (Raph: https://github.com/Raphomet & Horace: https://github.com/warpdude) built an Airbnb Meetup tool. We like to dogfood, so we use it for our nerd meetups as well.
As for the meetup being on a busy Thursday, it just happened to be a time that worked for everybody.
One of the things I would like is recommended naming conventions for the various objects in STORM. For example, what's the best way to name a StreamID? Should it include information about the spout/bolt it originates from and the bolt it is going to? I spend a lot of time fretting these names and I still feel like I'm not getting it right.
In general I think of streams as not going to a particular bolt, but something that is provided that anyone can subscribe to. So in the WordCountTopology, the stream of words isn't "intended" for the word counting bolt, it's just data that can be used by anyone else in the topology. This is a consumer-focused way of looking at it (consumers know their inputs) rather than producer-focused (producers know their outputs).
Screenshot of my account page - http://imgur.com/Hu7qT
The URL is - http://beta.manning.com
To clarify, the time limit applies only to the links that are sent via email to you. The latest version of the MEAP and the published versions are always available for download.
Lists all of your ebooks with quick download links and last updated times for meap books. Been using it for a while now with no issues.
The emailed links expire, but the accounts page is permanent as far as I know.
Or is this question not applicable at all (because the architecture makes no assumptions on the type of data storage); the requirements and usage scenarios are completely different?
MEAP, btw, is a great program... for any of you guys who haven't ever bought a book this way, it's pretty cool. Getting updates as they're delivered and being able to provide feedback as the book is being developed, is pretty gnarly.
Many commercial grid computing products try to be all in one -- that is, handle storage and computation. They don't apply to every problem because they only have one kind of storage meant for certain kinds of tasks.
The architecture in Big Data is a general-purpose way to compute arbitrary functions on arbitrary data, at scale and in realtime. Every data problem you'd ever want to do can be described as a function on data, which is why this architecture is so general-purpose. I recommend reading Chapter 1 in the book (which is free to download from the webpage for the book) where we explain these ideas much further.
I'll also be making announcements about the book on Twitter, of course.