The data I'm loading is stuff like tags - e.g., <itemid>\t<tagid>. In human terms, "Dress A has a ruched collar." Mapreduce can handle data like this, even when it comes unordered.
The data I'm reading is computational results based on the loaded data - e.g., an index: <tagid>\t[<itemid1>, <itemid2>, ...] (where each itemid has been tagged with tagid). E.g., "here are all the dresses with a ruched collar."
(Actually, we do considerably more than this, nor do we need Hadoop for an index. But an index is the simplest example I could give.)
The original data is very boring. It's only after aggregation and calculation that it becomes worth reading.