

Map Reduce: A simple introduction (2010) - awjr
http://ksat.me/map-reduce-a-really-simple-introduction-kloudo/

======
briffle
Map Reduce seems very interesting, but every example I have seen explains it
in terms of counting frequency of words in documents. I would love to have
someone explain it with an actual business example. I can't think of many real
world uses where counting the frequency of words would matter to most
businesses. (besides maybe some analysis of log files)

~~~
yid
Think of the SQL GROUP BY functionality as an example. You have a giant
terabyte-scale text file that is a CSV of observations (say the raw US census
results). You want to group people by some facet (say zipcode and household
income) and compute some complex, arbitrary function for each group.

Your map function scans the rows and outputs the kv pair (zipcode+","+income,
csv line). All the csv lines in a group go to the same instance of the reduce
function, where you can run any code you like (compute averages, do deep
learning, etc). The output is the results of what you want to compute for each
group.

This is a pretty simple example, but does demonstrate where the power of mr
comes from -- arbitrary functions in the map and reduce functions that are
allowed restricted one-way communication from the mapper to the reducer. It
also should help you understand the glib "you can implement SQL on mapreduce"
comment below, which is what Apache Hive does.

------
usujason
This is a good explanation and I would love to share it but the poor spelling
and grammar makes it un-shareable.

~~~
nocman
Perfectionist much?

While I can agree that pointing out the spelling and/or grammar mistakes could
be constructive, calling the article "un-shareable" because of them seems
terribly over-the-top.

~~~
snoman
Unless you're already familiar with the material, spelling and grammar lends
credibility to the content of the article. Similar to a code smell, it makes
you ask "are you sure you know what you're doing?"

~~~
nocman
While I agree that numerous spelling and grammar mistakes can indicate a lack
of maturity in writing skills, I also know that some of the most brilliant
people I've ever worked with were absolutely terrible at spelling. One guy in
particular was an amazing programmer. He produced libraries of code that were
fast, efficient, and easy to read and understand. They were also very well
documented. I know, because I went through the documentation and fixed the
dozens and dozens of spelling errors. If I didn't know otherwise, I'd have
thought he made a game out of how poorly he spelled words.

I can appreciate numerous spelling/grammar errors making you analyze an
article with a bit more scrutiny. However, I can't think of a single example
of an article with those kind of errors where I couldn't figure out from the
content whether I thought the author really knew what they were talking about.

At any rate, I think there is a big difference between an article needing a
little bit more scrutiny, and the article being "unsharable".

~~~
usujason
Just my personal opinion, no disrespect intended. I have personally authored
posts that needed a bit more time on the proofreading table and I did worry
that people would question my authority on the subject due to grammatical
errors in the article.

~~~
nocman
Fair enough.

Just to be clear, I don't want to give the impression that I don't think
spelling/grammar matter (even for blog posts). I just think it is easy to get
so pedantic about it that you place far too much weight on them.

I have my own set of pet peeves, and am probably guilty of allowing the
violation of one of them to taint my view of an article too quickly and too
often.

And of course, there are always those articles that are _so_ bad grammatically
that it looks like a first-grader wrote them (but I don't think that's the
kind of article we were discussing).

------
jacquesm
I love the punchline, besides the ridiculous requirements and the deadline the
way it is dealt with upon completion is way too close to comfort, I may have
worked at some point in the distant past for that particular boss.

------
deutronium
Am I right in thinking MapReduce seems to be going out of fashion somewhat as
even Google themselves have moved towards using something they term Millwheel.

Which is a stream based processing system.

I have seen one paper on a streaming MapReduce solution though.

~~~
bcuccioli
Wow. I worked on Millwheel as a summer intern the summer before last. At the
time it was a team of about 11 people. I'm honestly pretty surprised to see
this comment as I thought it was just a small internal research project.

Have you seen any references to it in the wild other than the Google Research
paper?

~~~
deutronium
Oh maybe I'm wrong, I really thought I saw something that said it was used for
the index creation. I'm just having a look over the papers I've read.

They do definitely seem to have switched from MapReduce though at least -
[http://www.theregister.co.uk/2010/09/09/google_caffeine_expl...](http://www.theregister.co.uk/2010/09/09/google_caffeine_explained/)

------
triplesec
This is a fantastic explanation. anyone got any favourite uses of this / other
resources to share of similar clarity?

~~~
pdpi
I'm a big fan of (and occasional poster to) Reddit's Explain Like I'm Five,
where you can ask questions and people are supposed to answer them in simple,
accessible terms (if not in the "Little Timmy" literal 5yo sort of speech)

You can find it at
[http://www.reddit.com/r/explainlikeimfive](http://www.reddit.com/r/explainlikeimfive)

------
sarreph
So this is essentially analogous to division of labour efficiencies. Or am I
missing something?

~~~
patrickmay
It's more a technique/architecture for parallelizing highly parallelizable
tasks.

------
sebnukem2
The many typos of the text make it almost difficult to read...

------
mirajshah
Great explanation, quite cheeky as well :)

------
RankingMember
At first I thought you were making fun of the CEO with the misspellings. Then
I continued reading and realized you just don't use spell check. ;)

Interesting article, thanks.

