
Hadoop ported to R (and it's trivial) - Anon84
http://blog.revolution-computing.com/2009/11/hadoop-ported-to-r.html
======
tmountain
Misleading title at best as Hadoop is a framework for the management and
execution of map/reduce while the article demonstrates a map/reduce operation
in R. I guess Hadoop has been ported to Clojure as well since I can just say:

(apply + (range 10))

And get an answer of 45. Hadoop must have been ported to MySQL as well:

SELECT type, SUM(price) FROM products GROUP BY type

It's not that I mind people pointing out the obvious (map/reduce style
constructs have been around forever), but I'm accustomed to a port meaning
something being migrated from one platform to another with a comparable
feature set.

------
jbooth
What a stupid article. "It's not quite that simple, of course".

So the distributed petabyte-scale filesystem that scales to thousands of
nodes, the recent work on append which guarantees fresh writes will be visible
to all hosts, the efforts at map-locality which will run your map function on
the host where the data split is located, compression layers to improve I/O
throughput..

Trivial?

~~~
kscaldef
I'm sure they could bang that out in an afternoon. A weekend, at most.

</sarcasm>

I really wish people would actually read the MapReduce paper before trying to
talk about what it is.

~~~
pgbovine
amen, the MapReduce paper is all about the distributed system they built
around the core algorithm.

this article would be akin to somebody 5 years ago saying "hey guys, it's easy
to implement Google's search engine in 5 lines of R ... here's the calculation
for PageRank, the rest is just detailz!"

------
jhammerb
In addition to the points below about Hadoop's implementation of MapReduce,
Hadoop consists of many subprojects which perform complex tasks: HDFS, a
reliable, petabyte-scale distributed file system; HBase, a clone of Google's
BigTable; Pig and Hive, which provide higher-level syntax, columnar storage in
HDFS, and a persistent metadata repository; and Zookeeper, a coordination
service for distributed systems.

I'm not impressed by how REvolution's new management is approaching marketing.

~~~
pgbovine
sorry to sound trollish, but the sad part is that this sort of marketing-speak
masquerading as technical blogging might actually be able to earn companies
'streed cred' with customers. their potential clients might be like "oh wow
you guys really CAN implement Google's famous MapReduce or Yahoo's Hadoop in a
few lines of R, and you even wrote about it on a techie blog, we'll buy it!"

------
Semiapies
"It's not quite that simple, of course"

No, it isn't, or else every language with map and reduce functions would have
a trivial, one-liner version of Hadoop.

Does anyone have any familiarity with the MapReduce package?

