The Elephant was a Trojan Horse: On the Death of Map-Reduce at Google

timr · on June 26, 2014

"and the real contribution from Google in this area was arguably GFS, not Map-Reduce"

This, a million times over. Map-reduce is not difficult to implement. Implementing a distributed, petabyte-scale filesystem to hold the data being accessed by thousands of workers is what's difficult.

It's just a shame that Hadoop (and HDFS) are what we're saddled with in the outside world. They're a total disaster area, from configuration difficulty to memory usage to speed to monitoring. But since HDFS is the only commonly used distributed FS, you're pretty much bound to Hadoop (and the rest of the horrible Apache ecosystem).

The world needs a good, stable, well-written distributed filesystem (ideally, one not written in a bloated language designed for remote controls and set-top boxes).

random3 · on June 26, 2014

First, I don't believe that Map-Reduce is not used inside Google anymore. Second, the map-reduce pattern is actually very useful basic, common computation. It's equivalent to the SQL JOIN and it's not something that you can really do without. Perhaps the large chunk (batch) approach is not ideal for many of the use-cases Map-Reduce is being tried with (e.g. like "interactive" querying with Pig or Hive). But that doesn't mean it's not useful. If you're optimizing for throughput you'll generally want to read/process in the batches optimized for some underlying sizes (it could be page size, it could be blocks, etc.).

Also a system is a lot more than just the compute framework. It needs to deal with various inputs/outputs, do scheduling, etc.

Between the distributed storage and the distributed processing, I'm not sure it's easy to decide which one could be more difficult, either.

Saying that Hadoop is a disaster, is not far from saying we live in an awful world. Working with them for years doesn't make me the most objective person, however, given the huge adoption, I'd say they may not be that bad. More, using something like Cloudera Manager makes it trivial (which sometimes makes me wonder why the vanilla version hasn't been improved...) (BTW there's QFS and other distributed file systems).

I wonder why is the Apache ecosystem horrible?

I get it that you don't like Java. Fair enough. What would be your language of choice for the next gen, stable distributed file system? Go, Rust, JavaScript?

wyager · on June 26, 2014

>What would be your language of choice for the next gen, stable distributed file system?

Here's my heavily biased subjective opinion on this entirely hypothetical software:

I think we should do one or both of two things:

A) Do it in very clean, fast, simple C. Put an emphasis on speed and simplicity.

B) Do it in very reliable, secure, simple Haskell. Put an emphasis on correctness and simplicity.

With some effort, the C one could be correct and the Haskell one could be fast.

I mention these two languages because they compile to native code and have very good cross-platform support. You won't have any trouble running either of these on embedded devices (which I can't say for Java or Go. Go has some weird compiler bugs on ARM platforms, and the JVM is frequently too memory intensive for embedded). C has an advantage of allowing the absolute minimal implementation, and Haskell has an advantage of allowing a massively concurrent implementation. Yada yada yada

Of course, it could be that the question is completely irrelevant. Just define a spec for a DFS, and then let different implementations pop up in whatever language is best suited to that implementation's specific details.

nl · on June 26, 2014

You won't have any trouble running either of these on embedded devices (which I can't say for Java or Go. Go has some weird compiler bugs on ARM platforms, and the JVM is frequently too memory intensive for embedded).

Why is this important in this use-case? If the DFS is being used for data processing then presumably the nodes are reasonably capable machines.

There may well be a difference use-case for a DFS for embedded and resource-constrained devices. That's not what Google or Hadoop is doing though.

vidarh · on June 26, 2014

The biggest limiting factor even in our relatively low-density populated rack is heat and power. With off the shelf servers and relatively low density, I can trivially exceed the highest power allocations our colo provider will normally allow per rack. The more power you waste on inefficient CPU usage, the less you can devote to putting more drives in.

nl · on June 26, 2014

The OP's claim is that memory is the limiting factor in the case of Java. I don't entirely agree, but even if I did it would almost certainly be a fixed overhead per machine, and unlikely to be a problem on server class machines.

Also, the read/processing characteristics of compute nodes often means the CPU is underutilized while filesystem operations are ongoing.

srean · on June 26, 2014

I will leave with an elliptical meta-comment, for those whose competitive advantage lies in others not getting it right, have little interest in correcting misconceptions. You might have interest in this anecdote https://news.ycombinator.com/item?id=7948170

nl · on June 26, 2014

But how much of that is Java, and how much is Hadoop?

Spark runs on the JVM, and much, much faster than Hadoop on similar workloads (Yes, I understand it isn't just doing Map/Reduce, but the point is that Java doesn't seem to be a performance limitation in itself).

srean · on June 26, 2014

Indeed and as I said it did surprise me that Hadoop was so much slower. But the buck really stops at resources consumed per dollar of usable results produced, and in that Java is going to consume a whole lot more. At large scales, running costs far exceeds development costs. BTW my point was not only about Java but also about your assessment of the hardware.

delian66 · on June 26, 2014

CPU and memory resources spend on an inefficient filesystem implementation are just wasted resources, not available for your workload. Keep in mind that the inefficiencies are multiplied over all your cluster nodes.

wyager · on June 26, 2014

There are many use cases for a DFS. Some of those cases involve relatively low-resource nodes.

nl · on June 26, 2014

I agree. But that's a difference use-case to what HDFS is designed for.

liquidcool · on June 26, 2014

I don't think large scale distributed file systems written in C are hypothetical. I'm pretty sure this is exactly what MapR has done - replace the Java-based HDFS with C, retaining the API. GlusterFS by Red Hat is another DFS.

fh973 · on June 26, 2014

As someone who is currently implementing a next-gen distributed file system, I can highlight one aspect: you have a lot of concurrency and asynchronous processing. Thus you need at least reference counting.

dasil003 · on June 26, 2014

Can you really do Haskell on embedded? I thought the far far abstraction away from memory as a concern made it pretty much a non-starter for the foreseeable future.

codygman · on June 26, 2014

According to Atom, you can even do "hard realtime embedded software":

http://hackage.haskell.org/package/atom

wyager · on June 26, 2014

Embedded meaning "ARM running an OS", yes. Embedded meaning "OS-less microcontroller", not so much. You'd have to use an embedded programming DSL for that, which isn't really ARM anymore.

zem · on June 26, 2014

ats will probably be an interesting best-of-both-worlds third option soon, though from what little I've seen of it it is currently harder to write code in than either haskell or c. but once you do put the work in to write your proofs etc. both correctness and speed should fall out naturally.

pedroo · on June 26, 2014

> It's equivalent to the SQL JOIN...

Don't you mean GROUP BY?

e12e · on June 26, 2014

I think parent meant equivalent in the sense of being "[a] very useful basic, common computation. (...) not something that you can really do without."

walshemj · on June 26, 2014

Fortran or PL/1G gets my vote writing Map Reduce in java is painful compared to PL/1G mappers I was writing back in the 80's on PR1ME mini computers.

shiven · on June 26, 2014

Wait, what? A distributed file system written in Fortran? Is it even possible in a language like that?

gtirloni · on June 26, 2014

Possibly yes, since it can interoperate with C. That's not what most people use it for (networking) so you would be swimming against the current.

I don't think it would attract a following among developers anyway... perhaps if you emphasize it compiles to Javascript, who knows ;)

walshemj · on June 26, 2014

no my bad thought that they where asking what would you use instead of java for he mapper/reducer code.

having said that at BT we did have a distributed ISAM database where the core code was F77!

dsymonds · on June 26, 2014

MapReduce is difficult to implement well. It's just ironic that the hard part is the part that's not in the name: the shuffle phase.

srean · on June 26, 2014

Heartily agreed ! Or at least Hadoop does it poorly. It is really that bad. I say this with hesitation and a lot of reluctance, its an open source project, I have not contributed anything towards it, so it is really unfair of me to complain.

Here is my personal experience with these tools at Google and at Yahoo (lightly edited from an old comment)

==

I have had the opportunity to try out Google's implementation of mapreduce implemented in C++ way back in time (6 years ago). These would run on fairly impoverished processors, essentially laptop grade of that time. Have done stuff on Yahoo's Hadoop setup as well, these used high end multicore machines provisioned with oodles of RAM. If I were to be generous, Hadoop ran 4 times slower as measured by wall clock times. Not only that, Hadoop required about 4 times more memory for similar sized jobs. So you ended up requiring more RAM, running for longer and potentially burning more electricity. This is by no means a benchmark or anything like that, just an anecdote.

That Hadoop would require much more memory did not surprise me, that was expected. What was really surprising was that it was so much slower.

Four times might not seem like much, but I was being generous to Hadoop. It makes a big difference when you can make multiple run through the data in a single day and make changes to the code/model. Debugging and ironing out issues is a lot more efficient when your iteration loop is shorter.

I think Hadoop (by virtue of its comparative crappiness) gave Google a significant competitive edge over the rest, probably still does.

timr · on June 26, 2014

A big part of implementing the shuffle correctly at a large scale is having a good distributed filesystem upon which to build.

dsymonds · on June 26, 2014

Sure. I didn't mean to imply a good distributed FS was unnecessary or easy. Just saying that MR isn't easy.

necubi · on June 26, 2014

Let me suggest the Quantcast File System (QFS) [0]. It's much closer to GFS as described in the paper (crucially it uses Reed-Solomon encoding to reduce storage requirements), it's highly tunable to different workloads, and it's written in C++. Quantcast uses it to store petabytes of data and for running map reduce. Unfortunately it hasn't seen much uptake outside of Quantcast, despite being a clear improvement over HDFS.

[0] http://quantcast.github.io/qfs/

(Disclaimer: I used to work for Quantcast).

jwr · on June 26, 2014

Reed-Solomon for forward error correction, to provide redundancy? But isn't Reed-Solomon really geared towards single-bit errors, while in the real world our storage tends to fail with multiple missing blocks?

I thought erasure codes were a much better approach.

necubi · on June 26, 2014

It's not used for tolerating disk errors (typically in a production context you have RAID for that, and failures tend to be for the entire disk). It's used to reduce storage requirements via striping. See the QFS paper (http://db.disi.unitn.eu/pages/VLDBProgram/pdf/industry/p808-...) for a good description of how this works. The basic idea is that with RS you can get 3x replication by splitting the data into 6 pieces stored on different servers, plus three parity blocks. This requires 1.5x storage rather than 3x while still tolerating the loss of any three machines of the nine.

erikano · on June 27, 2014

>[T]ypically in a production context you have RAID for that, and failures tend to be for the entire disk[.]

Does QFS run in addition to other file systems on the storage nodes or does it manage disks directly? You see, I was thinking that maybe ZFS + QFS might be a good idea and would like to know if it's possible. Also, is QFS available for FreeBSD and/or SmartOS storage nodes and clients? How about CoreOS and Debian, are storage nodes and clients available for those?

staunch · on June 26, 2014

I think MogileFS is closest to being what most people need. I'd love to see a Go version with all the lessons learned from that project.

https://code.google.com/p/mogilefs/

qohen · on June 26, 2014

What about Nokia's Disco project, which uses Erlang and Python?

http://discoproject.org/

From http://disco.readthedocs.org/en/develop/intro.html

What is Disco?

Disco is an implementation of mapreduce for distributed computing. Disco supports parallel computations over large data sets, stored on an unreliable cluster of computers, as in the original framework created by Google. This makes it a perfect tool for analyzing and processing large data sets, without having to worry about difficult technicalities related to distribution such as communication protocols, load balancing, locking, job scheduling, and fault tolerance, which are handled by Disco.

Disco can be used for a variety data mining tasks: large-scale analytics, building probabilistic models, and full-text indexing the Web, just to name a few examples.

Batteries included

The Disco core is written in Erlang, a functional language that is designed for building robust fault-tolerant distributed applications. Users of Disco typically write jobs in Python, which makes it possible to express even complex algorithms with very little code.

rdtsc · on June 26, 2014

I am more exited about a newer one --

LeoFS

http://leo-project.net/leofs/

all open source.

They started with S3 compatible API and are adding multi-data-center replication and NFS support.

EGreg · on June 26, 2014

Why s it difficult to build a distributed, petabyte scale filesystem? Isn't the search through indexes easily partitionable horizontally? Is it a problem of eventual consistency? I am not sure what the huge issue is and would like to learn.

vidarh · on June 26, 2014

It's "trivial" to build a distributed, petabyte scale filesystem.

It's hard to build a cost-effective, reliable and fast distributed, petabyte scale filesystem that's suitable for a wide range of workloads.

Consider that you need to minimise the amount of copies of data to keep costs reasonable, yet the fewer copies of data, the lower your IO capacity for accessing that data is (since readers/writers will content for the IO capacity of a small number of storage nodes), so you want to maximise the number of copies of data to maximise throughout. Yet the higher number of copies to maintain, the more IO it takes to spread each write out through your storage network. Soon enough you start running into "fun" problems such not being able to naively push writes out to each storage server it is meant to go to for data that needs to be replicated widely, because you'll be bandwidth constrained, but instead needing a fan-out even for simple writes.

You'll also want to minimise operational headaches; a disk going dead or an entire server failing needs to be handled transparently, as every additional disk or server you add increases the odds of a failure per any unit of tim.

(Compare with the naive approach for just a 1PB system: I can "easily" get about 200TB per off-the shelf storage server with hardware RAID. Lets say 150TB usable space; get about 14 of them to let you replicate stuff across two servers, and put GlusterFS on it. It'll work. It'll also be expensive, horribly slow for a number of workloads, and a regular disk replacement nightmare)

EGreg · on June 27, 2014

If you need to minimize the amount of copies then yes, you need to have some "risk management" software to estimate which machines are more reliable, and which files are more important, and then assign those files to enough replicas to be able to statistically guarantee some SLA. Then you need failover where at least one if the replicas is always available.

The routing table should be small enough to fit in RAM on every machine, and consulted for request. It would be updated when failover occurs. The table would consist of general rules with temporary exceptions for specific partition ranges that are being failed over.

You can store indexes in files, in a similar way. Just avoid joins and make like a graph database: first load documents from the index and then do mapreduce to get the related documents.

But besides that, I can see how maybe multi user concurrent access might necessitate eventual consistency algorithms for each app, but that's it.

gatehouse · on June 26, 2014

Well, one key element of the original map-reduce paper is the way the data is spread around. Instead of building a giant NAS with specialized (expensive) systems, and then building a bunch of specialized (expensive) compute systems, and then shipping massive quantities of data around on fast (expensive) network, the map-reduce system is built on a bunch of well balanced systems in terms of CPU/ram vs. disk, and the job is designed in a way that it can be distributed to these systems and data transfer is minimized.

So in a way, everything is happening in the storage nodes and they need to be much more than just a filesystem.

karamazov · on June 26, 2014

Dealing with frequent machine failure is one major issue, for example.

The original GFS and MapReduce papers (http://static.googleusercontent.com/media/research.google.co... and http://research.google.com/archive/mapreduce.html) go into detail.

ithkuil · on July 1, 2014

I just want to add to the other good replies:

* today's petabyte feels lighter than 10 years ago. It's not only disk space, it's bandwidth: from network to bus to ram to cpu.

* often a matter seems trivial when you have a proof that a given design actually works. But someone had to build it the first time and get it right; understand which of the many were important, invest and risk. You can find spectacular failures even with less unknowns.

* the devil is in the detail.

EdwardDiego · on June 26, 2014

> (ideally, one not written in a bloated language designed for remote controls and set-top boxes).

Oh cool, subjective language argument time!

mukundmr · on June 26, 2014

What about the Cassandra Distributed File System? It probably doesn't scale as much as HDFS though.

wyager · on June 26, 2014

>ideally, one not written in a bloated language designed for remote controls and set-top boxes

I don't know what you'd have to be smoking to use Java in a remote control.

riffraff · on June 26, 2014

java was designed for that

http://en.wikipedia.org/wiki/Oak_(programming_language)

e12e · on June 26, 2014

Indeed. And it runs today (on among other things) sim cards.

See eg: "Defcon 21 - The Secret Life of SIM Cards"

https://www.youtube.com/watch?v=31D94QOo2gY

wyager · on June 26, 2014

I know. My comment still stands.

tptacek · on June 27, 2014

It really doesn't, in that it doesn't actually build an argument of any sort.

wyager · on June 27, 2014

Didn't think I needed to! Seemed self-evident enough. How's this:

Java is a high-level language designed for heavy usage of dynamic allocation and related features (like GC). Remote controls, being very simple devices that benefit strongly from low power consumption, tend to be designed with low-resource microcontrollers. These devices are not generally capable of managing the entire java feature set. Therefore, to use java on a remote control, you must either use an unnecessarily complex microcontroller, or a crippled subset of Java, in which case you might as well use C or something.

tptacek · on June 27, 2014

Java is not "designed for heavy usage of dynamic allocation". You're confusing the language, the runtime, and the reference JVM. The HotSpot-derived reference JVM is certainly not an embedded design, but the reference JVM looks nothing like (for instance) Javacard.

wyager · on June 27, 2014

> Java is not "designed for heavy usage of dynamic allocation".

How, exactly, would you port the entire java core language to a platform without dynamic allocation? There exist subsets of java that work without it, but they're essentially not the same language at that point.

tptacek · on June 27, 2014

That's silly. A huge fraction of all C libraries rely on dynamic allocation, and so following your logic, C isn't an embedded programming language, despite being the lingua franca of embedded programming.

As for "how you'd 'port' Java to such an environment", again, look at Javacard.

wyager · on June 28, 2014

>A huge fraction of all C libraries rely on dynamic allocation, and so following your logic, C isn't an embedded programming language

That's not my logic at all. The C core language doesn't require dynamic allocation. In fact, the core language has no concept of it. All dynamic allocation comes from library functions, not language features. This is why we use C for embedded programming.

>As for "how you'd 'port' Java to such an environment", again, look at Javacard.

Again:

"However, many Java language features are not supported by Java Card (in particular types char, double, float and long; the transient qualifier; enums; arrays of more than one dimension; finalization; object cloning; threads). Further, some common features of Java are not provided at runtime by many actual smart cards (in particular type int, which is the default type of a Java expression; and garbage collection of objects)."

Java Card is almost nothing like Java as most of us know it.

tptacek · on June 28, 2014

Embedded programming is nothing like programming as people on HN know it. The dev who writes a native-code Markdown gem for Rails is going to be surprised at how different the experience of writing a SPI bus driver is.

So I don't find your argument very compelling. You have to do better than to point at how different an experience it is to code in an environment without object cloning and threats. That argument is almost tautological! You have to show how Javacard Java is fundamentally dissimilar as a language to Java. But almost the entire list of language features you cited here are absent because they don't make sense in the Javacard programming environment, not because they've been replaced with some other alien language concept.

At any rate: you were wrong to begin with when you scoffed at the idea of Java being used for remote controls, given that small consumer electronics were the original problem domain for the language that became Java, and you're wrong today, given that there are relatively popular and very successful small-form-factor embedded environments based on Java.

wyager · on June 29, 2014

>The dev who writes a native-code Markdown gem for Rails is going to be surprised at how different the experience of writing a SPI bus driver is.

Again, you're taking what I said and twisting the logic beyond recognition. No one uses Rails to do embedded programming. No one expects normal Rails and embedded Rails to be the same (because there is no embedded Rails).

Embedded C and "desktop" C are more or less exactly the same. They are the same language, in just about every way.

This is absolutely not true with standard Java and embedded Java subsets (like Java Card). There are huge differences in the language itself, like those I mentioned. Half the reason people use Java is the memory management features. Java minus these features is a fundamentally different language. Not to mention the lack of certain fundamental types (No floats, no multi-dimensional arrays, etc.) and other weird quirks of systems like Java Card.

> you were wrong to begin with when you scoffed at the idea of Java being used for remote controls, given that small consumer electronics were the original problem domain for the language that became Java,

Argumentum ad antiquitatem, or maybe argumentum ad auctoritatem (towards Sun). Just because Java was intended to be used for something does not mean it's any good at that thing.

>and you're wrong today, given that there are relatively popular and very successful small-form-factor embedded environments based on Java.

Argumentum ad populum. Just because a lot of people use some subset of Java for embedded programming doesn't mean it's a good idea. Lots of people use PHP too; it's not because it's a good thing to do; it's because PHP programmers are cheap. If I had to hazard a guess, that's the same reason people use Java in embedded environments.

jmillikin · on June 26, 2014

  > This morning, at their I/O Conference, Google revealed
  > that they’re not using Map-Reduce to process data
  > internally at all any more.

This is incorrect, so monumentally so that I couldn't continue reading. It's as if the author had opened an article about climate change with "Now that advancing glaciers have rendered Algeria uninhabitable..."

MapReduce doesn't work well for low-latency pipelines because it's got a high fixed overhead, but it's still the undisputed king of medium-latency and latency-insensitive workloads.

jbigelow76 · on June 26, 2014

You may want to correct Urs Hölzle, Senior VP of Technical Infrastructure at Google, then or at least tell him to choose his words better.

From today's I/O keynote video https://www.youtube.com/watch?v=wtLJPvx7-ys#t=9454

This is the exact quote:

    "... and today even when you use map-reduce, which we invented over a decade ago, it's still cumbersome to write and maintain analytics pipelines, and if you want streaming analytics you are out of luck. And in most systems once you have more than a few petabytes they kind of break down. So we've done analytics at scale for awhile and we've learned a few things. FOR ONE, WE DON'T REALLY USE MAP-REDUCE ANYMORE. It's great for simple jobs but it gets too cumbersome as you build pipelines, and everything is an analytics pipeline."

emphasis mine

Of course the word "really" in the middle of the sentence gives semantic wiggle room, but it's still a pretty big statement.

gdy · on June 26, 2014

>"FOR ONE, WE DON'T REALLY USE MAP-REDUCE ANYMORE" And this is said in the context of talking about streaming analytics.

jbigelow76 · on June 26, 2014

But Urs also said, paraphrasing this time, that once you get into petabytes of information everything pretty much becomes streaming analytics.

Since I would assume that any non-trivial service that Google provides is in that petabyte neighborhood it explains why he would say that Google isn't using MR anymore.

gaius · on June 26, 2014

I am pretty sure that Google didn't invent map-reduce, which has been around since the 1970s at least.

This guy may work for Google, but he's a clown.

seanmcdirmid · on June 26, 2014

How many big data jobs were being processed by MapReduce in the 70s, 80s, early 90s? Ya, that's right: none. Sanjay and Jeff were the first to apply the combination of map-shuffle-and-reduce as we know it today to big data processing.

Also, Urs Holzle is not a clown.

walshemj · on June 26, 2014

British Telecom used map reduce in billing systems for the dialcom (telecom gold) platform in the 80's - that was on the largest (non black) prime minicomputer site in the UK.

Back then 17x 750's would be roughly the same as one the 5k plus clusters that yahoo etal use.

We even sold the system to NZ telecom

seanmcdirmid · on June 26, 2014

Interesting. What kind of distributed file system were they using?

walshemj · on June 26, 2014

we used the normal file system (primes probably descended from ITS) and had a load of JCL written in CPL (prime JCL) language to sync every thing up over our Cambridge ring to two sites.

(we had oxford street dug up for our 10MBs link)

supermatt · on June 26, 2014

A dfs isn't a requirement for map/reduce.

seanmcdirmid · on June 26, 2014

From http://en.wikipedia.org/wiki/MapReduce:

> MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster.

...

> The "MapReduce System" (also called "infrastructure" or "framework") orchestrates by marshalling the distributed servers, running the various tasks in parallel, managing all communications and data transfers between the various parts of the system, and providing for redundancy and fault tolerance.

...

> The name MapReduce originally referred to the proprietary Google technology but has since been genericized.

So it would be quite impossible to have a MapReduce system without distributed computing infrastructure; even if you were doing mapping and reducing, it wouldn't be MapReduce.

supermatt · on June 26, 2014

I see no mention of a distributed file system there. Local storage is not a requirement of distributed processing.

1stop · on June 26, 2014

How do you do distributed processing without a distributed filesystem? Do you mean you'd load the filesystem into memory and send it to the "processors"?

supermatt · on June 26, 2014

The data could be stored on a network device, such as a file server or database, for example. It could indeed be local, but it needn't be distributed.

In the example GP gave, the data could possibly have been stored in a database queried using segmentation via consistent hashing (a basic way to distribute jobs across a known number of workers).

srean · on June 26, 2014

...defeating the entire purpose: of large scale parallelism on commodity machines. OTOH if you have a way of achieving order 500X parallelism with a centralized commodity server or database, I would love to hear.

EDIT @supermatt Ah I see, we differ in the definition then, to me it isnt bigdata/largescale unless it churns through big amounts of stored data. Bitcoin mining is no where in the ball park of this, its an append only log of solutions computed in parallel.

supermatt · on June 26, 2014

How on earth do you think bitcoin mining pools work (as an extremely trivial example). They coordinate ranges between a number of workers. The stored size of those ranges is miniscule in comparison to the data of the hashes between those ranges calculated on each 'miner'. These 'coordinators' absolutely work as a centralised 'commodity' storage server (or database) resource for 500x+ parallelism.

'Big Data' means 'Big Data', not 'Big Storage'. They are completely different things.

seanmcdirmid · on June 26, 2014

Big data doesn't mean big computation, it actually means big data on lots of disks across many nodes. They are completely different things.

You might be into HPC, but that's not what Sanjay and Jeff did. HPC and big data loads are quite different.

supermatt · on June 26, 2014

The bitcoin example may be a bit oversimplified, and may indeed lean more towards HPC. The example was intended to illustrate data locality (as per the parent question), not the actual computation.

Big Data may incorporate data from various 3rd party, remote, local, or even random sources. For example, testing whether URLs in a search engines index are currently available. This may be a map/reduce job, it may utilize a local source of urls, but it will also incorporate a remote check of the url.

As I said a few links up: DFS is not a requirement for map/reduce.

seanmcdirmid · on June 26, 2014

All MapReduce frameworks I know about today are built on DFSs. There are definitely plenty of frameworks that support map and reduce that don't (e.g. MPI), but these aren't systems based on what was described in the OSDI 2004 paper where the word MapReduce was introduced.

I guess people just fixate on the terms map and reduce when the focus of MapReduce really was....shuffle.

supermatt · on June 26, 2014

I think the problem is that we are talking about two different things.

The very start of the paper describes the term and it's methodology (which is what we are discussing), and then goes on to explain googles own implementation using GFS (which you seem to be getting hung up on.)

seanmcdirmid · on June 27, 2014

Keep in mind that this whole thread is about "MapReduce", which Holzle was talking about, not the more generic map and reduce that has been around since the 1800s (and they will continue to mapping and reducing in their new dataflow framework, they just won't be using MapReduce). Now for the paper:

> Our abstraction is inspired by the map and reduce primitives present in Lisp and many other functional languages.

Inspired doesn't mean equivalent.

> Our use of a functional model with user specified map and reduce operations allows us to parallelize large computations easily and to use re-execution as the primary mechanism for fault tolerance.

They are using map and reduce as a tool to get something else.

> The major contributions of this work are a simple and powerful interface that enables automatic parallelization and distribution of large-scale computations, combined with an implementation of this interface that achieves high performance on large clusters of commodity PCs.

They are very specific about what the contribution is. All work that has claimed to be an implementation of MapReduce has followed their core tenants. Even if MPI has a reduce function, it is not MapReduce because it is based on other techniques.

I'm really tired of people who claim there is nothing new or even significant when there clearly was. Ya, everything is built on something these days, but so what? In the systems community, MapReduce has been a huge advance, and now we are moving on (at least for streaming).

supermatt · on June 27, 2014

I'm still in the camp of there being nothing new here. Now gfs may be a different matter, but that was part of a different paper, and not a requirement of this one. Which is why I have kept stating that a dfs is not a requirement.

seanmcdirmid · on June 28, 2014

If that's what you believe, then you are going to miss out on the last 10 or so years of systems research and improvements. And when Google stops using MapReduce but the new thing still uses map and reduce, you are going to be kind of confused.

nl · on June 26, 2014

I've seen MapReduce done against fairly significant amounts of data stored (10s of TBs per run) on a SAN running over fibre. The compute nodes weren't particularly cheap either - I guess they were commodity machines, but quite a long way from the "cheapest possible" things Google uses.

But it was still useful: it was a good computing model for letting as many compute nodes as possible process data.

That might not be what Google was trying to achieve, but it's difficult to argue that it isn't MapReduce.

walshemj · on June 26, 2014

Databases we should be so lucky :-) this was old school ISAM files updated with Fortran 77 and 4 different log files all with multiple types of records.

Our "Mappers" did quite a lot of work compared most modern map functions

walshemj · on June 26, 2014

I our case the first stage synced up all the required file systems and applied all the required updates before kicking off the mapper stage.

walshemj · on June 26, 2014

effectively yes each worker machine had an identical copy of the required ISAM files which where kept in sync by our system.

We had to build a lot of the functionality that comes out of the box in more modern system like hadoop

dbc1012 · on June 26, 2014

I don't know about Mr Holzle but you're wrong about map/reduce. I'm aware of two significant counterexamples. I'm sure there are others.

Teradata's been doing map/reduce in their proprietary DBC 1012 AMP clusters since the 80's, providing analytical data warehousing for some of the world's largest companies[1]. Walmart used them to globally optimize their inventory.

MPI systems have been supporting distributed map/reduce operations since the early 90's (see MPI_REDUCE[2]).

1- http://www.cs.rutgers.edu/~rmartin/teaching/fall99/papers/te...

2- http://www.mpi-forum.org/docs/mpi-1.0/mpi-10.ps

walshemj · on June 26, 2014

what does falsely claiming that google invented MR make him then ?

gaius · on June 26, 2014

I see the Google fanboys and wannabes are out in force on this thread.

seanmcdirmid · on June 26, 2014

I see the crazies are out trying to redefine MapReduce as just being map and reduce and completely missing the point. But whatever, they've probably never seen big data loads and are definitely not involved in the industry.

gaius · on June 26, 2014

Ooh, scary big data.

I could run your workloads in Excel without breaking a sweat. But go on kidding yourself.

seanmcdirmid · on June 28, 2014

I don't think Excel scales to 10 or 100 TB of data.

gaius · on June 28, 2014

In all seriousness tho', I was running data sets that big in Oracle, in 2006. You can see why I don't take "big data" seriously.

ithkuil · on July 1, 2014

There's certainly a hype around big data nowadays, often even up to the point of being ridiculous.

The point is that people are starting to use this term to describe something that it's not even technical anymore, let alone describe the actual amount of data: merely using data to drive decision making.

This is not a new thing [0], yet there is a clear trend that shows how this kind of dependency is shifting from being auxiliary to being generative; some of the reasons are:

1. cheaper computing and storage power

2. increased computing literacy among scientists and not.

3. increased availability of digitalised content in many areas that capture human behaviour.

When there's request, there's opportunity for business. One thing that is new and big about Big Data is the market. It should be called "Big Market (of data)".

It's an overloaded term. IMHO it's counterproductive to let the hype around Big Data as a business term pollute the discussion about what contribution Google and others have made in the field of computer science and data processing.

So what did Google really invent? Obviously the name and concept behind MapReduce wasn't new. Nor the fact that they did start to process large amounts of data.

Size and growth are two key factors here. Although it's possible that the NIH syndrome affected Google, it's possible that existing products just weren't able to solve those two requirements. It's difficult to tell exactly how large given that the Google is not very keen at releasing numbers, although it's possible to find some announcements like [1] "Google processed about 24 petabytes of data per day in 2009".

20P is 10000 times more that 200 T. Stop to think a moment what does 10000 mean. It's enough to completely change the problem, almost any problem. A room full of people becomes a metropolis; an US annual low wage salary becomes 100 million dollars, more than the annual spending of Palau [2]. Well, it's silly to make those comparison, but it's hard to think about anything that scaled by 10000 doesn't change profoundly. Hell, this absurdly long post is well under 10k!

To stay in the realm of computer science, processor performance didn't increase by a factor of 10000 since PDP-11 from 1978 to Xeon from 2005 [3].

Working at that scale poses unique problems, and that's where real the contributions to the advancement of the field made by the engineers and the engineering culture at Google are placed. If anything, just knowing it's possible and having some accounts on what they focused on is inspiring.

This is the Big Data I care about. It's not about fanboyism. It's cool, it's real, it's rare. Arguing who invented the map reduce mechanics is like arguing that hierarchical filesystems where already there hence any progress made in that area by countless engineers is just trivial.

[0] Historical perspective: James Gleick , http://en.wikipedia.org/wiki/The_Information:_A_History,_a_T...

[1] http://dl.acm.org/citation.cfm?doid=1327452.1327492

[2] https://www.cia.gov/library/publications/the-world-factbook/...

[3] http://www.cs.columbia.edu/~sedwards/classes/2012/3827-sprin...

rbanffy · on June 30, 2014

What was big data in the 70s, 80s and 90s? We just didn't call it map-reduce at the time.

gaius · on June 26, 2014

"Big data" is not a thing, and neither is "the cloud", while I'm here.

seanmcdirmid · on June 26, 2014

Well, then, you really don't understand the value of their contribution, which you have in your mind is just "map" and "reduce."

dsymonds · on June 26, 2014

Though it seems to be quoting Urs accurately enough. All I can guess is that he meant MR isn't used for some particular context. MR is most definitely still heavily used inside Google.

jrockway · on June 26, 2014

Apparently the context was streaming analytics, which is a workload that MapReduce is poorly suited to.

mtdewcmu · on June 26, 2014

Well, you would expect there to be legacy applications around for a while, if they decided not to use MR in future algorithms.

thrownaway2424 · on June 26, 2014

"For one, we don't really use Map-Reduce any more." https://www.youtube.com/watch?v=wtLJPvx7-ys#t=9454

sylvinus · on June 26, 2014

Not so sure about this. I was at the conference and captured the exact wording from Urs, "We don't really use MapReduce anymore" (https://twitter.com/sylvinus/status/481864384981913601)

tupshin · on June 26, 2014

Projects like Apache Spark have demonstrated the power of a more complex DAG (Directed Acyclic Graph) approach that allows for more precise control over the data-processing flow, compared with the simpler execution model of M/R. All of the major Hadoop vendors are pivoting, and simultaneously adding support for Spark (which can work with the Hadoop stack, but is not part of it), while also supporting the development of one or more technologies that are trying to retrofit Hadoop into a more powerful model, such as Tez, from Hortonworks.

chris_va · on June 26, 2014

'The company stopped using the system “years ago.”'

Hm, as a former Google engineer, that statement (from the journalist) is not accurate. Though the definition of map reduce is malleable, so it's hard to say what was meant in the first place.

nevi-me · on June 26, 2014

Didn't Urs mean that it's not used for analytics anymore?

dpritchett · on June 26, 2014

As a non-PhD who doesn't work at Google I'm having trouble reading between the lines. What exactly is the offered improvement here?

Decoupled (compared to Hadoop) systems with distributed data and JIT processing?

EdwardDiego · on June 26, 2014

The offered improvement is, I guess, process Google-sized 'Big Data' without the significant issues of Hadoop style clusters. Beyond that, there's not enough details.

To give you an example, data import speed has always been an issue with Hadoop - Facebook are quite proud that they can import 320TB of data into their Hadoop clusters in a day.

tsotha · on June 26, 2014

Yes, it all seems very hand-wavey to me. "Map reduce is finished, and it's been replaced by... some other thing."

rdtsc · on June 26, 2014

Paradigm shift -- new type of cloud processing engine + making it available to the outside.

EdwardDiego · on June 26, 2014

Make sure you crystallise your synergies while going forward.

riffraff · on June 26, 2014

> It will also be no surprise to me when, eventually, Hadoop does the same, and the elephant is finally given its dotage.

AFAIU, hadoop has already moved to support different workloads than MR when it introduced YARN[0]

[0] http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yar...

ambrood · on June 26, 2014

That isn't "hadoop supporting different workloads". That is a Hadoop cluster supporting different frameworks. They are two different things.

ohashi · on June 26, 2014

So what's replacing it in the open source world? What are some real world problems Map Reduce is being used for now that some other solution is better suited for?

toast0 · on June 26, 2014

A lot of problems people are solving with Hadoop and friends are best solved by getting a big box with lots of ram and lots of SSDs. Current generation dual processor socket server boards go up to 1 TB of ram, a few years ago that kind of ram would require several servers; and the Haswell EP Xeons, expected towards the end of the year will support even more ram.

ohashi · on June 26, 2014

So you're essentially saying it was a stop-gap solution until hardware caught up for many use cases.

EdwardDiego · on June 26, 2014

Argh, there are far too many firms who have Hadoop clusters who don't really need them.

erikano · on June 27, 2014

>[P]aradoxically, it’s easier to build robust, fault-tolerant systems from unreliable components[.]

Giving this some thought, I think I see a parallell here to how most the rest of the nature works; the "components" that make up a creature appears to often be as simple as it can afford to be.

rlpb · on June 26, 2014

So they're not using MapReduce, a particular implementation of the map/reduce concept. But are they still using the concept having just changed their implementation? I've read the article and this is still unclear to me; I think commentators are conflating the two.

_pmf_ · on June 26, 2014

Map-reduce is a brute force approach. It's obvious that intelligent, specific data handing strategies are orders of magnitude better for production systems if there's the engineering power to pull it off.

jeffdavis · on June 26, 2014

The author discussed the data flow model -- would you consider that to be a "specific data handling strategy"?

Also, why do you think developing for a map-reduce framework requires fewer engineering resources than developing for, e.g. a dataflow framework?

coldcode · on June 26, 2014

Computing darwinism. Many things are tried and most of them fail, but each try teaches something useful for the next generation.

kitd · on June 26, 2014

I'd say it teaches the current generation. IME the next generation tend to repeat the mistakes of their predecessors much more easily than our industry should be happy with.

tezka · on June 26, 2014

useless truism.