Hacker News new | comments | ask | show | jobs | submit login

I once saw the co-founder of Cloudera saying that Google exists in a time-warp 5-10 years in the future, and every now and then it gives the rest of us a glimpse of what the future looks like.

Felt exaggerated at the time, but it often seems like the truth.

I actually thought you were exaggerating but 5-10 years is actually very true in some areas.

GFS only translated into HDFS after 5 years, same with MapReduce, BigTable etc.

And still doesn't work nearly as well.

I have always suspected so since Google still does not open source its MapReduce, but I haven't found any actual evidence directly comparing Hadoop MapReduce and Google's. There could also be technical issues preventing Google from open sourcing its implementation, although I don't know any. Now that we are moving on to more sophisticated models and implementation I am hoping more details can emerge.

I don't know anything about MR in particular, but the open-sourcing process at Google often gets bogged down by how interconnected the codebase and ecosystem are. Releasing an open source version of a component means reimplementing a lot of stuff that Google isn't ready to release yet but that that component depends on. Look at Bazel/Blaze - it took them a long time to get it out, and the open version doesn't have feature parity. Also take a look at their governance plan going forward, which hints at the difficulties.[1]

That said, I suspect a large part of the reason is that Google's MapReduce is highly optimized for running in a Google datacenter, and they're generally pretty tight-lipped about the real secret sauce in their infrastructure.


> There could also be technical issues preventing Google from open sourcing its implementation, although I don't know any.

My understanding is that sharing details takes time off very valuable employees, away from fixing important internal issues – a lot more than the company being afraid those details might leak. For the handful of companies who can and needs to implement equivalent, poaching said employees is simpler than back-engineering from blogposts and it makes implementation easier.

MapReduce and associated infrastructure at Google is arguably at the center of their massive and proprietary codebase. I suspect opening that would involve opening much of their core code.

It is in SOME ways an exaggeration—they're also tackling familiar problems—but GFS/Colossus and the data center administration are probably accurate.

However, Facebook and Amazon almost certainly have similar challenges. I don't think they are necessarily unique so much as so huge the rest of us don't have the money or the problems to solve.

Some parts of Google definitely, but others not so much. If anything they were late on social networks and they still haven't really caught up.

I think he was referring to the technology Google is using to build products not the products per se.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact