

Are we moving away from good "at scale" programming? - switch33

So I was watching a video that recently came out about developing in python: http://vimeo.com/63377197<p>Watch the guy who talks a bit later. He works for disney and explains about his project called fastpipe that he developed addressing many of the concerns of modern day code usage of parrallelism.<p>He makes a lot of notions about how software developers are moving away from developing on their single developer machines and instead are getting a large headache from doing so by using node-based multicpu programming!<p>Hadoop is horribly inefficient at best it seems for it's overhead. And reliability seems to be becoming more and more of an issue.<p>He was able to achieve decent results computing using python which is known to be a slow language by finding good reliable ways of doing the parallel code development. He makes a lot of assumptions about reliability that also are very interesting decision wise that show that he has programmed for years.<p>I was wondering what other people think of some of the things he is saying. Are we moving away from the right decisions for developing parallel code?<p>It seems like we are getting away from the learning how to split things up into processes right without conflicts. And we don't really know how to always pick the more reliable code.
======
switch33
I think I will try posting this again if this does not get more traction. I
really think this is Hacker News material. It explains about a lot of what is
going on with a bunch of the major tech companies and involves programming
both of which are very attune to the subject of most of the posts on this
discussion site.

Maybe i'll try reddit in r/programming if I can't get more tommorrow.

~~~
asperous
Yeah, right now HN is pretty much broken

------
asperous
I think the core problem he was talking about is that people use tools that
are bigger than they need to use.

If you have 100TB+ of data, you might need a cluster. Think about it, Google
has to deal with all of the pages on the internet; pretty much no one else has
that data. So why are we using their methodologies?

~~~
switch33
This is exactly what I'm talking about. Even layman people can use their
regular computers for more interesting activities like machine learning on
decent sized data if it wasn't so focused on just throw your cloud instance up
using hadoop and run stuff that way.

There are many people out there that can accomplish some of these tasks with
better written code. There is also probably a need for a better hadoop. One
that is less about clusters/multiple cpus and more about rented out bigger
power computers.

~~~
asperous
Hadoop is specifically for clustered computer, it doesn't really make sense
otherwise. Like the speaker said, it's really simplified (though you can do a
lot of stuff with map/reduce).

In some ways the guy is probably blowing some steam, simply grabbing a bigger
computer isn't always the best option, that's why clustered computing was even
invented. I do think he has a point that people reach for it sooner than they
need to. I've made that mistake before, and I just ended up settling for the
pipeline approach.

~~~
switch33
The other point i'm trying to really make is maybe hadoop does serve some
purpose but there could be an easier/better approach for parallelism.

If we focused harder on reliability and another service "like hadoop" would
probably be created but for non-cluster/multicpu. And doing some of the
decently sized tasks we would have better, faster, and more reliable tools at
hand.

