

Project Tungsten: Bringing Spark Closer to Bare Metal - mateiz
https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html?utm_content=14636812&utm_medium=social&utm_source=twitter

======
rxin
And on a related note, congratulations to Matei Zaharia for winning the ACM
Best Dissertation Award for his work on Spark:
[http://awards.acm.org/doctoral_dissertation/](http://awards.acm.org/doctoral_dissertation/)

~~~
xtacy
Many congrats to Matei! Well deserved.

------
papertowebforms
This is really exciting and important Matei, can't wait to see it in action.

* Matei Zaharia is the creator of Apache Spark

------
anonymousDan
I'm not sure I completely buy their claim that network/disk IO is no longer
the bottleneck in many situations. Their recent NSDI paper
([https://kayousterhout.github.io/trace-
analysis/](https://kayousterhout.github.io/trace-analysis/)) is compute bound
for a 20 node cluster, but they never evaluate it in a larger cluster with
fixed data size. The availability of on-demand cloud resources would
potentially make it easier to solve the bottleneck by just increasing the
cluster size.

~~~
kayoust
For most workloads, increasing the number of machines does not increase the
amount of data sent over the network, so the ratio of computation to network
bandwidth stays the same. As a result, increasing the cluster size doesn't
make workloads any more I/O bound.

At some point, a larger cluster will be more bandwidth-constrained because of
oversubscription, but given the network utilizations we saw (<5% at the median
in Figure 5), a cluster would have to have pretty high oversubscription for
the network to become the bottleneck.

The one caveat here is, for example, matrix workloads, where the data sent
over the network increases with the number of machines.

~~~
anonymousDan
Interesting. What about e.g. map-reduce workloads with large fan-in for the
reduce step?

------
choppaface
Are there JIRA issues related to these ongoing projects? I'd be interested to
see the current thoughts on llvm code gen and the "Spark Binary Map."

~~~
rxin
Here's the JIRA ticket:
[https://issues.apache.org/jira/browse/SPARK-7075](https://issues.apache.org/jira/browse/SPARK-7075)

------
batbomb
That's a clever name. Somebody at spark has apparently used a TIG welder
before.

