
Disk Locality in Datacenter Computing Considered Irrelevant (2011) [pdf] - espeed
http://www.cs.berkeley.edu/~ganesha/disk-irrelevant_hotos2011.pdf
======
jandrewrogers
A number of the assumptions in the paper are questionable in hindsight.

Disk bandwidth consistently exceeds network bandwidth by a significant factor
in common systems and this is likely to be the case for the foreseeable
future. A platform with a good I/O scheduler can easily and demonstrably turn
that into extra throughput on the same hardware for many use cases.

Their corroborating examples are platforms that happen to have poor I/O
scheduling. In these cases, the effects of poor I/O schedules would be
expected to dominate the effects of differences in bandwidth. You would expect
the differences in available bandwidth to have no practical effect as a
result. But there are many platforms that do implement good I/O schedulers
where differences in bandwidth materially affect performance because the I/O
schedulers can take advantage of that bandwidth.

While not entirely their fault, their assumptions about the cost of SSDs is
incorrect. The difference in cost/GB is down to a relatively small integer
factor and shrinking, well under a single order of magnitude. And these can
deliver enormous local bandwidth in cheap systems. In many cloud clusters now,
the cost of a large, local SSD JBOD (how you would want to use it) is less
than the cost of the server, even with cheap servers. The cost of using SSD
over spinning disk has become increasingly marginal for many applications.

In summary: platforms with good I/O schedulers see real benefits to disk
locality, assuming the application is I/O intensive. Platforms with poor I/O
schedulers not so much. Make the appropriate choice for your use case and
platform.

~~~
vidarh
The thing that struck me is that they are looking at what networks are
_available_. But to date I've never worked on a single system using 10GbE or
above. Of course they are out there in large numbers within specific niches.

But most people are still stuck on 1Gbps. At the same time a steadily
increasing proportion of those systems have SSDs which can do 2GB/sec+ reads.

------
kijiki
Also relevant: [http://research.microsoft.com/pubs/170248/fds-
final.pdf](http://research.microsoft.com/pubs/170248/fds-final.pdf)

~~~
PantaloonFlames
What is the date of this publication? (Also, why do authors decline to date
their papers?)

~~~
scott_s
It probably slips their mind. This is the author's copy of a conference paper.
The conference version is dated by virtue of being a part of that conference's
proceedings:
[https://www.usenix.org/system/files/conference/osdi12/osdi12...](https://www.usenix.org/system/files/conference/osdi12/osdi12-final-75.pdf)

------
gtrubetskoy
The disk locality speed-up is increasingly not due to bandwidth, but latency
(directly related to physical distance), which still matters.

To provide some perspective - in 1 cycle of a 1GHz CPU light (or electric
potential) travels about 30cm. So round-trip communication with a disk (or
whatever) that is 5cm away vs one that is one 1m away (could be same rack)
would take 20 times longer in each direction. Now consider communicating
across a football-field-sized datacenter.

~~~
wmf
I don't think the numbers support this. Mechanical disk latency is ~10ms while
datacenter network latency is <100us, so the network contributes 1% extra
latency. NVMe over fabrics is worse off because the flash latency is only
~100us and the network adds ~10us, but that's still nowhere near 20x.

------
PantaloonFlames
This is really great for people who build out datacenters, but a large portion
of consumers of compute+data will lease that from providers like Amazon and
Azure, which means they cannot enjoy the benefit of the Disk I/o ~= local
network i/o equation. There's no guarantee that a parcel of EC2 nodes is in a
single rack, or even in a single datacenter, and so network i/o is not
comparable to disk i/o.

Am I mistaken?

~~~
wmf
According to the docs, EBS goes up to 500 MB/s while a d2.8xlarge can read
3,500 MB/s from local disk so local is still faster (if you can actually use
that much throughput).

------
fleitz
It's not really irrelevant considering the paper outlines that it's important
to keep data in the same rack.

Disk locality is not that important when you're fully async, as throughput
tends to be similar, however, if your code is not fully async then you usually
pay significant penalties in latency, and if multiple nodes are able to access
the data then you pay a huge price in caching.

------
Upvoter33
The paper makes poor conclusions about SSDs/Flash - which are increasingly
becoming the performance tier in the data center

------
mwilcox
For disk, sure, but not for Flash / NVM

~~~
PantaloonFlames
Did you read the paper? It discusses Flash / SSD and explains why SSD does not
disrupt the conclusion.

