
The Seven Deadly Sins of Cloud Computing Research (2012) - jcr
https://www.usenix.org/conference/hotcloud12/workshop-program/presentation/schwarzkopf
======
contingencies
TLDR - added to
[https://github.com/globalcitizen/taoup](https://github.com/globalcitizen/taoup)
...

Sin #1 - Unnecessary distributed parallelism: It does not always make sense to
parallelize a computation. Doing so almost inevitably comes with an additional
overhead, both in runtime performance and in engineering time. When designing
a parallel implementation, its performance should always be compared to an
optimized serial implementation in order to understand the overheads involved.
If we satisfy ourselves that parallel processing is necessary, it is also
worth considering whether distribution over multiple machines is required. The
rapid increase in RAM and CPU cores can make local parallelism economical and
worthwhile. Establish the need for distributed parallelism on a case by case
basis.

Sin #2 - Assuming performance homogeneity: Virtualized cloud environments
exhibit highly variable and sometimes unpredictable performance. Base reported
results on multiple benchmark runs, and report the variance.

Sin #3 - Picking the low-hanging fruit: It is unsurprising that it is easy to
beat a general system by specializing it. Few researchers comment on the
composability of their solutions.

Sin #4 - Forcing the abstraction: In some application domains, it is unclear
if a MapReduce-like approach can offer any benefits, and indeed, some have
argued that it is fruitless as a research direction. Ideally, future research
should build on [iterative processing, stream processing and graph
processing], rather than on the MapReduce paradigm.

Sin #5 - Unrepresentative workloads: The common assumption in academic
research systems is that the cluster workload is relatively homogenous. Most
research evaluations measure performance by running a single job on an
otherwise idle cluster.

Sin #6 - Assuming perfect elasticity: The cloud paradigm [...] its promise of
an unlimited supply of computation. This is, of course, a fallacy. Workloads
do not exhibit infinite parallel speedup. The scalability and supply of
compute resources are not infinite. There are limits to the scalability of
data center communication infrastructure. Another reason for diminishing
returns from parallelization is the increasing likelihood of failures and
vulnerability to "straggler" tasks.

Sin #7 - Ignoring fault tolerance: Many recent systems neglect to account for
the performance implications of fault tolerance, or indeed of faults occuring.
For each system, we should ask whether fault tolerance is relevant or
required. If it is, it makes sense to check precisely what level is required,
and what faults are to be protected against; consider and ideally quantify the
cost.

------
pm90
Note that this is more targeted at academic research. Not to undermine its
importance of course.

Its good to see something like this out there. There are literally thousands
of papers published which try to demonstrate "speedups" but which are usually
not very reproducible or useful. The "publish or perish" mentality is
responsible for a lot of this. The only way to be upto date with cutting-edge
meaningful research seems to be to attend conferences and talk to other
researches in the field, and try to publish in the better journals.

------
dkural
This assumes most academics are more interested in good research than
publishing papers or looking objectively at software they spent years writing
without any fundamentally new ideas.

~~~
seanmcdirmid
Systems is much more practical as an academic field. For example, Googlers are
heavily involved in high-end systems research, and they really are just
interested in good research vs. increasing their paper counts (which they
doesn't help them much in their career in industry). The tier one systems
conferences, OSDI and SOSP, also only accept much fewer papers a year than
most fields (~30 vs. 60+) with about the same number of active researchers.
The culture of your field really helps in this regard.

------
dang
The paper is
[https://www.usenix.org/system/files/conference/hotcloud12/ho...](https://www.usenix.org/system/files/conference/hotcloud12/hotcloud12-final70.pdf).

~~~
nkurz
I was about to post the same link.

I wonder if 'dang' followed the same process I did: click the submitted link;
read the interesting abstract; scroll to the bottom only to be scared by the
embedded video; recover for a second from the panic of being without plain
text; scroll back up scrutinizing the small print for a link to actual paper;
then click with an awkward urgency, feeling relief at having found my way back
to safe ground.

Then I asked myself, "Why did the submitter link to the container page rather
than to the real content? Maybe 'dang' should change the link?" But then I
wondered, "What if there are people who actually prefer the video? It's hard
for me to picture, but presumably such people exist. How will they be able to
watch the video if the link goes straight to the PDF?"

So as a survey for my sanity, I have to ask: How many people watched the video
without looking at the paper first? How many looked at the paper without
giving any consideration to the video? Second, does this correlate with the
first 'sin' in the paper? Do those who prefer linear text also frequently
wonder if unnecessary distributed parallelism tends to hamper performance?

~~~
jcr
Nate, we don't really have a convention or guideline for this situation. The
guidelines only tell us to add [pdf], [video], and similar to the title if
it's a link to a media file. Though it's not exactly stated, it means a direct
link to the media file itself, rather than a link to the page where the media
files can be downloaded. For example, we (almost?) never see "[video]" tags in
the titles of stories linking to youtube or vimeo.

On previous USENIX submissions I've put some variation of "[pdf/slides/video]"
in the title to indicate multiple things are available, but this just created
unnecessary work for poor `dang` who had to edit the title to remove it. I
really do try to avoid causing work for the moderators, so this time, I did
not put the extra info in the title. I was proud of myself for learning from
my mistake... until I saw `dang` adding a comment with a direct link to the
paper pdf. (sigh, I'll probably never learn ;)

If there is a "right" answer, then I still haven't figured it out yet.

As for my access pattern, I actually found the USENIX page from another site
listing various papers, so that's what I was looking for when I loaded the
page. I made sure the USENIX actually did provide a download link to the full
paper PDF (sometimes they don't/can't), and then started downloading the video
with wget in the background while I read the paper... I still haven't watched
the video yet, but I will.

~~~
nkurz
I appreciate the insight. I wasn't trying to criticize your choice of link,
just wondering if others shared my personal reaction. Similar to linking to
the abstract page for arXiv papers, I think you chose the "right" option.

I think the FAQ is out of date with regard to adding an explicit [PDF], as the
system now adds this automatically for links where the type is apparent. But
I'm always impressed at how efficient the new 'dang' API is for adding
publication dates to submitted papers!

