

Softlayer Cloud: An operations guy's rant about really bad service - kovyrin
http://kovyrin.net/2011/05/02/cloudlayer-bad-story/

======
mgkimsal
"This was enough and we’ve finally made the decision to give up and go back to
real hardware only we control and manage. It took us more than a month of
work, but we think we’ve got pretty good system built as the replacement for
Softlayer cloud based solution."

This can't possibly be a good end solution! Haven't you read everyone else's
blog posts and forum comments that there's _no possible way you can ever build
anything as good as _insert_vendor_name_here_ has done with their cloud_?

There's _0_ chance that you'll be able to respond to a 2400% increase in
traffic/usage in under 43 seconds! And you'll never ever be able to replicate
around the globe instantly, and even if you could, it'll cost _so_ much more
than _cloud_vendor_ because they have economies of scale you can never
achieve. Ever. Seriously.

I just can't believe someone would consider - you know - owning and managing
hardware that they control with their own SLAs and support people in place.
_Where's your sense of cloudsourcing?_

~~~
DanBlake
There's 0 chance that you'll be able to respond to a 2400% increase in
traffic/usage in under 43 seconds!

You don't need to be on a cloud to support that. Dedicated is so much
cheaper/more powerful than typical cloud machines that you can have a surplus
of machines ready for your bidding and still pay less than the cloud +end up
with way more processing power.

Also, you don't need to own/mange the hardware- Hosting companies will lease
it to you + repair it when things go wrong

~~~
kovyrin
Yeah, even if one could scale up his web/app farms 2400% so fast, I'd really
like to see a database that scales up/out this well :-)

~~~
reitzensteinm
You should look into NoSQL. It's web scale!

~~~
kovyrin
MongoDb? ;-)

------
rrwhite
Wow this post was like reading a post from myself in the not so distant
future. We had the _exact_ same experience on SoftLayer CloudLayer. Random I/O
failures, read-only mode and of course the most frustrating part was the
support process around that.

We had a week last month where we would lose a couple boxes a day to this
issue at which point we said enough was enough and switched completely over to
dedicated boxes (still from SoftLayer). Everything has been smooth sailing
since.

To their credit SoftLayer (and probably more specifically our awesome account
rep) stepped up and refunded us our money we spent on the CloudLayer in March
(after some cajoling on my part) but their general reaction is one of a
company that's completely disjointed. When I got really pissed and started
bitching on twitter in March I received inbound contacts from multiple people
on their sales/marketing/account mgmt team. Unfortunately these people didn't
seem to be well connected to each other and are even less connected from IT
(so they could empathize but not really affect change) and "management" (this
faceless part of the company which seems to love to cut off it's own nose to
spite it's face).

The oddest (and probably most infuriating) part of the whole thing is how
everyone at SoftLayer went to great lengths not to acknowledge that the
CloudLayer product had issues. They always seemed shocked by my assertion that
something was very wrong with CloudLayer product and asserted that I was alone
in seeing these issues. To this day I've still not read a single "we know
something's wrong and we're fixing it" post from them and that's the part that
bothers me the most because it implies that their either incompetent or
disingenious. Neither of which are a trait you want in an infrastructure
provider.

~~~
kovyrin
Damn, this is exactly what I was experiencing: when I'd ask "wtf? how comes
your management would not see the trend" I' d constantly receive "you're the
only customer experiencing this problem!" bullshit.

~~~
MediaBehavior
> ...bullshit.

Worse: Lie. Bald-face lie.

------
armored
Yet another opaque cloud, where a vendor uses some disingenuous language to
disguise or confuse exactly how much resources you are actually getting for
your money. Cores != hyperthreads. Don't get me started on "dynos" or "small",
"large" and "extra large".

~~~
cubicle67
at least Heroku are pretty upfront about what a dyno is

<http://devcenter.heroku.com/articles/dynos> (linked from
<http://www.heroku.com/pricing#1-0>)

~~~
armored
Yeah but that doesn't really tell you anything about the underlying hardware.
Your performance will vary depending on which nodes your dyno is running on.
Hardware abstraction is a great thing, I just think that a little more
transparency is in order.

------
awakeasleep
Does anyone have any contacts at Softlayer?

This could be a great opportunity for them to explain how their business
works, and give some potential customers insight into how they're working on
things.

Or... it could be an opportunity for a marketing-speak filled cover-up which
would damage their credibility further.

~~~
hartror
Seconded, I have been eying off their cloud service as a way to manage our
server costs as we have some pretty wild but predictable swings in load
throughout the day/week. We currently have dedicated hosting with Softlayer so
a cloud in the same data centre as our existing databases etc would be great.

This however has made me pause, we are a small team and I like my sleep. A
response would be awesome.

ps. pretty happy with Softlayer over all btw.

~~~
plusbryan
My data on SL Cloud is about 6 months old now - but it's seriously far behind
their dedicated offerring. A poor control panel lacking vital functions,
frequent downtime, and slow io mire it.

~~~
MartinCron
I am using SL Cloud currently and I can back up the reports of slow io and
clumsy control panel. Uptime, however, has been great.

We're going to try moving to their "Bare Metal" cloud db offering To take
advantage of local storage (10x faster writes in my tests)

------
kaib
I find it fascinating that cloud has become almost a synonym for rapid
provisioning and VPS based systems. I think this is really short sighted.
Having a great uptime or latency does not automatically follow from being able
to spin up tons of servers. And being able to provision dedicated servers in
hours is still pretty magic to me. If nothing else, focus on just provisioning
is obscuring other critical issues like network connectivity and I/O bandwidth
and latency.

mgkimsal is spot on with his (sarcastic) comment, you need to think for
yourself.

------
Joakal
> So, many times a week we would wake up, see an instance dead (all critical
> processes locked in D-state), call Softlayer, and spend hours (literally) to
> bring it back up.

Wouldn't keepalived or heartbeat help with failover? If I see an instance
died, I would delete it and start a new instance with a latest backed up
image. I assume it would take ~15 minutes compared to relying on the support.

~~~
kovyrin
The thing is, all our cloud instances were redundant so SL downtimes almost
never caused real service disruptions on the site. But when one instance dies,
there is a huge chance that other one (two, five, ten) will die soon as well.
As for the automated kill/deploy scripts - really often even their own cloud-
related portal features don't work (like when you try to restart/reset an
instance and their APIs and portal just time out, etc).

------
citricsquid
We had a good experience (from what I remember) with using Softlayer cloud to
supplement our physical hardware when we had high load. We'd bring online ~5
cloud instances to handle the load.

------
jread
I've done some benchmarking of softlayer cloudlayer and their "SAN" back cloud
servers have about the worst IO performance of any provider. Even the largest
16 and 32GB instances performed extremely poorly. EC2 EBS is a much better
performing storage platform. Stay away from cloudlayer if you have anything
even remotely IO intensive.

[http://blog.cloudharmony.com/2010/06/disk-io-benchmarking-
in...](http://blog.cloudharmony.com/2010/06/disk-io-benchmarking-in-
cloud.html)

------
lux
How is their cloud storage? I was comparing prices between services the other
day and it seemed pretty competitive, but haven't yet tested latency and such.
Wondered what other HNers have found compared to Amazon and Rackspace in terms
of service/issues. Thanks!

------
wildmXranat
I was running softlayer dedicated boxes for about 18 months. It was a much
better experience than what looks like a horrible cloud setup.

~~~
kovyrin
Yeah, we run 100+ dedicated boxes for 3 years now and our experience is really
great. Hence the rant - why couldn't they provide the same quality of service
and support for their cloud offering?

~~~
mikiem
Because cloud computing is a more complicated way of delivering CPU, memory,
disk (space and performance) and network. Complicated always has a much lower
likelihood of being better, more durable, cheaper or more reliable.

~~~
kovyrin
You know, we never complained about cpu, memory, disk or network performance.
The only real complaints were about the fact that no cloud instances had
uptime of more than a week. I'm totally OK with shit being slow (we could
always optimize or scale out), but when it is plain broken - this is when I
need to wake up and fix things.

