"This was enough and we’ve finally made the decision to give up and go back to real hardware only we control and manage. It took us more than a month of work, but we think we’ve got pretty good system built as the replacement for Softlayer cloud based solution."
This can't possibly be a good end solution! Haven't you read everyone else's blog posts and forum comments that there's no possible way you can ever build anything as good as _insert_vendor_name_here_ has done with their cloud?
There's 0 chance that you'll be able to respond to a 2400% increase in traffic/usage in under 43 seconds! And you'll never ever be able to replicate around the globe instantly, and even if you could, it'll cost so much more than _cloud_vendor_ because they have economies of scale you can never achieve. Ever. Seriously.
I just can't believe someone would consider - you know - owning and managing hardware that they control with their own SLAs and support people in place. Where's your sense of cloudsourcing?
There's 0 chance that you'll be able to respond to a 2400% increase in traffic/usage in under 43 seconds!
You don't need to be on a cloud to support that. Dedicated is so much cheaper/more powerful than typical cloud machines that you can have a surplus of machines ready for your bidding and still pay less than the cloud +end up with way more processing power.
Also, you don't need to own/mange the hardware- Hosting companies will lease it to you + repair it when things go wrong
[insert sound of loud cheering from those of us that have a clue -- again]
Upvote. Sorry that there probably aren't enough people with a firm grasp on reality to lift your comment out of the hazy twilight. Take heart in knowing that the scent of coffee is getting stronger with every passing day.
While I do agree that you many not be able to handle a 2400% increase in under 43 seconds I don't think this is a realistic expectation.
I don't think that all products will have to deal with the 2400% or even a 100% increase in traffic. It really comes down to what the product is. Most products don't even reach such gains. So while there is a 0 chance they can take the increase, there is a low chance (probably under 10, maybe even under 1%) that the traffic will ever increase.
Wow this post was like reading a post from myself in the not so distant future. We had the exact same experience on SoftLayer CloudLayer. Random I/O failures, read-only mode and of course the most frustrating part was the support process around that.
We had a week last month where we would lose a couple boxes a day to this issue at which point we said enough was enough and switched completely over to dedicated boxes (still from SoftLayer). Everything has been smooth sailing since.
To their credit SoftLayer (and probably more specifically our awesome account rep) stepped up and refunded us our money we spent on the CloudLayer in March (after some cajoling on my part) but their general reaction is one of a company that's completely disjointed. When I got really pissed and started bitching on twitter in March I received inbound contacts from multiple people on their sales/marketing/account mgmt team. Unfortunately these people didn't seem to be well connected to each other and are even less connected from IT (so they could empathize but not really affect change) and "management" (this faceless part of the company which seems to love to cut off it's own nose to spite it's face).
The oddest (and probably most infuriating) part of the whole thing is how everyone at SoftLayer went to great lengths not to acknowledge that the CloudLayer product had issues. They always seemed shocked by my assertion that something was very wrong with CloudLayer product and asserted that I was alone in seeing these issues. To this day I've still not read a single "we know something's wrong and we're fixing it" post from them and that's the part that bothers me the most because it implies that their either incompetent or disingenious. Neither of which are a trait you want in an infrastructure provider.
Damn, this is exactly what I was experiencing: when I'd ask "wtf? how comes your management would not see the trend" I' d constantly receive "you're the only customer experiencing this problem!" bullshit.
Yet another opaque cloud, where a vendor uses some disingenuous language to disguise or confuse exactly how much resources you are actually getting for your money. Cores != hyperthreads. Don't get me started on "dynos" or "small", "large" and "extra large".
Yeah but that doesn't really tell you anything about the underlying hardware. Your performance will vary depending on which nodes your dyno is running on. Hardware abstraction is a great thing, I just think that a little more transparency is in order.
This could be a great opportunity for them to explain how their business works, and give some potential customers insight into how they're working on things.
Or... it could be an opportunity for a marketing-speak filled cover-up which would damage their credibility further.
Seconded, I have been eying off their cloud service as a way to manage our server costs as we have some pretty wild but predictable swings in load throughout the day/week. We currently have dedicated hosting with Softlayer so a cloud in the same data centre as our existing databases etc would be great.
This however has made me pause, we are a small team and I like my sleep. A response would be awesome.
I've had great luck with Softlayer dedicated hosting in the past -- just as good as rackspace dedicated, and about half the cost. The only limitation was not being able to do custom VLANs between servers in multiple cabinets (which I needed for a multicast logging solution). I've never tried their cloud product, though (they should offer discounts to the hackernews affiliated organization...)
Disks seem to be the real weakness of all cloud solutions so far. Local storage is the only cost effective way to have high performance disk, and that makes provisioning cloud instances much harder. There might actually be some stuff to do in this space using dual-ported SAS or SATA with dual port SAS drives, purely for failover.
Yes, their dedicated servers hosting is just awesome - the best I've seen! But their cloud stuff is terrible and this is what made me really frustrated and guess about it being a 3rd party service poorly integrated - their cloud offering sucked, they could not offer any reasonable support for it, etc.
We were in a similar situation. We were extremely happy with their dedicated server, virtually never had to use their support and never had to worry about the server going down.
But their cloud offering is a whole different story. I lost track of the number of time, we just could not connect to the server. This was not a disk problem but a network problem. Twice we raised support tickets and once it took around 6 hours to resolve and the other time it took 3 hours to resolve. In all that time all we only got generic replies that asked us to wait for updates. These two major incidents were in addition to at least three times where the server was not accessible for like 30-40 minutes each time, where we did not raise a support ticket. All this in a span of less than a month.
We moved to the cloud for the same reason, in terms of cost and usage. But the pain of not being able to access a server when you need it is just not worth it. We are testing out Rackspace. Hopefully that will be a better experience.
My data on SL Cloud is about 6 months old now - but it's seriously far behind their dedicated offerring. A poor control panel lacking vital functions, frequent downtime, and slow io mire it.
I find it fascinating that cloud has become almost a synonym for rapid provisioning and VPS based systems. I think this is really short sighted. Having a great uptime or latency does not automatically follow from being able to spin up tons of servers. And being able to provision dedicated servers in hours is still pretty magic to me. If nothing else, focus on just provisioning is obscuring other critical issues like network connectivity and I/O bandwidth and latency.
mgkimsal is spot on with his (sarcastic) comment, you need to think for yourself.
> So, many times a week we would wake up, see an instance dead (all critical processes locked in D-state), call Softlayer, and spend hours (literally) to bring it back up.
Wouldn't keepalived or heartbeat help with failover? If I see an instance died, I would delete it and start a new instance with a latest backed up image. I assume it would take ~15 minutes compared to relying on the support.
The thing is, all our cloud instances were redundant so SL downtimes almost never caused real service disruptions on the site. But when one instance dies, there is a huge chance that other one (two, five, ten) will die soon as well. As for the automated kill/deploy scripts - really often even their own cloud-related portal features don't work (like when you try to restart/reset an instance and their APIs and portal just time out, etc).
We had a good experience (from what I remember) with using Softlayer cloud to supplement our physical hardware when we had high load. We'd bring online ~5 cloud instances to handle the load.
I've done some benchmarking of softlayer cloudlayer and their "SAN" back cloud servers have about the worst IO performance of any provider. Even the largest 16 and 32GB instances performed extremely poorly. EC2 EBS is a much better performing storage platform. Stay away from cloudlayer if you have anything even remotely IO intensive.
How is their cloud storage? I was comparing prices between services the other day and it seemed pretty competitive, but haven't yet tested latency and such. Wondered what other HNers have found compared to Amazon and Rackspace in terms of service/issues. Thanks!
Yeah, we run 100+ dedicated boxes for 3 years now and our experience is really great. Hence the rant - why couldn't they provide the same quality of service and support for their cloud offering?
Because cloud computing is a more complicated way of delivering CPU, memory, disk (space and performance) and network. Complicated always has a much lower likelihood of being better, more durable, cheaper or more reliable.
You know, we never complained about cpu, memory, disk or network performance. The only real complaints were about the fact that no cloud instances had uptime of more than a week. I'm totally OK with shit being slow (we could always optimize or scale out), but when it is plain broken - this is when I need to wake up and fix things.
This can't possibly be a good end solution! Haven't you read everyone else's blog posts and forum comments that there's no possible way you can ever build anything as good as _insert_vendor_name_here_ has done with their cloud?
There's 0 chance that you'll be able to respond to a 2400% increase in traffic/usage in under 43 seconds! And you'll never ever be able to replicate around the globe instantly, and even if you could, it'll cost so much more than _cloud_vendor_ because they have economies of scale you can never achieve. Ever. Seriously.
I just can't believe someone would consider - you know - owning and managing hardware that they control with their own SLAs and support people in place. Where's your sense of cloudsourcing?