
Cat /proc/cpuinfo or don't trust your cores to rackspace part i - inaka
http://www.rubyrescue.com/blog/2009/10/27/cat-proccpuinfo-or-dont-trust-your-cores-to-rackspace-part-i/
======
jbyers
Don't trust any servers to anyone. When we get a new server we check its stats
against reality (have had upside and downside surprises on CPUs), run bonnie++
to make sure IO is as expected (it hasn't been due to exotic RAID problems),
and run memtester to see if we have bad RAM (had that too). Takes more time,
sure, but no surprises later.

~~~
jacquesm
That's the right attitude.

For extra points do serious burn-ins, especially on network hardware, keep a
good eye on those error counters as well as mcelog in case you got a faulty
ram in there.

It's all part of commissioning a server, especially if you host at a cheap
outlet like rackspace.

~~~
cperciva
_a cheap outlet like rackspace._

I've heard rackspace called lots of things, but this is the first time I've
heard someone call them "cheap". Have their prices gone down lately?

~~~
jacquesm
On a relative scale they're cheap, what they call 'managed hosting' though is
not what I'd call managed hosting. I think they call it managed hosting
because they will do backups for you or something like that :)

The Planet/EV1, which was my choice when hosting in the US earlier was quite a
bit cheaper, but service there was absolutely terrible.

It got to the point where I reprogrammed the DRAC cards to lock out their sys
admins.

After The Planet took over we had all kinds of issues, then finally they had
an explosion in a transformer in one of their datacenters taking down all of
our stuff for days on end. After that we moved out.

They said it had nothing to do with them and we would be credited for the
downtime if we stayed for at least another 6 months, but we had by then
already signed up elsewhere and restored from backups. I figure if you're
willing to run your operation that close to the red line then we should be
taking our business elsewhere. They were lucky nobody got hurt.

Right now we're hosting in three places, leaseweb, mojohost and virtual
acccess. VXS is by far the best but expensive, leaseweb is somewhere in the
middle and for high volume mojohost is absolutely unbeatable.

btw, you have me curious what else you've heard rackspace called :)

~~~
cperciva
_btw, you have me curious what else you've heard rackspace called :)_

Most of what I've heard about rackspace is that they had a good reputation
once, but have been resting on their laurels and now have higher prices and
worse service than other hosts -- but this is all 2nd hand and I have no
direct experience with them, so I was curious to hear other perspectives.

~~~
jacquesm
Ok. I think if they worked a bit harder at justifying that 'managed hosting'
bit then they would actually be worth it.

With both mojohost and leaseweb I basically only bug them when there are
network or hardware issues, for the rest the problems are mine. VXS is a
different story, that's where all the web servers are, their operators help
with warding off all kinds of attacks, proactively scan for security issues
and so on. 24x7 cell phone of the manager of the hosting facility.

It's very addictive, that level of service.

They charge a pretty penny for it, but imo it's worth it, it is still much
cheaper than having a full time sysadmin for our stuff, and they probably do a
better job of it.

------
jacquesm
If you never run top '1' then you shouldn't be operating servers for
customers.

~~~
inaka
attempting to ignore snarkyness, but failing - and that says what about the
dozens of rackspace engineers configuring and monitoring and supporting the
box over the past two years?

~~~
jacquesm
There once was a really nice quote here on HN: "you can't outsource
responsibility".

If in two years time you've never _ever_ had a look at what kernel you are
running, especially while tuning a system for performance you only have
yourself to blame.

Don't tell me you're running a 'stock' kernel and never bothered tuning it for
your application, or considered upgrading it. Also, in your resources list you
should have the exact machine configuration, there are tools to retrieve that
sort of info automatically.

Then, when you're done, store it in <http://inventory.sf.net/> or something
like that.

It's typical that the people at rackspace would simply drop in the requested
hardware, and that you yourself deal with the configuration.

The smart money is on running some tests after they've done that to make sure
it went ok. Asking for a CPU upgrade and not checking if they're operational
is just plain stupid.

I figure you literally asked rackspace to upgrade the CPU, and that's what
they did.

Did you explicitly ask them to install an SMP kernel with a specific version
and they didn't do it ? Or did you expect them to do it but you didn't check
if they actually did until today ?

Two full years of trying to tune a box for performance and not noticing this,
then publicly blaming rackspace is simply cheap, an attempt at pinning the
blame on rackspace, for something that you should have noticed long ago
yourself.

Kudos for writing about it but the title should be "How I messed up". That's
taking responsibility and then make sure it never ever happens again.

~~~
sailormoon
I really do not dig this tone. The guy is obviously not a system admin. He
paid top dollar for rackspace managed hosting precisely so he wouldn't have to
do the kinds of things you mention.

"You can't outsource responsibility" is utter nonsense. It is completely
impossible to "own" responsibility for everything important in a complex
society. Meaningless platitudes should not distract from the fact - Rackspace
did not do their job.

Yes, he messed up. He messed up by making assumptions and not checking
Rackspace's work more closely. That's not the same as messing up in your own
work. His post is a reminder to be more careful checking on the work of your
"upstream". There's no need to pile on with the "if you didn't know 'top 1'
you shouldn't be running a startup!" etc.

~~~
thaumaturgy
I'm not a big fan of the tone, either, however, jacquesm is spot-on in his
assessment.

For one thing, my understanding of Rackspace's business practices -- and I've
only dealt with them peripherally, so I might be a bit wrong here -- is that
they "manage" things like their network, and the actual server hardware, and
stuff like that. So, if you want a CPU upgrade, sure, they'll do that. If you
need your server rebooted, they'll do that too. But, they don't have anyone
sitting there monitoring your system's performance metrics and doing your
sysadmin duties for you.

The way I read it, Rackspace did do their job: they upgraded the hardware. It
was up to the server admin -- not Rackspace -- to check that the _software_
was then configured correctly.

And finally, I don't generally agree with statements of the form, "If you
don't know X, you shouldn't be doing Y", but ... looking at dmesg and top are
both really, really, really standard sysadmin operations. Entry level stuff,
really. Sysadmin work doesn't just mean messing around with Apache's
configuration; there are many more nuances, and it's likely that their system
is vulnerable to problems that they don't even know about.

~~~
jacquesm
The tone is probably in large part because the OP does not take any
responsibility for his own part in this and instead is pointing his finger at
a third party that _may_ have been partially at fault. But that is by no means
sure.

This is typical with what I think is a real problem in society, the
'externalization of blame'.

Inability to see your own responsibility is a serious issue, and it is really
pervasive. If I were in the OPs position I would be headbutting a piece of
concrete for 20 minutes to make sure I never ever make a mistake like that
again, and I would thank rackspace for finally finding the fault that I could
have noticed in 5 minutes two years ago.

That's why you have post-delivery checklists, burn in tools and inventory
management, staples of everybody that has # on machines that do customer work.

I'll try to keep my 'tone' better under control, apologies for that.

At least it wasn't in Dutch ;)

------
ericwaller
After _the conversion from Apache [+passenger] to Nginx [+unicorn] overall
throughput went up by a factor of 3._

I noticed a similar gain (10-12 req/s to 25-30req/s) for action cached pages
after switching out nginx+passenger for nginx+thin. This is on a 256mb slice,
and seems totally counter to the general opinion that passenger is great for
VPSes.

~~~
jcapote
Have you tried nginx/passenger? I ask because I wonder how much of that gain
is part of switching to nginx rather than switching to unicorn...

~~~
inaka
on a separate application and server, yes, and it seems faster than
apache/passenger, but it still suffers from the 'passenger choke' where
touching tmp/restart.txt seems to pause all web traffic for 10-15 seconds, so
we never considered it here...

