

New Year's Resolutions for SysAdmins - zdw
https://www.usenix.org/blog/new-years-resolutions-sysadmins

======
Ecio78
I should "Finally learn IPv6" but I know that I will procrastinate again and
again..

------
reitzensteinm
"Check that your backups are working the way you think they are."

I'm a bit horrified to read this here. If you're a sysadmin and don't have
both automated and manual testing of backups, it's hard to imagine what else
was a more important use of your time.

There aren't many things that could bankrupt a healthy business overnight, but
catastrophic data loss is certainly one of them.

An analogous entry for a lawyer might be to pay attention while reading
contracts.

~~~
bowlofpetunias
> what else was a more important use of your time

Keeping shit running _now_ is always more important than keeping shit running
in case of catastrophic failure.

Because a catastrophic failure is only catastrophic if you actually have
something of value in the first place.

In the real world it's a balancing act in which there is no room for
absolutism. Choosing which compromises to make is the hardest part of any job
that comes with a level of responsibility.

~~~
reitzensteinm
I'm talking specifically about _dedicated_ sys admins, since that's what the
blog post is about; I'm no stranger to cowboy coding on a project that may or
may not ever be worth anything. I've lost data once before to corrupted
backups and I don't regret not investing more engineering effort. It was an
MVP, and making better products gives me more leverage.

But by the time you're hiring full time for the position, outside of a few
edge cases where maybe you're SnapChat and you are growing 20% a week, it's
probably time to settle down a bit and be sensible. At which point, if testing
your backups to completion to avoid catastrophic data loss isn't #1 on the
todo list, it's #2.

~~~
vidarh
It may be #2 perpetually in many cases because the boss will not listen when
you insist it's important. For many it then starts slipping down the list, as
what is best for the company is often not best for the employee: For many it
becomes a reasonable (for them _personally_ ) risk to take to bet that they'll
do better from keeping the boss satisfied now rather than spend time on
backups to avoid a major disaster after they've left. And yes, that means
gambling that the major disaster won't hit while you're still there.

I'm not saying this is how it should be, but it is how it often becomes if the
sysadmin or whomever taking on those responsibilities don't report to someone
who _also_ see the data integrity as priority #1 for the sysadmin.

I've worked in places where the CEO's e-mail client configuration is the #1
priority for the guy that should have been focusing on server backups, for
example, and where prioritising the backups would be a bad career move for the
person in question.

(Yes, that is a huge warning sign that it's best to find a different job)

------
pchander
Excuse my ignorance, but what's wrong with nslookup and ifconfig?

~~~
jaryd
short answer: `/bin/ip` replaces `ifconfig` -- it's newer and more powerful

more info: [http://www.tty1.net/blog/2010/ifconfig-ip-
comparison_en.html](http://www.tty1.net/blog/2010/ifconfig-ip-
comparison_en.html)

~~~
nailer
Asides from ifconfig not being maintained (which is reason enough not to use
it), I always wondered specifically what was broken with it.

Then I worked for an arbitrage desk at an investment bank. They used virtual
addresses for different IPs, on top of vlans connected to different exchanges,
redundantly (ie bonded).

Not a single IP-having interface appeared using ifconfig.

------
jaryd
I'd also recommend any sysadmin to start looking into an automation framework
like Chef, Puppet, Ansible, or any other of the myriad options.

~~~
pault
I just started digging into chef with the intention of using with AWS opsworks
and... holy complexity. There just doesn't seem to be any obvious entry point,
as far as I can tell. I've spent two entire days searching and there doesn't
seem to be anything in between "hello wordpress" and "read this 300+ page user
manual". Can anyone recommend a hands on guide for setting up a multi-node
stack with opsworks that isn't just plain vanilla templates? I was really
impressed by Richard Seroter's videos on AWS at pluralsight, but they are more
of a 30,000 foot overview and gloss over implementation details, for the most
part (even at 10+ hours). The project I'm working on is mostly a pretty
straightforward node service running on postgres, but it uses some custom
services that need to scale horizontally, and host sensitive data that should
probably be running in a VPC.

Edit: I'm also having a difficult time trying to figure out how to get up to
speed with chef in the specific context of opsworks, since most of the chef-
specific courseware I can find assumes I'm using a hosted chef server, and I'm
not quite sure how that overlaps with the opsworks flavor. It's starting to
feel like I'll be at this for another week before I can get even a simple
stack running.

~~~
mateuszf
Try with ansible.

~~~
deckiedan
Agreed.

As well as the reasonably good ansible documentation, checkout the ansible-
examples github repo. ( [https://github.com/ansible/ansible-
examples](https://github.com/ansible/ansible-examples) )

There's a few initial 'conventions' to figure out (roles, group vars,
playbooks), but then it's so incredibly simple to get into.

My dad is playing with a raspberry pi at home, wanting to set up a web kiosk
thing. I'd been doing something similar a few months ago for work, so I send
him my ansible playbook. He could follow it as if it were a plain text
checklist/tutorial, without needing ansible at all. If he wants to automate
it, he can.

That, I think, is one of the benefits of ansible. It's not a lot more work
than just writing an overview of what you need to do in the first place, but
with the benefit of it being repeatable and automated.

------
ams6110
_3. "Check that your backups are working the way you think they are."_

3.1 "Check that you can actually restore a file/recover a database"

~~~
protomyth
3.1.1 "Do it daily to a box in another building while dumping the backup to
offsite storage"

------
AmVess
I learned #2 very early on. Further, the specs I give for a project are the
very minimum. If you aren't willing to do things correctly the first time, I'm
not willing to be a client.

------
blueskin_
#2 is the most important in that list.

