Hacker News new | past | comments | ask | show | jobs | submit login
Devops/Sysadmin Cheatsheet (rubytune.com)
241 points by wlll on Jan 11, 2013 | hide | past | web | favorite | 107 comments

Some random thoughts:

- Instead of 'while true;' you can use the shorter 'while :;'. ':' is a null command.

- On OS X, dtruss is kinda, sorta like strace (it's a wrapper around dtrace, and the dtruss name comes from Solaris).

- For basic host to IP resolution, I prefer ping as it calls gethostbyname(), like most other programs will do. host/dig are suitable for querying/testing DNS independent of how the local box is configured as they call the resolver library directly. For example, host bypasses nsswitch.conf.

(Separately, OS X's stub resolver has some neat tricks. e.g., using /etc/resolver/<domain_name> to route DNS requests for a particular domain to a specific name server.)

- I had never heard of http://michael.toren.net/code/tcptraceroute/ before. I don't think I've ever encountered a situation where the issue was outbound ICMP/UDP packets being blocked, but rather it's the return ICMP Time Exceeded packets.

- 'find . -size +100M' is shorter than 'find ./ -size +100000000c -print'.

- ctime is NOT when a file was created, though this is a common misconception. Unix does not record when a file is created. Rather, ctime is the last time the file's status (i.e. inode information aka metadata) was changed. This differs from mtime which records the last time the file's data was changed. ctime is a superset of mtime. atime is the last time a file was accessed, unless the filesystem is mounted with noatime (common for NFS file ssystems). atime/mtime can be set to arbitrary values (permissions allowing) via the utime() system call.

- Modern find supports -delete instead of '-exec rm {} \;'. In any case, my muscle memory still defaults to '-print0 | xargs -0 rm'.

- 'dd if=/dev/zero of=file.txt count=1024 bs=102' seems like an odd way to do that. I think that 'bs=1024 count=102' might be more efficient depending upon buffering?

You can use TCP/UDP/GRE for your traceroute packets with the -P switch:

traceroute -P udp <hostname>

This is super useful if you're probing a network that's hostile to ICMP packets - it's a lot more reliable than ICMP in many scenarios, from my experience.

Regarding ctime, some Unix versions(or I should say filesystems) really store the creation time(birthtime) of the inode. FreeBSD's find has the -Bmin switch for searching for true creation time.

Can anyone link something about this? We have two contradictory opinions and I'm not sure who to believe. The Wikipedia article on ctime isn't at all helpful either:


I really wasn't clear about what I wanted to say, sorry. birthtime is different from ctime on those systems. Ctime is last change on the inode as parent said.

I just wanted to note that some Unix systems really store the creation time. See http://www.daemon-systems.org/man/fstat.2.html for an example.

THANK YOU for the insight and effort. Enlightening. Updated the find, MUCH better. Will play with the others and modify the sheet as I go!

> (Separately, OS X's stub resolver has some neat tricks. e.g., using /etc/resolver/<domain_name> to route DNS requests for a particular domain to a specific name server.)

I was about to give up and install dnsmasq locally to get around a VPN-related issue. Thanks for the tip!

ctime is also useful because it's harder to forge. You can normally set the mtime on your files to whatever you want, but forging the ctime normally requires either directly modifying the filesystem structures on disk, or changing the system clock.

ctime is trivial to manipulate. For instance, 'touch -t 1111111111 foo'

You are wrong; -t does not change the ctime. It changes the atime and mtime. You cannot change the ctime using touch.

Great list. A few more (related or building on those there)

Quick mysql "why is my database under so much load suddenly" that doesn't rely on the slow query log:

    pt-query-digest --processlist h=<host> --interval=0.01 --print > /tmp/queries.out
    pt-query-digest /tmp/queries.out
Tracking down what files exactly are filling up the disk. Go to /, and follow the big numbers:

    du -smc *
Combine strace (for the file handle no), lsof (for the file handle to ip/port) and netstat (to find the process on the remote host) to track RPC calls that are hung.

For 'ps auxww', throw on an f to get a visual representation of child processes.

And finally, ssh tunnels (which can be chained together) to get past bastion servers easily:

    Host <alias>
        HostName <ip>
        ProxyCommand ssh -q <bastion host, can be another alias in .ssh/config> "nc -w 3600 %h %p"

With recent enough SSH (> 5.4), SSH tunnels can be achieved with

  Host <alias>
    HostName <ip>
    ProxyCommand ssh -W %h:%p <bastion host>
Which doesn't require nc (or anything else) installed on the bastion host.

I prefer using tcpdump instead of processlist when dealing with pt-query-digest. You get everything instead of just what happens to show up during the interval specified. This is, of course, assuming that you don't allow localhost socket connections.

Very true - extremely fast queries can slip between the processlist checks. The problem I've encountered with tcpdump is that you can cause packet loss by running it. It's not common, but it happens enough that I've become gun shy about using it on production systems.

The pt-query-digest function will cause additional load on the DB, but it won't interrupt communications.

What OS will drop packets while tcpdump is running?

Linux. :)

If tcpdump can't keep up with the incoming traffic, the kernel will drop the packets in its buffer (or rather overwrite them with new packets).

Throw in TCP's flow and congestion control protocols, and dropped packets can have disastrous effects on your database.

Google has many references you might find useful on this subject.

Thanks for that! I'll take a look at the pt-query-digest stuff again and add!

That's not devops - that's system administration. Not complaining; it's still a well-presented list.

Seriously! The term is becoming useless when it's just a synonym for 'sysadmin' or 'random linux cruft'

Thanks! You are completely right. Just basic sysadmin stuff. It's titled that way simply a) I thought I'd see if I could hang with the term and b) the "source" of these commands (including myself) are from developers who specialize in rails ops work (so we do write app code, but specialize on the server side)

Christ, just admit you did it so you could be associated with the term when someone Googles your name.

YES, this is all part of my master plan to be THE KING OF DEVOPS!!11one.

Meta question: where do you actually learn this stuff? I am a developer with an intermediate desktop level command line knowledge learnt mostly on the job but would appreciate a crash course on system-level command line, diagnostics, devops, etc.

In my experience, you pretty much learn this on the fly. You don't know how to do X, you google it, if you do X often enough, you'll eventually learn. After a while you end up mastering most of the unix suite (ls, grep, awk, du, etc..) so instead of googling stuff you write it.

Unlike fundamentals (think books like SICP, Introduction to Algorithms, K&R, the Dragon book, et al), this is just a collection of useful commands that do not bring barely any collateral learning aside from learning the 'UNIX' way.

If you really want to put up the effort to learn this, and you don't have any projects or anything that requires this knowledge, I've noticed lots of tech offices have a copy of UNIX in a nutshell always have a copy around. I've checked it out myself and it's pretty useful (I already 'know'), not sure how good is it for learning this stuff from the ground up.

This is a great reply. "On the fly" is definitely my experience too. Needing to do something. Doing things slowly, inefficiently, with googling until you find or get told the more efficient way of doing things.

Also, talking with/demanding explanations/debating with geekier than thou friends has been really helpful with the bigger concepts.

Learning by doing is probably the way most people learn their unixcraft. It also helps if you have a friend or a team member who is a lot more experienced than you, a go-to guy with infinite patience to your stupid questions.

I had the privilege of working with a badass russian unix hacker for some time, he taught me how to do black magic with find, grep, sort and uniq.

I second the idea of learning on the job, especially by things going bad.

I have the feeling ops people can be somehow.. "measured".. by how many things they had exploding in their face, and then learned how to fix them.

For example, looking at TFA, the first thing I thought is "oh, `df -i` should be there too".

Because I ran out of inodes enough times that I learned to check that (though I am not a sysadmin).

[Edit oh gads I see now the commands are in text boxes and the full command is not visible. It is "ps aux | head -1 && ps aux | sort -k 4 -nr | head" which maybe correct is even more ridiculous.]

> ps aux | head -1 && ps aux | sort

Is a wasteful construct and doesn't even do what is claimed "List the top 10 memory hogs". Depending on what you consider "memory" something like this is shorter and correct.

ps aux --sort=-resident|head -11

Other errors: They list same command "du -hs" for twice. I believe they meant "df -h" for "overview of all disks". Although, it's correctly overview of mounted file systems.

Thanks! I've updated that with your version, much MUCH more concise. (Also fixed the df thing, definitely copy paste error!)

Okay, I'll admit it. I don't actually know what devops is. I know what developers do and what sysadmins do. Is devops just a buzzword for one person who can do both? Or does it mean something other than that?

I will admit that I have a fuzzy understanding as well. My take on it is a holistic view and integration between development and operations instead of formal or informal walls between the two.

Wikipedia: http://en.wikipedia.org/wiki/Devops

In small companies it usually always exists by accident. I think the "hype" is more around large companies that have huge barriers and sometimes friction between development and IT or sustaining Operations.

I don't really know how to state what we do aside from "Writing the glue/lube between the application and the hardware."

Deployment automation, monitors, backups, cleanup...

I'm the author of this cheatsheet and I have no clue really. Just wanted to try the word out, see if it felt right.

But yes, my impression is that it's normally a developer who "trends towards doing systems administration" – so someone who knows the codebase, but can provision and wrangle servers. So, someone like me (I write full stack apps, but my "speciality" is the server end of things).

After giving it a thought, i'd say it's just a new buzzword for sysadmin.

Because, atleast in my understanding, it's totally in the working area of sysadmins to set up a server, install and configure software, configure monitoring, etc. The typical sysadmin will also write scripts and stuff to do that.

My bet is that it just came up by some angry sysadmins who felt like they need to differentiate from the dumber kind of admin who can barely touch a shell.

Of course a software developer may also write deployment scripts and similar stuff, so that's where it becomes fuzzy.

I never really thought about that term, but thanks to yyour question i just figured that i will drop the term from my vocabulary. It's too fuzzy, it's too much of a buzzword.

At the moment, it's mostly an overloaded buzzword. :) The best definition I've found is that it's a culture of collaboration and integration between development and operations, rather than a role for one specific person.

Ben Rockwood from Joyent has a good conceptual talk at http://www.youtube.com/watch?v=h5E--QSBVBY , "DevOps Demysitified".

I would warn anyone that read this to understand what each command does before actually running anything on a machine.

Things like "sudo !!" are INCREDIBLY dangerous, and I would never put that on a cheat sheet.

Anyone running commands blindly without knowing what they do deserves what they get. That goes for potentially dangerous commands or not.

Or put in a slightly nicer way - Blindly running commands is going to turn into a learning experience, don't do it on a production machine.

I suppose if you type it at any random point it might be. It's useful more when typed something that you forgot to sudo.

If you are running sudo, the long-term payoff is probably higher if you never use sudo !!. The very small chance of a huge disaster is still a lot higher than up arrow. Unless you are on a laggy connection and up arrow delays and you push it twice and end up running the wrong command!

If you have lots of jitter and lag, i suggest using Mosh which I've found to be amazingly awesome when using it on Amtrak's spotty free wifi (3G uplink).

I've never had any incidents with `sudo !!`, however, I should point out that zsh (maybe other shells as well) replaces the `!!` with the full command and requires another press of Enter.

    $ rm -rf /root
    rm: cannot remove `root': Permission denied
    $ sudo !!  # Does not run this, but returns the line below.
    $ sudo rm -rf /root  # Then you say, "Wait I dont want to do this."

Interesting. Bash on my OS X box shows the command you're running, but doesn't give you a chance to cancel it. It looks like this:

  $ rm -rf /root
  rm: cannot remove `root': Permission denied
  $ sudo !!
  sudo rm -rf /root
  # /root is now kaput

I think it's meant for when you run a command normally, but then realize you should have sudoed (sudid?) it. Though in that situation, I would still be inclined to do "↑ esc a sudo ". It's not a lot more work, but it is a lot less uncertain.

Indeed. I find ctrl-p ctrl-a is faster than going to the cursors.

Whoops, thanks. Should have been ctrl-a, but spelling out muscle memory is, like, hard.

Shoulda sudone it!

I use this one like a dozen times a day.

  user@host$ less /var/log/syslog
  Permission Denied

  $ sudo !!
  sudo less /var/log/syslog

This is exactly how I use it. A time saver for mundane "whups" moments.

why would that be bad? You would get to read your file.

It's not bad. It's extremely useful. If someone is so negligent that they run something potentially destructive and then immediately sudo !!, they probably shouldn't be administering systems.

Truthfully, I'm more comfortable using the up arrow and re-running it manually. At least I can inspect what the command was before running it.

Obviously, this doesn't help when using Fabric or scripting, but you shouldn't need "sudo !!" in those scenarios.

So, what's wrong with sudo !! then?

It's too easy to have a sticky key and type sudo !1 by accident. As a rule, not a good idea to do sudo from history without examining it first. So do up arrow or CTRL-P and the edit the command.

It re-executes your previous command -- this time with root privilege.

If and only if you meant to do that, you're fine.

Why else would you type it in? I can't imagine many people saying "hm, that command I didn't intend to run didn't work. Maybe I should run it as root instead."

Hopefully if you're in a NOPASSWD user/group, you know what root can do if you're careless; if you're not, there's yet another point where you'd have go out of your way to do damage with this shorthand.

The "only if" part is false: it is certainly possibleto be fine having done sudo !! without meaning to do it.

This is just a small curated collection from myself and old friends — mainly rails devops. I'd be interested in hearing from other rails devs/ops what you use on the command line on a daily basis, what cool thing you know about that no one else does, etc! I'll be adding more as we go....

It would be nice if the title mentioned it was rails focused.

I didn't mention Rails in the title because most of the commands aren't Rails specific.

It's not. It's a poor title. Nothing there is Rails specific.

Sorry about that! I didn't submit the link otherwise I would have mentioned it.

Though...the only way it's really "rails focused" is that the commands are collected over time and sourced from my fellow rails developer/sysadmins. A lot of these are basic/general linux commands.

For getting around linux I've found no better helpers than manpages and the cheatsheets by Peteris Krumins[1]. I notice this isn't offered as a pdf or image, is the idea that someone will come and visit the site when their server is in trouble? That said, I didn't know about some stuff here like scriptreplay -- thanks!

[1]: http://www.catonmat.net/projects/cheat-sheets/

Check also the Unix Toolbox: http://cb.vu/unixtoolbox.xhtml I've got a copy of the pdf in my dropbox just in case.

Here's a first version, still working on it though: https://www.dropbox.com/s/a0rx9c28euwokaz/cheatsheet-pdf-ver...

I'd second a quick pdf version of it as well, I'm sure more than a few of us have a directory with cheatsheets in somewhere on their drive as a 'just in case'.


Note: not affialated with them; just one of the top Google results. Also, I'd take a screenshot (and then export to pdf if an image file wasn't acceptable) since the pdf conversion via pdfcrowd splits it into multiple pages.

Here's a work in progress, still needs some loooovvvee: https://www.dropbox.com/s/a0rx9c28euwokaz/cheatsheet-pdf-ver...

Nice work!

PDF is coming! And thanks for linking to catonmat, that stuff gets into some serious yummy detail.

Here's one I like for testing bandwidth between two machines:

  host1$ while : ; do nc -l 6666 > /dev/null; done

  host2$ pv /dev/zero | nc host1 6666 
  156MiB 0:00:17 [9.46MiB/s] [    <=>              ]

You should check out iperf as well.

Looks like a useful tool. Thanks!

A little gotcha: "iptables -L" doesn't list the NAT table rules; do "iptables -L; iptables -t nat -L" instead.

I also almost always add '-n' to iptables commands to disable reverse name resolution, which can sometimes take a while. Similarly when calling netstat.

Me too. Even more important is that a host on multiple networks may have the same PTR value for each address, and thus one can lose information in the reverse lookup. "host w.x.y.z" ( or Solaris "nslookup -type=ptr z.y.x.w.in-addr.arpa" ) is available if you don't recognize the IP address.

dig -x <ip>

Will also correctly do a PTR query for the reverse entry for that IP. Also functions correctly with IPv6, so you don't need to remember to split the IPv6 address up into a lot of dots :P

I always use 'iptables -L -n -v --line-numbers' - verbose will show you stuff like which interface the rule is set to, and line numbers can be useful when adding a rule before another with 'iptables -I'

I use -n with almost any network-related command. Probably overreacting, but I hate to just sit there staring at a black rectangle.

This is a UNIX cheatsheet. Nothing devops about it.

And I concur with the comments about "sudo !!" - !! in general I've totally removed from my UNIX vocabulary. At least do something like !?string[?]

Checks the speed of the network. if you are looking to good connectivity on your VPS.

    wget cachefly.cachefly.net/100mb.test -O /dev/null
check the write speed of the disk. Mostly used to check what kind of a write speed you get on the machine.

    dd if=/dev/zero of=iotest bs=64k count=16k conv=fdatasync && rm -rf iotest

Devops/Sysadmin Cheatsheet for Linux

Most of the commands listed rely on various GNU extensions to the various utilities or are only applicable to Linux.


strace -> dtrace/ktrace/ lsof -> sockstat/fstat watch -> no idea

sudo -> Almost never installed by default on FreeBSD ps aux --sort=-resident|head -11 -> --sort is not valid ...

And the list goes on ...

If you're going down that path, add windows servers to your list and find more things missing.

I don't know a lot of people running Ruby services on top of Windows ...

I don’t know a lot of people running Ruby services on FreeBSD, either. I’m pretty sure I know one or two who, at least, develop on Windows.

But, yeah, noone’s stopping you to make your FreeBSD version of the cheatsheet, what with the open source spirit and all… :P

There's a few typos in this, in case it helps:

    ps auxww -> ps auxww -H (H is hierarchy)
Faster than lsof, and only displays files - although most things in unix are files! lsof -p -> ls -l /proc/$PID/fd

Run something forever watch command

Overview of all disks du should be df.

Find files over 100mb find . -size +100M

Low hanging fruit for size ls -al | sort -nk5

Files created (modified) within the past 7 days: find . -mtime -7

Find files older than 14 days: find .gz -mtime +14 -type f This will break when you have more archived files in the directory than the shell's glob char can support. Use: find . -mtime +14 -type f -name '*.gz' and it will run quicker too.

TCP Sockets in use, "netstat -antp" will be faster and also lists the process id.

EDIT: formatting

Instead of:

  ls -al | sort -nk5
I suggest:

  ls -larS

I upvoted you the other day and it seems coming back today i have the upvote option again? Weird.

Either way, have an upvote!

Thanks for catching the ps aux and for the refinements! Deploying new version now...

Small typo, "Overview of all disks" should be

  df -h

  du -sh

Thanks for noting. Last minute copy/paste fail :(

Be careful when you think you really understand the meaning of atime/mtime of findutils.

E.g. man find

  -atime n
    File  was  last  accessed  n*24  hours ago.  When find figures out how many 
    24-hour periods ago the file was last accessed, any fractional part is 
    ignored, so to match -atime +1, a file has to have been accessed at least 
    two days ago.

Thanks for this nice list.

Small nitpick/question: why put the commands in inputs? Do you edit them on that page? Or is it just to format them?

I tried to do a copy/paste on the page contents to my personal offline notebook but the most important bits, the commands, didn't paste.

Ack! I'm sorry! I started out with "pre" elements, but they were causing problems with copying and pasting (it looked odd, and it added a dreadful newline so the command would execute on paste!)

The plan is to have a pdf shortly; that will most definitely be copy and pasteable.

    du -h --max-depth=1 -x
I use this frequently to keep an eye on disk used by user directories. It returns the disk space used by each directory in the current directory.

So does:

    du -sh *

Correct with one important caveat: because * is a shell expansion, it will not process any directories starting with a ".". e.g., imap maildir directories.

That's fair, but for the proposed use, checking the size of home directories, .* directories are of little concern.

Honestly, when I have to get fine grained enough to dig into disk usage using 'du', I'm rarely concerned with hidden directories (and I typically append a 'c' to the command as well, so if I do have to pay attention to . directories, I will notice the discrepancy between reported sizes).

We have lots of imap maildir directories; e.g., ".Junk", ".Trash", ".Sent", ".Inbox", for each mail account on the server. It's kind of a special case though; I'll agree that in all other cases, your version is simpler and easier to remember.

Your local maven repo (.m2/repository) can get huge.

du -sh .* *

Problem solved.

You might be right. I have some recollection of .* stabbing me in the back once before in this command, and I know I put -x (do not traverse file systems) in there for a reason -- but I don't have any notes on why, so I can't say anything smart about it.

The trouble with that is that it matches '.' and '..', which you probably don't want. I don't tend to have a single character after the dot in a name, so I usually use '.??*'.

this is a computer operator cheatsheat. there are no systems administration tasks being implemented here, just basic unix.

Just a UX suggestion, but I would highlight the text on the input focus for easy copy/paste.

Good point. I considered doing this, but then it ends up being a "surprise" to users when they quickly go to triple-click (select all) and the interaction is different than expected.

What I find really useful is you can immediately type some characters to filter, and then tab to the command you want. It's automatically selected in that case.

I'll likely stick a "copy" button on hover shortly — had some initial problems with z-indexes (it uses flash) but this solves the "one click to copy" issue without affecting the default and expected interaction.

Nice list

Thank you!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact