Hacker News new | past | comments | ask | show | jobs | submit login
The sad state of sysadmin in the age of containers (vitavonni.de)
970 points by Spakman on Apr 22, 2015 | hide | past | web | favorite | 443 comments



This bothers me as well. Even tasks as simple as adding a repository are now being "improved" with a curl | sudo bash style setup[1].

However, installing from source with make was (and remains) a mess. It may work if you're dedicated to maintaining one application and (part of) its stack. But even then it usually leads to out of date software and tracking versions by hand.

Many people have this weird aversion to doing basic sysadmin stuff with Linux. What makes it weird is that it's really simple. Often easier than figuring out another deploy system.

(The neckbeard in me blames the popularity of OSX on dev machines.)

[1] https://nodesource.com/blog/nodejs-v012-iojs-and-the-nodesou...


I agree that the "just curl this into bash" instructions are nightmare - on any platform.

I think a lot of this is a result of what I like to call the "Kumbaya approach to project/team management":

This is where you have a team (either for a single project or a team at a consulting agency, etc) that is effectively all development-focused staff, possibly with some who dabble in Infrastructure/Ops. In this environment, when a decision about something like "how do we get a reliable build of X for our production server deployment system" needs to be made or a system needs to be supported, no idea is "bad", because no one has the experience or confidence to be able to say "that's a stupid idea, we are not making `curl http://bit.ly/foo | sudo bash` the first line of a deployment script"[1]

[1] yes this is an exaggeration, but there are some simply shocking things happening in real environments, that are not far off that mark.

Edit: to make it absolutely clear about what I was referring to with [1]:

The specific point I was making was running something they don't even see (how many people would actually look at the script before piping it to bash/sh ?) from a non-encrypted source, and relying on a redirection service that could remove/change your short url at any time.

Unfortunately I was stupid enough to ddg it (duckduckgo it, as opposed to google it) and apparently this exact use-case was previously the recommended way of installing RVM[2]

[2] http://stackoverflow.com/questions/5421800/rvm-system-wide-i...

-


I think part of this is because there aren't any trusted, fully open source, artifact repositories that work with the various package indices out there.

Like, most of the way deployment should work is that you come up with some collection of packages that need to be installed and you iterate through and install them. Bob's your uncle.

Thing is, all the packages you need live out in the wild internet. Ideally, you'd just be able to take a package, vet it, and put it in your local artifact store and then when your production deployment system (using apt or yum or pip or gems or maven or whatever) needs a package, it looks at your local artifact store and grabs it and goes about its business. Never knowing or touching the outside world.

And your developers would all write their apps to deploy through the normal packaging methods that everyone and their mother is already familiar with and they could just put them into the existing package index as well.

But you've gotta lay out pretty serious moola (from when I last looked into available solutions to this) or set up a half dozen different artifact stores if you want to do things that way. And good luck managing your cached and private artifacts if you do. And on top of that developers don't necessarily know how to set up a PyPi or a RPM index or whatever so that the storage is reliable and you've got the right security settings or whatever else. (I know I sure don't and I'm not really interested in reading all of the ones I'd end up needing).


"And on top of that developers don't necessarily know how to set up a PyPi or a RPM index or whatever so that the storage is reliable and you've got the right security settings or whatever else. (I know I sure don't and I'm not really "

Setting up RPM is shockingly easy. It can get more complex, but the basic system is:

  REPOBASE=/srv/www/htdocs/
  createrepo -v  $REPOBASE
  gpg -a --detach-sign --default-key "Sign Repo" $REPOBASE/repodata/repomd.xml
  gpg -a --export "Sign Repo" > $REPOBASE/repodata/repomd.xml.key
That will create a repo from all the .rpm files in the REPOBASE. Also you will of course need a GPG key pair, but that can be generated with `gpg --gen-key` where you give it a description of "Sign Repo" (or change the above commands to the key description you used).

Then you get to decide on the deployment machine if you want to trust the repo (or you don't trust any and import the key via some other process, aka direct gpg import).

Of course you can find a bunch of more detailed explanations with $SEARCHENGINE, but if it takes more than a day to figure it out, your doing something wrong.

Building a set of RPM's isn't that much harder if you have a proper build system. But these are the kinds of things you give up when you decide to grab the latest immature hotness created by someone on their day off.


With docker, as referenced in TFA... you can simply vet a base image, and use that for your application... upgrades? create a new/updated base image and test/deploy against that.


And how do you "simply vet a base image"?


Same as everything: look at how it was built. Many of the images are built by CI systems according to Dockerfiles and scripts maintained in public GitHub repos. Audit those, then use them yourself if you're worried about the integrity of the services and systems between the code and the repository.


> Unfortunately I was stupid enough to ddg it (duckduckgo it, as opposed to google it) and apparently this exact use-case was previously the recommended way of installing RVM[2]

Not only "previously", it's the current recommended way to install rvm. From their front page:

>> curl -sSL https://get.rvm.io | bash -s stable

[1] http://rvm.io/


It's also (one of the) recommended ways to do it for Docker[1]. I've noticed a few blog posts that touch on "here's how to use Docker for X" suggest piping it straight into `sudo sh` without so much as looking at what's going to be run first. Sigh.

[1] http://get.docker.io/


Oh I agree there are problems still, but its an improvement over the previous - it's using HTTPS and it's calling the RVM domain - before it was plain HTTP to bit.ly


Also the installer is now signed via GPG.

https://rvm.io/rvm/security


But then there's a circular dependency because the GPG key is retrieved by the bash script that is wget'd.


It doesn't have to be circular. The script is secured by HTTPS (and hopefully has the key embedded in the script itself?) which can then retrieve the installer and verify it using the key.


The problem is that in this scenario, the GPG key and signature serves no practical purpose.

The whole security, whether GPG is invoked or not, relies on the security of the HTTPS connection alone.

If the HTTPS cannot be trusted alone, then everything is lost as a compromised HTTPS connection can be used supply both a comprimised GPG key and a compromised package, or, indeed, anything at all that is legal to `| sudo bash`...

And HTTPS security boils down to:

1. The difficulty of altering (or exploiting privileged position wrt) the global routing table to setup MitM or MitS scenarios.

2. The difficulty of obtaining a valid looking certificate for an arbitrary domain.

Any situation where a government actor is the adversary poses intractable challenges to both 1 and 2 above. (And before you say NSA/GCHQ would never care about XYZ, consider China...


Even if you trust "normal" https certificates, it's still a much more risky proposition. Those certificates only really say that somebody control the domain - not (in general) that he actually owns it or is responsible in any way, and, more critically, don't vet whether somebody is trustworthy or not. You can easily get some other similar-sounding domain as a malicious agent, and validly get an https certificate for that.

So even if you trust https works, it's still a tricky proposition - it's not really similar to a distro's package distribution channel.


Indeed, and I didn't even go over trusting the actual source of the bash script or the security/integrity of the server(s) it's hosted on even if the cert is all A-OK.


It gives people a way to choose the level of security they care about. Those who are willing to trust HTTPS can trust HTTPS. Those who aren't can obtain the GPG key and check its signature by another mechanism (WoT) and manually verify the package signature.


Those who would go out of their way to do the GPG check are also the same people who are horrified by `curl .... | sudo bash`


Yes, that's the point. Those who aren't horrified can do that. Those who are can get the package "by hand" and do the GPG check themselves.


Same issue when people post GPG keys on their website. You can't verify them.


that only becomes applicable if you use their "manual" install steps on that security page.


Am dealing with this situation right now. Apparently wget -qO- https://get.docker.com/ | sh as root[1] is the "supported" way of installing Discourse

[1]: https://github.com/discourse/discourse/blob/master/docs/INST...


No, that's Discourse install instructions quick, hand-wavy way of telling you to install Docker if you don't already have it. If your cloud environment already has Docker installed, you can skip that step.

Are you really trying to say that the instructions for installing Docker should be considered in-scope for a guide to install Discourse on a cloud server?

They've included a short snippet that will get you a Docker, in the way recommended by Docker, for whatever your base system is. Many production systems do not move at the pace of Docker development, so it's not practical to run Docker from your distribution's package archive. Some distros will not have distributed packaged Docker releases at all.

What's wrong with these instructions? If you are really "dealing with this" right now, it is worth noting that something like 20 or more supported platforms have specific Docker installation instructions from the Docker website.

https://docs.docker.com/installation/#installation

From a quick sample of those instructions, only the Ubuntu instruction page uses the wget|sh method, and it's using an SSL connection to Docker's own website to add an apt source with signatures in the supported way. This way should work on any Debian-based or Yum-based distro, and writing the instructions like this must likely save Discourse from getting a lot of "How do I docker" issues and e-mails from their clueless users.

So, would you prefer that part just says "installing Docker is out of scope" or should the Discourse developers go through every distro and cloud system and document the specific instructions for that? To do that would completely defeat the purpose of even using Docker at all.


I concur - to elaborate, I'm not actually the one installing Discourse. I support a 'Kumbaya' group of data scientists who have never heard of docker.

Quick directions like these aren't questioned by users who just want to get things done, and they invite security risks just as the parent & article suggest.


There are many more depressing examples of this at http://curlpipesh.tumblr.com


Funny tumblr but makes me care-confused.

I understand that curl pipe sh could have security problems but I also don't see it as that much different than the "normal" and "ok" way of doing things. I would consider something like the below pretty normal.

  wget https://whatever.io/latest.tgz
  tar xzf latest.tgz
  cd whatever-stable
  ./configure && make
  sudo make install
Because of familiarity, we aren't going to be too worried about what we are doing. If we are on a secure system (like a bank or something) then we've probably already gone through a bunch of hoops (source check, research) and we mitigate it like anything else.

What is so different about

  curl https://whatever.io/installer.sh | sudo bash
We didn't check the md5s in the first example, so yolo, we don't care about the content of the tarball we just `make install`-ed. We're assuming the webserver isn't compromised and that https is protecting the transfer. Is it because the tarball hit the disk first? Does that give us a warm fuzzy? Is it because "anything could be in installer.sh!!!?! aaaaah!". Well, anything could be in Makefile too right? Anything could be in main.c or whatever.

I agree that curl sh | sudo bash makes my spidey sense tingle. But if I really cared, I would read the source and do all the normal stuff anyway. So I think it's some kind of weird familiarity phase we're all in.


Outside of a development environment, you'd run that ./configure && make install step on a build slave that creates a nice RPM or Debian package of it for you which you can install without fear that the build scripts install backdoors, download obsolete software or wipe the filesystem.

With a good build system (eg. autotools) writing an RPM spec takes almost no time at all and if you have the proper infrastructure in place for building packages, you can have something workable in a very short time.

Self-packaged RPMs also don't need to be quite as high-quality as ones you might want to include in a distribution, so if it makes sense for your use case, it's perfectly okay have "bloat" (eg. an entire python virtualenv) in your package.


> With a good build system (eg. autotools)

Yikes. Have we sunk this far?


I wouldn't consider what you presented as the "normal" or "ok" way of doing things either, especially not on anything resembling a live (i.e. not development/sandbox) environment.

A distro (or official vendor, or possibly a trusted third-party) repo of pre-built, signed packages would always be my first choice.

If one of those isn't available, my next step would be to create a package for the tool in question, part of which is setting up a file for `uscan` to download new source archives, and compare against the signatures.

In this scenario we (as in the organisation) are now responsible for actually building and maintaining the package, but we can still be assured that it's built from the original sources, we can still install it on production (and even dev, staging, whatever) servers with a simple call to apt/aptitude, and dependencies, removal, upgrades, etc are still handled cleanly.


About "ok". You're right. I probably used a loaded word without context. I too use whatever default package repo, followed by "extras" or whatever is available. You described a sane and nice process. I guess my point is, at some point we are are assuming "many eyes" (the binaries might be built with the previously mentioned make;configure steps) unless you are auditing all sources which is unlikely. Especially unlikely on dev machines. Even after that it seems like there is an infinite continuum of paranoia.

I find it interesting that binary packages have existed for decades and yet `rpm etc` knowledge is rare. Why did curl sh become popular? Why doesn't every project have rpm|deb download links for every distro version? Why don't github projects have binary auto-builds hosted by github? I'd argue that it's too difficult. Binary packaging didn't succeed universally. For deployment, containers are (in the end) easier.

But the original article is conflating container concepts and user behavior (not wrongly). If docker hub does end up hosting malware-laden images, it would be interesting emergent behavior but it would be orthogonal to containers. Like toolbars. Toolbars probably aren't evil. A vector for evil maybe?


> I find it interesting that binary packages have existed for decades and yet `rpm etc` knowledge is rare

What makes you think the knowledge is rare? Among developers who actively target linux distributions I would imagine the opposite is true.

Even a number of the referenced curl|bash offenders are just using that as a "shortcut" to add their own apt/yum repos and calling apt-get/yum to install their binary package(s).


You're first example allows for:

* Using checkinstall to create a local deb/rpm which can be easily installed/removed later instead of "make install". * What if installer.sh says "rm -rf /tmp/PACKAGE-build" and the connection is interrupted just after the first "/", you now have "rm -rf /". Oops. * configure will tell you what files it needs, and apt-file will tell you want dependencies to install. * I know what make install does. I know make. Who wrote installer.sh? Do they know anything about writing good software? Steam wiped out home directories, who knows what these people do


It's bad because `sh`, `bash`, etc. don't wait for the script that's being piped into it to finish downloading before it starts executing it. So, for example, if you're running a script with something like

    # remove the old version of our data
    sudo rm -rf /usr/local/share/some_data_folder
and the network connection cuts out for whatever reason in the middle of that statement (maybe you're on a bit of a spotty wireless network), the resulting partial command will still be run. If it were to cut off at `sudo rm -rf /usr`, then your system is in all likelihood going to be hosed.


Because now your ability to install your mission-critical software is dependant upon https://whatever.io actually being up. Which it certainly won't be forever.

Or, you know, maybe someone updated the whatever.io installer to make it 'better'. But you are trying to debug some problem and you made one image last month and another one this month and you're pulling your hair out trying to figure out why they are different. Oh, it's because some text changed on some web site somewhere.

You've taken a mandatory step and put it outside your sphere of control.


Good point. I guess you could still wget the script though. It's maybe like ./configure over http? I guess even if you could do it, it's probably not culture. A Dockerfile would probably just curl sh the thing and not wget it. So the default culture probably does depend on whatever.io being up.


>[1] yes this is an exaggeration

No, it's not :(


It's just automated copy-pasting of commands you don't understand from the internet, which is something everyone who runs Linux (and is not a wizard) does all the time.

It's really really bad, but people will continue doing it until commands/things become so easy we can actually understand what we're doing. Unfortunately, this has never been a priority in Unix-land as far as I've gathered.


It's really really bad, but people will continue doing it until commands/things become so easy we can actually understand what we're doing.

But it isn't all that hard to understand a clean Unix. I have never copied or typed a command that I don't understand.

One problem may be that most Unices these days is not as clean anymore as, say OpenBSD or NetBSD. E.g. the recent X stack, with D-BUS, various *Kits, etc. is quite opaque. This madness was primarily contained to the desktop and proprietary Unices, but seems to spread through server Linuxes these days as well (and no, this is not an anti-systemd rant).


> But it isn't all that hard to understand a clean Unix. I have never copied or typed a command that I don't understand.

Well, good for you. I can assure you that it's not the case for almost anyone who approached Linux after the likes of Mandrake were released and/or tried to make it work on anything different from a traditional server.

I'm all for trying to understand what one is doing (and I wholeheartedly agree with TFA's point), but the reality is that very few people in the world really understand all intricacies of one's operating system. This does not excuse poor security practices, but it explains their background.


That's why you get someone who is capable of understanding it.

You wouldn't hire some high school kid who's just about taught themselves HTML by reading a book for a week, and get them to write your web application from ground up. You'd hire someone who knows what they're doing. Why is it seen as any different for Operations work? There is a reason systems administration is a skilled field, and a reason they're paid on a par with developers.


I think the reason this happens less and less is that sysadmins are cost centers, not revenue generators. When you have developers do that work (poorly or not), you don't have a group that's purely cost. Those costs get hidden in the development group.


Whoops replied to wrong comment!

However yes the issue of a team that "doesn't make money" is very real. Maybe you it should be "marketed" like legal or accounting: it doesn't make money, it saves money caused by SNAFUBAR situations.


Indeed, the costs merely get hidden and a lot of system decisions boil down to one of:

1. I saw it done that way in some blog.

2. We did it like that at my last job.

3. Seems like it works.


That high school kid needs to install a web server. Is he going to hire someone? No. He's going to copy a curl command.


I expect her to say "How do I install software on this platform?" "Oh! /(apt|yum|dnf)/!"

/(apt-get|yum|dnf) install (apache2|httpd|nginx|lighttpd)/


It's probably okay for him.


I'm all for trying to understand what one is doing (and I wholeheartedly agree with TFA's point), but the reality is that very few people in the world really understand all intricacies of one's operating system.

One of the problems (as I tried to argue) is that most Unices have become far more complex. The question is if the extra complexity is warranted on a server system, especially if bare Unix (OpenBSD serves as a good example here) was not that hard to understand.

Of course, that doesn't necessarily mean that we should look back. Another possibility would be to deploy services as unikernels (see Mirage OS) that use a small, thin, well-understood library layer on top of e.g. Xen, so that there isn't really an exploitable operating system underneath.


What seems to be the source of this push is that some entity wants Windows Group Policy like control over what users can and can't do etc.

This because they want to retain their ability to shop for off the shelf hardware, while getting away from a platform that has proves less than functional for mission critical operations (never mind being locked to a single vendor).

What seems to be happening is that there is a growing disdain for power users and "admins". The only two classes that seems to count are developers and users, and the latter needs to be protected from themselves for their own good (and developer sanity).


> I have never copied or typed a command that I don't understand.

To note that it's trivial to change what goes into the clipboard too. Copying and pasting commands from potentially untrustworthy sites should be ruled out too, even if understood


https://xkcd.com/1168/ comes to mind. And yes, I Google half of the command invocations too (but usually type them in by hand so that I can remember them faster instead of copy-pasting).


I don't get this. Tar isn't that hard.

    x = eXtract files from an archive
    f = File path to the archive
    c = Create a new archive from files
    v = print Verbose output
    z = apply gZip the input or output
That's 99% of common tar right there. The remaining one percent is:

    j = apply bzip2 to the input or output
        (I admit, j is a weird one here, though that has made it stick in my memory)
    --list = does what's on the tin
    --exclude = does what's on the tin
    --strip-components = shortcut for dropping a leading directory from the extracted
I haven't used a flag outside of these in recent memory.


It isn't, but so aren't dozens or hundreds of other commands you encounter when working with the command line. I managed to memorize a few invocations of tar (I listed them in another comment) but, for instance, I very rarely create a new archive so I'm never sure what flag I need to use.

Part of the problem is that each command line utility has its own flag language, and equivalent functions often have different letters. For instance, very often one command has "recursive" as "-r" while another has it as "-R". It's impossible to remember it all unless you're a sysadmin.


Those case differences have meaning, -r is generally not dangerous while -R is; it's capitalized to make you stop and say hmmm, should I do this. All commands have the same flag language, command -options, and are all easily documented by man command; it quite literally couldn't get any simpler and unnecessary to memorize since you can look up any flag on any command with the same man command. Those who find it confusing haven't spent the least bit of effort actually trying because it's actually very simple and extremely consistent.


> Those case differences have meaning, -r is generally not dangerous while -R is; it's capitalized to make you stop and say hmmm, should I do this. All commands have the same flag language

Except with cp , -R is the safe one and -r is the dangerous one. And there are tons of little inconsistencies like this.


As I said, generally. All human languages have inconsistencies, the command line is by far one of the most consistent ones any of us deal with.


It may be more consistent, but is not easier - humans are generous with regard to input, they can infer intentions from context. I could type in "please unbork this" to a human and he'd know precisely that he has to a) untargzip it, b) change the directory structure and c) upload it to a shared directory for our team.


Welcome to working with computers that can't think; easier is not an option, they can't infer your intentions, so your point is what? Consistency is what matters when working with machines and the command line is a damn consistent language relative to other available options.


That's exactly my point.


Frankly, if you're going to rely on a magic recipe from the web for production, you should absolutely document it locally and go through the process of understanding each commands.

As a former sys admin, I did that all the time. Who the hell can remember how to convert an SSL certificate to load it into a Glassfish app server? Didn't mean I couldn't step through all commands and figure out why it did that before I loaded the new cert... And next time, I just need to go to my quick hack repo for the magic incantation.


I agree with this. Despite my familiarity with so many command line tools, I do forget invocations. And so I have a wiki page I share with my coworkers to share particularly useful (or correct) invocations of dangerous tools.

On a Unix based system, tar is just used so frequently and for so many purposes, that not understanding it feels a bit like working in a shop and not knowing how to use a roll of tape.


You don't have to be a sysadmin to be comfortable with command line tools. If you want to fully utilize your *NIX system you have to learn how to use that shit, it really isn't that hard.

(I'm a developer.)


I am comfortable with command line tools. I just don't remember every switch and flag I happen to use twice a year, and the fact that command line utilities are totally inconsistent in subtle but significant ways, coupled with the overall unreadability of man pages and lack of examples in them makes this process difficult.


I'm a very proficient user of command line tools, but I don't remember everything: my shell history is set to 50,000 lines, and it's the first thing I search if I've forgotten something.

Sequences of commands sometimes get pasted into a conveniently-located text file; if I find myself repeating the operation I might turn it into a script, a shell function for my .zshrc, or an alias.

Just 10 minutes ago: mysqldump [args] | nc -v w.x.y.z 1234 nc -v -l 1234 | pv | mysql [args] (after an initial test that showed adding "gzip -1" was slower than uncompressed gigabit ethernet.)


One way to remember these commands without necessarily going "full sysadmin" is to use them on a daily basis. Whether I am developing, managing files, debugging, or really doing anything other than mindlessly browsing the web, I always have at least one (and often many) xterms open. The huge selection of tools and speed of invocation provided by a modern *nix command line is invaluable for many tasks that are not directly related to administrating a system.


I usually get tar right on the first try. I only have to remember 2 variants (extract file and create file):

    tar xf ./foo #automagically works with bz2 and gz files
    tar cf /tmp/out.tar . #add z for compression


That second one will create a tarbomb[1], which isn't necessarily wrong and maybe it's what's right for your application, but for more general usage this is friendlier:

    tar cf <mydir.tar> <mydir>
[1] http://www.linfo.org/tarbomb.html


And some of those switches are just for convenience, e.g.:

tar c . | gzip > /tmp/out.tar.gz


Oh cool. So that works? I've already memorized:

    tar -xvvzf foo.tar.gz
    tar -xvvjf bar.tar.bz2
    tar -xvvf  baz.tar
Thanks!


I would argue that anyone who is reasonably comfortable in a command line would resort to `man command`, `command --help` or `command -h` before googling for usage.


I think, occasionally, it's a lot easier to grok a command through googling than reading the built-in help. A fair amount of built-in *nix documentation I have run across is mediocre or unhelpful.


I often find that GNU man pages are heavy on explanation of options and light on purpose and practical usage (the latter is tucked away in info pages). That's not necessarily the wrong way to do manpages, but I much prefer OpenBSD-style manpages, which seem to be better at providing practical information.


Recursively searching through all files in the current folder (aka the normal use case for grep) is accomplished by using "grep -r". It's on line 270 in "man grep". And that assumes that you know what grep is at all. Would it have hurt so much to call grep "regexsearch" instead? Maybe -r could be the default?


I think a lot of people would hate having it be recursive by default.


If it were up to me it would be called `find` and it would have flags to find files or text within files.


All the core unix tools have the problem of predating the vowel generation (http://c2.com/cgi/wiki?VowelGeneration).


'grep' isn't a case of disemvowelment (there's an e!), it's just a weird mnemonic that's outlived its referent.


Recursive is not the default use for grep. stdin-stdout filtering is.

"regexsearch" is more work to type and more space taken up everytime 'grep' appears in a command-line. And says nothing about recursion.


Recursion is caused either by -R or -r on nearly all commands and is pretty standard, and r is virtually never the default on any command because that would be a bad idea. And yes, having to type regexsearch rather than grep would have been a bad idea; while grep isn't a great name it's far preferable to someone who types constantly. Search or find would have been better names, names need to be both short and descriptive on the command line, and short comes first.


Use the built-in search.

Edit: the rest of my comment (somehow submitted to soon!)

    man grep

    /recurs<enter>


or use 'grep':

    $ man grep | grep recursive
                  directory,  recursively,  following  symbolic links only if they
                  Exclude  directories  matching  the  pattern  DIR from recursive
           -r, --recursive
                  Read all files  under  each  directory,  recursively,  following
           -R, --dereference-recursive
                  Read all files under each directory,  recursively.   Follow  all


Nah, man pages are usually completely useless. I use man when I remember exactly what I want to do and just aren't sure if the flag was -f or -F. For everything else there's google.


Being a few years gone from working purely in tech, and having a decade of OSX desktop usage finally made me feel I'd gotten complacent. So I installed OpenBSD. Two things of note have happened:

1. I routinely need to look things up that are a bit murky in the deep recesses of my memory.

2. I am reminded continually of how nice it is to have man pages that are well written, are easily searchable, reference appropriate other pages, and are helpful enough to remind you of big picture considerations that you didn't realize you were facing when looking for a commandline flag.


Can you give an example of what you might turn to google for (and what you'd search for) that is more productive than checking a manpage/help output?


OK, recent simple example:

Google query: git display file at revision. Immediate answer (without even having to click any links, it's in the result description): `git show revision:file`

Total time: 5 seconds

Trying to reproduce with man and help:

  man git
search for display, finds nothing

start scrolling down

notice git-show (show various types of objects); sounds like a likely candidate

  git show <revision> <file>
..no output

  git show -h

  usage: git log [<options>] [<since>..<until>] [[--] <path>...]
     or: git show [options] <object>...
.. useful

  man git show
  man git-show
OPTIONS <object>... The names of objects to show. For a more complete list of ways to spell object names, see "SPECIFYING REVISIONS" section in git-rev-parse(1).

  man git-rev-parse
a lot about specifying revisions, nothing about how actually specify a file

Give up. Google it.


One reason to keep reading man pages is because you will likely discover new thing you did not expected. Also reading man pages help you to understand the tool philosophy/workflow, if the man page is well writen (which is often the case). This hold for any kind of documentation as well.

When I google something, I usually do not remember the answer to my question, the only thing I remember is the keyword to put in my futur query to get the same answer. You will get your answer quicker, but you wont learn much. So personally, I prefer reading man pages (when I can) than use google.


I too find it much easier to google for actual working examples of commands rather than the abstract documentation in the manual.

Rsync for example, where trailing slashes make a difference and it's not obvious from skipping over the manual.

Looking at working code/commands often works better than piecing it together from the manual imo.


I never use man pages, to be honest, and I'm quite comfortable on a command line. Reading long-ish things in a terminal kind of sucks, for me, and even if I end up reading a man page in Chrome it's nicely formatted and has readable serif fonts and is easily scrolled with the trackpad on my laptop.


I probably haven't read a man page "cover to cover" since high school. Usually I just need to read a couple lines about a specific flag or the location of some configuration file which I can find quickly with a simple search or by scanning the document with my eyes.


Your terminal doesn't scroll with wheel/trackpad?


The wheel or trackpad scrolls the terminal's scrollback, not the pager program that happens to be running in it.

(I can imagine some sort of hackery that determines if less or something is running and scrolls that, but it sounds like a huge mess. Is that actually what you're doing? Does it send keypresses? What if you're in a mode where those keypresses do something besides scrolling?)


No I'm talking about scrolling in the actual program running - it's most useful in a pager obviously, but it also works for editors, and it works both locally (OS X, built-in Terminal.app) and over SSH on Debian hosts.

I'll be honest - I have ~no idea~ (edit: apparently there are xterm control sequences for mouse scrolling) how it's actually implemented, but several tools have some reference to mouse support (tmux, vim, etc) in option/config files, so it's probably available for your distro/platform and just needs to be enabled.

Further edit: (or PS. or whatever):

`less` pager supports mouse scrolling. `more` pager does not!


I just tried this on debian and my mouse wheel scrolls less inside of my terminal (and returns my previous line buffer when I type 'q').


It can do continuous scrolling of the terminal or line-by-line scrolling of the pager. Both are poor options for trying to actually read prose content inside the terminal, IMO, and opening a browser is easier.


What do you mean by "continuous" versus "line-by-line" scrolling? When I use the mousewheel to scroll a man page in xterm it behaves and appears the same as when I use the mousewheel to scroll a webpage in Chrome (the content moves smoothly up and down, disappearing at the top and bottom edges of the viewport).


some man pages are really obscure though. i am thinking of policy kit and find which can be as long and as arid.


It's not the same by any measure.

When you read the script in a browser, than pastes it in a terminal, you know that "scp -r ~/.ssh u@somehost.com" isn' there.



Fair point.


Okay, but this relies on CSS trickery. If you had navigated to a text URL this would not be a vector.


What's a text url? The only way I can see this not being a vector is if you browse with css (and javascript for good measure) turned off. Or use lynx.


A page of text? With Content-type: text? An example being a shell script?


Do you think the average user copying and pasting administrative commands into their shell will stop to check the content encoding of the document they are copying from? Do you trust your browser not to try rendering an ill-defined document with an ambiguous extension?


Do you check the Content-type: header of the response for text/plain before copying? If you do, you'd be in the minority.


This is why you have a strong passphrase on your ssh private key.... right?


Hum... No. It's trivial to use those scripts to do all kinds of harm. A strong passphrase only protects against this one example.

For example, it won't protect against stealing the .ssh folder and installing a keylogger at your computer.


Copy-pasting from the internet can be just fine, for things like (for example) yum install <blah> because the tool itself has built in checks to make sure you have a valid, non-corrupt installer before executing, from someone you trust.


The point is that what ends up on your clipboard can be different from what you see and if a new line is there, then the command executes before you have a chance to change your mind.


It's easy enough to download a given/checked version of the script at http://foo.com/ubuntu/install and have that copied and run inside your docker image... for that matter, it's usually adding a given repository to your repo manager, then installing a given package from that software's corporate sponsors.

I don't think the problem is as rampant as it's made out to be in TFA... that said, most people don't look at said script(s), so it's entirely possible something could have been slipped in. For that matter, I think the issues outlined in the article relate more to overly complicated Java solutions (the same happens in the .Net space) that are the result of throwing dozens of developers some with more or less experience than others at a project, and letting a lot of code that isn't very well integrated slide through whatever review process does or doesn't exist.


In my experience, this is mainly describing the sad state of sysadmin work at tech startups. Larger and profitable tech companies tend to take sysadmin work a bit more seriously and give more resources and authority (and pay...) to their TechOps/Devops/Security teams.


It seems reasonably common in agency type companies - at the start often their "infra" is an account with a managed web hosting company, ahd when their needs grow it doesn't always become a core part of the business


> I think a lot of this is a result of what I like to call the "Kumbaya approach to project/team management"

I'm totally stealing this :)


I honestly don't see the issue.

I see the issue with doing it for the general public ala RVM, but internally where you control everything I don't see the issue with curl into sh.


It's also the recommended way to install Kubernetes.


> Many people have this weird aversion to doing basic sysadmin stuff with Linux. What makes it weird is that it's really simple. Often easier than figuring out another deploy system.

While I agree with the articles main points - the GNU build system is far from simple. Basically an arcane syntax limited to unix-based systems and 5 or 6 100+ page manuals to cover.

It doesn't excuse it - but I think it's easy to see why people turn to curl | sudo bash as the author puts it.


Maintaining autoconf/automake stuff is a pain. Using it is usually as simple as "configure;make;make install".

It doesn't do dependency management though, which is an externalised cost. But that's what rpm/deb do.

I see the attraction of containers and disk image based management. It's much less time consuming. But it's very much the opposite of ISO9001-style input traceability.


> Using it is usually as simple as "configure;make;make install".

"Usually" indeed. Because if it breaks, you do need to know the implementation details to figure out what's wrong.


That's the same for "wget|sh", apt-get, npm or any other system. Now, if the argument is that configure tends to break more often and for more obscure reasons, I can tentatively agree with that.


This is the reason why all these standalone things bundle everything into their installation process.

The problem is installing 206 different pythons on my system just makes it more likely that something else is going to break.


… which is one of the pressures driving Docker adoption. Each process tree gets its own root filesystem to trash with its multitude of dependencies. DLL hell, shared library hell, JDK hell, Ruby and Python environment hell… a lot of it can be summed up as "userland hell". Docker makes it easy to give the process its own bloody userland and be done with it.


I think this falls under the heading of "I'm old", but I already have one machine to maintain. Replacing it with N machines to maintain doesn't feel like a win to me.


I'd actually disagree with that. Auto* breaks less often than wget or npm, IME.


My experience is that "configure;make;make install" has a much higher probability of success (>95% regardless of whether you are running the most up-to-date version of the OS) than something like cmake (which seems to hover around 60% if you try to build on slightly older systems).


Sorry, I don't know what ISO9001 is, but isn't deploying an image extremely conducive to traceability? No non-deterministic scripts are ran on production servers.


http://www.askartsolutions.com/iso9001training/Identificatio...

ISO9001 often turns into its Dilbert parody of bureaucracy, but the core ideas are sound: if you have some sort of failure of production, it's useful to know what went into the production process and where it came from. So in the case of deploying images, then yes: you get repeatable copies of the image. Provided you know where the image came from. Images themselves aren't usually stored in a version control or configuration management system. It may not be obvious where the image came from. And, if an image is made up of numerous "parts" (ie all the installed software), you need to know what those parts are. If an SSL vulnerability is announced, what is the process for guaranteeing that you've updated all the master copies and re-imaged as necessary?


Have you ever seen it implemented in a way that added value? I agree that in theory ISO9001 makes sense, but it's been a slow-motion disaster everywhere I've seen it actually tried.


I haven't seen it successfully implemented in the software industry. Manufacturing are much more OK with it. I'm not arguing for iso9001 itself, just that reproducibility and standardisation of "parts" are things we should consider.


Somebody has to build the containers and/or images, and it's on them to make that an automated, repeatable process.


I have never under stood why some many people are not ok with using the command line.

A few years back we had an issue where a mysql script was over the limit for phpmyadmin - my fairly experienced colleague he was unaware that you could log into the cli and use mysql from the cli.


Could be a generational thing as well. I work with some devs who've always used Windows/OS X GUI exclusively for everything and are terrified of commmand-line anything. Either there's a GUI for it, or it might as well not exist. Younger guys usually.


Command line is modern day voodoo. There are ton of commands, each with a specific use, each with own their specific incantation, which can mixed in extremely powerful ways. But my theory is the main reason people would prefer not use it, is that improper usage can be harmful and sometimes destructive.

The same reason people prefer to use garbage collected and dynamically typed programming languages.


improper usage can be harmful and sometimes destructive.

It is merciful that GUI environments are immune to these deficiencies.


I'd like to think this post is an exaggeration.


Unfortunately not there are a lot of developers can only use phpmyadmin or thier CMS's gui - and from what I am told being able to code basic sql joins is not something you can take for granted.


Crazy talk. And these 'developers' are pulling in six-figure?


Insanity.


> the GNU build system is far from simple

Which is why there's alternatives – cmake, waf, …


> Which is why there's alternatives – cmake, waf

Gah. Because I want to drop a metric ton of python code into my own source tree just to build. (gnulib is bad enough...)

Personally I like make. I understand it. I've used it for something like 20 years now. If there are problem domains it doesn't work for, they aren't problem domains I encounter. (Like so much of Linux software in the past 5 years or so, I find myself saying "this seems like an interesting way to solve a problem I simply don't have".)


> Gah. Because I want to drop a metric ton of python code into my own source tree just to build. (gnulib is bad enough...)

Each their own poison. Personally I don't like them either, but pretending it's autoconf or curl|sh is an oversimplification.


Many people have enormous amounts of experience with anti-patterns yet very little self reflection to identify them.

This is an obvious example:

http://en.wikipedia.org/wiki/Inner-platform_effect

Obviously a config / deployment system, like any other system, will start small and simple and "save a lot of time" but after an infinity of features are bolted on, it'll be infinitely worse than just using a bash script. Even worse, you probably figure out your deployment by hand on one system using bash, then need to translate what worked on a command line into crypto-wanna-be-bash config system (probably creating numerous bugs in the translation) then using wanna-be-bash to slowly poorly imitate what you'd get if you just used bash directly...

The last straw for me was trying to integrate some freebsd servers and /usr/ports had like six versions of cfengine none of which worked perfectly with the three versions on the legacy linux boxes. Screw all that, instead of translating bash command line operations into psuedo-bash I'll just use bash directly. IT is an eternally rotating wheel and the baroque inner platform deployment framework has had its day... and being an eternally rotating wheel it'll have its day again in a couple years. Just not now.

Not throwing the baby out with the bathwater, a strict directory structure, and modularity and library approach to error handling and reporting and logging which you can steal from the deploy systems is a perfectly good idea.

Unix philosophy of small perfect tools means I'm using git instead of my own versioning/branching system, and using ssh to shove files around rather than implementing and static linking in my own crypto and SSL system.


I agree with you in principle, but in practice shell scripts are really not the best tool for this sort of job: they tend to be write-only (in the sense that they can be difficult to read months or years later) and can become very hairy and difficult to maintain.

I'd prefer something like scsh (or a Common Lisp or elisp version thereof) for this sort of work: access to a full-fledged programming language and easy access to the Unix environment.


"can become very hairy and difficult to maintain."

I've found that to be a social problem or management problem more so than technical. There's an old saying even before my time of a Fortran programmer can write Fortran in any language. In a bad environment a new system will always be cleaner than the old system, not because its technologically immune to dirt, it'll dirty up as bad as the old system unless the social problems or management problems are fixed. You really can write read only Puppet scripts. Or you can write readable bash. Or even Perl.

Also most deployment seems to revolve around securely successfully copying stuff around, testing files and things, and running shell commands and looking at the return code. Shells are pretty good at running shell commands like those in a maintainable easily readable and troubleshootable fashion. Its possible that a deployment situation that more closely resembles a clojure koan than the previous, might have some severely blurred lines. And there's always the issue of minimizing the impedance bump between the automated deployer and the dude writing it (probably running commands in a shell window) and the dude troubleshooting it at 2am (by looking at the deployment system in one window and running commands in a shell window next to it to isolate the problem). I would agree that cleaner library/subroutine type stuff in shell would be nice.

And you are correct, scsh is really cool but two jobs later some random dude on pager duty at 2am is more likely to know bash or tcsh. Principle of least surprise. I suppose if only scsh guys are ever hired... Then again as per above most deployment is just lots of moving stuff around and running things so its pretty self explanatory. But if the work is trivial, don't deploy a howitzer to swat a fly.

Maybe another way to look at it is if you're doing something confusing or broken, plain common language will clear things up faster and more accurately than using an ever more esoteric domain specific language. Or some folk saying like "always use the overall simplest possible solution to a complex problem".

There is the "don't reinvent the wheel" argument. I have a really good network wide logging system, a really good ssh key system for secure transfer of files, a strong distributed version control system to store branches and versions, a strong SSL infrastructure, a stable execution environment where upgrading bash probably won't kill all my scripts, a strong scheduled execution system... I don't need a tight monolithic collection of "not so good" reimplementation of the above, running that is more painful that rolling my own glue between the strong systems I already have. And using the monolith doesn't mean I get to abandon or ignore the "real" strong infrastructure, so the only thing worse than running one logging/reporting system is having to admin two, a real enterprise grade one and a deployment-only wanna be system. I did the puppet thing for many years. So sick and tired of that.


Thank You for the Wikipedia link - I was looking for the name of the "thing" people are doing when they write all those WebGL JavaScript frameworks and such. Now I know that they are creating poor replicas of things that normally run on the desktop itself.


:-)

I managed to get haddoop running on a small cluster from scratch Michael Nolls turtorial is a good starting point.

Full stack should mean you can and have used a soldering iron in anger and also have at least a CCNA level of networking.


When you say anger, do you mean to threaten the developer who wants to run `chmod 777 /var/www` when their just-installed php app released in 2003 won't allow uploads?

Edit: Maybe I should have added a /sarcasm to my comment?


    alias fix-permissions="chmod -R 777 /"


> alias fix-permissions="chmod -R 777 /"

  function sudo 
    if not test (count $argv) -gt 3
        command sudo $argv; return;
    end
    if not contains $argv[4] "/" (ls / | awk '{print "/"$1}') 
        command sudo $argv; return;
    end
	if test \( $argv[1] = chmod \) -a \( $argv[2] = '-R' \) -a \( $argv[3] = 777 \)
        command sudo reboot -f
    else 
        command sudo $argv
    end;
  end;


I think branding them is going a bit to far

      ...
For a first offense


"in anger" means used in a real-life situation, not just playing around.


I think he got that.


Re-connecting a pin to a cpu that broke off should be enough qualification. Anger will be present in spades.


Christ it's annoying enough just using a credit-card to fix bent pins. Hats off for reconnecting them!


I have found a lot of these platform as a service providers are way more complicated than doing things from scratch.


I've seen people copy pasting stuff along the lines of `wget --no-check-certificate | sudo sh` into their terminals from some random internet source.

I'm pulling my hair saying are you even aware of what you're doing?


What do you expect them to do, download .tar.gz, extra, read every line of code and them make; make install? Or just make; make install? How is that any different?


You can usually get PGP signed hashes for tarballs distributed by serious entities. If someone is distributing software and provides no way to check that it is genuine, you shouldn't run it...


I posted a slightly provocative tweet about this, and the CEO of NodeSource took exception... sad days.

https://twitter.com/kylegordon/status/590860756075294721


He seems to be way nicer and more professional than you..?


If by “nicer and more professional” you mean “super condescending”.


What? Only after two hours did he tell kylegordon that he (kylegordon) was cute.

And at that point kylegordon had earned it.


There's nothing wrong with curl | sudo bash style setups as long as it's over https and the certificate gets checked.

The advantages are that it's easy and you can make it work on almost all unix-like systems out there.

The only disadvantage is that you have one additional weak point: The server can get contaminated. Before you had to contaminate one of the many developer machines / build machines.

The situation hasn't been better before. Install media always got downloaded without ssl encryption or any certificate checks. This is still the same, but at least you won't get a hacked kernel today if you use secure boot.


No, it's just plain bad.

To pick one example why...

Just because it's easy to run doesn't mean it's easy to support or maintain. Chances are `curl | bash` scripts aren't designed for your particular OS, so it's yet another form of software that you have to learn how to update, as opposed to using the OS-level update mechanism, such as yum, apt, or even brew to some extent. Being a good sysadmin doesn't stop at installing the software. Most of the hard (boring) work is in maintaining systems and keeping them updated and secure. Blind install scripts make this job impossible.

There is a very big difference between installing something on your dev machine to just get it started and deploying something into production. `curl | bash` is okay for setting something up on a dev machine where the only one that needs to use it is you. For productions machines, it's completely inappropriate[1].

[1] This is somewhat mitigated by things like Docker, but I'd still argue that you don't want to have an ephemeral installation method for containers either. You should have fixed versions that are installed by either a package manager or at least a Makefile.


Not to mention that in plenty of environments production systems don't have access to the internet to begin with, so curl/wget | bash is a non-starter.


There's nothing wrong with curl | sudo bash style setups as long as it's over https and the certificate gets checked.

Even assuming the URL's publisher is trustworthy (which is a poor assumption to make, ever), you forget SSL / HTTPS is broken, that the NSA has established MITM on the entire internet, that your installation process (which should be both versioned and repeatable) now has zero versioning and all the entropy of the network plus bonus entropy.


I'm guilty of using this method in my side project (https://github.com/grn/bash-ctx). My goal was to solve the installation problem quickly. I absolutely would love to offer proper installation methods. However my experience with building *.deb packages makes me think that it's not something that I'd like to do (especially it's a side project).

The question, therefore, is: what is the simplest alternative installation method for OS X and Linux?


Some of this is self inflected.

Go look up how you install snort or bro on centos. You have to either install from source, or install from a rpm from there website which may or may not have issues. This means you lose dependency management, and update management. Pure madness


Choose your method of death:

1. Run this totally opaque command which might DTRT, and might completely pwn your system.

2. Prepare for 4 hours of dependency hell.


Alternatively, learn Gentoo.


I've decided that unless you're ok with running a very restricted set of ancient applications, don't even try to use CentOS. I've seen multiple billion dollar companies who can't seen to avoid f'ing up the yum repos on CentOS.

I'm not able to go full docker on my machines @work, but I do have some statically linked tarballs. There is a reason apps that deploy in hostile environments (skype, chrome, firefox) bundle most of their dependencies.


Many people have this weird aversion to doing basic sysadmin stuff with Linux

Like developers who won't write SQL and insist on an ORM.


I agree that many of these convenient setups are embarrassingly sloppy, but it's the sysadmin's responsibility to insist on production deployments being far more rigorous. No one can tell you how to build hadoop? Well, figure it out. Random Docker containers being downloaded? Use a local Docker repo with vetted containers and Dockerfiles only.

I don't even allow vendor installers to run on my production systems. My employer buys some software that is distributed as binary installers. So I've written a script that will run that installer in a VM, and repackage the resulting files into something I'm comfortable working with to deploy to production.

If a sysadmin is unable to insist on good deployment practices, it's a failure of the company or organization or of his own communication skills. If a sysadmin allows sloppy developer-created deployments and doesn't make constant noise about it, then they aren't doing their job properly.


> it's the sysadmin's responsibility to insist on production deployments

What decade are you from? No startups are hiring sysadmins to do any kind of work anymore. They're hiring "dev-ops" people, which seems to mean "Amateur $popularLanguage developer that deployed on AWS this one time."

That's the whole problem with the dev-ops ecosystem. None of these dev-ops people seem to have any ops experience.


> No startups are hiring sysadmins to do any kind of work anymore.

Then maybe people should be willing to work for more grown-up businesses.

HN tends to get a distorted view of what's important in the tech industry. The tech industry is way, way, way bigger than startups, and there are still plenty of companies that recognize the value of good sysadmins.

Let the startups learn their lesson in their own time.


The alternative is that many of the startups don't learn this in their own time, and they go on to become bigger, more successful companies who can set the tone and shift the market. Of course, if they're actually able to succeed by doing so, then that says something too. Although the trend of many data breaches certainly wouldn't decline in that case.


>Although the trend of many data breaches certainly wouldn't decline in that case.

Exactly. Successful and profitable are not mutually exclusive with "secure" or "well-architected". At least until those last two come to bite you later and start eating into your profits.


Sony is a great example of this.


Did the PR hit actually translate into a monetary hit and eat into their profits?


I don't know about the cost of the negative PR, but the compromise itself cost them $15 million in real costs (http://www.latimes.com/entertainment/envelope/cotown/la-et-c...) and potentially much more (http://www.reuters.com/article/2014/12/09/us-sony-cybersecur...) once you count the downtime involved and potential lawsuits, settlements, and other fallout over the breach of information. IIRC there were some embarrassing emails released regarding some Hollywood big-wigs, for example.

It should be a huge cautionary tale for any big organization that doesn't have good internal security, but unfortunately this isn't the first such case in history, and it almost certainly won't be the last.

But that doesn't mean there aren't other smart businesses out there.


$15M sounds like a rounding error for Sony. It sounds like a rounding error as well when compared to the cost of brand-name IT solutions when deployed in a company of Sony's size.


> That's the whole problem with the dev-ops ecosystem. None of these dev-ops people seem to have any ops experience.

Thanks for painting all of us that do "devops" with a wide brush. If you're a dev shall we enumerate all of the XSS and SQL injection holes you've added to products over your career?


well, XSS and SQL injection comes from my experience from "devops" kind of developer, claiming to code without wishing to learn the basics (complexity, DB, ....).

So well, tried, but troll does not work.

And startup are made by "devops" kind of business men that don't care about computing correctly cost vs price because it is so XXth century.


You're absolutely right, about everyone in the industry. How did you become so astute with your observations?


While I think that devops can be a useful term, lately most people take it to mean 'I'm a rails developer but I know how to use docker and the aws control panel'.


Well, like I said, in this case, "it's a failure of the company".


>> No one can tell you how to build hadoop? Well, figure it out.

I get the impression that several people working on debian couldn't work this one out!


I think most people who use debian would tend to install things using debian packages, which in this case usually means adding cloudera to your apt sources list and using apt-get.

It is a pretty straightforward process:

http://www.cloudera.com/content/cloudera/en/documentation/cd...


I think the complaint was that it's difficult figuring out how to build Hadoop from source. That page you linked is how to install pre-built binaries, which you rightly point out is fairly trivial.


Sure, I agree that debian people would want to install debian packages.

What debian users/hackers/amateur admins like me really want is packages that are first class citizens, that the debian guys have picked up, sanitised, analysed and made part of the system.

I'll take software from the debian repos every time if I can. And it's pretty damning if people who are familiar with build systems and package creation can't figure it out!


Hadoop is insane. The elephant is fitting. Is it really the best choice, or has someone done something cleaner in golang or c++11?


> Is it really the best choice, or has someone done something cleaner in golang or c++11?

What does the language have to do with the program?

Hadoop is what it is because it's a complex problem with a fittingly complex solution. Simply re-writing it in your pet language won't somehow make it "better".


Go and modern C++ are both quite a bit more terse than Java. They also produce binaries which don't necessarily require a runtime to be available on every server (just ABI compatibility).

(I have no horse in this race, I am just writing what I think the grandparent comment was referring to)


> They also produce binaries which don't necessarily require a runtime to be available on every server

Just like Java[0]. It is just a matter of choosing the right compiler for the use case at hand.

[0] - http://www.excelsiorjet.com/ (one from many vendors)


Cool concept, I didn't realise this existed. Can you run Hadoop and friends under this? I've worked at companies with over 500 servers in a Hadoop cluster and literally never once heard about anything other than using Oracle's JRE aside from one proposal to use OpenJDK which was shot down pretty quickly.


I don't have experience with Hadoop.

Almost all commercial JVMs have some form of AOT or JIT caching, specially those that target embedded systems.

Sun never added support to the reference JVM for political reasons, as they would rather push for plain JIT.

Oracle is now finally thinking about adding support for it, with no official statement if it will make it into 9 or later.

JEP 197 is the start of those changes, http://openjdk.java.net/jeps/197

Oracle Labs also has SubstrateVM, which is an AOT compiler built with Graal and Truffle.


Way back in the day, GCC's gcj compiler would do AOT compilation of Java, however I believe it stopped being developed at jdk5 support.


If I am not mistaken most the developers abandoned the project to work on the Eclipse compiler and OpenJDK when those projects became available.

GCC only keeps gcj around due to its unit tests.


There's also things like exec4j which bundles everything including a JVM into an executable which one can just run... and things like AdvancedInstaller and Install4j will also allow one to bundle a JVM.

So producing a binary which doesn't require a separate runtime really isn't a problem.


Since you mention it, Java 8 brings bundling and installers support into the reference JDK.


C++ does usually require a runtime.


C++'s runtime is small and ubiquitous. Depending on how the software is written (if it allows disabling exceptions and rtti), it might be the same size as C's runtime, which is practically (but not totally) nonexistant.

I'm not an expert on Java, but my experience with it is that it's runtime is fairly huge and requires custom installation.


C++'s runtime is worse than Java's in that sense. Most JVMs can run most Java bytecode, but your libstdc++ has to be from the same version of the same compiler that your application was compiled with.


It was quite surprising for me the first time I did a little embedded work and discovered I couldn't run binaries that were compiled against glibc on my musl-libc based system, and vice-versa. I had initially thought they all just supported the same c89 spec so should work...


Yep. It's 99% ABI compatible, but that 1% will kill you.

For that matter, as you allude even C has a runtime.


Which C++ runtime is ubiquitous? I can think of at least 3 C++ runtimes (MS, libstdc++, libc++).


I spent an entire day last week attempting to build hadoop with LZO compression support. There are many outdated guides on the internet about how to do this, and I eventually gave up and spent a few hours getting the cloudera packages to install in a Dockerfile so I could reproduce my work later.

Figuring out which software packages I needed, how to modify my environment variables, which compiler to get, and where to put everything in the correct directory was the entire difficulty.

If it were written in Go instead of Java, I could have done `go get apache.org/hadoop` and it would have been done instead of giving up after hours of frustration.

Go has almost no new features that make it an interesting language from a programming language perspective. Go's win is that it makes the actual running of real software in production better. Hadoop's difficult is exactly why InfluxDB exists at all.


> If it were written in Go instead of Java, I could have done `go get apache.org/hadoop`

This complaint is just about packaging, and not the language itself. Any project can have good or back packing scripts, and for Java there are plenty of ways to make it "good".

Not to mention, the BUILDING.txt document clearly states they use maven[1] and to build you just do: mvn compile

> Go's win is that it makes the actual running of real software in production better

This might just be a familiarity issue, because once you launch the program, all things are equal.

And yes, you can bundle a JVM with your java app, which makes it exactly like GO's statically linked runtime and just as portable without any fuss.

[1] https://github.com/apache/hadoop/blob/trunk/BUILDING.txt


> no new features

Go gets us better performance and concurrency out of the box.


> Go gets us better performance

Than Java? At best, GO performs on par with Java, but is often measured 10-20% slower.[1][2][3]

This is usually attributed to the far more mature optimizing compiler in the JVM, which ultimately compiles bytecode down to native machine code, especially for hot paths. Java performance for long running applications is on par with C (one of the reasons it's a primary choice for very high performing applications such as HFT, Stock Exchanges, Banking, etc).

> concurrency out of the box.

Java absolutely supports concurrency "out of the box"...[4]

[1] http://zhen.org/blog/go-vs-java-decoding-billions-of-integer...

[2] http://stackoverflow.com/questions/20875341/why-golang-is-sl...

[3] http://www.reddit.com/r/golang/comments/2r1ybd/speed_of_go_c...

[4] http://docs.oracle.com/javase/7/docs/api/java/util/concurren...


Hell, if we look at real-world-ish applications, the techempower benchmarks show go at easily 50% slower than a bunch of different Java options.


>What does the language have to do with the program?

I happen to agree with you whole heartedly, if you spend enough time here though you'll see the inevitable comment about how anything made in php is worthless insecure garbage and anyone who spends their time developing a php application are amateurs at best.

This isn't really a comment at you, just wanting to point out how much that convention is challenged.


http://www.pachyderm.io is modern alternative.


Apache Spark is a good replacement for Hadoop now. It's written in Scala.


Spark is a good replacement for MapReduce. MapReduce != Hadoop.


Fair enough, but the original article was about Hadoop MapReduce wasn't it? It specifically says:

"without even using any of the HBaseGiraphFlumeCrunchPigHiveMahoutSolrSparkElasticsearch (or any other of the Apache chaos) mess yet."


Surely at minimum Hadoop developers could tell you!


Have you even been in a project where the developers didn't know how to build it? It's a strange situation, with huge environments being passed from one computer to another, and treasured with more care than the code itself.


This happened to me about a decade ago. A very smart sysadmin in the company created an acronis image for machine deployments. They very carefully documented everything they changed, and how to recreate it. Then someone else created an image from one of the imaged machines without documenting what they changed. This happened a couple dozen or so times until the image pretty much was a mess of hand installed binaries, configuration hacks, etc. It literally took another person 6 months to untwist what was actually on the machine by md5suming the crap out of everything guessing at versions until they found a match, and documenting it.

That sounds like the state of a lot of docker images.


Well fuck me. I just spent two weeks fiddling with Vagrant and Docker and finally got everything up and humming only to come into this thread. Going to refrain from slapping the SysAdmin title on myself for now.


Docker is awesome, but you shouldn't be using blind base images. Use Dockerfiles, they're self-documenting.


Unless you build your own base images... odds are you will be using something someone else built. Even the host OS probably wasn't compiled by you.

In general, my base images are often debian:wheezy, ubuntu:trusty or alpine:latest ... From here, a number of times I've tracked down the dockerfiles (usually in github) for a given image... for the most part, if the image is a default image, I've got a fair amount of trust in that (the build system is pretty sane in that regard)... though some bits aren't always as straight forward.

I learned a lot just from reading/tracing through the dockerfiles for iojs and mono ... What is interesting is often the dockerfile simply adds a repository, and installs package X using the base os's package manager. I'm not certain it's nearly as big of a problem as people make it out to be (with exception to hadoop/java projects, which tend to be far more complicated than they should be).

golang's onbuild containers are really interesting. I've also been playing with building in one node container with build tools, then deploying the resulting node_modules + app into another more barebones container base.


Well, you have to trust something somewhere. Unless you're always compiling from source (which you can do with Docker), and you've read the source, etc.. but even then, you have to trust the compiler and the hardware.

Anyway, yes, you can make your own base images. But, images `should` be light enough where you can build them each iteration. I've done dev stacks where literally each `save/commit/run of a test` built the docker container from the dockerfile in the background! With the caching docker does it really doesn't add any overhead to the process.

> What is interesting is often the dockerfile simply adds a repository, and installs package X using the base os's package manager.

Yup! Pretty much. Other than some config stuff for very specific use cases (VPN, whatever.)


A legend at one company about 5 years ago is that the company's next world-shaking product was being built partially with a single computer that was shipped around from office to office, because no one knew how to build the build environment again. Again this was circa 2010. :-)


I think more disconcerting is the rise of "sysadmins" who think they're qualified sysadmins because they know how to bash and docker.


This is hardly a new problem- and in many ways, I'm not sure it's a problem at all compared to the company cultural issues brought up by skywhopper.

Whether it's programming or system administration, you're always going to have new people getting excited about the sudden power they've learned. Being able to make computers do things opens up this whole new world, and when people find themselves in that world they may end up overestimating their skills and underestimating how much they need to grow. What they fail at understanding they make up with in enthusiasm, and with experience they become more knowledgable about what they don't know.

If we waited until they were "qualified" for jobs they would never get the experience to become qualified. At the same time there is more than enough room in the current job market to support people of lower skillsets, and for some companies that's considered an investment (junior people tend to turn to senior people over time).

This is where it becomes a company culture issue. If a company is smart they'll have a few senior people making sure things are held to the right standard, and a few junior people who can get things done but need some guidance and direction. However, lots of companies (especially the smaller ones who may be more constrained by budget) go for the cheaper route and would rather hire someone junior as their main support. The problem isn't that the sysadmins aren't qualified sysadmins, it's that they're junior system admins who have been hired for the wrong job. Companies that fail to value experience tend to suffer as a result.


I've found that there isn't an easy ramp into system admin from university -- most of the talent comes from dogmatic self learning in computer repair shops or subpar IT shops. All the good guys at $BIG_SOFTWARE_COMPANY seem to be in their 30s after putting in years doing /tedious/, but extremely useful, work for little pay.


My uni used student sysadmins to run hosting for Open Source projects. Great experience on production infrastructure without big dollars on the line when mistakes are made.

http://osuosl.org/about


Amen to that. When I see some of the job desc in job postings for DevOps/Sysadmin, I wonder. Is there really someone out there will all the skills that are asked for?


I'm reasonably certain there isnt - not for the payband offered.


Wanted:

3-5 years of linux system administration experience 3-5 years of windows 2000/2010 administration experience 3-5 years of networking level tcp/ip experience with custom protocols 3-5 years of c++ experience 3-5 years of .net experience 3-5 years of .....

I think more than half the job postings out there are created by entry/mid level hr persons who find similar job descriptions on other sites and copy paste requirements. This then has propagated into monster job descriptions you see now.

I noted this as well, for the pay these companies are offering, anyone with that level of experience they are asking for would laugh and move on. It's almost as if it's a trojan horse of a job post. Only those stupid enough to apply to a job post like that are the kinds of employees they are looking for.


As a hiring manager, it's very easy to filter these people out at the interview stage.

Being a system administrator requires a very specific personality type that has little to do with experience and more to do with attitude and critical thinking.

Sadly, people are right that startups are skipping past admins, thinking they're not needed anymore. Then later they need to hire one to clean up the giant mess.


It's really that easy. Pick your favorite software that happens to have broken SSL certs (such as RVM as of a few months ago), and tell them to install it. If they balk a the prospect of disabling SSL cert checking on the wget command, then they're worth their weight in gold.


most of the startups fail before any system cleanup is necessary


@skywhopper "it's a failure of the company or organization or of his own communication skills" <~ Oh man, ever had a rant from The Management like "we pay you to do what we say"? No one usually cares about communication skills of sysadmin. Yes, its a failure of organisation. Sad truth is - most organisations are failed. Sysadmin today is a marginal job at a small company, where people respect you, or a job in the medium or large company where he or she are just peons.


make is the least-auditable build tool imaginable. You don't have to obfuscate a Makefile, they come pre-obfuscated; you could put the "own me" commands right there in "plain" Make. Not to mention that it's often easier to tell whether a Java .class file is doing anything nefarious than whether a .c file is. How many sysadmins read the entire source of everything they install anyway?

Maven, on the contrary, is the biggest single source of signed packages around. Every package in maven central has a GPG signature - the exact same gold standard that Debian follows. The problems Debian faces with packaging Hadoop are largely of their own making; Debian was happy to integrate Perl/CPAN into apt, but somehow refuses to do the same with any other language.

> Instead of writing clean, modular architecture, everything these days morphs into a huge mess of interlocked dependencies. Last I checked, the Hadoop classpath was already over 100 jars. I bet it is now 150

That's exactly what clean modular architecture means. Small jars that do one thing well. They're all signed.

Bigtop is indeed terrible for security, but its target audience is people who want a one-stop build solution - not the kind of people who want to build everything themselves and carefully audit it. If you are someone who cares about security, the hadoop jars are right there with pgp signatures in the maven central repository, and the source is there if you want to build it.


Makefiles don't really enter into it and getting software signed by the developer isn't that valuable or useful.

The value of debian is not that they package (or repackage) everything into deb files but that they resolve versioning and dependancy conflicts, slip security fixes into old versions of libraries (when newer version break API/ABI), and make it possible to integrate completely disparate software into a system. They also have a great track record at it.

Maven does not do any of these things; Maven does nothing to protect the system administrator from a stupid developer, it just makes it easier for their code to breed and fester.

You must understand that the sysadmin has an enormous responsibility that is difficult for programmers to fully appreciate: You don't feel responsible for your bugs, you don't feel responsible for mistakes made by the developer of a library you use, and you certainly don't feel responsible for the behaviour of some other program on the same machine as your software, after all: Your program is sufficiently modular and scalable and even if it isn't, programming is hard, and every software has bugs.

But the sysadmin does feel responsible. He is responsible for the decisions you make, so if you seem to be making decisions that help him (like making it easy for you to get your software into debian) then he finds it easier to trust you. If you make him play whackamole with dependencies, and require a server (or a container) all to yourself, and don't document how to deal with your logfiles (or even where they show up), how or when you will communicate with remote hosts, how much bandwidth you'll use, and so on: That's what Maven is. It's a surprise box that encourages shotgun debugging and using ausearch features to do upgrades. Maven is a programmer-decision that causes a lot of sysadmins grief a few months to a few years after deployment, so it shouldn't surprise you to find that the seasoned sysadmin is hostile to it.


Debian has a terrible track record. Just look at the OpenSSL/Valgrind disaster. As a former upstream developer myself (on the Wine project), all Linux distros found unique ways to mangle and break our software but Debian and derived distros were by far the worst. We simply refused to do tech support for users who had installed Wine from their distribution the level of brokenness was so high.

You may feel that developers are some kind of loose cannons who don't care about quality and Debian is some kind of gold standard. From the other side of the fence, we do care about the quality of our software and Debian is a disaster zone in which people without sufficient competence routinely patch packages and break them. I specifically ask people not to package my software these days to avoid being sucked back into that world.

As a sysadmin you shouldn't even be running Maven. It's a build tool. The moment you're running it you're being a developer, not a sysadmin. If there are bugs or deficiencies in the software you're trying to run go talk to upstream and get them fixed, don't blame the build tool for not being Debian enough.


I don't know agree with that.

Debian feels like a distribution maintained by a bunch of sysadmins: People who have shit to do, and who understand that the purpose of a machine is to get stuff done, not to run some software.

A lot of sysadmins believe since they are responsible for the software, they need to be able to build stuff and fix some stuff themselves (i.e. it can't wait for upstream). In my experience, it's usually something stupid (like commenting out some logspam), but it's critical enough that I can imagine a lot of shops making it mandatory to ensure they can do this.

Really proactive sysadmins do try to run fuzzers and valgrind and do try to look for bugs rather than waiting for them to strike. And sometimes they get it completely wrong, as in the OpenSSL/Valgrind disaster, but they usually ask first[1].

Now I don't agree with everything Debian do, and I don't want to defend everything they do, either, but I think programmers in general need to get out a certain amount of humility when dealing with sysadmins: Because when these sysadmins say that they're not going to package hadoop because the hadoop build process is bullshit, it isn't appropriate to reply "well you guys fucked up openssl, so what do you know?"

One thing that would help is if we didn't look at it as Programmers on one side of the fence and Sysadmins on another side. Programmers have problems to solve, and sysadmins have problems to solve, and maybe you can help each other help solve each other's problems.

[1]: http://marc.info/?l=openssl-dev&m=114651085826293&w=2


I find it weird that you consider 'packaging' to be something a sysadmin should do, but 'building' to be something they should not do. Aren't they both forms of 'prepping code for use'?

And then state that you don't want your own software packaged. So, if a sysadmin is not allowed to build and not allowed to package, how are they supposed to get your code into production? "curl foo | sh"?


I don't consider packaging to be a sysadmin task. On any sane OS (i.e. anything not Linux/BSD), packaging is done by the upstream developers. That doesn't happen on Linux because of the culture of unstable APIs and general inconsistencies between distributions, but for my current app, I am providing DEBs and woe betide the distro developer who thinks it's a good idea to repackage things themselves ...


Well, we're going to have to agree to disagree there, because I think Windows packaging is fucking insane.

One of the things I loved about my move to linux and .deb land was that if I uninstalled something, I knew it was uninstalled. I didn't have to rely on the packager remembering to remove all their bits, or even remembering to include an uninstall option at all. Or rely on them not to do drive-by installs (which big names like Adobe still do, out in the open). And not have every significant program install it's own "phone home" mechanism to check for updates. The crapstorm that is Windows packaging is a fantastic example of a place where developers love and care for their own product, but care not a jot for how the system as a whole should go together.


I read that the other way around. He specifically asks people to refrain from packaging his software. I think the implicit implication is that he does want sysadmins to run his carefully constructed build scripts in order to install the application.


You really nailed it. Among my duties are systems administration for a company that works with a lot of software development vendors. We have a user acceptance team that makes sure that we get what we ordered, that the QC stays at a high level. So functional problems, that's their deal. But they're not sysadmins, they can't easily see what developer choices make administering the servers more complicated, more fragile, more expensive, or more insecure. This shifts my job from the end of the process (here, run this!) to the beginning (hey guys, let's use these tools instead, it'll make everyone's lives easier).

As such I'm very pro containers as they will eliminate a ton of deployment effort and allow me to manage different environments much more easily. But it means that there needs to be a much bigger magnifying glass on the container contents early in the process as opposed to the moment of deployment.


> The value of debian is not that they package (or repackage) everything into deb files but that they resolve versioning and dependancy conflicts, slip security fixes into old versions of libraries (when newer version break API/ABI), and make it possible to integrate completely disparate software into a system.

Maven has exactly the same capabilities as deb does - you can depend on versions, depend on a range of possible versions, exclude things that conflict and so forth. And it puts even more emphasis on fully reproducible builds (with the aid of the JVM) - in that respect it's closer to nix than apt.

> But the sysadmin does feel responsible. He is responsible for the decisions you make, so if you seem to be making decisions that help him (like making it easy for you to get your software into debian) then he finds it easier to trust you. If you make him play whackamole with dependencies, and require a server (or a container) all to yourself, and don't document how to deal with your logfiles (or even where they show up)

Wow, self-important much? Too many sysadmins seem to forget that the system exists to run the programs, not the other way around.

> If you make him play whackamole with dependencies, and require a server (or a container) all to yourself, and don't document how to deal with your logfiles (or even where they show up), how or when you will communicate with remote hosts, how much bandwidth you'll use, and so on

On the contrary, maven makes the requirements much simpler. I have literally one dependency, a JVM, so it can run on any host you like (no need to worry about shared library conflicts with some other application). It needs to download one file (shaded jar) from our internal repo, and execute one command to run it. That's it.

> That's what Maven is. It's a surprise box that encourages shotgun debugging and using ausearch features to do upgrades.

No, it's just the opposite. All the dependencies and project structure are right there in declarative XML. It's what make should have been.


> No, it's just the opposite. All the dependencies and project structure are right there in declarative XML. It's what make should have been.

When make was written most machines would have just exploded at the sight of a typical build.xml, and downloading tens or hundreds of packages from anywhere was simply out of the question.

Also, 'dependency' means something completeley different in make as opposed to maven - I don't think modern build systems do even care much for make-style deps.


> When make was written most machines would have just exploded at the sight of a typical build.xml, and downloading tens or hundreds of packages from anywhere was simply out of the question.

Sure. But the notion of doing things declaratively existed (Prolog predates make by five years). And the biggest difference between make and the scripts that preceded it is that it's more structured, with a graph of targets rather than just a list of commands.

If you add the ability to reuse libraries of targets (something that sort-of exists via implicit make rules), restrict targets to something a little more structured than random shell commands, and - yes - add the ability to fetch dependencies (including target definitions) from a repository, you end up with something very like maven.


> The value of debian is not that they package (or repackage) everything into deb files but that they resolve versioning and dependancy conflicts, slip security fixes into old versions of libraries (when newer version break API/ABI), and make it possible to integrate completely disparate software into a system. They also have a great track record at it.

The analog of the aspects of what Debian does that you're talking about here in the HStack world are companies like Cloudera, who, surprise, make their stuff available as debs and PPAs.

Building your own Hadoop from source and complaining that the resulting product is unvetted is sort of like doing a git pull of all the Linux dependencies and building that.


But the problem is that the ones doing the vetting (i.e. Debian) have given up on making a vettable distribution because the build is so broken.


But Debian isn't the Debian of Hadoop. Cloudera is.

Why should we assume the Debian Foundation is the sole trusted source of every type of software?


What you are saying makes no sense.

And yes, if I'm using Debian and didn't add any PPA or extra sources, then the Debian Foundation IS the sole trusted source of software. And you do that because you know hey won't fuck up the system, which (and that's the whole point of this thread) the others certainly don't.

Now Debian is telling you: we see now way to distribute this software and guarantee what you are getting or that it won't fuck up the system.

Do you think I'd consider installing that junk?


So everyone who runs Hadoop is installing junk? Seems like plenty of other companies have been able to build businesses on it without adhering to your Debian-only rules...


Signed packages isn't about just being signed.

I could sign anything I like, but that doesn't make it any more secure for you to curl it into /bin/bash.

Signatures are about who signs it, and that's not something mvn has solved at all. Mvn is a free-for-all of binary code that very well could own my system, wheras debian is a curated collection of software which the debian maintainers have signed as being compiled by their systems with no malign influence and having met at least some bar.

I'd trust foo.jar signed by debian over foo.jar signed by bobTheJavaBuilder@gmail.com anyday... and mvn only gives you the latter.

So yeah, sure, they're signed, but it doesn't actually matter if you don't take the time to hook into the chain of trust (and believe me, mvn does not ask you to trust your transitive jar dependency 50 down the line) or have a trusted third party (debian) do their own validations.


> wheras debian is a curated collection of software which the debian maintainers have signed as being compiled by their systems with no malign influence and having met at least some bar.

And not only that, by shipping the source and requiring that binaries can be built from the source, who signs it is no longer blind trust. Others can audit it.

Reproducible builds should improve this even further.


> wheras debian is a curated collection of software which the debian maintainers have signed as being compiled by their systems with no malign influence and having met at least some bar.

This comes with a huge tradeoff, and I guess it's that tradeoff that makes developers like myself opt to sometimes even pipe the cURL to bash. I almost never download any software I actually plan to use through official system repositories, because whatever comes out of apt-get, it's almost always two years behind the last release and missing half the features I need. Sure, I'll apt-get install that libfoo-dev dependency, because I don't care what version it is as long as it's from the last decade. But for any application I actually need to use, it's either git repo or official binary download.


> whatever comes out of apt-get, it's almost always two years behind the last release and missing half the features I need

As a sysadmin, I love that, but I've had to come to terms with the fact that some developers have the attention span of hummingbirds who had Cap'n Crunch for breakfast ("two years old" is still very new software from an administration perspective).

So, I've basically accepted the fact that whatever stack the developers use I'll build and maintain directly from upstream -- with the price being that the version and extensions/whatever used are frozen from the moment a given project starts.


"Small jars that do one thing well. "

Oh, the "unix" philosophy.


Who can't read a Makefile? Who can't at least read the output of make -n? It's terrifying to me that you're suggesting that people can't and don't.

It's not even a security thing. I've had poorly-written Makefiles that would have blown things away thanks to an unset variable on a certain platform, for example.


> Who can't read a Makefile? Who can't at least read the output of make -n? It's terrifying to me that you're suggesting that people can't and don't.

Can I read a Makefile? Sure. But 90%+ of Makefiles these days are 12000 line automatically generated monstrosities. It's not worth my time to bother opening the Makefile in a text editor in case it isn't, and I'd be amazed if many people did.

make -n you can do I guess. But unless you're also auditing all the source code I'm not sure there's a lot of value in it.

> It's not even a security thing. I've had poorly-written Makefiles that would have blown things away thanks to an unset variable on a certain platform, for example.

Yep. Maven doesn't do that.


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: