Hacker News new | comments | show | ask | jobs | submit login
The sad state of sysadmin in the age of containers (2015) (vitavonni.de)
558 points by xg15 4 months ago | hide | past | web | favorite | 426 comments



Ex Amazon here. Most grumpy system engineers did not disappear: we got hired by Google/Amazon/etc to build large-scale infrastructure... and sometimes sell it back to you as a service.

Believe me or not, most of the underlying infra does not run on the popular technology of the year. Far, far from it. That's why it works.

Modern devops, with its million tools that break backward compatibility every month sometimes becomes the running joke at lunch.


Grumpykins here. I think the term "Modern devops" sort of nails it but not quite how you used it. Most departmental/enterprise sys admins/engineers of lore that had even the slightest necessity for life outside the box scaled anything resembling automation to its breaking points. Combined with knowing and serving the reasons for their existence - developers, users etc., and devops is nothing new - it is now simply the necessary manifestation of progress at scale (albeit positively devoured by managers speaking business, not entirely unlike "agile").

"Use what works" definitely presents a lot more choices these days and likely will forever more.

"Use what works well" is something different where "well" implies helpful, dependable, predictable, manageable and so on that will continue to scale with your needs. Only breaking things down the "old-school" way will lead towards success, stability, security and life outside the box.

Good devops is still, primarily, good engineers engineering good things, for themselves and others.

Granted the article is from 2015 but my impression is author is not just cranky, but scared.


Agreed. DevOps has been a thing for a long time. The funny thing is that the core of the DevOps philosophy —- to unite development and ops through code —- is still a rarity. SO MANY big companies have entire departments for DevOps that are basically either developers doing release management sysadmins writing pseudo code for infrastructure while making it somehow less accessible to development teams.


DevOps is uniting dev and ops through code? I believe it’s at first and most important to unite them through good(early, often, honest, etc.) communication and collaboration instead of “you broke x” and “you have to fix y” as well as “this is the other (dev or ops depending who says it) departments fault/task is the most important thing in DevOps. The code is just a tool to make this collaboration easier / automate it where the other things already happened and all understood they work on the same goal and not as enemies.


A good sysadmin would not look like they are doing much work (everything is humming along and can self-heal minus physical problems), but a good devops person is constantly busy.


I sort of agree but a good sysadmin was never idle on the inside. I'm seeing good devops people getting worked well beyond what I'd consider reasonable expectations i.e. "Oh, look, you can do everything! Here's everything!".

They're being perverted into a role having a full load of pure operations with shit for processes (and, often, systems) and an expectation that you have time to automate and shore up all the shit and technical debt accumulated since.

Can most good or even extraordinary developers simultaneously be elbow deep on a dozen unrelated products and actually get reasonable traction? I can barely keep one glass castle together, myself.


This is exactly my sentiment and why I moved away from SRE and back to SWE. I felt busy all the time doing development of tools and infrastructure while at the same time aggregating the role of operations.

Never having the time to properly finish a project that I was proud of delivering, turning those into services so we could leverage self-servicing was a dream that most of the time never happened, we were left with half-done systems requiring tons of manual intervention (lots of toil) while having to move fast to the next thing...


I think a lot of data engineers and transcoding folks would have similar reports. But you’re right; the problem with DevOps is the reach of their usefulness. If your whole company is built on code, your DevOps team will always be overworked and under appreciated.


Great sysadmins get fired unless they learn to pretend to look busy. They automate everything till they have nothing to do.


Logging is your friend here. You can spend days scrolling through logs, doing an occasional grep and making disapproving noises occasionally. Bonus points for developing some graphs for the next meeting.


My introduction to statistics was learned as a sysadmin, trying to show that I was doing 'stuff'.


Can you elaborate that?

Whats a “devops person” for you and what keeps them busy? And why cant they have systems that hum along self-healingly automated?


Most of the tools of the modern "devops person" are undergoing constant development themselves.

So, not only is there the responsibility of creating this self-healing automated infrastructure, but keeping tools and resume up-to-date, as well.


What fascinates me about this is, and sorry for being morbid, but what happens when y'all die? Does knowledge of the lower levels of the stack go away with your generation, or will there be enough of us young ones picking the important stuff up?


There's rarely anything old school sysadmins have learned that hasn't come from experience.

Been there, done that, fought that shit the first time. And the second. And the third. (it's amazing how often I find myself solving what are essentially the same problems over and over.) It's one reason why you'll find we'll push back on the "ohh shiny". There are many wonderful and fascinating things coming out. Tech is an amazing field to be working in. But it's also ridiculously frustrating because no one pays any attention to _why_ things are done the way they are, or _why_ approaches haven't worked in the past (I'm all for re-introducing past failed approaches, as long as there's evidence those reasons have been investigated)

You'll find a common trend amongst us in that most of us sort of ended up in the role accidentally. Schools and college teach you to become developers. Few people tend to head to college with the view of specialising in the ops side of things.

Even speaking as a comparatively old-school sysadmin, my strengths come from being flexible and adaptable. What I do today is nothing like what I was doing 5 years ago, and what I did then is nothing like what I was doing 5 years before that, and so on down the line. The field is constantly in flux.

I just have the best part of two decades of experience to both anticipate the problems, and be able to get to diagnosis quicker when things do go wrong.

Even as the older sysadmins die off I'm fairly confident there will be newer ones to replace them, because people are going to continue to learn from the problems they run in to.


Ansible is one "ohh shiny" thing that has greatly increased my productivity as a sysadmin. Before that I would automate what I could with ssh and pdsh and scripts, but it was never as well polished as Ansible.

I'm even using ansible for ad-hoc stuff (tweaking a config, restarting a service) because it's easier to do that from a management server than log in to some remote host, get oriented as to the OS distribution and version, and run commends in the shell there.


I like ansible with vagrant as well - it makes for a nice clean way of deploying to development environments while also been nicely 'self' documenting and not limiting (you can drop back to shell), it's a lovely tool for the most part.

Edit: The thing I really like about Ansible is how unsexy it is, it's just a nice sane way of doing largely what you could do yourself with ssh and bash but in a language that doesn't make you want to cry.

I've been around linux since the 90's and Ansible feels comfortable, predictable and stable - what you would want in a piece of software that can be mission critical in the most fundamental sense.


It would be even nicer if people weren't advocating it as a configuration management system, only as what it is: a deployment system.


I think I achieved my perfect balance of tooling for current systems with Ansible and Docker.

Ansible automates even provisioning in AWS. I never really liked CloudFormation's way of creating stacks, so I began to use Ansible to document the application's stack, have used it to deploy systems running in EC2 with RDS, ElastiCache, SQS, SNS, DynamoDB, etc.

After provisioning/configuring I'd end up with an instance in EC2 with Docker installed and from there our CI/CD would just trigger the deployment playbook that simply would do a `docker pull` of the version tagged for release and start the container.

Ansible helped to also install our Splunk forwarders as running it from Docker was a hassle still not so long ago, so we would have the best of both worlds: configurability of the host machine completely with Ansible and packaging and predictability of deployment through Docker.

I advocate this stack as simple enough to learn and use with widely used tools without their fancy (and often broken) features. Even though they can be still a bit immature, they are production-ready enough.


Who advocates it as a configuration management system? I've not seen that...



Everyone compares it to chef and puppet


Ansible? I could see using it to build a container that then got "orchestrated", I guess, but ... hmmm. I've never really looked at them as doing the same thing. (Nor have I looked at either Chef/Puppet for CM. Maybe I'm just stupid... My ex- certainly thinks that I am...)


It calls itself one, I imagine many people also consider it one. What is a true configuration management system?


I used puppet -- unhappily -- for a while before discovering SaltStack.

Salt, like Ansible, is a response to the Puppet/Chef hegemony -- ruby and DSL's and tacked-together bits that are a nightmare to install and upgrade in themselves).

I'd suggest Salt did some things much cleaner than Ansible, and while it can be an orchestrator, and deployment system, it also excels as a configuration management system ... but I think typically people are talking about configuration management systems that are tightly coupled with more formal change management systems (of which I've found pretty much none that work well).


It's a legit concern. There was a NANOG panel about this exact thing. I believe the quote was, "Take a look around. We're all old and greying. We have a severe pipeline problem." And then much to AWS' dude's dismay, the topic shifted towards blaming cloud services because no one takes the time to learn how any of this works any more.

Want to guarantee your child's future employment? Don't just teach them to code (the machines will do that). Teach them how to build networks and truly understand network protocols.


I’m going to teach my children how to navigate the world of insane Harry-Potter-esque rules which all IaaS/PaaS platforms enforce upon you. They will become software language lawyers and be masters of the electric Disney dollar.

You know like “ahh don’t call the messaging endpoint more than 800 mega-milli-times per mega-nano-second or it will cost you three bazillion CPU credits, but only on three and a half cores which will starve all your instances, issue an invoice and proceed to melt your credit card.”


DDOS is now a billing issue.


> Teach them how to build networks and truly understand network protocols.

I don't know how the situation is in the US, but in my country network engineering is actually quite a popular field of study. (we have college level education in network engineering).

The one thing that stands out though is that it's mostly done by youngsters who have either sysadmin experience, or worked in IT before that. Almost everyone who comes from high school goes into Software Engineering.

I think this is mainly because networking is quite an invisible field so to speak. Many people don't even know your job exists, and many young people only see the shiny hip side of it. (being Software Engineering).

Being good at network engineering is hard, especially once you get past entry level work and actually start being responsible for designing large-scale networks. Mainly because building a network is a major financial investment where garuanteeing performance is hard without either a ton of experience, or a shitton of lab time.

this kind of work pays very, very well though.


What country?


Netherlands


> Don't just teach them to code (the machines will do that). Teach them how to build networks and truly understand network protocols.

Building networks and understanding protocols is something machines can do TODAY. The entire internet was built to survive a nuclear war. It can reshape itself, and it most definitely understand network protocols. By definition, protocols are the language of machines over the network. And it's been like this for a few decades.

The only reason knowing low level network protocols in a world where machines can code anything (which makes them better analytical thinkers than humans) is to beg the machines for mercy in their ancient tongue.


It can reshape itself, and it most definitely understand network protocols

It really, really can't! That quote about the Internet interpreting censorship as damage and routing around it? Or the one about information "wanting" to be free? Taken wildly out of context.


Then what about routing protocols such as RIP, OSPF, etc? I'm sorry, but I don't know what quotes you're referring to.


Routing protocols convey state information between routers, but really that is just table stakes.

So, what do network engineers do? Classically, Set up and troubleshoot those systems. Currently, transforming from manual work to building systems to deploy, monitor, and remediate routers and such. In other words, the same stuff sysadmins->(SRE|PE) folks do and undergoing a similar transition.

SRE methods are quite applicable to neteng: https://landing.google.com/sre/book/chapters/eliminating-toi...


Then what about routing protocols such as RIP, OSPF, etc?

Don't forget BGP. I don't think they do what you think they do, at least, not to the extent you think they do them. There is a hell of a lot of manual work in running any sizeable network even within a single organisation.


And what exactly do they do that I don't understand?

> There is a hell of a lot of manual work in running any sizeable network even within a single organisation.

I'm not trying to say they do everything, and no manual work is required. I'm trying to say machines are already doing part of it. OP believes that give a world where machines can code, they can't design or maintain networks, which I find truly ridiculous, since machines are pretty far from doing any kind of "coding" today, but they do networking and network protocols pretty well.


since machines are pretty far from doing any kind of "coding" today

The first attempt at a system to turn plain English that even managers could write into executable code was 1959 - COBOL. So you're right in a sense, even 59 years later - but also wrong if you think networks are any more advanced than this. The Internet really cannot "reshape itself" and probably never will be able to.


RIP and OSPF are interior routing protocols---that is, they're used for routing within an organization (or autonomous system in Internet lingo) and deal with technical routing issues (fastest link, most bandwidth, etc). BGP is for routing between organizations and deals with political issues than technical issues (we need to send all traffic here due to contracts, unless it goes down, then shift traffic over there; and refuse routing information from such-n-such organization because they don't have their act together).


Routing protocols are how computers communicate network admin instructions to each other really really fast.

They're still just tools wielded by humans, even at a distance. Even today.


Moreover, who is going to write the next protocols we need?

This isn't a field that is done inventing. Not even remotely close.


OP was talking about a future where developers become obsolete because machines take over the development sector. Do you think writing protocols will be something humans will do better than machines?


Yes. Machines might come up with something that's 'good enough' though.


Been working since 86 in this industry.

You lose and gain and you should always be mindful of what is coming. Humility is good. I love you young guys, your ideas keep coming and they are mostly good.


Same problem as making sure your system doesn't lose data when a server dies. Make sure you have enough copies of the knowledge by propagating it between people. Try to have some kind of offline recording (books?) for recovery from a disaster where you lose everyone. Have an idea of how to recover at a business level if you do lose the data forever.

The trouble is making sure these plans actually work. This is why Netflix randomly execute some of their employees every month.


> This is why Netflix randomly execute some of their employees every month.

I certainly hope this practice doesn't start catching on with other employers.


"This is why Netflix randomly execute some of their employees every month"

Would that be their King Kong application?


That's part of the reason why my last two hires have been at the beginning of their career. For both of them, it was their first major sysadmin responsibility after having jobs involving tech support and occasional Linux experience.

The key is to pick smart people who are good at learning and find complex systems interesting. Then, of course, you need to have interesting projects for them to work on.


It's a very real concern. We have a thing called a "bus plan" for all of our tech employees (6 of us - small non-tech company). It basically attempts to cover everything that we would need to know if one of us gets hit by a bus.


They'll never die.

Some monkeys will just keep swinging branch to branch. Some monkeys check out the tree.


What makes you think there aren’t young people doing systems administration? Our last two hires in my most recent job were 23 and 27 respectively. Sure, they’re getting trained in the new hot cloud stuff... as the grumpy seniors figure it out first and set patterns... but they are still doing daily work with some rather ancient stuff.


I'm not saying there are no young people doing sysadmin. What I'm trying to say is that if the new 'infrastructure' that all sysadmins learn is not an open UNIXy system where you can grok all the internals if you want to, but closed systems owned by 2-3 major cloud players, then we kinda maybe have a problem in 20 years?

Of course, one can argue that that will just cause a new wave of openness and the cycle continues.


As a "young" (30) sysadmin/devops dude I think that open, Unixy system is Kubernetes. I can take an application, dockerize it, write a helm chart and run it anywhere.

The risk is in treating anything as a black box, whether its a managed service or a container you pull from Dockerhub. It's something you'll get burned by eventually and need to learn from experience.


You setup and manage K8s and all the kernel tunings for the host systems so the NAT layers are optimized ?


Yeah, and my point was that we start the young people that we hire on the open stuff. Then we move them up and on to the other open stuff, which runs on top of the cloud vendors.

(As almost everyone else points out, the closed cloud vendor stuff is nowhere near flexible enough for most moderately complicated use cases unless you’re running at serious scale.)

This is a false crisis.


My company seem to split responsibilities based on age. 40/50 year olds deal with oracle Linux, AIX and Solaris. Under 30's hires are more focused on cloud and mobile. We're all expected to have a footing in Windows and Oracle DB.


Everyone will be running on AWS, Google Cloud and Azure, who know how to operate infra that doesn't crash all the time.


That's why it works.

Touché. Remember, there's value in battletested and proven, e.g. https://www.quora.com/Why-do-satellites-use-old-processors


Would be interested to get your opinion on Puppet/Ansible/Chef/CFEngine/SaltStack


CFEngine is basic text manipulation, it's not comparable to the rest.

Puppet and Chef was the first generation. I wouldn't recommend. All the companies and people I know using Chef migrated away from it after many disasters. Nowadays, it's only mentioned in interviews to find out if candidates have real world fire fighting experiences.

Ansible is good. Used that for managing hundreds of machines at multiple jobs (some who migrated from Chef). It's been bought by RedHat, it's well maintained and I think it will have the brightest long term future.

Not sure about SaltStack. Never had the opportunity to try. I'd be a bit worried though on the long term prospect because I don't think they have much backing or user base.


> Not sure about SaltStack. Never had the opportunity to try. I'd be a bit worried though on the long term prospect because I don't think they have much backing or user base.

saltstack is a well thoughout solution in my opinion. It makes more logical sense and is less of a mumbled mess then either chef or puppet and has miles better performance then ansible.

I know quite a few shops who use it. Its definitly smaller then ansible.


+1 for salt -- I wish it had better docs or examples of how to build out a larger system; it's hard to start with imho even if you know ansible well. The existing docs read like man pages without the helpful examples even.

At the last gig, I wrapped salt deploys with a small Slack bot, so users would fire deploys from Slack; you could see what was going out and who was pushing. It was a very very nice, simple, fast solution that should scale to hundreds of machines easily.


SaltStack is around. Lots of big orgs take the time to understand. Ansible is more popular because you can use it with just one playbook. Saltstack requires you to think about your environment and design your configuration management properly.


I use salt. Multiple thousands of machines. I feel like I've barely scratched the surface of what it can do with it. I wrote some custom utilities for it. Added some functionality to handle physical deployments of an OS with redfish (the new iLO/iDRAC api).

Salt is not without warts but its definitely worth checking out.


CFengine, at least version 3, was probably the furthest away from string manipulation (and I was given the impression that text file content manipulation was considered a bad idea with it). What killed it was promise theory, which is actually a great theory and works quite well but made writing the bundles painfully hard and also hard to maintain. Also during the early days of v3 it was probably lacking a ton of essential functions so even if you were trying to do things the right way you would bump into feature limitations. I think this put a lot of people off adopting it widely and why Chef and Puppet did so well.

Puppet and Chef is actually quite good and I still prefer it to Ansible for a number of reasons. I've certainly run it fine in environments of many thousands of servers, though I can understand that it can implode for some people at scale if they design their deployments in a certain way or structure their manifests/cookbooks a certain way. That said I've certainly seen Ansible fold on much smaller infrastructure, but that is also down to a number of factors that can be avoided or mitigated. Idempotency with Puppet is really strong which is something you want if not every single system in your environment is ephemeral, with Chef it's almost as good with that but not always with the first run, with Ansible you have to specifically consider and aim for it when writing code for it in your playbooks.

The fact that you get used to having Chef or Puppet run e.g. every half an hour is a good thing, where Ansible runs are more ad-hoc. This leads me to another thing that bothers me and that is where people think it's a situation of having to use e.g. Puppet or Ansible as if they're conflicting choices for the same tasks. They have a lot of things in common, but Puppet is more for managing and ensuring changes in an idempotent, non-conflicting way while Ansible is more for doing something a bit like that but more for ad-hoc or orchestration tasks. I think it's good to use both but also be sure what you use it for, since one can do a bit of what the other thing is good at but doesn't do it so well.

For example, I would consider using Ansible to do deployments and releases, rotate SSH keys, execute failovers, or even to install the Puppet agent for the first time. I would use Puppet to deploy and update monitoring agents and configuration, user access, ensure directory permissions, configure system things like rsyslog, logrotate, Postfix, ntp, etc.


> This leads me to another thing that bothers me and that is where people think it's a situation of having to use e.g. Puppet or Ansible as if they're conflicting choices for the same tasks.

That's mainly because Ansible folks advertise it as a configuration management tool, while in fact it's a deployment tool. The former needs asynchronous operations, especially because a node that is supposed to be reconfigured can be temporarily down. The latter needs to be executed synchronously, with reports being read as they come by an operator.

There are several other operation modes that are useful for a sysadmin, like running a predefined procedure with parameters supplied from the client, or running a one-off command everywhere (even on the servers that are currently down, as soon as they are up), but we don't have many tools to cover those cases.


I make my living as a CFEngine consultant. CFEngine runs every 5 minutes (it's lightweight enough to do that). The evolution was: CFEngine 1 ran once a day; CFEngine 2 ran once an hour; CFEngine 3 runs every 5 minutes. Self-healing infrastructure.


The concept of self-healing is a bit weird for me. Surely you want to investigate the cause before it heals?

Funny that we have tools like tripwire which have the opposite idea of the world.

My dream would be to have both functionalities in a single tool.

Bidirectionality! If you solve a problem on one machine you could pull that fix then push the same fix out to other machines as a preventative measure.

Some mix of git/osquery/augeas could do this.


> Ansible is great. Used that for management hundreds of machines at multiple jobs. It's been bought by RedHat, it's well maintained and I think it will have the brightest long term future.

A lot of folks I know have been bitten by Ansible's performance (Ansible has a central master that runs recipes on each node, rather than having nodes "pull" from a central master).


Ansible has a pull mode that can be turned on. There are some trade-offs with it from the normal operating model, but it's there when you get large enough to need it.

https://docs.ansible.com/ansible/2.4/ansible-pull.html


Ansible has a very, very low barrier to entry. You go from 0 to 100 in a very short time. It makes a lot of sense to use it when you just begin building your infrastructure.

Later on you can run Ansible Tower, deploy Ansible agents everywhere, and basically use Ansible under the same client/server model like all the other tools.

Salt is eerily similar to Ansible, it's just geared towards client/server. Being experienced with Ansible, it was weird at first to use Salt because everything looked familiar, yet slightly different.


You can also get Tower's functionality with the free version called AWX, although not as polished. https://github.com/ansible/awx


I believe you are talking about Ansible tower, the paid tool from RedHat that gives a centralized server.

Ansible is not centralized. It configures servers with SSH and can operate from any user or host who has ssh access.


Yeah, any user or host singular. Execution of a playbook is driven by one machine which can be a bottleneck.


Yeah but when you run a playbook it's running from a single machine which is calling out via SSH


Not necessarily from a single machine. It's pretty easy to divide your network and control the divisions from git clones f your Ansible files.

Ultimately you could have a git clone for every machine and only ever run it against localhost.


Yes. The host will run 100% CPU to handle the hundreds of SSH connections.

I've been re configuring 300 to 800 hosts many times a day, never had a problem. I think it would take a few thousands hosts for the performance to be noticeably slow and I am really not sure that other tools or systems could take it much better.


I know our SREs once screwed the config for sshd, and considered themselves very lucky that they had puppet on the machines and could push a fixed configuration (if they had used exclusively ansible, that'd be the end of it - no way to connect or to deploy new configuration)

[edit] To clarify - ansible is great, and we use it. Just saying that, as everything, it still has (sometimes subtle) downsides in various scenarios. If it works well for you - great, but maybe others really were bitten by it.


There's nothing stopping you from having a sshd instance dedicated for use just by ansible, on a different port/different network, on every node. Now if that's simpler or more complex I don't know.

But "have two ways in" is a basic principle of sys admin (typically via traditional network and some out of band console access).


When i worked with physical machines, they had embedded management systems, which were on a physically separate network to the machines' main interfaces, ran a little embedded SSH server, and would (amongst other things) give you a console on the machine.

Simpler machines should still have serial consoles, and you can get those on the network via a terminal concentrator or a serial-to-ethernet adaptor.

I would love it if Ansible could control machines over an interface like that, rather than via SSH. Then you wouldn't even need to run SSH on machines which don't need it, which is most of them.


Well, teach your sysadmin to use the system configuration tester when they edit a system configuration file.

Nothing to do with ansible really, except that ansible allows to prevent that easily.


> Well, teach your sysadmin to use the system configuration tester when they edit a system configuration file.

Wrong. Teach your sysadmin not to overload a single service with different functions (debugging channel, user-facing shell service, running remote commands, file upload, and config distribution channel), especially not the one that should not be used in batch mode, without human supervision.

When you write an application, you don't put a HTTP server in database connection handling code, but when it comes to server management, suddenly the very same approach is deemed brilliant, because you don't run an agent (which is false, because you do, it's just not a dedicated agent).


Are you advocating running multiple sshd instances in this case?


Good heavens, no! You'd only have two different instances of the same service that is difficult to work correctly with.

For serving as a debugging channel and user-facing shell access, SSH is fine (though I've never seen it managed properly in the presence of nodes being installed and reinstalled all the time). But for everything else (unattended):

* you don't want commands execution, port forwarding, or VPN in your file server

* you don't want remote shell in your daemon that runs parametrized procedures -- but you do want it not to break on quoting the arguments and call results (try passing shell wildcards through SSH)

* you don't want port forwarding and remote shell in config distribution channel; in fact, you want config distribution channel itself to be reconfigured as little as possible, so it should be a totally separate thing that has no other purpose whatsoever

* you don't want to maintain a human-user-like account ($HOME, shell, etc.) for any of the above, since they likely will never see a proper account on the server side; you want each of the services to have a dedicated UID in /etc/passwd, own configuration in /etc/$service, own data directory, and that's it

Each of the tasks above has a daemon that is much better at them than SSH. The only redeeming quality of SSH is that it's there already, but it becomes irrelevant when the server's expected life time gets longer than a few days.


Yes, because everybody knows that testing eliminates all bugs.

(it's not that testing is useless - far from it; but I thought the HN crowd knows better than to respond to issues with "that's because you didn't do enough testing!")


I'd venture to say you're wrong about Salt. It's being used at some large enterprises. I use it (in one of the large tech companies) on thousands of servers, with plans to up that an order of magnitude or more. Of all of the solutions mentioned, it has been the most powerful, while also being the most scalable.

Other than that, my experiences line up with yours almost exactly.


I love SaltStack, its more of a python framework for managing systems over ZeroMQ than it is pure configuration management. Compared to Ansible it's more complex but faster, reactive, and significantly more flexible. I'd highly recommend it over Ansible for larger environments. For smaller ones, it depends on if the steeper learning curve is worth it.


Ansible starts getting painful around 1500 nodes.


+1 for Ansible from me.

Of all the tools, I first heard of Puppet first and so I'm assuming it was first on scene? From my limited experience, it seems Puppet is most widely used tool because of that reason. Not necessarily the best of the bunch, but first on the scene. Considering the effort required to roll it out, I am assuming whatever is deployed first will stay as the tool of choice.

I've tried out Puppet, SaltStack, and Ansible, in that order.

What I didn't like about Puppet is that once you deploy a change, the actual change can happen on the "client servers" at any point within next 20 minutes. I may be off on the exact duration but I remember that changes were deployed at any point within that range of time. To me that sounds like not a great idea. What if you want to switch over your web servers at a specific moment? And Puppet requires a dedicated command/control server.

Next I tried SaltStack. I liked it enough. Now that I think about it and hear someone else mention it, yah SaltStack is similar to Ansible. What drove me away from SaltStack was that you essentially need a dedicated command/control server from where all SaltStack commands are sent out to SaltStack "client servers". I did not want to dedicate resource (and money) for a server that is rarely used. And the personal web/lab servers I manage can grow small/large from 2 servers to 10 servers.

Next I tried Ansible. I think Ansible is the perfect choice for me. I only needed to 'devop' just a handful servers and also learn a tool that many businesses seemed to want on resume. So I picked Ansible and it's been great. Some operations are not as flexible as doing it with a shell script (and I assume same issue exists for other tools). But I've had good luck combining Ansible with little bits of shell script to get the result I need.

The best part of Ansible is that any Mac or Linux machine can be used as the "command server", provided that you have the SSH key pair on your Mac or Linux machine.

Lastly, some may not like the ad-hoc way of doing things on Ansible, but I prefer it that way.


I first heard of Puppet first and so I'm assuming it was first on scene?

CFEngine was first, it's based on a kind of maths called "promise theory" and it solved the problem of you had many different kinds of Unix owned by many different groups and had to have a consistent way of saying "all machines belonging to group X need to have user Y and package Z" and it would abstract away the slightly differing syntax between Solaris, SunOS, IRIX, OSF/1, Ultrix, yadda yadda. This is a problem that doesn't really exist anymore.

Chef I think came next, it was written by people who knew Ruby but didn't know maths so they used CFEngine terminology like "converging" but Chef doesn't really do that, it just runs Ruby scripts. If CFEngine was a scalpel, Chef is a mallet. Chef and Puppet are related somehow, same group of devs had a falling out and went their own ways, something like that. They are much of a muchness.

Ansible is cool because it recognises the reality of why CFEngine isn't relevant nowadays: most organisations are running just one particular Linux distro so you can do away with the abstraction and get all the benefits without the complexity.


> it's based on a kind of maths called "promise theory"

Promise theory is not math, despite its name. It doesn't predict anything, it doesn't explain any phenomena. It's an architectural approach. Brilliant, led to a really great software (CFEngine), but it's not "maths".


Basic concepts of promise theory (10 minute video by Mark Burgess who came up with it): https://www.youtube.com/watch?v=2TPsB5WuZgk

2014 introductory article: https://www.linuxjournal.com/content/promise-theory—what-it

Basic book on the subject: https://www.amazon.com/Thinking-Promises-Designing-Systems-C...

It's not "maths" like arithmentic but it's "maths" like graph theory:

Promise Theory, in the context of information science, is a model of voluntary cooperation between individual, autonomous actors or agents who publish their intentions to one another in the form of promises. It is a form of labelled graph theory, describing discrete networks of agents joined by the unilateral promises they make.

https://en.wikipedia.org/wiki/Promise_theory


> It's not "maths" like arithmentic but it's "maths" like graph theory

It's less like graph theory and more like inversion of control: an architecture, not a set of theorems and their proofs. Even Burgess' own book you mentioned is nothing like a mathematical handbook.

I'm a great fan of Mark Burgess and his promise theory, but calling it a mathematical theory or a mathematical domain is simply incorrect.


I hear you.

The book I mentioned (Thinking in Promises) is the introductory-level public book.

https://www.amazon.com/Promise-Theory-Principles-Application... is the heavy-duty scientific stuff.

"In Search of Certainty" (https://www.amazon.com/Search-Certainty-Science-Information-...) is somewhere in-between.

I would say promise theory is its own kind of logic and notation. Thanks for the correction @dozzie.


Actually the sequence was CFEngine, Puppet, Chef.

See http://verticalsysadmin.com/blog/relative-origins-of-cfengin...


> [...] the actual change can happen on the "client servers" at any point within next 20 minutes. [...] What if you want to switch over your web servers at a specific moment?

You don't. Configuration management is a wrong operation mode for a synchronous change. Still, you could order all your Puppet agents to run their scheduled operation now instead of leaving it waiting for its time.


Ansible all of the way. Chef and Puppet have too much overhead in comparison. Ansible is agentless. You can either use a centralized server for deployments or you can have every instance configure itself. Also, Ansible is YAML based, which is a strength and a weakness.

Chef is a runner up. Love their community and Chef is pretty straightforward once you learn the lingo.

Puppet doesn’t really work for modern Git development workflows (Hiera and r10k are duct tape) and testing Puppet is kludgey. Also, most of the docs you’ll find for it stopped getting updated in 2015 or so.


I've used chef, ansible and saltstack in small startup and large scale enterprise environments.

Ansible is just about the easiest and most flexible thing going, but once you hit "very large scale" you're going to get bit by its performance and start worrying about when you actually update things. Ansible Tower starts to look good then, but it's not the well-walked path and brings you all sorts of other issues about how you distribute secrets to bootstrap things.

Chef is kind of nice when you don't have a lot of environments that you need to manage and about as flexible as you need it to be in those situations.

SaltStack shines when you really have a lot of heavy lifting to do and the Event System, Beacons and Reactors will honestly blow your mind with the complex things you can achieve in a way that's simple to reason about and maintain.

That said, there's really like 3-4 majorly different ways you can (or would want to) use Salt and understanding it and its documentation is a large cognitive investment. You will likely run into major pain at some point down the road if you choose to use it. I would only use it again if I had a really good reason to -- pretty much if there's no other alternative. I would not at all bother using it to try and do typical sysadmin automation tasks.

Strange side-note: The best managed Salt environments I've worked in or looked at were all masterless, whether at small or massive scale. It's my probably-wrong opinion that traditional master/minion SaltStack is always going to cause you enormous problems eventually when you need to either scale out or pivot on something.


I am genuinely curious, how much ops is hand written in just plain old bash or simple scripts?


Bash is everywhere but in small quantities. If it's more than a page of bash, it's probably time to rewrite in something with stricter rules and fewer surprises, or better libraries, or both. Perl or Python is quite common at that level.


The Bash scripts are now contained in Docker files. Much better :)


Is puppet one of those tools?


Depends,

heck. I have seem some guys do magic with CFengine and freeBSD.

(freebsd is a nice OS from an ops perspective anyhow, and has a lot more common sense and sane defaults in a lot of places).


> Modern devops ... million tools ... running joke

So fucking true, bro


You are my hero.


> Modern devops, with its million tools that break backward compatibility every month sometimes becomes the running joke at lunch.

Ironically, modern devops and its million broken tools are a primary source of revenue for cloud providers, helping pay for your lunch in the first place.


You might be on the point, efficient designs are bad for cloud providers. On the other hand, shitty designs that get hacked are bad PR for them.

Thing is, none of it matters. Bottom line matters.

They care to attract company decision makers. Decision makers are engineers in small to medium businesses, and managers in big ones. Sadly, its the big ones that matter for bottom line. So target is mid management, flashy power point presentations and 'conferences' that allow for justified travel and stay.

Good mid management, with tech background, exists, but is a minority.

Not all is lost, truth is out there (c) Mulder :))


Risk arbitrage.

Present capability vs. future catastrophic risk.

Technology is debt.


I honestly can't tell whether you're making a sincere but unintelligible argument, or just trolling.

Do you disagree with what I said?


Yes, re: last.

Tech-as-debt is a notion I'm playing with.

https://mastodon.social/@natecull/99318348047974414

https://plus.google.com/104092656004159577193/posts/jS1K9Mto...

Obvious antecedent: technical debt, Ward Cunningham.


The clearest explanation of why this happens is at the end:

Before, admins would try hard to prevent security holes, now they call themselves “devops” and happily introduce them to the network themselves!

1) The merging of devs into the sysadmin role was a product of: the work of sysadmins (particularly systems change control and security compliance) not being valued in our culture.

2) Devs delighted to be free of the shackles placed upon them by sysadmins who were encumbered by the concerns expressed in this article.

If you were a devop who resolved to fix the problems bemoaned in this article, my guess is you would turn around in 60 days to discover you'd become a sysadmin.


I recall the idea of "devops" from this book: https://landing.google.com/sre/book.html

The stated goal of putting both systems administrators and software engineers on the same team is to reduce friction and increase communication. One of the worst, productivity-killing situations you can find yourself in when developing network software and services is caused by the traditional "old school" mentality of separating the two camps. When your software developers operate independently of your systems engineers and administrators they're forced to make assumptions about infrastructure, operations, and compliance goals. Both teams have the same goals so why are they not on the same team? I think some "old school" system administrators don't realize how costly such communication mistakes are. Getting 6 months into a development project to be told you cannot have a critical piece of infrastructure _for reasons_ is a costly, costly mistake.

Containers are a smart solution to the build problem. Don't build your containers from public, un-trusted images! Build your own images. Run your own, protected, registry. You still have all of the compliance and validation necessary and you don't end up debugging failed builds because one machine out of a thousand is running on some minor shared library version not supported by your software.


>Don't build your containers from public, un-trusted images!

The author is complaining that you can't build these private trusted images. Software developers have got it in their head that containers are a way to package & distribute software. They're not, that's what the OS's package managers are for. If your software requires Docker as a build dependency, you have failed to properly package your software.

As a concrete example look at Ubiquiti's UNMS.[1] Their package consists of downloading & installing Docker binaries on your system, not tracked by the OS package manager, and then running a bunch of containers built from these public un-trusted images you just told me not to use.

They also conveniently ignore the fact that I already have a Redis server, I already have a PostgreSQL server, I already have an NGinx proxy. (Plus I guarantee my database servers are better tuned for my hardware than some random image from Docker's library.) It is not up to some random software developer where I should be drawing the isolation boundaries on my infrastructure. They also make the big assumption I want to use Docker to manage my containers in the first place. Perhaps my company already uses Solaris LX-branded zones, or LXC, etc.

Now imagine if instead of spinning up a PostgreSQL database container, it used MS SQL as it's database of choice. You think I'm going to let some random developer dictate whether or not I should spin up another SQL Server instance and pay MS for another round of cores / CALs?

Yes - you can build your own containers, and they're fantastic - if software developers properly package that software for ease of installation & configuration. Software developers should not be dictating what container/virtualization framework I use, what configuration management I use, etc.

[1]: https://help.ubnt.com/hc/en-us/articles/115012196527-UNMS-In...


There are public trusted images, like the so-called official repositories on Docker Hub [1]. As long as you build your images based on official repo images, you're probably fine. Just don't depend on untrusted images; instead get their dockerfile/config files, and build the images yourself.

To me, a Docker image seems like an ideal way to distribute some proprietary device management web software like Ubiquity UNMS, rather than requiring some obscure version of some database or whatever other dependency actually be installed and maintained by their clients. You can spin that image up on a server or group of servers, or on Amazon ECS or a bunch of other providers in a matter of minutes. With enough motivation, you could even export the image and manage the environment manually.

[1] https://hub.docker.com/_/php/


It is the ideal way to distribute this kind of software. They just bundled way too much into one container.


This comment makes way more sense to me that the original blog post. Yes, nobody should be relying upon docker as their distribution platform. That's pretty terrible. Ubiquity I've observed seems pretty uncomfortable just supporting the major distros, I actually wrote some docker stuff to pull down their .debs, crack them open and install the binaries inside on a fedora/centos system. That's closed source for ya.


> I actually wrote some docker stuff to pull down their .debs, crack them open and install the binaries inside on a fedora/centos system.

Why would you want to do that though? Treat the whole thing as a black box running inside docker and be done with it. The second you crack it open, you get to support it. Let Ubiquiti support it, after all that's what you are paying them the big bucks for.


because....they only offered .deb files and I wasnt running debian or ubuntu (nor do I like to bother with it in containers I'm building myself b.c. i have no clue about debian)

the package in question has since finally offered .rpms but i haven't had time / interest in updating it. this is wifi software I'm running personally, ubiquiti only supports the windows/mac versions of it in any case.


Ubiquiti has always done this, even before containers were "hot."

If you install their rpms or debs for any of their properties, you're almost always getting a copy of Mongo or some other dependent service... and it is probably going to be incompatible with whatever version your package manager has or you're already running (version-constraints-wise, not actual compatibility-wise).

This is an indictment of Ubiquiti, not containers in general. If their software were properly built, they'd be shipping you a docker compose setup or something with N different containers that you could substitute out (at a network level) for your own.


I once worked at a company which separated IT into 3 teams: developers, DB-sysadmin (ops), and QA (who also managed deployments). Releases were supposed to go in a waterfall model from the Dev group -> QA group -> Ops. QA wanted Dev to submit Word documents for each release with blanks to be filled in with server names. However Ops was so distrustful of Dev that it was not enough for them to lock us out of Prod using regular security tools, we were also not allowed to know the NAMES of servers in Prod or how currently deployed systems were grouped.

Every release was an Abbott-and-Costello "Who's on first?" routine. Do you have any idea how hard it is (especially in computing) to ask for something without being able to utter its name?

QA: "You left servername blank on this deployment document."

Dev: "I know; Ops won't tell me. Just ask them for where the service is currently."

QA: "Ops says there's 5 unrelated legacy services with that same project name, on different servers."

Dev: "5? I only knew about 3. You know, if I could query the schemas of the Prod DB, I could tell you in a jiffy which one it is."

Ops: "Pound sand. If you want look at databases that's what the dev DB server is for."

Dev: "Erm, OK well can I give you a listing of the Dev DB schema and you tell me if it looks like the one the Prod service is talking to?"

Ops: "Oh I see you want us to do your job for you now? You can compare the schemas."

Dev: "OK..."

Ops: "Just tell us which DB server you want the schema pulled for."

Dev: "But you won't tell me the server names."

Ops: "No."

My point is this is how bad communication can be when ops and dev are not on the same team.


I have trouble imagining the incentives driving Ops in that conversation.


Devs hardcoding things in their software in a rush making the software tougher to deploy and operate causing greater incident rates and therefore page-outs. Devs interested in greater resilience and stability in their software should be opting for dependency injection of pretty much every damn thing in the world around them whether it’s a network service or file system location. Otherwise, presume that it can go away at any time. A common pattern among developers trying to save time that costs more in the long run is to hardcode a path to an executable. A simple /use/local/bin/ buried in an infrequent job that is installed on developer machines but never in prod is all it would take to cause an incident in prod that costs the company millions. I say this both as someone that has written this and had to fix others committing the same error in their code and QA passing it along.

Ops tends to be where the brunt of technical debt is truly buried. Bad code is one thing but seeing the code in action with real world data is a different beast altogether.


The easiest way to ensure the stability of the systems that you own is to prevent anyone from changing things on them.


Ideally the guarding of valuable company (and incidentally customer) data.


The thing is that any separation in the roles in ineffective. Things shift around some if you embed an ops guy into the dev team directly, but it doesn't resolve the core problem. This applies to DBAs as well as ops or any other software-side segmentation as well.

The core problem is that there are "ops guys" and "dev guys". That creates conflicting incentives, even within the same team. It creates tension and a dynamic centered around bandying work around so that it's "the other guy's problem" in some situations, and hoarding logic onto the one segment so that there isn't an "obstruction" in getting things done in others. Moving the "segmented" guys directly into your team just makes these politics closer to the heart, which is not always an improvement.

Teams should be comprised of whole-platform "generalists" (in quotes because they really should be good at stuff, whereas "generalist" implies they aren't; here I just mean a competent non-specialist), where any single individual would be comfortable/capable performing any particular task that may come up. Of course, each member will have preferences and habits, little "skews", but it is important that these skews are controlled and used for mutual education, and not allowed to "flandersize" someone from "the guy who knows SQL better than the rest of us" to "full-fledged DBA who hasn't committed any C# for 3 years".

The right axis for separation is hardware v. software. If it's software-related, your dudes should be equally yoked, such that any SQL ticket would be assigned to any member of the team, or any "devops"/sysadmin/deployment ticket assigned to any member of the team.

These systems, from the OS up, are all part of the same thing, and they're all tightly integrated. Making the workload of the individual people on the team also tightly integrated is the only way to make sure that incentives align properly and that the most effective technical decisions are made, instead of decisions motivated, consciously or not, by offloading blame or other political/effort/convenience considerations that cause the overall system to suffer.

If you get into a sticky situation that requires specialized help from someone who has lived and breathed MySQL Server night and day, well, that's what consultants are for. Consultants would also be useful for inspections/sign-offs. But your core team can't tolerate being segmented out by component/implementation detail.

> Containers are a smart solution to the build problem.

Linux "containers" are a variety of things. True OS "containers" don't exist on Linux, but there are some rudimentary approximations. A Docker image is essentially a zip file, and sure, zip file-ish things may work fine for uploading artifacts to systems. Dockerfiles are unequivocally terrible, however.


I agree with parent, but I think you're taking it too far. I don't think there are enough skilled generalists to pull off your ideal, and I think software/infrastructure is too complex to allow for generalists in the breadth you describe.

I'm a security person who knows pretty good Python and simple database stuff (SQLite). I think I'm in the top 50% (humbly) of my field, probably higher.

But I don't know front-end, containers/CICD, or disrtibuted systems worth a damn.

I do believe parent, which is the idea that teams should have embedded resources. A "VM security team" operating firewalls and infrastructure and policy auditing should not only have security experts, but their own devops group that automates the crap out of everything, using 2018 best practices. Currently, my team's "dev" group is a separate team in another area whose work queue is fed by multiple, distinct teams. It makes learning and understanding our requirements really tough for them.


> But your core team can't tolerate being segmented out by component/implementation detail.

And yet tolerate it will, because it is somewhat impossible to hire a team composed entirely of people who are each experienced and competent in writing and designing frontends, writing and architecting backends, deploying and maintaining whatever backing services you've using, build and release engineering, Linux, networking, etc. And what are junior developers supposed to do?


> And yet tolerate it will, because it is somewhat impossible to hire a team composed entirely of people who are each experienced and competent in writing and designing frontends, writing and architecting backends, deploying and maintaining whatever backing services you've using, build and release engineering, Linux, networking, etc.

You're right that everyone is not going to start out knowing everything. No matter how senior you get, there will always be areas you know better or areas that you prefer, which are the "skews" I referred to in my original comment. When a new framework or technology or whatever is introduced, only one or two people will know it. That's all fine.

Docker is the epitome of the broken segmented model. Devs hate and resent ops telling them they can't do things. Docker promised devs that if you spend a half-hour writing instructions to build an archive that contains your app's file tree and to pull in a completely untrusted OS userland `nice-mans-alpine:4.x.malware-free`, those annoying ops people will get out of your hair, and you can go ahead pulling `bad-actors-handy-4line-totally-safe-lib` from npm to your heart's content. No more complaints about that package not being approved, or the dependencies not installed, or the runtime too slow, ha!

The whole comment thread on the original article is a case in point. Someone who is responsible for the whole software side of real systems will be horrified at the suggestion of such recklessness. However, developers who're only accountable for pushing "at least one commit per day!", and consider security and performance someone else's problem, will be thrilled at the prospect of "tearing it up with some 10x coding" while they silence "the Luddites". (who, sidebar, were too dumb to see the beauty in JavaScript back in the 90s! Pshaw!)

Which dynamic do you want to encourage?

> And what are junior developers supposed to do?

The same thing that everyone else is supposed to do: learn it, gradually, as needed. Read the docs. Seek mentorship from team members who have that "skew" (formalize this process if necessary). Read the changelogs. Read the code. Figure it out!

Many will protest and say it's outside of their comfort zone. Some will protest and say this is inefficient. That may be true in the short-term, but the system will invariably suffer if you do hard segmentation on the software work, because the falsely-separated concerns won't understand each other and end up setting up territories.

People will hate the DBA because they won't understand why he cares about "boring crap" like "normal form". People will hate the sysadmin because they won't understand why he cares about "boring crap" like "not being woken up at 3am". Your front-enders will be more gregarious and have better haircuts, leading to prioritization of front-end concerns.

Essentially, the project becomes driven by blame-shifting, protectionism, and which software-side segment has the more attractive people, because the concerns are fungible enough that any side could potentially handle them. That makes it a political competition. The project is no longer driven by technical prudence or efficiency. It's no longer about the tradeoffs involved in solving the problem at layer X instead of layer Y.

The dividing lines from OS up are arbitrary. We can't all be experts in all of it, but we can all have the expectation that we need a basic grasp over the whole system, by which I mean the WHOLE SYSTEM, and that we should become competent in the major elements used to build it, and patiently nurse this competence over time.

One team member should be able to handle 90% of the tickets that come in independently, whatever elements of the stack are affected (sysadmin, application code, database, frontend, etc.), and when they hit one of the 10% they can't do independently, they should consider it their responsibility to seek mentorship and learn the skills so that after several such rounds, they can do it independently.

The only real question is whether there are enough people capable of this out there. I think there would be if we set it up as a general expectation. I'm not sure if there are when we've already accepted the segmentation as a fact of life.


>The only real question is whether there are enough people capable of this out there. I think there would be if we set it up as a general expectation.

That strikes me as merely wishful thinking. It's not as if there isn't already research on human cognitive abilities in general.

Do you have any scientific basis for thinking engineers are merely being held back by our acceptance of specialization, rather than by inherent cognitive limitations?


Once the downvotes start coming in, people read comments uncharitably, and the thread gets lost, but to be clear, I'm not advocating for anything that is beyond the cognitive capacity of typical software developers.

One and two-man startups provide ample evidence that working knowledge of the whole platform is not beyond human cognitive scope, even if getting this to be accepted at large requires some extra cultural encouragement and support, and some professional management of individual "skewing".

Once more, it's not that everyone has to be a hardcore expert in everything all at once. You don't want them to be.

You just want your main people to know each platform component well enough to be able to make a reasoned decision about the trade-offs involved in using one or the other for a specific task, and then to be able to own that decision as a group.

If they can't or won't do that, the platform decisions become political instead of technical. I've seen this over and over again, where massive technical problems are routed around because the Java developers have been told they can't touch Ruby, or the C# developers have been told they can't touch SQL, and the real problem never gets fixed, because we only recognize naive, scared "specialists" who insist that they can't learn Python because they're just a PHP developer, so they can't look at that piece of Python that's holding up the thing, instead of rounded, capable "generalists" who can be trusted to call in help when they're getting in over their heads, and may take an occasional "inspection" or two to make sure they're aligned with best practices.

General contractors are not electricians, but they can do a lot of routine work that involves electrical fixtures, sockets, and outlets. You call the electrician for the face-melting stuff.

General practitioner MDs are not dermatologists, but they can do a lot of work that involves routine skin disorders. They'll prescribe creams for fungal infections, rashes, acne, etc. They'll let you know you need to call in a dermatologist for the "skin-melting" stuff.

In software, we don't say "call the DBA for the database-melting stuff." We say "the DBA will write all of the SQL for you." It just doesn't seem to comport to me.


I apologize if I seemed particularly uncharitable, and I think you may well be right that I thought you were advocating for greater depth of knowledge than you were.

However, I still disagree with your premise that it's merely our attitude at large somehow holding people back. Startup founders don't refute my suggestion that there's a cognitive limitation involved, since they're relatively rare and may well have greater capacity to be the generalists that you're proposing. I'm also not convinced that, even among founders, they're as broad generalists as you're suggesting.

You go on to give non-computer examples of generalists and specialists, yet you don't address how it is that specialists are (admittedly only imoplicitly) ok there but not in computer tech.

To reiterate my point about cognitive capacity, if true specialists are desirable, then I allege asking them to be more of a generalist makes them a less competent specialist and therefore less valuable on the market. That's an alternate explanation for extremity of specialization than preconceived notions.

Now, personally, I share your desire for greater breadth of knowledge among all technical professionals, if for no other reason than they might have a greater appreciation for my own specialization. I just don't think it's realistic.


> I apologize if I seemed particularly uncharitable, and I think you may well be right that I thought you were advocating for greater depth of knowledge than you were.

No need, it wasn't really meant to be directed toward your comment specifically. I just referenced that negative misinterpretations are inferred when the comment is grey as a way to remind people that it's not likely someone would advocate such caricatures.

> Startup founders don't refute my suggestion that there's a cognitive limitation involved, since they're relatively rare and may well have greater capacity to be the generalists that you're proposing.

You're right, and I thought of this when I used that example. But by the same token, we can take it out a level further: professional software developers have already shown themselves as having higher-than-average cognitive abilities, because the truth is that the average human doesn't have the cognitive capacity to become a professional software developer. If they did, we'd all be paid much worse.

How far off are founders from professional software engineers? How far off are professional software engineers from the median of adults? How much additional cognitive load is required to be operational in a handful of extra platform components, especially if all those components run the same type of hardware? All good questions that I don't think either of us have ready answers for.

The other thing is that even if this is out of reach for the "average developer", it wouldn't mean that it's not an ideal to strive toward, or necessarily even unrealistic in all cases.

> You go on to give non-computer examples of generalists and specialists, yet you don't address how it is that specialists are (admittedly only imoplicitly) ok there but not in computer tech.

Specialists should exist -- as external reference points in consulting groups.

If you want your life's mission to be building SQL queries, join a database consultancy and deal only with the SQL problems that your clients couldn't figure out on their own and decided they needed to pay $$$ to solve. If SQL and database design is truly your passion, you'll be much happier this way than you would be as a staff DBA redesigning the same rote EAV schema for Generic Business App #29, working slavishly to finish the code for that report that Important Boss #14 needs on his desk ASAP.

Creating a referral-style economy creates a lot more room in the marketplace for specialist consulting groups and gives more specialists greater reward (monetary and emotional). It simultaneously allows "generalists" to stay focused on the big picture of building and maintaining a robust and prudent system overall.

I think it's worthwhile to consider how generalist v. specialist operates in other knowledge fields, and what lessons we can take from that.

I am confident that a generalist ethos is for the best, but I'm not sure we'll get there without better cultural underpinnings, so I'm not making these statements purely out of self-righteousness (maybe only like 80% ;) ).

This dialogue has already been informative and has helped me refine my ideas and hopefully learn to present them somewhat better. Thanks! :)


Phew, this has been a good exercise. Let me clarify the thesis.

The thesis is NOT that a crew of superhumans can supersede all DBAs, security engineers, and infra people in the world.

It is rather that you can be a great software-side engineer, and that you can skew/focus on a few primary concerns, and develop and maintain a working knowledge in the others, sufficient to service your core project's needs.

Specialists can be called in as spot checkers, auditors, or short-term implementers, but they shouldn't be needed for the day-to-day of building, maintaining, and deploying your software.

In software, everything goes down to the same place: the system hardware. And these days at least, this is pretty much homogeneous between software segments. If you know how this functions, the differences are in the modes of expression and the conventions, not really the principles. We can learn the varying conventions well enough to be serviceable in all the elements that we send down to hardware -- not necessarily expert, but good enough for day-to-day work.

I'm not saying that everyone on the team should be better than the best DBA guy you've ever met. I'm saying that everyone on your team should be reasonably confident with SQL. Specialists have a place in your friendly local <security/database/whatever> consultancy.


> In software, everything goes down to the same place: the system hardware. And these days at least, this is pretty much homogeneous between software segments. If you know how this functions, the differences are in the modes of expression and the conventions, not really the principles.

Interesting that you mention this, since I think it's become something of a self-fulfilling prophecy, especially with giant cloud IAAS providers making one-size-fits-all choices of hardware to sell.

I certainly agree with you that that the basic principles are certainly the same, but that ignores the performance (and, arguably, reliability) possibilities that open up when not limited by the hardware (including network) choices of others.


The thing is that any separation in the roles in ineffective. [...] The right axis for separation is hardware v. software.

Any separation is ineffective except along this one particular completely arbitrary dividing line? If that were true we'd still be hunting and gathering and nothing else.


> Any separation is ineffective except along this one particular completely arbitrary dividing line? If that were true we'd still be hunting and gathering and nothing else.

Hardly arbitrary -- hardware is fixed at the time of manufacture. Hardware engineers should be well-acquainted with software concerns and needs, but the years-long feedback cycle and real expenses associated with hardware development creates a natural barrier for work separation, requires different work cadence and much more stringent processes, etc.

This is not to say that a good hardware engineer shouldn't contribute to software and vice-versa, but it is to say that the roles are sufficiently divergent that it makes sense to place them in different segments. That is not the case with anything this side of the operating system, as far as I'm concerned.


It's arbitrary when you claim there are no sensible divisions in software. I think your entire lengthy argument is a sort of elaborate fantasy about how much better the world would be if everyone was just like you or at least, just like you imagine yourself to be. It's fun but not a particularly realistic or constructive way to look at the world.


> It's arbitrary when you claim there are no sensible divisions in software.

It's about the fungibility of the problem space. I don't know how you expect your core team to make reasonable decisions about the tradeoffs if they a) don't understand more than one of the platform elements; and/or b) don't have any responsibility or accountability for the tradeoffs that get made, because now it's another segment's problem. Indeed, when I've been on teams primarily comprised of non-generalists, these decisions were almost always a matter of bureaucracy and politics.

> I think your entire lengthy argument is a sort of elaborate, lengthy fantasy about how much better the world would be if everyone was just like you or at least, just like you imagine yourself to be.

I've worked on teams that were mostly "generalist" and teams where the "generalist" type was either absent or artificially constrained. My perspectives are drawn from those experiences, and have developed based on a hard-earned worldview that says people reliably act in favor of their own expedience. Doesn't seem very fantastic to me. ¯\_(ツ)_/¯


I don't know how you expect your core team to make reasonable decisions [...]

That's how most everything is made, not just software. In the case of software, Fred Brooks added an essay titled "Parnas was right, and I was wrong" in the 20th anniversary edition of The Mythical Man Month about this topic. Itself published over 20 years ago.


> Don't build your containers from public, un-trusted images! Build your own images. Run your own, protected, registry. You still have all of the compliance and validation necessary and you don't end up debugging failed builds because one machine out of a thousand is running on some minor shared library version not supported by your software.

You have just lost all the speed to production advantages of containers.


"speed to production" is not meant to be the primary advantage of containers.

"knowing exactly what you're running and being able to reproduce it" is meant to be the primary advantage of containers.

What you're basically saying is "if your container system admins do their job properly rather than throwing security and reliability out of the window, it can take a bit longer than not bothering". This is trivially true, but not really the point agentultra was making.


If that's the reason, then this is broken from the start.

BTW the more I learn and use nix the more I see that this is the proper solution to what docker is currently marketed to fix.


That's how I always did it (building containers ourselves), and once the pipeline is in place, it's barely more work than pulling public images.

Speed of production advantages are absolutely not due to pulling untrusted containers. If anything, it makes your life harder.

Hard to imagine any serious production setup not doing this... In most cases, you need to modify the containers anyway to suit your needs, and how else are you going to rebuild them all when the next OpenSSL update comes out?


Not true at all. One time setup cost, in house knowledge, secure tools, optimized Dev workflow etc


Have you really? Building a base container to base all further images off of takes about a half hour with our build system. Fuether app builds are down to 10 minutes at a max and can honestly still be optimized. How exactly are you losing all the speed advantages?


No, you still get that. You are mistaking initial speed to production (longer with containers) for amortized speed to production when you scale.


> shackles placed upon them by sysadmins

Well, potentially unpopular opinion here, but an awful lot of sysadmins brought their looming obsolesence on themselves. I'm an app (as in "a program that runs on a computer", not an iOS add-on) developer, always have been. I get requirements from the business types, code it up in vi or Eclipse or whatever, get it working, and then they (the business) want to deploy the working app out to production so people can use it and the business can make money off of it. And, for decades, sysadmins have been a brick wall of pure hostility. They're not all like this, but a lot more are than aren't. Like, I get it - you're overworked and the demands on you are unreasonable. Yeah, me too. But I just work here, man. You're right, I don't know how to do your job, that's why I sent you an e-mail asking you what steps are needed to deploy an app into production since it's not documented anywhere. But rather than just tell me what you need so I can go gather that up, you're going to unload on me because you feel overworked and unappreciated, but you're sure as hell not going to unload on a manager or somebody with actual power, you're going to take it out on the developers who have no pull or voice.


Actually, as a sysadmin, I sympathize with you, since I consider that kind of situation to be a sign of, essentially, bad system administration. It also sounds like it might be at a larger company.

Personally, I've always considered it a significant part of my job to make developers' jobs easier, especially with something like deployments and dependencies.

As such, I disagree that we've brought our own "obsolesence" on ourselves, but I do agree that those of use who have perhaps forgotten that ours is a service profession have hastened its demise.


Like I said - it’s not all of them, but a lot of them. And yes, this is endemic in big companies, not in startups.


Sadly, this attitude among my fellow sysadmins is one of the reasons I avoid larger companies.


> e-mail asking you what steps are needed to deploy an app into production since it's not documented anywhere

Two points, immediately:

1) Can I autoscale the app, as in what is the data/files persistence model?

2) Did you write Readme.md with a) build steps b) networking requirements c) data sources d) authentication model e) database ACID requirements ?


I feel like there has always been a contingent of sysadmin / ops folks who preferred the "Better to ask for forgiveness than permission" model. They still hate when things break, (so not quite fans of developers with a "move fast and break things" philosophy) but they care more about big picture improvements and ease of upkeep than enforcing any particular process. Detecting problems and being able to roll back is typically more valuable than preventing mistakes in many cases. It may be somewhat driven by laziness, but it actually works out pretty well for collaborating with the fast-moving developer types. It also does depend on being in an environment that is tolerant to occasional mistakes or outages.

It makes sense that these types naturally gravitated towards the devops models. I'm really not sure where this leaves the more compliance-minded systems folks though.


> It makes sense that these types naturally gravitated towards the devops models. I'm really not sure where this leaves the more compliance-minded systems folks though.

Working for profitable businesses where stability is valued over velocity.


It's rough because, like many backend type jobs, the best thing that can happen is nothing breaks. Incremental improvements in stability or scalability will not be noticed, but every single change you make is a massive risk of a page at 2am, all-nighters trying to fix things, outage reports, incident reports, root cause analysis reports, etc. You're stuck between process and outcome.

You have to constantly fight the urge to just never touch anything.


This is definitely a rant that obscures the underlying point: the introduction of _untrusted_ or _unreliable_ network resources, frequently hidden in a string of dependencies.

I'm baffled how often I see an someone throw this sort of craziness - "go fetch this thing from some random third party" - into very important places, such as the startup procedures of a container. It's something I see in a culture of the two person startup just trying to get something out the door. It's definitely "technical debt", and frequently, it won't get removed. Thus, you try to scale up to meet load, and all these new instances go time out on the same external resource that's randomly having problems... boom! At the worst possible time. Never mind the potential huge security gaps.

But the specific _tools_ aren't the issue here. It's the culture of "ship something now we'll deal with fallout later". A lot of people start using Docker and won't ever look at the Dockerfile, or, will add a Maven dependency and won't even check licenses or security updates for _any_ of the transient dependencies.

Cloud technologies and containerization make everyone just think "we can do things so fast now" and never, ever pay attention to details that can come back to bite you.

On the flip side, it's a good time to be in cybersecurity; because this cultural problem will never, ever, get solved. :)


At the end of the day it comes down to the fact that businesses just simply don't care (Equifax etc).

They like the idea of security and that's where it ends.

In many places if you try to "do things right" you will get fired in two months for being too slow/strict and they will happily replace you with a clueless easily trusting person who "goes and fetches things from random third parties".

Many times they get lucky enough to survive and they don't appreciate the risks that they took. That pace becomes the expected norm and sets the theme in the industry.

And when shit hits the fan the PR person writes a "we are oh so very sorry .. security is totally our number one priority" blog post. They blame and fire the poor bastard and replace him with another warm body.

When it comes to these "hidden" things like security companies do not reward and also punish "doing things right" so on average and over the long term we end up where we are today.

When the culture sufficiently shifts towards being sloppy you will get hammered down quick if you try to be the voice of reason because it ends up being you vs everyone else (the norm).


> just simply don't care (Equifax etc).

And, honestly, why should they? Security breaches have yet to hurt an actual company (they hurt users plenty, but not the organization that's actually responsible).


Data breaches are climbing in cost to organizations. Here's a claim, that in 2017, the average breach costs $3.62 million. https://www.scrypt.com/blog/average-cost-data-breach-2017-3-...

(I've seen similar claims in different ranges. Costs of breaches in the US are pretty high - over $7 million.)

Even Equifax probably wants the 3-4 billion in valuation it's lost since the breach.

The solution appears to be buying tools to avoid and respond to breaches quickly, instead of engaging and building in security awareness. (Microsoft's security development lifecycle comes to mind.)

IMO, both approaches are likely cost effective, though I have no numbers or research to back that up.


You know, I used to agree with you. But the reality is you have to weigh the massive productivity boosts that things like docker bring to the table vs. the potential issues it can bring. To a large degree, good perimeter security mitigates a lot of the concerns of containers themselves running slightly out of date software.


> To a large degree, good perimeter security mitigates a lot of the concerns of containers themselves running slightly out of date software.

this is a very naive way of setting up a secure production enviroment.

Your perimeter security is worthless if you are loading non public images which have malware or even worse, unknown malicious code in them.

having a data breach or hack on your hands is something which could kill the company. That risk is not worth having a slightly faster productivity boost because you or your ops team is not able or willing to build a proper private repository setup.


Yes, it's definitely not something related only to Docker, it's today's culture of trusting all possible code only because someone placed it on Github.


There's so much leverage in that code, though. It's clear how the culture evolved.


As a "major theme", the author takes:

> Consider for example Hadoop. Nobody seems to know how to build Hadoop from scratch. It’s an incredible mess of dependencies, version requirements and build tools.

And as the major introduction to the blog post:

> I’m not complaining about old-school sysadmins. They know how to keep systems running, manage update and upgrade paths.

Huh? Old-school sysadmins know how to keep systems running, manage updates and upgrades. At the same time nobody knows how to build Hadoop from scratch. At the same time, Hadoop build instructions themselves have curl|sh scripts or mirrors and the wiki page is outdated. And it uses Java (and thus maven/ivy). And that downloads the internet.

According to the blog, Hadoop, maven/ivy/sbt/any dependency manager, package managers, and everything is broken. But the tagline is:

> This rant is about containers, prebuilt VMs

What does any of this have to do with the "Age of containers" and pre-built VMs? Is the author just talking about Gentoo/LFS-style "compile the whole system from scratch"?

This feels like an incredibly rushed rant. I can only envision the author requiring to setup hadoop for the first time, breaking their head for a few days (it happens), and taking it out on everything.


I think the logic is, if we didn't rely on Containers and prebuilt VM's, Hadoop had to be easier to build to be useful.


The point everyone seems to be missing, and the one I think most important, is that we're no longer building from trusted sources.

Build systems just download and run random code from the internet without verifying that its the correct code, from the correct source.

Its a ticking time bomb.


There is SSL/TLS, unless it's done wrong (invalid certificates get ignored by the dependencies manager), it's safer than the old "md5 of the file" systems.

Now, some dependencies are fraudolent (especially true in the Javascript world because it eventually targets a lot of user browsers), but nobody ever checked the sources anyway...


TLS only verifies that have connected to the correct server. It can't verify whether the package on the server has been replaced by a malicious one. For that, you need a "md5 of the file" (these days, a sha256, because md5 has long been broken).


You need to make sure the hash is also not tampered, both on server and in flight to the user. How do you do that?

If the answer is: use TLS, there is no point in having the file hash at all.


No, the answer is to use PGP and a manifest hash.

This is how package managers work. TLS doesn't replace those.


Which isn't really true; as a sysadmin (I'd say "former", but once you're a sysadmin, you're always a sysadmin), I've seen lots of things with horrible build and dependency nightmares, and that was before package managers, containers, and virtual machine images became de rigeuer.

Think of a self-hosting programming language: you can't build it without a running installation of a previous version. (Anyone remembering "On Trusting Trust" at this point?) Or any application in an image-based language like Smalltalk. Development becomes path-dependent. It's inevitable to get into a situation where A and B cannot be made to work together, except in a derivative of a version that someone, somewhere made while holding their mouth the right way.

Pre-built containers and VMs are an admission that path-dependence is the way stuff is supposed to be.


I think that is the author's logic. Except it's not very logic, since Hadoop (or Bigtop) doesn't use either.


Picture this: you need to use Hadoop. Do you:

A) work through building it yourself, or

B) get a container that claims to have a running Hadoop and hope it works for you?

If B wasn't on the table, what would happen?


If I need to use Hadoop, I'll download one of the pre-built binaries that they offer on their site.

You'll notice that the Debian Wiki users have given up on building it since 2010. That was three years before Docker even appeared. Almost nobody was using containers back then.


Hadoop isn't even that difficult to set up. I've built it from source, and installed it from binaries.

Containers are totally unnecessary here, just as they are for most java apps.


That's like saying, "If we didn't invent the internet, we would have never had privacy issues". OK, so if we didn't rely on containers - would hadoop have had a perfect set of packages for every distribution? Let's say that the packages for Arch linux were broken. What next?

That's the whole problem with the article. It takes a problem (building Hadoop was bad), correlates it to a completely different tool (because we have docker, hadoop build scripts are bad), and goes on to rant about everything else.


I've seen this brewing for a while, and getting worse and worse. Back in the 80's and 90's, there were developers who would code their own sorting or hashing routines rather than linking in some external library to handle this "solved" problem. The perjorative term "Not Invented Here" (NIH) grew to describe those developers and they were shamed into reusing code whenever there was code to reuse. And in some cases (like sort routines), it makes perfect sense. However, NIH accusations have grown to "if there's something vaguely similar to what you're writing, you must use it, even if that involves more custom coding to artificially bend it to the case at hand than you would have developed in the first place", culminating in things like the completely empty, useless (but enormous) Spring "framework" or, to a lesser extent, things like Angular that sort of do some things, but create far more problems than they solve (and definitely add more development overhead than they remove).


Interesting take on the NIH term. I thought this was more an ego thing for the big tech companies. They love to reinvent existing things to look like geniuses


I think you're correct on the origin of the term, and I should have mentioned that - but in the past few decades, I've been accused of "NIH"-ing whenever I've "rolled something" of my own from authentication to IoC. Just because there's a library that has a particular description attached to it doesn't mean that it should be used as often as it can be.


The author has added an update to the bottom of the post which I think makes his main intended message clearer:

Update: it was pointed out that this started way before Docker: »Docker is the new ‘curl | sudo bash‘«. That’s right, but it’s now pretty much mainstream to download and run untrusted software in your “datacenter”. That is bad, really bad. Before, admins would try hard to prevent security holes, now they call themselves “devops” and happily introduce them to the network themselves!


It's a rant, my inclination is that it has to do with a very specific situation the author is facing at work.


The author should have grabbed the .sdeb or the debian build scripts and tore them apart if they really wanted to make a point (if, upon examining the build, there was one to make).

I mean there is a lot of cognitive load/disconnect we're talk about. As an ops guy, I can't look into every package. That's why I trust the package manager (apt-get, yum, whatever) and all the build maintainers who either volunteer or work on for Redhat/Canonical/SuSE/IBM/whoever.

Things get through. That's why we have all those security people out there who are digging around for bug bounties and find crap like the recent Ubuntu Snap package craziness.

Docker containers can be good. You can use an official Ubuntu or Alpine image, build your base, and create scripts to make sure your base containers don't go out of date. Most people don't do that. The official Docker containers are kinda a mess, but at least they're maintained. Grabbing some random container off Dockerhub? Yea that's not going to end well; unless you just use their source to build your own. Or if it's a container continually maintained but the person/company who wrote the service.

Docker containers do need better security introspection and that's going to be a big deal going forward. But this article is all rant and some, but not enough, substance.


"Docker containers do need better security introspection and that's going to be a big deal going forward."

Exactly!

And npm. And maven. And every damn package system for every damn programming language since package systems are now a requirement.


Yes, but shouldn't you have separate "build" and "deploy" container images? You should "build" a particular version once, "deploy" the result into a test environment, test it thoroughly, and then "deploy" to production, right?

This is not my job (yet). Please tell me if I'm wrong, because I'll need to do it in the next few months.


Can I complain about pre-built VMs?

I swear by the time I figure out my path problems I have 4 folders named /code/

And my working file system is /code/code/index.php

So when I run node.js (to compile my .scss) it messes up the folder pathing and ugh... why cant we do everything on production?

/rant


> None of these "fancy" tools still builds by a traditional make command.

Is there anything more "get-off-my-lawn" than "These tools don't use the thing I like!"


But I just don't understand why we have to have 47 half-built over-complicated build systems or job runners or whatever the new fad term is for every language, when there's something that does what they all do, is battle-tested, and has been around for decades.

Everyone repeat after me. Makefiles are not scary. I can write a shell script. Do I really need to learn grunt/gulp/webpack/npm/rake/fake/maven/gradle/ant and on and on and on?

Probably somebody has released another one in the time it's taken me to write this comment.


Makefiles aren't scary. But they're also not particularly good.

I use Rake (or Gulp, or whatever) because then I can use Ruby (or JavaScript, or whatever). Shell plumbing is fine for informal and small-scale stuff, and I make my code conform if somebody down the line (who may be me) wants to get out their duct tape, but the world is more complex than what /bin/sh can see. Shell is the lowest common denominator. Expecting everything to at all times be written in and for that lowest common denominator is not reasonable. We're a tool-using species and we refine tools over time to make them better. The profusion of tools happens because they iterate on each other to be better. If old tools were sufficient, people would use them because learning new ones is hard.

So, yes, you do need to learn those tools. Or invent a shell that isn't tooth-pullingly difficult to use with a JSON file (and do not say `jq`, I love `jq` as an inspector but it does not step to `JSON.parse` and a working subscript operator). Or change `make` so that a git checkout won't trigger a full rebuild. Lots of baseline, stump-simple things that `make` is just not going to do for you because it's built for a frankly outmoded method of development.


> I use Rake (or Gulp, or whatever) because then I can use Ruby (or JavaScript, or whatever).

You can use Ruby in Make. I don't know Ruby but here's some Python:

  SHELL = python
  .SHELLFLAGS = -c
  .ONESHELL:
  .DELETE_ON_ERROR:

  foo.txt: bar.json
      import json
      with open('$<') as input, open('$@', 'w') as output:
        output.write(json.load(input)['the_text'])
(Caveat: I've never actually tried this at scale.)

You can even use different languages for different recipes with per-recipe variable settings.


Ha, that's a neat trick! Trouble is, for either Python or Ruby it becomes tricky due to stuff like dependency management. You'll have to `bundle exec make` to get sane library paths for Ruby or `pipenv run` for Python, etcetera etcetera.

At that point I think you might as well just use a language-native one.

Great pull, though.


The GP does this via a neat hack, but you can also do this in a much more understandable fashion by simply having the body of every Makefile rule start out by shelling out to some script in your favorite language.

I think you're (understandably) misinformed about what Makefiles do because you've run into some bad ones. The thing they're doing is managing a N-level deep dependency tree in a declarative way. So if A->B->C you can run something to generate C, then B can run, and finally A, and this can all be done in parallel for hundreds of files.

On the individual rule level this is really simple, e.g. just turning a .c file into a .o file, then finally some rule that depends on all *.o being generated creates a program out of them.

The language-native ones are usually much worse. They're easier to use at the outset because they don't make you create this dependency graph, but that also means that they can't run in parallel on your N cores, and they usually suck at incrementally updating the build.


> The GP does this via a neat hack, but you can also do this in a much more understandable fashion by simply having the body of every Makefile rule start out by shelling out to some script in your favorite language.

I'm not sure what you mean? How would this allow you to use, say, Python as the language for recipes? Just having Make drop straight into Python kind of defeats the purpose of Make.


You'd use Python as the language for the recipe that turns (in this example) a given .c file into a .o file, while leaving the Makefile to do what it's good at, declaring the DAG dependency tree needed to incrementally build it.

The point is that people conflate these two things. They open some random Makefile and see that it's mostly doing stuff with shellscripts, and think "oh I should do this all in Python", and then write some monstrosity that doesn't make a dependency DAG and thus can't run in parallel or be easily understood.

Instead they should have split the actual logic they found in shellscripts in to Python scripts the Makefile invokes.


Nevermind, I misread you. I missed "rule" where you said "beginning of every Makefile rule". (I thought you were suggesting just having the default rule run some enormous Python script, which I've unfortunately seen before.)


> Shell is the lowest common denominator. Expecting everything to at all times be written in and for that lowest common denominator is not reasonable

Is it that hard to learn shell? Why is it so painful? What makes it the "lowest common denominator"? I use it all the time, but I admit at work I am one of the few.

> Expecting everything to at all times be written in and for that lowest common denominator is not reasonable. We're a tool-using species and we refine tools over time to make them better.

This is too vague. What makes a Ruby based build or JS based build a more "refined" tool? It sounds like familiarity is the real issue here.

> If old tools were sufficient, people would use them because learning new ones is hard.

How many people even know Makefiles these days anyway? The "modern" approach seems to be, learn a programming language and then try to do everything inside of it. Some languages are more interested in this cloistered philosophy than others (like JS).

If anything, I think the reason these build tools keep being proliferated is because nobody wants to learn anything more than the bare minimum to "be productive" (which, depending on what you're working on, can be anything from pushing out customer demos for a company that will never sell, to microservices operating at scale). Learning a language and never leaving its paradigms/comforts is easy.


Your second paragraph got to the heart of it. If we want to use some standard build toolchain, it needs to use a nice language and not feel obscure. I was explaining to someone a bash script I wrote, and he said "why not use Python". There were reasons but... he was right, Python would be much easier to use and maintain, and we have a lot more developers who know it.

That said, Maven is incredibly suck-tastic.


Eh. It's not my favorite thing out there, but Maven's fine for what it is. It's designed for and explicitly for well-behaved Java artifacts. If your Java artifacts are not well-behaved, you're going to have a bad time--in my experience, most of those cases are doing things you probably shouldn't be doing.

(You may be a wizard and have a reason to do them, for sure--but that's what writing Maven plugins is for. Or not using Maven. You've got choices.)


Given the limitations of the platform, there really isn't a such a thing as a well-behaved JVM library that depends on other libraries, unfortunately. Oracle really dropped the ball by only serving their own needs with the module system.


Can you expand on this? Having done a pretty decent bit of JVM development, I've never really run into issues even doing some not-out-of-the-box stuff.


What reasons? I did some ruby shell automation and it’s dead simple to get it working and working correctly. I imagine python has a similar story.


> `JSON.parse` and a working subscript operator

https://github.com/ingydotnet/json-bash

    source json.bash
    
    json='{"name":"Jason","friends":["Jimmy","Joe"]}'
    JSON.load "$json"
    joe=$(JSON.get /friends/1)
    JSON.put /friends/2 Jeff
    new_json=$(JSON.dump)
It's still not as trivial as javascript, but "tooth-pullingly difficult" is a little unfair.


Again: sh, not bash. If I have the freedom to specify bash, I can specify something better than bash or sh.


> but the world is more complex than what /bin/sh can see

That is blatantly false, but I get the premise behind it. Writing and maintaining complex stuff in bourne or bash is not fun or easy.

I always use bourne until certain level of complexity or awkwardness is reached. It's pretty easy to write and dead simple to troubleshoot.


Blatantly false? OK, put up. Forget `sh`'s JSON parsing, I'll make it easy on you--show me its arrays. Arrays, plural, you can't use $@ as a bail-out; I need more than one. Show me hashes. Show me sets. Show me the basic building-block primitives of software, because build pipelines are software. It's 2018. If your language can't do this stuff without babysitting, it effectively can't do it because nobody's got the time to babysit your easy-25%-of-Perl language

And yeah, I did say `sh`, because that is what you can practically be expected to have kicking around alongside `make` on a system where I can't just install something worthwhile. If I have them, then there's no reason to write much harder to troubleshoot shell scripts (and, thank you quotebarf, more likely to be incorrect) than to open Pry.


Bash, but...

> --show me its arrays

  declare -a
> Show me hashes.

  declare -A
> Show me sets.

See above

> Show me the basic building-block primitives of software,

  thing() {
     local localVar
     # Yes, it would be nice to have lexical scope as well as 
     # dynamic scope, but...
  }


Sure. All of which are clunky, opaque, and harder to work with than any other language I have used in the last decade.

And besides: if I can install bash, I can install something specifically better for whatever I'm doing instead.


Clunky is subjective -- I think we can probably agree that there are clunkier, more opaque, and harder to work with constructs in programming than a slightly different syntax for array declaration.

What are these build systems where you need to install Bash? Bash 4 was released in 2009: it's been in every major Linux distribution for at least two major versions, ditto for FreeBSD... heck, even Solaris ships with it now.


You're right, there are clunkier, more opaque, and harder to work with constructs in programming. Like `sh` arguments. And like `sh` quotebarf.

And Busybox is commonplace. Systems that include Busybox usually do not include `bash`. And so, for my purposes, if I can specify `bash`, I can also specify, say, Ruby, which--while by no means perfect--makes life much, much easier.


Speaking as an (ex- (sort of)) sysadmin,

"All of which are clunky, opaque, and harder to work with than any other language I have used in the last decade."

Wrong question. A better one: Is it more clunky, opaque, and harder to work with than every other language that's appeared in the last decade? Because no one seems to agree on what is specifically better.


I disagree that it is the wrong question. "Every other language" doesn't matter because I don't value homogeneity and I think that homogeneity of programming language is a fool's errand. I am comfortable shipping production code in most of the languages in current use; in my estimation, none of the major build systems out there are as opaque or difficult to use correctly as make/shell.

(And I am what a current sysadmin would be if we did not call ourselves "devops engineers" now.)


And you are definitely not a sysad or any sort of sysadmin. The core mission of a sysad is to build the best environment possible while restricting that environment.

You seem to be the sort of developer that developers love and the sort of sysadmin that gets fired in the first week Just my .02 after two and a half decades.

Not to say that you are entirely wrong or your approach doesn't have merit in the new world (especially SV). But it doesn't work for sysadmins and production environments anywhere but your bubble. Not yet.


Well, you're right, I'm not a sysadmin. I pervasively automate, which often sidelines sysadmins, when it doesn't make them redundant. I write code and I don't touch production machines except in extremity, neither of which apply to most (though by no means all) of the people I know who want to call themselves a sysadmin.

Anyway, the core mission of anybody touching the stack is to enable the business to achieve its goals. Nothing, and I mean nothing, more. "Restricting that environment" is appropriate in some environments, and a number of my clients bring me in to help with that. Facilitating developer velocity--and, yes, developers do tend to like me, because I'm good at this while achieving goals around security and uptime--is appropriate in, probably, more. Pays better, too, even if it shouldn't.


It's not that sysadmins cannot do the work you are rightfully proud of. If there are two basic things that differentiates your statements from those of a traditional sysadmin it is these.

1. Design. 2. Discipline.

Where these two values are dispensable long term devops and the new world shine through. I've worked in both worlds and the only mistake is assuming one size fits all.


In general you seem like an absurd sort of creature. Neither here nor there. Bragging about your facility and business velocity. Everything you claim to do sysadmins were doing in 98 and with equal velocity and adequate coverage.

There were guys like this that were there before and helped everyone along: https://www.computerhope.com/people/don_libes.htm


At the risk of being too "meta", although I agree with what I believe is your point about good sysadmins having been advancing automation (and otherwise keeping business needs in mind), I worry that you're distracting a reader from that point by what reads as an ad-hominem attack in your first sentence.

I'm still not certain what point you were trying to make with "neithere here nor there", however.


Have you SEEN how Bash array indexing works? It’s not exactly user friendly. Same with Bash hashtables (which I don’t think are really hashtables).

I love Bash and use it way more often than I should, but Bash is not a friendly language. Mocking and stubbing functions is really hard to do, which makes tests awkward, even with BATS. And you need to watch the file descriptor to which text is sent to avoid things get globes unexpectedly. And you need to properly scope your vars to avoid unexpected collisions. And everything regarding sourcing and importing dependencies. Etc.


>I use Rake (or Gulp, or whatever) because then I can use Ruby (or JavaScript, or whatever).

This is an anti-feature. When you need this, your build is too complicated.


But they're also not particularly good.

Million+ line codebases got errm made with make. In my experience people most sneering at it have far lesser demands on their tool...


Million+ line codebases were written entirely in COBOL.

The world advances.


Some were, I guess, but hardly all of them. The Linux kernel is built with Make, for example.


There are been several attempts at introducing a new build system, but they have been all shot down over introducing dependencies.


If it ain't broke don't fix it!

How many features could people have implemented or bugs fixed or general progress made instead of obsessing over tooling?


It's weird that you use the word "advances" to refer to JavaScript-based build processes.


I've written a makefile from scratch.

My challenge to you: I want a makefile that has 20 third party dependencies and can be built on osx, linux, and windows.

I can do this within an hour with gradle, ant, or maven. The ecosystem doesn't exist for this in make, and anything I could come up with to make it possible would end up being a tool that would look like automake and the monstrosity that it entails.


That's a bit unfair, because make relies on the underlying system capabilities much more than Java does, and Windows' just isn't up to snuff. But for the other platforms, autotools definitely can do what you ask.


> But for the other platforms, autotools definitely can do what you ask

Ugg... autotools. Yeah, lets use what feel like build tools created way back in the 60's or something. Sure they "work" for some value of "work" but oh boy are they ugly and nasty. Good luck hiring anybody under the age of 40 who is gonna be willing to work on such a clunky old tool.

There is a reason why the world is moving to build systems that replace the Make toolchain.


"I can do this within an hour with gradle, ant, or maven."

For something that isn't JVM-based?


Certainly not for all but the most trivial cases, but I'd use the language's build tools.

The industry has moved to language-specific build tools and ecosystems, and the article and is complaining about this in part, no?


"osx, linux, and windows"

and you've already made sure they are running the same JRE version, right?


Come on...bundle your JRE. I don't know why anyone wouldn't in the age of terabyte hard drives and ubiquitous "small" apps with sizes that make a bundled JRE with all the trimmings look lightweight.


And while you’re at it can you make it idempotent when building sub artifacts which inputs haven’t changed.

And can it do it all incrementally in parallel too please because enterprise shops tend to have a ton of code

Oh and it would be really nice if you could make it so if someone else somewhere in the org has compiled that thing then could we just use their computed binary to save the time compiling locally.

THANKS!

;)


> But I just don't understand why we have to have 47 half-built over-complicated build systems

> Everyone repeat after me.

I don't mean to pick on you specifically here because this attitude comes up a lot. In short, a lot of people are doing a thing, a thing that you aren't familiar with, and your gut reaction is to say "everyone: stop doing that, and do what I say!".

Wait a second and consider that maybe there is a reason why this incredibly large number of developers are using these tools. That perhaps they evaluated various different options and decided that what they are using is more suitable than make. Maybe you could find out.


> Wait a second and consider that maybe there is a reason why this incredibly large number of developers are using these tools. That perhaps they evaluated various different options and decided that what they are using is more suitable than make. Maybe you could find out.

Here we have another fundemental "problem" between dev and ops.

The inherent friction because of different areas of concern.

Dev's want to build fast and create new features, but sadly (even with the whole devops notion) are usually not thinking about viability in production.

Ops people need to keep thing stable, something which is sadly undervalued a lot of times in companies.


Put devs on 24x7 escalation above your level-1 ("Is it actually broken?") NOC. Developer attitude would change immediately.


Sadly i have seen far too many devs go "you are holding it wrong"...


"You are holding your paycheck wrong"


"It works for me."


Not really? A lot of build tools are chosen specifically to make the build process more reliable and understandable. For example, build tools like Maven handle dependency management, which is hugely beneficial in ensuring your builds are consistent and work the same in different environments. Makefiles are shell scripts.


Makefiles are definitely not shell scripts. Individual recipe lines are executed using Bash by default, but you can change this. (Heck you probably could even get away with Python if you really wanted.)

The rest of Make (which is 95% of what's in a Makefile) is its own language, which is actually not too bad for what it's intended for (munging around filenames), and has the flavor of a functional language.


"Maven handle dependency management"

https://maven.apache.org/pom.html#Dependency_Version_Require...

Don't forget to go the extra mile with your dependency versioning to ensure you actually get repeatable builds.


Not taking sides here but you picked a bad example. Makefiles specifically handle dependency management, but designed from a compiled language perspective. Make sure you build this .so before you build this bin, or that this directory exists, and so forth.


That's fair, but it still reinforces the point: Makefiles are great from a compiled language perspective. Other build tools are better from other perspectives. It isn't wrong to choose a tool depending on your needs!


"Wait a second and consider that maybe there is a reason why this incredibly large number of developers are using these tools. That perhaps they evaluated various different options and decided that what they are using is more suitable than make. Maybe you could find out."

Have you considered that perhaps the previous commentor is familiar with the other tools? Or perhaps that the large number of developers have streamlined their particular workflows for their particular use case and have not considered the flexibility needed for other cases? Or perhaps that there is a cost associated simply with having 47 different build systems?


If the previous commenter was familiar was familiar with 47 build tools and has discounted a lot of them as extraneous. Is he advocating building an npm module with make? What about a jar? Nuget package? Cargo module?

Its seems more likely that they’re frustrated about the sheer amount to learn and they probably know make quite well.

But there are real advantages to these tools. There’s a lot of vanity libraries out there but not many vanity build tools.


My real issue is the constant wheel-making impulse, that leaves us with a shattered landscape of people tripping and falling over busted-ass and abandoned wheels. I have a JavaScript front-end project that uses three different package managers, and four different make-analog tools, plus some batch files sprinkled on top to orchestrate the common use cases of this monstrosity. Nothing that we are doing here is that complicated, except for this sea of shitty half-baked ecosystems around these tools, each of which was considered best-of-breed at one point.

What we want to do is slurp some text files up, apply some transformations to them, glob them together, run them through the minifier, and dump them into a final output. This is exactly what a decent makefile would fit well for. Instead, I got this, because apparently nobody wants to use anything that isn't the hot new way to do things, and so very few people have even had enough exposure to know that there are tested tools for these kinds of things that existed. The last time I was using make beyond trivial uses was a decade ago in college (incidentally, I actually did build jars with make, easier than fighting Eclipse...). But just knowing that something exists is more than half the battle, although the problem then is that it's frustrating as all hell watching the same ideas cycle round and round and round every couple years.


I’ve wasted enough time tinkering with gulp, grunt and webpack to sympathize


> I’ve wasted enough time tinkering with gulp, grunt and webpack to sympathize

And yet, those tools fill a need that would be very hard to replicate with the toolchains that came before it. Good luck doing half of what gulp / webpack do from, say, a Makefile.


I'm not familiar with either of those particular two tools, and what you say may well be absolutely true for both of them.

It's just that this argument keeps getting used for every single new "reinvented wheel" (to borrow from the GP). Sometimes the argument is as strong as "it couldn't be done any other way" and sometimes it's as weak as "this one is just incrementally better," but it feels a little like crying wolf.

Was it really "very hard" to make the old wheel do what you needed, or perhaps somehow extend it or add a library, or was it just far more fun and exciting to build something from scratch?

I generally don't mind a proliferation of tools, except when they start to break or conflict with each other, which is, I believe the GP's main concern, and at least tangentially related to the article.


The value of gulp / webpack et al are the plugins.

I've had success in several JS projects using make in combination with the shell version of a lot of these plugins and npm scripts.

There is value in both approaches.


Well you use the right tool for the job. I hope to god you're not using a Makefile for a Java or Scala project. You better be using Gradle, SBT or Maven.

If you're building in Elixir, I hope you're using Mix and not a Makefile. If you're building in Rust, I hope you're using Cargo, or some other Rust specific build tool.

And Makefiles do get stupid complicated when you need things portable and to work on Linux, Mac and FreeBSD; or allow them to have optional dependencies. That's why we have autoconf and all that ./configure && make && make install craziness.


I wish there was something like an updated Make, a tool that works for everything but updated to 2018.

For instance Make works based on timestamps and therefore works very poorly together with git. Switch to another branch and you can get weird effects based on what files were updated and not and often trigger needless rebuilds. And everyone uses git these days.

GNU Make, just using hashes instead of timestamps, would be a huge step forward.


> I wish there was something like an updated Make, a tool that works for everything but updated to 2018.

Is ninja that tool?


> GNU Make, just using hashes instead of timestamps

Sounds like you're describing make on a system with ccashe installed. Hashing incurs a significant performance hit. The first build with ccashe is 20% slower than building without it[1]. Your modern make would likely be slower for people who just build something from source once and aren't doing incremental development.

https://ccache.samba.org/performance.html


ccache is just for c/c++..

Did you consider that it could be other things with ccache that makes it slow? E.g., the need to fully expand C headers and turn it into a text stream (NOT needed for a build tool).

Everything with git is lightning fast including operations that require a full directory tree diff / hash.


Gradle does this. There is definitely a few extra seconds of overhead compared to make, but that's hardly intolerable.


That'd require hashing every file on every run to figure out what changed. Not a good idea.

There are modern tools that work for many things. Gradle supports native compilation these days as well as any JVM language. Bazel supports compiling many kinds of languages, and there's still scons.


I've tried scons, waf, CMake, etc, always ended up hating them and return to standard Makefiles. SCons is the worst I've tried ever.

As to the cost of hashing: "git status" runs on 0.009s and requires a full diff of my working directory?..


>That'd require hashing every file on every run to figure out what changed. Not a good idea.

it is using time stamp to check whether to hash to check for changes.


CMake?


CMake generates a standard Makefile for dependency change tracking doesn't it? Just timestamp-based on hash-based?


We have 47 half-built over-complicated build systems, because writing build systems is hard.

Writing correct Makefiles is also really hard, and good luck debugging them. Make is simple and beautiful for small, self-contained projects without many dependencies, but it does not scale well (and even then it's tricky, see [0][1]).

Having 47 different build tools with their own flaws is certainly bad, but they exist because of Make's shortcomings. Just saying "make is fine, use it" won't fix anything.

[0] https://cr.yp.to/redo/honest-script.html

[1] https://cr.yp.to/redo/honest-nonfile.html



Did you mean to paste two different discussions? Those are identical.


Gah! Yes. The other one is:

* https://news.ycombinator.com/item?id=15044438


Systems besides make are used because they offer advantages make doesn't


For many of those systems, the biggest advantage is often that tasks are written in the language of the application.

A lot of the Rake tasks I've encountered in my career would have been easier to write in bash. Not a majority, but a sizable minority. I suspect that in many cases, the gain was that the authors were more comfortable in Ruby than in bash.


I'm pretty comfortable in both, but I'll pretty regularly use Ruby via Pry for stuff I know I can do in bash. It's easier to write, much much much easier to write correctly, and it presents a unified interface to other developers.


Language specific build tools will usually behave predictably across all development platforms supported by the language. Once you start building on a third party, developer experience will be bounded by the quality of platform support of that third party. You don't want to send everybody off on a hunt for the right version of Python unless you are Python.

(Just picking Python as an example because of the recent xkcd. The same is true for everything else, e.g. a Windows computer will rarely contain exactly one make.exe, it's usually either none or a whole bunch of them)


I've uses my fair share of build tools and I think makefiles are horrible. Stringly typed, ad-hoc features, and a really bad language from a PL perspective.

Writing good build systems is genuinely hard, and I think make is not a good build system.


It is, unfortunately, better than most of its successors, at least in terms of generality.


As a counter-example, this is a few hand-picked sections from a Makefile for OpenSSL.

https://gist.github.com/anonymous/d885c3bb66a319d22f9d60c1ef...


Maybe you should spend more time learning what value adds they provide.


Many of these are tools which download code dependencies from package repositories and integrate them into a codebase or build. Make doesn’t do that, even configure doesn’t do that.


Anyone remember the days when you didn't want to do that?


Days like today? As someone who is often working on machines with no network access, I vastly prefer build and deployment processes that don't need to go download crap.


careful, your dinosaur scales are showing.

make is so bad you need automake to manage it.

there are much better tools. sadly, nothing LCD (least common denominator) so as to gain wide traction.

That said, for anyone distributing software, shame on them for not packaging their custom build so as to be runnable via ‘make all’ (just using make to drive everything else).


Make is only bad under the Autotools mess. Every build-related struggle in an open-source project that uses Autotools can be traced to Autotools, not to make.

Autotools wasn't invented to overcome deficiencies in make, but deficiencies in C portability across Unix flavors.

Those deficiencies are greatly diminished today, both by POSIX standardization, and there being fewer viable surviving Unix variants that anyone cares to build for.


Don't tell them: show them. I've found they get quite a reaction to the mess underneath illustrated by the PHK article below.

https://queue.acm.org/detail.cfm?id=2349257


Exactly. Don't forget `pkg-config` [1], which I've found eliminates most cross-POSIXish-platform linking issues.

[1] https://en.wikipedia.org/wiki/Pkg-config


I've used make for 15 years and never once needed automake. The company I'm currently at uses straight make without problems for a 500 kLOC code base of multiple languages, 3rd-party code, code generators, and unit tests. Our make code totals 1000 lines.

I've seen plenty of messes using scons and ant, more so than I've seen with make. Make is a solid tool.


Across multiple operating systems (unix and windows)? Does it fetch and install 3rd party dependencies? Can a noob maintain the makefile without pulling their hair out?


Old way: we have one tool, it’s got a few quirks but it does most of what we want and then gets out of the way. Noone is impressed by this, tools are supposed to just work aren’t they?

New way: we have a dozen tools, they collectively do less than the old one and integrating them is a full time job for someone since any upgrade breaks something. But we all get to put the names of all of these things on our CVs! And that’s what’s important.

I wish I was kidding.


In reality it's all peachy until stuff doesn't work and no one knows why, or how to investigate an issue or remotely where to begin to troubleshoot it. Change and evolution is good, but I think there is still a lot to be said about knowing the basics to anything. Levels of abstractions eventually hurt more than it helps.


I think that may be hehind some of these tales of 'luddite sysadmins'. Sysadmins need to keep things running and complexity and dependencies, even if they being convenience is something that makes them nervous. It's not about being a luddite, its about being able to hold a mental map of how it all works, so that when it stops working you can dive in.


And not to mention getting called at 2 am because something didn't build or the release bombed and having to examine, for the first time, some over-complicated mechanism to build things that goes through some pipeline where you're eyeballing large log files full of long exception chains.

Devs see the world as one where velocity and progress is among the most important, while 'sysadmins' (a term no longer used by companies and recruiters, unfortunately) have to worry about keeping the applications actually up so they can be used.

DevOps just seemed to have swept the sysadmin under the rug under some pretty words about breaking down barriers. It feels more that devs broke down the wall sometimes. The amount of infrastructure chaos and mess and confusion I see today (as a contractor bouncing around different places) seems higher than traditional infrastructure shops of 10+ years ago.


> It feels more that devs broke down the wall sometimes.

That's because only programmers get Peter Principled into the manager roles[1] that decide on whom to hire and how to run "Devops".

[1] or become startup founders


This is very true. Having a mental map of how things work is vital for correct (and good) troubleshooting.

Also, a ton of people use abstraction as an excuse to not learn a body of knowledge that contains fundementals (especially on the systems side).

In comparison to network engineering for instance, where having (quite) low level knowledge of how protocols work and interact with each other is vital.


> Levels of abstractions eventually hurt more than it helps.

This is just not true, otherwise everyone would need to know ASM or be able to easily contract someone who does.


Isn't that reductio ad absurdum, though, epecially considering the original comment is saying that the levels of abstraction eventually hurt more than helping?


How are these tools any more abstract than make?


> "These tools don't use the thing I like!"

Pretty much. I know several admins who appear to be joining a growing pool of luddites who rail against anything new. They're particularly butt-mad about anyone drawing more salary than them. "DevOps" is their favored totem to direct their ire at.

I used to try and convince them otherwise, but it turned out to be a completely futile waste of time. At the end of the day, persistent FUD just means a more lucrative job market for the engineers who are pragmatic and fearless enough to give an honest try at making the newer paradigms work for their employers.


Also means fat fat contracting gigs for us luddites to clean up after the move-fast-break-shit-gtfo kids.


You know all those people who say, "Never rewrite a project! It'll always fail and take forever and cost too much and..."? I've spend much of the last decade getting paid reasonably well rewriting their projects. (Anyone remember mod_perl 1.x?)


Hey, if you want to take 18 month stints without healthcare to clean up mis-managed "digital transitions" at places like GE, more power to you ;-)


Hey, if JCL was good enough for IBM, it's good enough for me.


Sort of

Every build system is like Make, but more friendly for their language (IIRC Make was originally for compiling C and C++). Make just so happened to become generic enough to build damn nearly anything and also get bundled into most Linux distros.

I think the author is arguing that having to install a shit ton of dependencies to use some other Make-like build system is garbage. That’s true in some cases. But I wouldn’t want to use a Makefile for packaging Node; npm is great for that and understands how Node works.

More

Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: