Software Developers Should Have Sysadmin Experience

cs02rm0 · on Jan 9, 2017

This might be controversial, but I don't think you get to be a half decent developer without being a reasonable sysadmin.

Maybe my experience is unusual, but I've never worked anywhere that the sysadmins knew more than the developers about how best to run their code in production. And when things go wrong with it how best to find the cause of the issue.

And I've never thrown code over a wall without having tested it in a representative environment.

The worst sysadmins get in the way of developers. Ones that scale down your CI server to the cheapest, throttled, one the hosting company has, leaving $800/day contract developers waiting for builds that run in 20 seconds on their laptops take nearly an hour. And then try and argue the toss about whether the CI server is cost effective and every few months keep switching it down despite the CTO saying it needs to be left alone.

When a sysadmin sees an issue in "their" environment that they understand there's a tendency for some of them to just see that issue as the only thing the developer has had to deal with that month. In all likelihood, in a productive company, it's the most trivial issue the developer has had to resolve that day.

Often this stuff goes more smoothly where the developers (I mean, it's not as though if you're going to drop one of the two groups of people it's going to be them going) manage production and there aren't people with separate job titles and the resulting friction between them.

Sorry. There must be great sysadmins out there struggling with terrible developers, I'm sure of it. I just haven't seen it.

pjc50 · on Jan 9, 2017

I've done the dual sysadmin/developer thing for a small company, and the problem I experienced there was completely incompatible working modes.

Sysadmins must deal with interrupts (requests, crises, things driven by external schedules etc) and then in the rest of their time build systems to manage or reduce the interrupts. Developers are expected to produce work on a predictable schedule. This is disrupted by interrupts and obliterates the schedule for proactive work unless your management is very good at making it a priority.

The "prevention of information services" problem is certainly real though. Perhaps it could be addressed by embedding the sysadmins in the dev teams rather than having a department of their own, but then you have to fight org hierarchy.

s_kilk · on Jan 9, 2017

I had exactly the same experience at a previous employer, almost word-for-word.

Having said that, the central assertion is still correct: the absolute best developers I've ever worked with were also top-tier sysadmins (or linux experts, depending on what you want to call it).

krylon · on Jan 9, 2017

AMEN!!! In my current job, I am wearing both hats, and while I like that there is a certain variety in my work, users calling for help is highly disruptive when programming or doing some other stuff that requires deep focus.

The upside that in a three-person IT department there is very little bureaucracy to fight, just the odd "organically grown" legacy system.

kefka · on Jan 9, 2017

As an aside, did you know that the word "Amen" actually is a acronym in the Jewish language that means "El melekh neʾeman" (or AMN) which translates to "God, trustworthy King". (source: https://en.wikipedia.org/wiki/Amen)

I figured the etymology of that word was rather interesting. But yeah, I get the whole SysAd/Dev dual job. They're tough to balance and do effectively. SysAds are firefighters. When the nag(ios) alarm rings, we come a-callin.

grzm · on Jan 9, 2017

From that Wikipedia same article, it says

The Talmud teaches homiletically that the word amen is an acronym

The etymology section shows the word has much more prosaic roots. The Talmudic acronym seems to rather be an interesting backronym.

krylon · on Jan 9, 2017

> As an aside, did you know that the word "Amen" actually is a acronym in the Jewish language that means "El melekh neʾeman" (or AMN) which translates to "God, trustworthy King". (source: https://en.wikipedia.org/wiki/Amen)

I did not know that. ;-) Thanks!

eternalban · on Jan 12, 2017

The fact is that nobody really knows. Talmud was written in ~200 BC and is an exegesis. Egypt is the elephant in the room of Hebraic history. It is possible it is an Egyptian loan word just like Moses -- "born" from water -- is an Egyptian name.

probablybroken · on Jan 9, 2017

I've been in this situation for a long time; The worst part is every so often you get assigned a PM who wants you to accept responsibility for meeting artificial development deadlines.

ptero · on Jan 9, 2017

Developers IMO benefit greatly from having the general engineering experience. This helps understand how the part they build fits in a full product, where the narrow spots are, what is likely to break first, where formal documentation is insufficient / contradictory / wrong, etc.

Sysadmins, who often manage crises, acquire this experience way or another (e.g, by researching options to fit a square peg in a round hole without leaks), so developers with sysadmin experience tend to all have it. I think though that the key part is the "engineer" part and it can be acquired and used without sysadmin-imposed hassles (interrupts, crises, being underappreciated).

michaeljforster · on Jan 9, 2017

The two hats don't fit well at the same time; 27 years of both has not been easy, but the experience of wearing each regularly is invaluable.

youdontknowtho · on Jan 9, 2017

that's a great point about the types of schedules for each discipline.

sagichmal · on Jan 9, 2017

> I mean, it's not as though if you're going to drop one of the two groups of people it's going to be them going

I'm sorry you've had such an awful experience with sysadmin colleagues that you've developed such a corrosive attitude towards them. I've worked in lots of good environments, where dev/ops was being effectively practiced, and sysadmins there were the most effective force multipliers imaginable.

pjc50 · on Jan 9, 2017

> awful experience with sysadmin colleagues that you've developed such a corrosive attitude

This is going to be a sensitive topic, but can we talk about ""BOFH"" culture somewhere on this thread? (Maybe I'm old and it's now dead, but I think some of it persists)

I kind of understand it as a product of working in an environment where everything is urgent and nothing is appreciated, but when sysadmins come to resent the people they're supposed to be supporting then the force multiplier turns negative. Sysadmins develop strategies for reducing the number of requests at any cost, usually by making the experience as opaque and unhelpful as possible.

vinceguidry · on Jan 9, 2017

When I was reading BOfH a few years ago I got a different gist.

The BOfH is the archetype of someone who is excellent both with technology and politics. When you are in a service role, you have two competing priorities. You must deliver people the things they want, but also keep things nice and stable for yourself so you don't go crazy.

An important dynamic in organizations is laziness vs. intimidation. Political savviness allows you to apply intimidation to get the lazy to do what you want. You can threaten to fire, or raise an issue that could possibly get them fired and even if it doesn't, won't make you look good. The BOfH is someone who can respond to political intimidation with adroit technical interventions to ensure that his second priority, ensuring a smoothly-running system, isn't threatened.

If you read the BOfH stories carefully, you see that the operator knows where his bread is buttered and is careful to remain on good terms with the people who really have the power in the company. The whole thing is a phenomenal read on organizational dynamics.

robynsmith · on Jan 11, 2017

> Sysadmins develop strategies for reducing the number of requests at any cost, usually by making the experience as opaque and unhelpful as possible.

This is sometimes an organizational problem. I worked in a support role at VMware for about 5 years and this is what I observed:

- Support & IT departments typically have enough staff in the beginning

- The organization grows & the department grows to match the new work that exists

- At a certain point, the organizational view of Support & IT/Ops changes, and it's now viewed as a cost that you want to keep down.

- Leaders try to minimize the increases in budget, but the workload per sysadmin/engineer increases.

- The sysadmins/engineers have no control over the flow of new work, which effects the quality of work that gets done and can create a toxic environment.

It literally becomes impossible to handle all the incoming requests. Different people handle it differently. Good sysadmins would learn to prioritize properly, but due to the toxicity some people have trouble handling it so they end up developing strategies to make a certain number of requests "go away".

Anyway- just my two cents.

robynsmith · on Jan 11, 2017

The SOLUTION to this problem is multifold:

1) Leadership: Stop viewing the IT/Ops/Support department as a "cost to keep down".

2) Leadership: Treat the department like they are manned by people.

3) Realize that not all requests are created equal. Some take minutes, some take months.

4) Determine a reasonable number of requests/tickets per sysadmin/engineer. Make sure to add padding for things like project work, sick time, vacation, professional development, and so on.

5) Hire proactively to prevent the determined threshold above from being surpassed.

6) From the IT/Ops departments perspective: realize that the incoming requests are coming from people that need your help and they are effectively your clients/customers. Treat them as if customer satisfaction is extremely important!

There are also other strategies where you give a subset of people the ability to work on projects and designate a different subset to be interupted with urgent requests, and rotate the role. There are all kinds of things you can do to improve the situation :)

beachstartup · on Jan 9, 2017

> where everything is urgent and nothing is appreciated

in such an environment i wouldn't expect any other result. fix the environment, not the rational human response to it.

falcolas · on Jan 9, 2017

Sysadmins are, 99% of the time, responsible for production uptime. Sadly, management includes all of the software being written in-house in this expectation. Change means instability, means that Sysadmins' feet are held to the fire - this makes them resistant to change.

Software developers, on the other hand, are responsible for making changes. Adding features, pushing fixes, and so forth.

These two points of view are inevitably going to cause friction. Developers are only recently starting to be held responsible for production uptime and the pages that come along with that - and it's a good thing for both sides.

That "most trivial issue" for a developer is something the sysadmin was woken up for 3x in the past, and doesn't want to be woken up for again, so he pushes back. How can he not?

lanstin · on Jan 13, 2017

In 1997 when my code first started running in Live I was given a pager and told welcome to Ops. Every piece of software that company had a single dev name easily read out from the binary and used to contact when trouble occurred.

gedrap · on Jan 9, 2017

>>> The worst sysadmins get in the way of developers. Ones that scale down your CI server to the cheapest, throttled, one the hosting company has, leaving $800/day contract developers waiting for builds that run in 20 seconds on their laptops take nearly an hour.

How likely is that the sysadmins were told to 'just make it run cheaper, I don't care' by someone higher in the foodchain?

JeremyNT · on Jan 9, 2017

> How likely is that the sysadmins were told to 'just make it run cheaper, I don't care' by someone higher in the foodchain?

Having worked in ops for > 10 years, this is how it usually goes.

The SA's job tends to involve a lot of scepticism and caution. You look for problems and try to solve them proactively. One (often easy) way to solve many classes of problems is to throw hardware at them.

Management always pushes back on this tactic. That's reasonable; they need to justify capital expenses (especially if you're self-hosted).

The core issue though is that capital expenses are easy to quantify, while "lost productivity" is much harder to fully account for. If I complain that some hardware upgrade which costs $x could improve productivity, I just don't have hard numbers on my end - it's all napkin math.

In many places reluctance to spend money on infrastructure is also, I think, a symptom of headcount-itis. Managers love to have more employees, and love to have more for them to do, because that makes managers seem more impressive to the org. My manager might have perverse incentives; keeping the SAs busy fighting scaling fires both makes his team look impressive because they're busier, and makes him look better because the capital expenditures are lower.

Obviously, head count is expensive, so this is usually a game of appearances rather than an effective strategy to improve the bottom line. Good insight into productivity is required to catch this kind of stuff, but in the real world I've found that a lot of places just don't have an org structure capable of weighing cost / benefit properly when it comes to infrastructure.

gedrap · on Jan 10, 2017

Thanks for such a detailed response :)

onion2k · on Jan 9, 2017

How likely is that the sysadmins were told to 'just make it run cheaper, I don't care' by someone higher in the foodchain?

If you're blindly following "orders" to reduce costs and doing things that push up costs elsewhere then you're not doing a good job. A good sysadmin (or the sysadmin's boss) should be able to pull up some numbers and say "Build tasks are being queued for an hour before they run. What impact is that having?", and call a wider meeting that brings together the higher-up-the-foodchain manager, the development team, and anyone else who might be affected. Ideally it'd be the higher up manager who calls that meeting of course, but they may not understand the technical issues.

davidgerard · on Jan 9, 2017

I have long thought that one of the most effective workplace sabotages a sysadmin could put into place was to implement management plans without question.

FLUX-YOU · on Jan 9, 2017

>If you're blindly following "orders" to reduce costs and doing things that push up costs elsewhere then you're not doing a good job.

This is the responsibility of someone above to know whether or not the orders they give should be given. If they need to ask for information from people below them, fantastic, please help them along.

Please don't fall on the sword for incompetent managers.

onion2k · on Jan 9, 2017

This is the responsibility of someone above to know whether or not the orders they give should be given.

Yes, and part of that is the people in their team(s) helping them and understanding that they're fallible and may fail to ask a pertinent question. Equally, the manager needs to be open to updates volunteered by their team without a prompt. Ultimately everyone does better if the entire group works together.

cs02rm0 · on Jan 9, 2017

Everyone was told to see if they can find cost savings. This just wasn't one though in the bigger picture and even after being told it wasn't one, even by their line manager and separately the CTO, they pursued it over and over. I don't believe anyone else was directly involved further up the food chain on their side.

All it needed was for the question to be asked on the company's internal board and listen to the answer. Even trying it once or twice and I'd probably have forgotten about it in short order. This went on for a couple of years though!

_d8fd · on Jan 9, 2017

My favorite is being told I can't store stuff I need on some enterprise storage solution because it is running out of space, when I know in my head a couple more terabytes of storage costs way less than the amount of effort that went into discussing it by all parties involved, and there's no effort to help me find an alternative. So it goes into a different bucket, like S3, which is what we were trying to avoid (for various on-premise benefits) in the first place.

Clubber · on Jan 9, 2017

I agree. I also think it should work both ways. The worst jobs I've ever had is when the sysadmins have the mindset where they own the servers and are unwilling to deviate from what they've read at the behest of the developers.

"I will force AV on reads on the developer boxes." "I will install AV on the production DB servers without telling anyone in the development group, then make the developers prove AV was the cause of production slowness before removing it two weeks later." "I will force this crazy group policy on developers and when they complain, I will totally ignore them."

A bad, or uncompromising sysadmin (one in the same) make development work a complete nightmare.

I half believe the reason developers are embracing cloud architecture so much is to remove so many sysadmins out of the equation.

On a side note, a tip for developers. Always make friends with the sysadmins. Buy them lunch or something. Right or wrong, they can make your lives much better or much more miserable.

cableshaft · on Jan 9, 2017

> I half believe the reason developers are embracing cloud architecture so much is to remove so many sysadmins out of the equation.

It was literally true at one of my previous jobs. We couldn't install anything on our own dev machines without approval from Net Ops, not even Notepad++ (I don't think I ever got that installed, never got approval).

We once asked for a new server which mirrored the software of an existing server with two months lead time and got complaints that two months is not enough time to get a new server. I think we ended up getting it in three months, after the new project was supposed to be deployed to it.

Meanwhile we were starting to get into Azure, and we had a new server in Azure up and spinning with everything we needed installed on it in about 15 minutes.

The Lead Developer said, "We need to get as much stuff on the cloud as we can so we can stop dealing with this mess." We dealt with a lot of PHI there, though, so there was only so much we could do.

ryandrake · on Jan 9, 2017

I've seen this as well. "Timeline from internal IT for provisioning a box and deploying our app is 6 months, and subsequent changes go through a ticketing system with a 2 week average turnaround. Or, we can have it running ourselves on AWS in 30 minutes."

tetha · on Jan 9, 2017

Shit. So, you're saying that my mess of a team is kinda awesome by implementing reliable production-ready deployments within 2-4 weeks, and implementing changes to environments within like 30 minutes ("set key foo to bar in configs please?") to a week ("we need persistence!").

I guess IT in this place really is getting up to speed.

cableshaft · on Jan 9, 2017

Yeah, that sounds spookily similar to the process we had, including the ticketing system changes timeline.

briffle · on Jan 9, 2017

what your probably not seeing, is the CIO/CSO screaming at the SA to get AV deployed on every machine in the company, to meet some audit requirement checkbox, or PCI compliance, by the end of the month.

web007 · on Jan 9, 2017

Exactly this. Audits don't care if there isn't any practical malware or if nobody can access the system outside 3306 and 22. Audits say "all production systems implement antivirus software" as a binary checkbox.

Sacho · on Jan 9, 2017

So the problem is effective communication? Why can't the sysadmin in this imagined scenario explain their actions this way?

fl0wenol · on Jan 9, 2017

When I have seen this problem, it's because the sysadmins are instructed (or have learned via experience) not to explain their reasoning to developers or end-users. Because if they did, then it becomes a discussion or argument that becomes a time sink since there was very little chance they could change the mandate even if they agreed.

So they become intentionally opaque to move that discussion out of their laps and make it come via the development team managers confronting the operations managers and having the fight on that turf.

Such situations occurring is a sign that the organization is not set up effectively. This sort of confrontation shouldn't need to be happening.

Ideally the development team's lead and/or project managers are involved with, are informed ahead of time, or are even contributing to the policy decisions on the operational side.

tristor · on Jan 9, 2017

Because telling someone that you did something because of compliance doesn't help. They still blame you personally even though the compliance standards are usually industry-wide or even defined by Congress as an act of law.

greedo · on Jan 9, 2017

One of our web teams wanted to do a simple Wordpress deployment on LAMP. As sysadmins, that was no issue, even with clustered mariadb. But our DBA team doesn't have any experience outside of Oracle/MSSQL, and squawked about mariadb. After an hour of this BS, the manager of the web team spun up a few EC2 instances and got to work. Of course we don't have anyone familiar with EC2, so supporting that will be a learning curve for someone, but the manager is happy, he has his own sandbox without hassles from the DBAs.

Sakes · on Jan 9, 2017

"This might be controversial, but I don't think you get to be a half decent developer without being a reasonable sysadmin."

Couldn't this argument be applied to any developer for any discipline/speciality?

Sure, more knowledge/context is always better if reasonable to attain, but my experience suggests that your above concerns could also be addressed via team organization rather than expecting all developers to know all things.

BillyParadise · on Jan 9, 2017

Not particularly controversial at all, from my POV.

I was a sysadmin with various ISPs in various countries for 15 years before I "turned to the dark side". I'd been using Ruby for a few years with Puppet and Chef, and after dealing with one too many "flaky coders", I picked it up.

I have to say, coding is far more enjoyable, though both come in handy in my day-to-day life.

It sounds like you've dealt with a few "BOFH" sysadmins. Don't worry, we're not all like that, and those that have been on both sides of the team will probably see your way.

Tell your boss I'm available (remotely), by the way ;)

friendzis · on Jan 9, 2017

I have seen this situation too many times (exaggerated a bit):

D: I have noticed that task Frobnicate has not been running in Production for a month, then checked and it is not even added to scheduler!

SA: There is no mention of Frobnicate in the pipeline for scheduled tasks.

D: What pipeline? FancyPancyScheduler is bundled with application and tasks are defined in DB, I have done it in Staging and everything worked, Frobnicate is all the fuss in the team, you must have heard about it, why don't you check for changes in Staging?

SA: We have well defined pipeline to manage scheduled tasks, currently the executing agent is Cron, not FancyPancyScheduler.

---------------------

Developers and Admins have more or less the same goals (stable, maintainable and extensible), but on different pieces of the system (code vs infrastructure). In my short career I have seen problems arise where one party makes plans and changes according to current or even past (it worked like this earlier) state of the other party. This applies to both developers and admins.

So I sort of agree with your sentiment, that developers need understanding of system administration. Though, depending on team size, I believe it is entirely sufficient to have someone in Developement who understands system administration and actual infrastructure, and someone in Operations who understands developement and actual stack. This is where I hope DevOps will end up at: arbitration between Developement and Operations to ensure smooth sailing forwards. Because the debate "I will do it in code" versus "this must be done on the edge" (e.g. static assets in a website. Served by application or web frontend?) will never be resolved.

Edit: formatting

devonkim · on Jan 9, 2017

> Developers and Admins have more or less the same goals (stable, maintainable and extensible), but on different pieces of the system (code vs infrastructure).

I disagree because this generalizes both developers and admins too much for my own comfort. I've seen sysadmins get really sloppy in the name of getting something into production quickly out of hubris without thinking about the full lifecycle of an application (common with developer-turned-sysadmin engineers - I am one and tend to be more reckless due to the reality that most of the errors I've observed would not have been caught going super slow - that adding more test code does not necessarily find the most critical of errors, just increases confidence) and especially in enterprise software most developers are sitting on features and are nearly allergic to new trends by their organizations valuing revenue loss far above losing growth opportunities.

Of course the stereotype is that operations wants things stable and manageable at the behest of business while developers want to deploy new stuff faster (because the idea of development in most places is to create something new). Modern infrastructure becomes increasingly code-driven and emergent as opposed to manually formed and restrictively managed sysadmins will have more room for errors that may change this into the future. Meanwhile, developers are increasingly under greater scrutiny by society when rolling out features such that nobody can ignore the concerns and they may be eventually forced into nearly waterfall-like development patterns. We can already observe this with the infection of Agile with enterprise bureaucracy / overmanagement back into the rest of the software industry as many of the former smaller, agile tech companies become big behemoths themselves.

ChemicalWarfare · on Jan 9, 2017

DevOps approach definitely shifts some sysadmin-type roles towards the development teams. That said though there are things that belong in the realm of sysadmin responsibilities - both "old school" like let's say setting up DB replication, VPNs or a puppet master server(not that "old school" but still) and "new" - things like Kubernetes and Fleet/CoreOS for example have plenty of configuration/maintenance complexity that is better suited to be handled by a dedicated sysadmin.

avereveard · on Jan 9, 2017

> thrown code over a wall without having tested it

this is the weirdest part of the whole devops mantra. like, I know how to evaluate the complexity and memory requirement of code way before I write it and I guess most compsci should be able to do the same.

so either it's yet one more attempt in getting cheap labor into workable territory or plenty people where this myth originated are being cheated out of their money for a graduated curriculum that teaches nothing of value.

hueving · on Jan 9, 2017

>the complexity and memory requirement of code

Those are only tiny slices of real production bugs. No amount of complexity analysis of your code ahead of time is going to protect you from all of the issues that arise with integrating any large system dealing with lots of requests. You run into all kinds of things like query optimization, kernel TCP tuning, load balancer problems, cache thrashing, high latency clients, out of spec clients, power failures, etc.

If you think knowing the theoretical behavior of your program in an ideal environment is enough, you are exactly the type that throws code over a wall without having tested it.

avereveard · on Jan 9, 2017

funny how most of the things you list are either stuff that can be audited in code alone (query optimization, cache thrashing) or totally out of control of the developer (load balancer issues, tcp tuning)

sure if you bounce them all up like that it might look like you have a point, except it falls apart when you attribute concerns properly.

or please explain, how would dealing with kernel tcp tuning part-time help Joe Random developer write better code?

hueving · on Jan 9, 2017

Query optimization can't be audited in code alone. The indexes you need depend heavily on the database system that you are using in production. Do you know at what point your DBMS stops loading the whole table in memory? Do you know what datastructure and algorithm it's using when you do a LIKE query?

Cache thrashing also can't be audited in code alone without understanding the architecture that the the app is going to be deployed on. It's highly unlikely that the servers will have the same processor cache sizes, memory sizes, and numa architecture of the dev's laptop.

Load balancer is something a developer should know about as well. A developer has to consider the behavior required by the application (e.g. backend session persistence, headers injected, etc).

>please explain, how would dealing with kernel tcp tuning part-time help Joe Random developer write better code?

Joe might learn that connections aren't as cheap as he thinks and maybe it isn't a great idea for each client to require 50 connections for the app to function. He might also learn that TCP isn't very efficient on high bandwidth, high latency, lossy networks and decide to switch to UDP with error correction.

Long story short, a good developer should know everything about the environment in which the app is intended to run. "It performed ideally on my laptop" is throwing code over the wall.

Civil engineers don't design bridges without understanding where the bridge will go. The same applies to software.

avereveard · on Jan 9, 2017

> Joe might learn that connections aren't as cheap as he thinks and maybe it isn't a great idea for each client to require 50 connections

so we're back to point one, you need devs that go trough basic education and stop cheapening out hiring Joe / or Joe should ask a refund from his tuition fees.

> snip of stuff that one does not know off the bat

sure but it is knowable, it's not exactly hard. database are predictable, building indexes on the right places is not an esoteric practice that can only be done by trial error and rituals etc etc. literature is quite adbuntant, easy to process and complete with tradeoffs about different approaches and how they impact performance, maintainability etc.

99% programmers aren't breaking new ground.

hueving · on Jan 10, 2017

>so we're back to point one, you need devs that go trough basic education

There is no basic education that covers the associated costs of a TCP connection in the kernel of a modern operating system or in the load balancers it passes through on the edge of the network.

>sure but it is knowable

So you're saying it is important for a developer to understand the infrastructure the code will run on. Thank you

The reason I brought up all of those points is because they are things not covered in CS educations and they hammer "hands off" devs all of the time.

I've worked with tons of junior devs from all kinds of good schools (Stanford, MIT, UC Berkeley, etc) and they almost always get bitten by this stuff because they throw their code over the wall and don't make an effort to understand the operational environment. It has nothing to do with a good education, it has to do with a mindset of not operating in a vacuum.

godmodus · on Jan 9, 2017

True for most devs that do any backend work.

Many devs out there who work with Windows or do mostly front end often have little experience in that domain.

Seeing alot of work get done at uni by students - who also actually some backend (friend did a blockchain project recently) did infact do very little backend discovery - the job was delegated to another student to get the env. Up and running.

tristor · on Jan 9, 2017

> Maybe my experience is unusual, but I've never worked anywhere that the sysadmins knew more than the developers about how best to run their code in production. And when things go wrong with it how best to find the cause of the issue.

I've had the exact opposite experience. In most of the organizations I've worked in the "sysadmins" (mostly Systems Engineers/Operations Engineers actually) were stronger developers than the people who were actually developing the software. But that could just be a title shift, because what I've seen happen is that people who care about systems but have a development background gravitate towards operations roles and end up filling in as the "actually Senior" developer for the dev teams.

In the 15 years I've been doing this, I've only occasionally met someone who has stuck hard to the development side of the house but actually is competent when it comes to systems. Most developers have zero care about any of the lower level things like networks, hardware, and even backend software/databases which are required for their application to succeed. A common scenario is that the devs choose an inappropriate backend stack because they chose the easiest things to deploy rather than what is best suited for the use case. Then when things blow up, they beg for an ops team to be created, which usually starts by hiring people who are competent enough developers they can relatively painlessly replace the entire backend with something sane (e.g. Mongo to Postgres shifts are commonplace, because Mongo is a dog in the real world).

> The worst sysadmins get in the way of developers. Ones that scale down your CI server to the cheapest, throttled, one the hosting company has, leaving $800/day contract developers waiting for builds that run in 20 seconds on their laptops take nearly an hour. And then try and argue the toss about whether the CI server is cost effective and every few months keep switching it down despite the CTO saying it needs to be left alone.

Yeah, that does sound terrible. I agree. My top 5 jobs as a systems person is the following in priority order:

1. Make sure production stays up for our customers so we keep making money. (5 9s targets)

2. Ensure the security (and compliance) of our systems so we don't get hacked and we maintain customer expectations about compliance.

3. Ensure the performance of our product/systems is up to customer expectations.

4. Make sure deployment automation is solid and streamlined so that deployments are frictionless

5. Make sure new code is actually being deployed regularly and remove impediments to deployment so customers get features faster.

You'll notice a trend here I'm sure. The most important thing is the customer, then the developer. The biggest frictions I've seen between systems/development teams is when the development team believes that their desires/needs are the highest priority. The systems team is /not/ there to be at the beck and call of the development team, it's to be at the beck and call of the customer who is paying the company money. As much as possible I try to ensure the development team is having a frictionless experience, but if something will negatively impact the customer it is 100% my job to throw a roadblock in the way of the development team to prevent that. The customer of the company is my priority, and everything else is secondary.

johanneskanybal · on Jan 9, 2017

I believe in small cross functional teams but that you need to be a sysadmin to develop I don't agree with. Perhaps it's more your opinion of what a good developer means, most teams benefit from variety in my experience. It sounds like your biased to certain types of organisations where there's a big gap between departments.

eternalban · on Jan 9, 2017

> This might be controversial, but I don't think you get to be a half decent developer without being a reasonable sysadmin.

It is an interesting exercise to generalize this statement in context of general engineering.

It seems either your conclusion is held to be incorrect, or, we reach the conclusion that software development is not engineering.

lmm · on Jan 9, 2017

There's no value in arguing over the semantics of "engineering". There are huge differences between software development and e.g. civil engineering, to the point that I would be dubious about any analogy that treated them as the same thing.

eternalban · on Jan 9, 2017

Chemical engineering, Hydroelectrical engineering, Power engineering come immediately to mind as engineering disciplines that deal with active systems that require operational management and control.

lmm · on Jan 9, 2017

Sure, but those are still all very different from software development.

eternalban · on Jan 9, 2017

Of course. (In my opinion, /high software/ has more in common with mathematics, music, theatre-film-dance, and architecture than it has with engineering, and /low software/ is begining to resemble boiler room operations.)

But here, as example is my BSEE a.m.: http://eng.rpi.edu/academics

And to this day, we hear about "software engineers" and "software engineering".

Per my OP: "It seems either your conclusion is held to be incorrect, or, we reach the conclusion that software development is not engineering."

Possibly, one reason for the prevalent problems in the pedagogical & human resource fulfillment aspects of the field is due to a miscategorization of the field.

contingencies · on Jan 10, 2017

Process engineering (~manufacturing) and logistics (~supply-chain) are not dissimilar to modern software workflow. The basic tools (modular management of complexity, discrete processing, statistics, monitoring, redundancy in processes/providers, feedback) are equivalent. In fact, I feel like a huge part of a successful software career is learning to see the similarities in disparate fields and draw from them positive architectural benefits, while keeping other-profession-spire-dwellers properly onside/placated.

eternalban · on Jan 10, 2017

Well that is certainly correct but it should be pointed out that one can say that about most organized (psuedo-)industrial production endeavors. But it seems incorrect to posit that that is the 'defining' characteristic of software development.

> In fact, I feel like a huge part of a successful software career is learning to see the similarities in disparate fields and draw from them positive architectural benefits, while keeping other-profession-spire-dwellers properly onside/placated.

Fully agreed. In fact that has been my guiding light in my own approach to software development. To clarify my view, I think software, very much like architecture, is a polyglot yet distinct discipline. It is not engineering. It not mathematical logic. It is not process engineering. It is not logistics (provisioning). Etc. (Just like architecture is not civil engineering. It is not philosophy. It is not art. It is not environmental systems engineering. Etc. It is architecture.)

-- p.s. edit --

I would like to bolster my earlier statement that software development has more in common with architecture, theatre, film, etc., than with engineering:

I would like to propose and roughly define a notion of 'semantic gap'. A sort of soft measure of the degree to which the formally expressible definition of a 'production' falls short of permitting the realization of the 'product' without the intervention of the 'designer'.

With that definition in hand, I propose that "engineering" discipines are those creative productions that have minimized the semantic gap to a degree that permits strict divisions of labor in the production.

Where as the "arts" are those creative endeavors that are faced with an intrinsic constraint on the degree to which the semantic gap can be minimized, and, that this maximally reducible semantic gap requires subjective and/or contextual 'interpretation' of the formally expressed design.

contingencies · on Jan 11, 2017

I like your semantic gap notion, however I am less convinced that overall mutual comprehension is the issue but rather the different issues of clarity of expression of vision (at the earlier/design stage), or clarity of interface (beginning at the implementation stage).

By way of example, there are many successful artistic projects that utilized the talents of multiple artists in parallel (lots of murals and mosaics, for instance).

In larger scale computing projects, frequently the (mechanics of the) interfaces provide bigger problems than the vision statement or overall goal, whereas in artistic projects indefinable aesthetics may be the shopstopper, despite perfect comprehension and collaboration.

technofiend · on Jan 9, 2017

>Often this stuff goes more smoothly where the developers (I mean, it's not as though if you're going to drop one of the two groups of people it's going to be them going) manage production and there aren't people with separate job titles and the resulting friction between them.

This is not personal criticism but you know how I know you're not working in a highly regulated environment? Check out the Carnegie Mellon Capability and Maturity Model (CMM) as a counterexample of where some companies go. Development is not at one remove but two from production support. There's an "operate" team between them and production environments and in a regulated environment operate doesn't have privileged access either. That'll be a third team due to separation of duties requirements.

Now imagine you're paged out to a call where your code is slow or failing and you're not even allowed to login to where the issue's happening. Fun, right?

This is why I'm absolutely loving the devops changes we're seeing now - because developers can control the environment without retaining control of it. My ideal is to apply some sensible defaults (no, you can't have all my crashdump space for your app logging; ask for more disk instead, no you can't run ghost/glibc/pooodle vulnerable versions of libraries) and otherwise let the developers spec the OS as a template or dependency for their app. It's much better for me since if I'm required to troubleshoot I know my requirements are met and otherwise the developer may do as they wish. Everyone wins and my control requirements are satisfied because remember developers are never allowed production access in regulated environments.

>Maybe my experience is unusual, but I've never worked anywhere that the sysadmins knew more than the developers about how best to run their code in production. And when things go wrong with it how best to find the cause of the issue.

I guess it depends on what you mean? The developer is in the best position to know what logging there is and how to enable or it increase verbosity. But they may be completely ignorant of how the operating system's tcp stack, memory management or other mechanisms work. Have you ever had to explain to someone that a java out of memory error had nothing do with the fact that linux is using otherwise idle memory to buffer i/o and that they're misreading top output? That the actual issue is their object management and just increasing the JVM's heap size is at best a bandaid?

If you have a developer who insists every issue is the operating system, sometimes the SA has to know how to dig in and run stack traces, probe tools (systrace, dtrace, whatever), jmx queries, etc until they can pinpoint the offending code.

As another example if you have an application that isn't draining queues quickly enough and therefore sending back tcp zero window frames upstream, what's the solution? A hypothetical lazy developer will say "it's the OS not queuing enough data, increase the OS buffers." A hypothetical lazy SA may say "it's the app not consuming packets quickly enough, rewrite the app."

In reality if we've all been paged to a priority one bridge the solution will probably be the combination of the two - tactical fix of increasing buffers to create some time for development to understand why the code isn't doing what it should and fix it.

devonkim · on Jan 9, 2017

It's funny that you bring up CMMI in the discussion. CMM(I) is nearly the antithesis of Deming's approaches toward quality management. What's really interesting is that Deming's approaches were adopted by Toyota decades ago to historically great effect (sadly with few other large examples of named successes in the business world) while Taylorist approaches (including CMMI) a handful of other Japanese daibatsu and especially US companies back to the 19th century. These companies have had vastly different growth trajectories over time, but when it comes to quality most consumers in surveys will associate Toyota with it over Hitachi, Mitsubishi, and Fujitsu (I believe all of these companies are full-blown CMMI 2.0+ adherents and champions). Similarly in the US, what has Six Sigma really done for companies that have adopted it? GE is hardly known for anything in the public eye resembling technical chops, for example, and most studies show that companies that adopt Six Sigma more than 70%+ of the time lag the S&P 500 upon adoption with no long-term recovery afterward either (perhaps Six Sigma adoption is not a cause but a simple correlation with poor performance similar to private equity oftentimes getting a bad rap in the public eye).

When even the US military - one of the world's foremost investors in management and leadership research - has largely abandoned command and control (the military equivalent of Taylorism) we really need to ask whether structures that enforce a management/worker caste vs. one that empowers those closest to a problem are effective beyond any meaningful scale.

technofiend · on Jan 9, 2017

No doubt valid points about CMMI; I was exposed to it as part of a program to improve quality and in that specific instance it was constructive. However the program office oversight was shut down as all portions of the business were certified as "level 2" and without that level of structure and control most of the process died within a year.

Even so my original point remains - in some kinds of highly regulated shops there's enough external pressure for controls and separation of duties that the developer simply cannot have access to production. I'm not defending either practice (CMMI or seperation of duties), I'm just saying in some places it's reality, regardless of perceived drawbacks or overhead.

cs02rm0 · on Jan 13, 2017

> Have you ever had to explain to someone that a java out of memory error had nothing do with the fact that linux is using otherwise idle memory to buffer i/o and that they're misreading top output? That the actual issue is their object management and just increasing the JVM's heap size is at best a bandaid?

I've not seen a professional developer be confused about Linux using otherwise idle memory to buffer, no. I have seen that with sysadmins who mostly look after Windows boxes and were somewhat unfairly dropped in at the deep end.

I've seen JVM heap OOM errors be caused by both object management issues and applications that would legitimately benefit from larger heap sizes. Many, many times.

I think I'd fall off my seat if I saw a sysadmin use JMX to find an issue. I did see a security guy (so not really a sysadmin, but he was doing a related job) use strace once. He was remarkable enough to have his own Wikipedia page.

hayd · on Jan 10, 2017

> Have you ever had to explain to someone that a java out of memory error had nothing do with the fact that linux is using otherwise idle memory to buffer i/o and that they're misreading top output? That the actual issue is their object management and just increasing the JVM's heap size is at best a bandaid?

Is there somewhere to read about this, I this might have come up with one of our projects. (At the time, from googling, I suggested they try mark and sweep - I didn't really have any idea but was of the opinion they had lots of small objects.) I don't have much experience in Java but was trying to be helpful!

jstimpfle · on Jan 9, 2017

> but I don't think you get to be a half decent developer without being a reasonable sysadmin

I think you've just described me. And I don't see the argument against sysadmin.

> leaving $800/day contract developers waiting for builds that run in 20 seconds on their laptops take nearly an hour.

Sorry I just can't take you seriously.

monocasa · on Jan 9, 2017

Honestly, I've heard worse out of sysadmins in some places.

jstimpfle · on Jan 9, 2017

Question: What must go wrong for a build to be 180x slower on a server than on a laptop?

contingencies · on Jan 10, 2017

Maybes: Packaging latency in archive formats (compress before upload, decompress after). Network latency on the upload/download. Block IO performance on the server. Virtualization overhead. Memory or processor constraints. Assumption of equivalence is spurious (eg. server is doing multi-architecture builds and full suites of tests including eg. regression tests). Yep, something like that.

cs02rm0 · on Jan 13, 2017

Using an EC2 t2.nano instance for a build server.

When they're out of CPU credits it's game over.

monocasa · on Jan 10, 2017

Overcommit of resources, causing near constant thrashing to disk.

eviln1 · on Jan 9, 2017

Hi, I'm a Sysadmin, and I've been a grumpy one through a larger part of my 15 years experience. My main issue was that Developers were acting like Users: they don't care about what you have to deal with, they want things to 'just work'. In return, I've treated them like children, in some instances yelled at them when they did dumb stuff. I've tried to educate them when possible, and was angry when the education didn't stick. At the time i was the 'King of the Hill' type of sysadmin - natural leader of a very small and tight team, kind of irreplacable, and with enough years in the company behind me to consider myself as a demi-god.

When I switched companies, I came across better developers. Some had decent sysadmin skills, but the main difference was that they actually took interest in how things worked past the 'git push', and when I asked / required them to make some changes that would make my life easier, they listened, discussed and adopted when appropriate. With those same guys, I took interest in what they were doing, what their actual job was and came up with ideas that would make things easyier and run smoothly on both ends. After a while I figured out that they weren't actually better developers - they were better people. (Also, I figured out that being grumpy was not the best approach and that patience, kindness and gratitude could get people to do more than snark, humiliation and flame-throwers.)

I guess my point is: you don't really NEED to have sysadmin skills to be a decent developer; what you really need is to care about what sysadmins do - be curious, talk with them and trust them when they say that your brilliant idea won't work in production.

morbidhawk · on Jan 9, 2017

I think there are definitely developers out there that give little to no respect to systems administrators.

I've seen this ignorance even in college professors. In my first programming class in college I took a CS class that had both CS and IT students in it since it was required for both kinds of students. The (CS) professor kept trying to convince students how much better CS was and gave some good arguments (ie: salary) but the most arrogant thing he said is that IT is a subset of CS and that by doing a CS degree you would understand everything it takes to be in IT. He also mentioned how in IT you would be constantly fixing other people's computer problems but as a software engineer you wouldn't need IT's help since you can fix it yourself. The funny part is part-way through my degree I realized that college didn't even offer a real CS degree it was called "CIT with Computer Science Emphasis" which none of my advisers nor professors mentioned would cause issues getting jobs outside of Utah, the best thing I did was leave that school and finish my CS degree elsewhere which caused me to lose a lot of unnecessary credits and almost felt like I was starting over. I feel like I got scammed but that's beside the point I am yet to work for a company where a software engineer gets to manage his own computer without following IT guidelines like my CS prof had described.

rcymerys · on Jan 9, 2017

I couldn't agree more.

I didn't realize the importance of all the "admin stuff", before the our newly hired sysadmin came to me and asked if I could help him figure out how to deploy the project I was working on. This ended up being a looong chat about monitoring, redundancy, architecture, security... you name it. What I've always thought of as installing and configuring software turned out to also touch designing the software so that it works reliably and is easy to maintain.

I don't think I'll ever have plenty of sysadmin skills, but knowing even the general idea of what's important to sysadmins helps a lot. Also, being able to become another interruption in their day and consult ideas is priceless. :)

pyrale · on Jan 9, 2017

This, so much this. You don't need to be the sheep with 5 legs every manager wants, what you need is to be accessible to collaborate with people on problems.

That's an individual skill as well as a systemic one, though.

_jl2g · on Jan 9, 2017

I've worked as both a dev and sys admin, and I think this is the most reasonable response so far.

joekrill · on Jan 9, 2017

I think it's more than that -- it likely stems from how the IT department is run at a particular company, too. I've dealt with many sysadmins that offer absolutely no transparency into their processes, and in many cases actively obfuscate it. So any attempt to take an interest seems to be interpreted as some sort of threat to their position.

Either way, this is a two-way street. And often times the culture of one group or the other gets in the way. Which is really unfortunate.

thijsvandien · on Jan 9, 2017

Mostly this.

Having been an admin myself, gradually moving more and more towards development, I understand what things in software are annoying for an admin to have to deal with. Most developers simply don't care. Grant full permissions or don't expect anything to work. Any objections and you're a troublemaker. A better attitude would have gone a long way, but firsthand experience works best.

In addition, it helps me greatly when there is no (decent) admin around. I know whether to suspect the software or the system it's running on, how to keep things running on a less than ideally configured/maintained system without completely compromising security, can help users when the problem they're having is not a problem with the software, but a problem to them anyway – they love the extra mile – et cetera.

It must be said that some admins are just as shortsighted. Knowing what kind of measures actually work for stability, security, and so on, I've come to strongly dislike those who only complicate the situation to no benefit, as well as those who point their finger at the software when it really is their system that's causing problems.

davidgerard · on Jan 9, 2017

As a sysadmin, I love the devs who think "devops" and I make a point of saying nice things about them to the CTO.

wccrawford · on Jan 9, 2017

As a developer with some very basic sysadmin knowledge, I'd say you have to have enough sysadmin skill to set up and administer your own system, but not enough to keep it running and deal with attacks and security in the OS. (Obviously, attacks and security in the software you're writing are still your responsibility.)

I say this because if I had to wait for a sysadmin every time I wanted to see if something worked, I'd spend a lot of time doing nothing. And it's likely that I couldn't even solve a lot of problems.

So I think you not only have to know what they do, but some of how to do it.

bandrami · on Jan 9, 2017

As a grumpy evil sysadmin, I think the good Professor misses where the real disconnect is, at least nowadays: stack management.

Why do things like Docker exist? Because developers got tired of sysadmins saying "sorry, you can't upgrade Ruby in the middle of this project". Why does virtualenv exist? A similar reason.

Containerized ecosystems (which is to say basically all of them now) are really a sign of those of us on the sysadmin side of the aisle capitulating and saying that developers can't be stopped from having the newest version of things, and I think that's a bad idea.

15 years ago, when a project would kick-off, as a sysadmin I'd be invited in and the developers and I would hash out what versions of each language and library involved the project would use. This worked well with Perl; once the stacks started gravitating to Ruby and Python it was a dismal failure.

Why? Because those two ecosystems release like hummingbirds off of their ritalin. Take the release history for pip[1] (and I'm not calling pip out as particularly bad; I'm calling pip out as particularly average, which is the problem): in the year 2015, pip went from version 1.5.6 to 8.1.1 (!) through 24 version bumps, introducing thirteen (documented) backwards incompatibilities. Furthermore, there were more regression fixes from previous bumps than feature additions. You'll also notice that none of these releases are tagged "-rc1", etc., though the fact that regressions were fixed in a new bump the next day means they were release candidates rather than releases. Ruby is just as bad; the famous (and I've experienced this) example is that an in-depth tutorial can be obsoleted in the two weeks it takes you to work through it.

Devs are chasing a moving target, and devs who haven't been sysadmins may have trouble seeing why that's a bad idea.

[1]: https://pip.pypa.io/en/stable/news/

seneca · on Jan 9, 2017

As a Sys Admin turned Automation/Tools Engineer, I think you're missing part of the point. You've got the beginning of it right in saying that Sys Admins used to be involved in pinning down versions, and even in why that was necessary, but I believe you're incorrect in saying that the containerization technologies are bad for removing that.

Those technologies don't exist so Developers can get around Sys Admins and ignore your helpful advice. They exist to solve that problem that makes the Sys Admins role there necessary. It removes the underlying need for a Sys Admin to worry about the versions. Admins should see this as a good thing, but in my experience many dislike it because it takes them out of their Gatekeeper role. We shouldn't WANT to stop the Developers from from having the newest version of things. They aren't kids playing with toys that we need to nanny over, they're doing work that creates values and the fewer things we do to get in the way of that, the better.

If something breaks due to version changes, their testing should catch it. If things are breaking in production, we ought to get involved because there's some other problem, but before that we, as a profession, need to learn to get out of the way and let people work by letting technology handle the problems. The "Gatekeeper" mentality needs to die as quickly as it possibly can.

jstimpfle · on Jan 9, 2017

> They aren't kids playing with toys that we need to nanny over

In my experience (university), yes they are, and they should do that at home.

Why do you need the latest bleeding versions in the first place?

In my sysadmin experience, people believe software gets bad and deprecated as soon as the glory next breaking version appears. I don't think I need to argue why this is an illogical stance.

With my developers hat on, bumping to the next version mid-process reliably introduces more friction than is worth it. People think the next version solves that one weird issue but ignore that it introduces two new ones and that the software must be changed to fix five new incompatibilities.

But the solution reliably is to just not use that weird feature that caused the bug in the first place, and think what a clean solution would have been. And guess what, the result is a cleaner and more compatible code base. It's a tip that works for me again and again: If there is friction, think - before spending the next hours with an update that will soon lead to new problems.

It's great that you can for example compile Linux without too much friction. It's great that arcane shell scripts can run on any system. Stability (in a compatibility sense) is not a nice-to-have, it's basic sanity.

treehau5 · on Jan 9, 2017

My comment to you two is -- why not both?

Stability, sanity, all that is amazing, and a must have.

But also bug-fixes, security improvements, and performance improvements are wonderful too, which tends to come with using up-to-date dependencies.

The problem with the latter, as you mentioned, is when it introduces breaking API changes and is wholly not backwards compatible. This is not a "kids playing with toys wanting to experiment problem" this is a bad software problem, which is why I like Go, and why I liked Java when I was doing it full time. If the language you use has backwards compatibility as a first-class citizen, most likely the package authors will act that way too, and then the maintainers, and eventually the developers. Limit your software choices to those who care about not breaking everyone's shit every 2 weeks. Heck even when I write my own API's now that I know only my company is going to use internally I am thinking about this.

brightball · on Jan 9, 2017

Backwards compatibility is seriously under appreciated. When I tell developers to ensure that their changes are backwards compatible, they tend to look at me like I'm green.

I do not understand the disconnect that developers have with understanding all of the benefits that it brings. Yes, you have some extra code in your code base so it's less clean. You also have a stable environment as a result. The first affects only your personal preference. The latter affects all of your developers and users.

Unless you have a situation where it's impossible to maintain, not insisting on it is pure self interest.

gaius · on Jan 9, 2017

I do not understand the disconnect that developers have with understanding all of the benefits that it brings.

Because they've never worked on an old codebase, because front-end technologies change so often and everything just gets re-written anyway. It's a waste of time worrying about this when the code won't make it to is first birthday.

If you were speaking to seasoned C and DB developers about stability in the tools and the platform, you'd be preaching to the choir.

bandrami · on Jan 9, 2017

This gets to the complaint that so much of the open source ecosystem gets to version 0.8.6 (whether it's named that or not) and then completely rewritten "this time the right way". That's not actually a good thing.

prodigal_erik · on Jan 9, 2017

As jwz put it,

> It hardly seems worth even having a bug system if the frequency of from-scratch rewrites always outstrips the pace of bug fixing. Why not be honest and resign yourself to the fact that version 0.8 is followed by version 0.8, which is then followed by version 0.8?

sgift · on Jan 9, 2017

> Yes, you have some extra code in your code base so it's less clean. You also have a stable environment as a result. The first affects only your personal preference. The latter affects all of your developers and users.

Backward compatibility has real costs. You cannot restructure your code base as easy, you cannot deprecate bad ideas, cannot extend it as easy more and so on and so on. Sure, it also has real benefits (as you've stated), but missing the disadvantages while only highlighting the advantages is not a useful approaching, it only shows "your personal preference".

serge2k · on Jan 9, 2017

> Yes, you have some extra code in your code base so it's less clean. You also have a stable environment as a result. The first affects only your personal preference. The latter affects all of your developers and users.

No, the former results in a bloated code base full of old legacy crap that no one understands and is afraid to touch because it might break. You have to insert weird workarounds because that bug is now a feature to some idiot and you provide backwards compatibility so it lives forever now.

StillBored · on Jan 9, 2017

Has software quality increased now that everyone is refactoring and rewriting everything for every release?

I don't think so.

Its just plain arrogance to believe your are a better developer than the guy who came before. Having some fear of breaking the code base is healthy, the same way that having a little fear that the chainsaw is going to cut off your leg makes you safer.

jstimpfle · on Jan 9, 2017

I like your answer. It's civilized, balanced, and I agree with every word of it :-)

I'll add as an anecdote that I do follow your practices (limiting dependencies). It works wonderfully on Debian stable (most of the software there is now >2 years old, the next version has just been soft frozen). I have the occasional package pulled from testing: I recently toyed with Perl6. And currently I use a newer version of python3-sphinx for a nicer doc syntax but I could do without. It causes no headaches at all.

Declanomous · on Jan 9, 2017

For one thing, if there is any place to treat software like a toy and to play around with the latest version, it would be a college or university.

I don't particularly care about having my software on the latest version. I personally prefer using the old version for six months while the newest version gets the bugs worked out of it.

I know sysadmins value reliability and security, but it's really frustrating when every upgrade takes dozens of hours of work to approve. Questions like "What features do you need in the new version" miss the point. It isn't about the features of the software, it is about maintaining a modern code base.

Upgrades always have the potential to break things, but when you keep up with the upgrades it is easier to achieve the stability and security goals the sysadmin wants. When you upgrade often, it is easier to read the documentation and find where changes might break something, and when things do break it is easier to fix them. Upgrades that jump over several versions at a time are a nightmare to debug, and it creates a lot of technological debt that you have to work out later.

Ultimately, sticking with a version of software because it works is trading a little stability now for an absolute mess down the line.

jstimpfle · on Jan 9, 2017

I said it elsewhere, but personally I'm a Debian stable evangelist. There is one major upgrade every 2 years or so. It often goes without friction. The rest is mostly security updates. Breakage between major upgrades is very rare.

I don't think this thread is about "maintaining a modern code base" at all. Whatever that should mean -- My impression is you've fallen victim to the hype train.

In my perception the thread is about always catching up with the latest and greatest. Would you say in all earnest that my code is not modern because I make a point of developping against solid standards and not constantly longing for things that are not in my distribution (the software there is usually 0.5-2.5 years old)?

You can check some of my code at https://github.com/jstimpfle. Is it "not modern"? I'm a reasonable but not outstanding developer, and not saying that everything will work on your computer (since I'm usually the only tester) -- but I'm pretty sure I can get everything there to run on your computer with minor effort.

> When you upgrade often, it is easier to read the documentation and find where changes might break something, and when things do break it is easier to fix them.

No. Breakages are less frequent because the software is not brand new, and they are better known because all people using the stable release are on the same version. Documentation comes with the distribution, but I don't have any problems googling it by giving the version string either.

> Upgrades always have the potential to break things, but when you keep up with the upgrades it is easier to achieve the stability and security goals the sysadmin wants.

This thread was never about security and I don't approve. I don't think you are familiar with the concept of a stable release.

> Upgrades that jump over several versions at a time are a nightmare to debug, and it creates a lot of technological debt that you have to work out later.

No. If you develop against solid standards you have less breakage. It's not about incompatibility with the most recent versions. That would be a stupid idea. It's about compatibility with releases other than the latest and greatest. This means not depending on the hot new features that are only in these versions, simple as that.

flukus · on Jan 10, 2017

> I said it elsewhere, but personally I'm a Debian stable evangelist. There is one major upgrade every 2 years or so. It often goes without friction. The rest is mostly security updates. Breakage between major upgrades is very rare.

That's fine for an OS, but what do you think business customers would say if you said "sorry, that feature won't be added until the next release in 2 years time"?. That's were tools like pip come in, they let the software move faster, which it often needs too.

bandrami · on Jan 10, 2017

what do you think business customers would say if you said "sorry, that feature won't be added until the next release in 2 years time"?

We say that all the time; we have a two-year release cycle. And in our field (aviation) that's considered breakneck.

jstimpfle · on Jan 10, 2017

So what exactly are the latest-and-greatest libraries that you absolutely need to implement your own business critical features?

Please list more than only one. It's simple to make exceptions for exceptional requirements.

flukus · on Jan 10, 2017

Well for me, all of the libraries I use because none of them exist as packages for any OS.

With most of them security fixes will only go into the latest version though, so once you get behind your system is insecure.

Applications aren't something you build and forget. An unmaintained project is a dead one.

jstimpfle · on Jan 10, 2017

> Well for me, all of the libraries I use because none of them exist as packages for any OS.

In conclusion you don't use any libraries that are packaged for any OS.

What libraries?

Also assuming that some libraries you use don't exist for your OS, that doesn't mean that you absolutely need the latest and greatest in a business critical way. So, not approved.

All in all, not too fond of the reasoning and the evidence you provide.

flukus · on Jan 10, 2017

Your logic would prevent any app from just about any non-c ecosystem running. Java, ruby, python, dotnet, rust, go, they all have their own library management and very few of those libraries will be available in an apt repository (let alone a compatible one).

Your policy may work in a university, but you'd be fired from any real business.

jstimpfle · on Jan 10, 2017

You are still not providing evidence.

You're also making bold claims ("any app from just about any...") that my personal experience can just not validate. It's very easy to write applications without fancy dependencies. Recently I did algorithms, systems and applications in C, C++, python, sdl, alsa, X11, Unix shell scripting, lp_solve, and some web programming in python, javascript, sqlite3. All rock solid and stable - all will probably run on any Linux box from the last 5-10 years (python3 came only in about 2008; forget about the lp_solve bindings and just use the command-line tool).

There are 2050 python3-* packages on my system. Not that I think it's a good idea to use most of them. What's "compatible"?

So what are the libraries you absolutely need? What is this week's secret sauce?

flukus · on Jan 10, 2017

.net MVC, nUnit 3, jquery, knockoutjs, and nHibernate to name a few. I could name a dozen similar tools on the java stack. Pretty soon I'll be experimenting with rust with libui.

>There are 2050 python3-* packages on my system. Not that I think it's a good idea to use most of them. What's "compatible"?

2000 of them are random versions someone made a package for that are unknown to the core team and probably not receiving updates.

Your list of projects sounds like typical academic ones, not tools used by businesses that employ most software developers.

You're also missing the other benefit of these tools, that we develop against the deployed version. There are no compatibility issues because we develop on ubuntu and host production on redhat.

jstimpfle · on Jan 10, 2017

If you consider making websites, and tools and web applications for internal processes, academic... I also did some contract work where I created a server and a client for displaying advertisement media to commodity screens, in a tiled fashion. The tools were all there, C, python, bash, X11, some media libraries, the versions there were all fine (I worked around a bug in mplayer though).

jquery, knockout. I don't know a first thing about bundling dependencies for client-side javascript code (not going super fancy there, don't need jquery, knockout - tried writing single page apps by hand but they are hugely complex) but anyway, isn't that independent from server installations? Don't you bundle these libraries in-tree? If so, it doesn't relate to the discussion.

.NET... It's MS, do you run on Mono? How does the question of requiring the latest version apply?

mdekkers · on Jan 10, 2017

> There are no compatibility issues because we develop on ubuntu and host production on redhat.

Sorry, but that is a hilarious argument, almost straight from Gentoo is Rice: https://fun.irq.dk/funroll-loops.org/

mdekkers · on Jan 10, 2017

> Your policy may work in a university, but you'd be fired from any real business.

Nah. If we define "real business" as something with a decent turnover, employing over 250 people, and being in business for over 8 to 10 years; a business that isn't actually in the business of writing software (the majority of what makes up global stockmarkets, or "real business" in most peoples' eyes) then you will find that OP's attitude and policy-making philosophy is right on the money. (Source: Was CTO in exactly the above type businesses for many years)

jtreminio · on Jan 9, 2017

> Why do you need the latest bleeding versions in the first place?

Because the newest version has several features that we would like to take advantage of immediately?

Look at PHP 7.0 which introduced return types, and 7.1 which introduced nullable return types. These are features I really want in my application, so we upgrade.

rch · on Jan 9, 2017

No offense intended, but your university probably isn't competing for top engineers. Grad students and postdocs aren't professionals yet either.

The sysadmin role has traditionally been a focus in that environment (e.g. controlling access to cluster resources).

jstimpfle · on Jan 9, 2017

Define "professional". And let me claim that "top engineers" are actually the prudent ones - which you didn't refute.

rch · on Jan 9, 2017

I think we agree on the prudence of professional engineers.

The definition of 'professional' is up for debate, but I'd encourage people to weigh in on the following (to IEEE, not me):

- an appropriate engineering education background (ABET/EAC)[1]

- at least four years of engineering experience in your field and under the supervision of qualified engineers

- passed two exams (the Fundamentals of Engineering [FE] exam, which is now a computer-based test available essentially year round, and the eight-hour PE exam)

- kept current by as a minimum meeting your state's continuing education requirements.

-- http://insight.ieeeusa.org/insight/content/careers/97473

[1] I think it would be worthwhile to consider apprenticeships, equivalent to the 'law office study' path to attorneys' bar certification.

shuntress · on Jan 9, 2017

I think the pain of keeping any project fully up-to-date is (sometimes far) less than the pain of updating an out-of-date project.

jstimpfle · on Jan 9, 2017

If it is, your project has a problem.

mancerayder · on Jan 9, 2017

Those technologies don't exist so Developers can get around Sys Admins and ignore your helpful advice. They exist to solve that problem that makes the Sys Admins role there necessary. It removes the underlying need for a Sys Admin to worry about the versions. Admins should see this as a good thing, but in my experience many dislike it because it takes them out of their Gatekeeper role. We shouldn't WANT to stop the Developers from from having the newest version of things. They aren't kids playing with toys that we need to nanny over, they're doing work that creates values and the fewer things we do to get in the way of that, the better. If something breaks due to version changes, their testing should catch it. If things are breaking in production, we ought to get involved because there's some other problem, but before that we, as a profession, need to learn to get out of the way and let people work by letting technology handle the problems. The "Gatekeeper" mentality needs to die as quickly as it possibly can

Well - that's all well and good, when put such that "Gatekeepers" are viewed as blockers.

However, we "Gatekeepers" are the ones that get paged and / or yelled at by a CTO when an application keels over. Not the developers. The developers get to sit in their sandbox (otherwise known as "production" in 2016/17) of ever-changing library versions that were only rapidly tested in QA. Then they play a game of Starcraft II, scan HN and go to bed. When something runs out of memory or crashes in the middle of the night, we get paged. So, hell yes we should be involved in the process.

Sincerely,

Gatekeeper

snuxoll · on Jan 9, 2017

Sounds like an organizational failure to me.

When I started at my current company the traditional silo between dev and systems was there (although we were allowed to deploy our own stuff) - they managed everything we ran our apps on and we just deployed them to servers they had already configured. Over the past ~3 years we've made a lot of changes, the department manager for our IS team is present in our daily standup calls to relay information between our two teams and we now have a couple separate VMWare clusters dedicated to our applications and VM running on them is our responsibility for the most part. We are the first to get called for issues with our applications, and where necessary we work collaboratively with our systems team to resolve them - we don't throw blame around, it does no good.

I should add most of this is only possible because we have real DevOps people on our team (well, really, it's just me right now - we lost our other and need to hire a replacement still) - not developers who know enough to copy a blob of crap to a server to run, but people who have real skills in both aspects. We are trusted to maintain things because we can do it right, and while it took a lot of work (and some unfortunate infighting) to get to this point both of our departments are working great with this arrangement.

There's still kinks that need ironing, we've not done an adequate job at writing documentation so our systems team can help with some failures (primarily on our Linux VM's, our whole systems team is Windows admins) if we aren't available - but it's on the radar as well as getting PagerDuty set up to escalate alerts to them if we don't respond in time (like having our PostgreSQL data volume fill up over the weekend, not a call I wanted to get at 10AM on Sunday).

So yeah, fix your culture issue, get people communicating daily between your teams, share responsibility for issues instead of placing blame.

jasonlotito · on Jan 9, 2017

> However, we "Gatekeepers" are the ones that get paged and / or yelled at by a CTO when an application keels over. Not the developers.

And that's why people are moving away from that model. It's part of the reason DevOps is being embraced as a model. Developers should be on call to support the applications they build. You get benefits all around.

CaptSpify · on Jan 9, 2017

should is the key-word here. As a sysadmin, I'd love to work closer with devs, especially during outages. Unfortunately every time we bring up on-call, the room goes silent. This is very anecdotal, but IME, devops has just become a way for devs to bypass sysadmins.

I wonder how many companies are doing it right vs doing it wrong? Any anecdotes from a proper devops group?

serge2k · on Jan 9, 2017

and we get yelled at when we can't deliver a feature because some gatekeeper is sitting atop his little throne in the kingdom of servers saying no. ;)

This is institutional failure.

CaptSpify · on Jan 9, 2017

Then place the blame on the gatekeeper. As a sysadmin, I'd be more than happy with you pointing the finger at me as the reason why you can't deliver a feature. Assuming, of course, that you've run the proper tests and gotten QA's approval.

user5994461 · on Jan 9, 2017

Honestly. Developers should stop thinking that their jobs is to release features all the time at all costs. That's simply not true and that's counter productive for the business.

WorldMaker · on Jan 9, 2017

If only the business might learn that it's not their jobs to request new features all the time at all costs...

If only customers weren't fickle and might learn not to demand new features all the time whatever the costs...

It's turtles all the way down.

user5994461 · on Jan 9, 2017

And in the end, that's the developer who always gets the decision. Does he ships half assed half finished every single time, or does he take time to do some testing and not break production.

serge2k · on Jan 9, 2017

Developers do what the business wants them to.

jstimpfle · on Jan 9, 2017

Example?

jdbernard · on Jan 9, 2017

The problem with your characterization of things is companies like the one I'm in where us "Developers" have replaced the gatekeepers entirely. We have no dedicated SysOps, and yet our production environment stays up just fine.

zeveb · on Jan 9, 2017

> It removes the underlying need for a Sys Admin to worry about the versions.

No, they really don't: they remove the ability of system administrators to administer versions of software across the total system.

This is bad, e.g. when a new OpenSSL vulnerability comes out (it being a day ending in -y) and every piece of software has to be updated.

> We shouldn't WANT to stop the Developers from from having the newest version of things. They aren't kids playing with toys that we need to nanny over, they're doing work that creates values and the fewer things we do to get in the way of that, the better.

I am a developer, and I disagree. We are, by and large, kids playing rather than adults making carefully considered decisions. We'd rather use v3.0.rc-1-awesome rather than 2.17.12, because the former is the version that adds an API that saves us from writing twenty lines of code, never mind that it also is untested, unstable and very likely insecure.

We need adult supervision. We need oversight. That's why I argue for using stable, LTS-style distributions, and running against the distro packages unless there is a very good business reason not to (and yes, 'we can't implement necessary functionality in a cost-effective timeframe' is a valid business reason). I'm not opposed to using the bleeding edge when it makes business sense; I'm opposed to developers using the bleeding edge because they like it, and keeping the business in the dark.

eternalban · on Jan 9, 2017

I agree with OP and would amend his statement to "15 years ago, when a project would kick-off, [sysadmins & architects would] be invited ..."

From an architectural point of view, microservices take the reductionist approach to system design to an absurd limit, and per my professional experience (fwiw & ymmv) are due to the general architectural illiteracy of the rank and file practitioners in this field.

gaius · on Jan 9, 2017

Right, microservices isn't "architecture". It is - whether they know it or not - an admission that "we can't do architecture so we'll chuck it over the fence and let the ops people worry about how it all hangs together".

eternalban · on Jan 9, 2017

> microservices isn't "architecture".

Yes the 'no-architecture architecture'. It's very Zen. /s

> how it all hangs together

In case you are interested in rescuing :) young but promising talent in the field, next time you find yourself involved in a discussion about microservices "architecture", point out the realization of a single-node application per this approach, where every function is a process, and the call stack requires IPC, and the 'linker' is considered obsolete and outdated technology.

gaius · on Jan 9, 2017

I once worked on an application that - no joke - comprised 5000 VMs each of which was running one "service" in a dedicated JBoss. It was laughably bad.

iask · on Jan 9, 2017

At what point do you determine who is responsible for securing the environment? The "Gatekeeper" mentality is stemming from this. There is no clear line in any organization and I see the blame-game all the time.

whyileft · on Jan 9, 2017

This is exactly the example I give when I try to explain to someone that managing through personal responsibility as opposed to team and organizational responsibility will grind your company productivity to a crawl.

_d8fd · on Jan 9, 2017

That's kind of like asking which employees at a bank are responsible for keeping cash in the vault. Hopefully it’s a group effort.

astine · on Jan 9, 2017

There are actually generally rules for which employees have responsibility over the vaults in banks. The employees aren't equally responsible. Roles and responsibility are defined strictly. Generally, the lowly tellers aren't allowed the same access to the money as the general manager, and they aren't held responsible to the same degree if money in the vault goes missing.

KallDrexx · on Jan 9, 2017

This has actually been a very insightful response to me.

After reading your comment it now occurs to me that Docker and other container systems are actually a huge organizational tool. One issue I have encountered at companies is keeping the IT and development departments on the same high level organizational incentives to keep political barriers from coming up between them (and conflicts arising).

Containers can help keep everyone's incentives aligned because System admins can focus on the actual administration aspects of the systems and infrastructure (that devs do not need to be concerned about, like vnet layouts and whatnot) while devs can focus on the actual development and deployment without having to have everything confirmed and approved by the IT departments.

eropple · on Jan 9, 2017

At almost every shop I've ever seen, "sysadmins" are also the ones whose responsibility it is to at least attempt some sort of security practice and business continuation. Which opaque, "just run it" containers actively fight against. Did the developers actually audit what they've pulled in as dependencies? Did they make sure that they can be rebuilt if whatever package source goes away? Where is everything documented? Containers, as usually implemented, rather than "keep everyone's incentives aligned", instead damages the ability of the adults in the room to keep everything from falling apart.

(I have been both the sysadmin saying "no" and the developer mad at sysadmins saying "no". But going slower and doing our homework has never, ever hurt me or my employers.)

KallDrexx · on Jan 9, 2017

I've never seen sys-admins verify security of dependencies outside of the major dependencies (like the language VM version). Security wise they have been much more concerned with the security of data storage systems (databases, elastic search, etc...), the operating system, and the network in general.

Security of the application is very much so responsibility of developers, not system admins, as the developers have the best point of view to understand the implications of the software they are developing/integrating with.

If there are routine violations of security at the application level that aren't being caught by developers working with those systems then the company as a whole needs to sit down and make sure the development teams have the proper security procedures in place, because putting a department in charge of security that has all accountability but no power to remedy the situation is a recipe for political fights between departments and a disaster. Proper code reviews and team leads with experience should be able to catch more security issues than sysadmins will.

If your sysadmins are in charge of security review of the application then they have to be in charge of security review of every low level dependency at the individual package level. Otherwise your developers won't think about it because it's not their problem (IT will review and let me know if anythings bad) and it encourages them to lack accountability of the security of their own software.

SomeStupidPoint · on Jan 9, 2017

Doesn't that constant updating and lack of version oversight create security risks?

Developers may not be as aware of those topics as sysadmins.

ajford · on Jan 9, 2017

Isn't it just as bad to keep versions locked for months and years so we don't upset the delicate balance of versions on the production env?

I've worked at a place where they were running PHP 4.4.9 until about 8 months ago. And they were upgrading to 5.4! I get that it was work to convert a lot of the older code base to 5.4, but it was already passed EOL when they were switching to it. And 5.5 wasn't much behind it.

So now in the near future, they'll need to upgrade again (though they probably wont), and they'll probably jump to 5.6, which EOLs in two years (probably two years after it EOLs).

UK-AL · on Jan 9, 2017

The lack of updating is also a massive security risk.

brianwawok · on Jan 9, 2017

That's terrible. If I'm a Python dev, why should a sysadmin who doesn't even know Python tell me what version of a library I will use?

I have regression tests to catch if an upgrade breaks anyyhing. What does a sysadmin have to approve or deny an upgrade? A little beard stroking and changelog reading?

I think the movement towards containers is like you said, to keep sysadmins off the code. Sysadmins add value in setting up the infastructure and keeping it running. They subtract value when they want to tell developers what version of a library to use.

bandrami · on Jan 9, 2017

why should a sysadmin who doesn't even know Python tell me what version of a library I will use

Because he maintains that installation and you don't? But, yeah, that's why virtualenv, Docker, etc. were invented, because devs kept getting sick of installations having consequences.

What does a sysadmin have to approve or deny an upgrade?

Check for conflicts of this version of this library with other software currently in use (by other developers maybe, or even by the same developer). Add it to the watchlist on the dozen or so security mailing lists and newsfeeds he checks daily. Read the changelog and look for implementation problems. Read fora and look for performance problems people are reporting. Yes, beards get stroked during this process, but time and again we see that developers refuse to do this, and wind up coming to us when they break something because of that...

gnarbarian · on Jan 9, 2017

Developers tend to fall into the trap of believing we are better than sysadmins but there is immense value in having talented admins around. I have recently experienced this first hand watching someone breeze through server archaeology and vitlrtualization tasks that I struggled through and may never have been able to accomplish in a reasonable amount of time. When admins and programmers recognize each other's strengths and play to them it is a rewarding experience. We just have to realize that we're on the same team.

Also, docker (and containerization in general) is a wonderful thing for both of us. It decouples the fickle apps from systems (also moving targets) and the other apps which are constantly seeking out new and creative version incompatibilities. It makes migration and maintenance a much less frustrating endeavor with fewer surprises along the way.

btreecat · on Jan 9, 2017

>Because he maintains that installation and you don't?

So why is that an acceptable mentality for "in-house" developed software but if you buy something proprietary from a third party where you have zero say over what lib/langs are used, it's A-OK?

>Check for conflicts of this version of this library with other software currently in use (by other developers maybe, or even by the same developer).

That's not the case when using containers properly. Every service gets it's own environment so whatever version of lib-xyz is needed, even if incompatible with other parts of the project, are walled off for only the service that needs it.

>Add it to the watchlist on the dozen or so security mailing lists and newsfeeds he checks daily.

Ok this is where I completely agree with you as we have been working on this at our company. My personal solution seems pretty logical though so hear me out.

1) Build a docker file that fully documents the install of your service as well as any OS level dependencies. Ensure that any config files are external to the container to allow sysadmin access.

2) Document in a central location (say an internal wiki) what the external services, servers, repositories, developers, and admins are responsible for the service.

3) Automate builds of containers from repo and add automated testing post containerization.

4) Sysadmins monitor repositories for changes to docker files or wiki articles for new services, databases and libraries as well as taking note of library versions. If an issue with a particular lib or service is discovered, the config files can be edited to point to a new service. Or a new container build can be triggered with zero changes to the source code, but a forced update to the OS packages for the container.

In a tight situation where a developer might not be available on-call, the sysadmins have more control over a similar proprietary product but don't have a workflow for messing with source code (which they are likely not familiar with regardless).

There are solutions to the issues you raise (often trivial ones at that), they just require an adjustment to workflow and an increase in communication between developers and their sysadmins.

bandrami · on Jan 9, 2017

I see your point, and like I said I agree this is why Docker was invented and it's the best-in-breed at what it does (namely, being a tourniquet for a self-inflicted wound). My biggest concern really isn't "my problem" since I'm in ops: it's the leftpad worry. I still have teams starting Dockerfiles with "FROM centos:latest" because that's just the mindset they have: "Latest will fix any problems" rather than "Latest will introduce new problems".

And, ultimately, Docker lets that not be my problem, because they have to deal with it when the next leftpad happens. So, yeah: they should have at it. I guess I still think there's something to be said for the cathedral pace, though.

btreecat · on Jan 9, 2017

>I still have teams starting Dockerfiles with "FROM centos:latest" because that's just the mindset they have: "Latest will fix any problems" rather than "Latest will introduce new problems".

Well there are two ways to approach this IMO.

One be proactive. Create them a vetted centos or whatever OS environment for them to base off.

The problem is, if you don't keep on top of it as a sysadmin, the developers will just figure out another way to wall you off.

Alternatively, accept that it doesn't matter what underlying OS they use, because a patched OS >> than unpatched and that when done correctly there is minimal exposure even when the service has an exploitable lib due to the jailed nature of containers.

Assuming "latest introduces new problems" too often builds an aversion to patching which can lead to worse issues down the road.

>And, ultimately, Docker lets that not be my problem, because they have to deal with it when the next leftpad happens. So, yeah: they should have at it.

Exactly! The only one responsible for libs are the parties directly leveraging them. Not that developers shouldn't make that info known. It has to be documented to remove the bus-factor of 1, and if it isn't the sysadmins should work with the devs to get it documented.

> I guess I still think there's something to be said for the cathedral pace, though.

I think it depends a lot on your resources as a department/company. You should always execute as quickly as feasible given your team size and work load. Otherwise technical debt has a way of piling up faster than you can offload it.

closeparen · on Jan 10, 2017

At my organization, leftpad is not a reason for SRE to tell developers they can't use dependencies. Instead, leftpad is a reason for SRE to run internal package mirrors for all our supported packaging systems (debian, pip, glide, Maven, etc) and ship forks of the build tools so that when you reference a 3rd party dependency, the URL is rewritten to one at our internal mirror. The internal mirror, in turn, goes out and downloads anything it doesn't already have.

They also maintain the base docker images that we're expected to use, as well as the docker build infrastructure.

Facilitation with guardrails, not blockers.

eightysixfour · on Jan 9, 2017

I have some perspective as someone who has done a bit of all of these jobs over the last 10 years as well as working in a hosting company that handled release management for large Fortune 500s.

> So why is that an acceptable mentality for "in-house" developed software but if you buy something proprietary from a third party where you have zero say over what lib/langs are used, it's A-OK?

Proprietary software generally has a support agreement and SLA for fixing things instead of getting the response "it works in dev!"

> That's not the case when using containers properly. Every service gets it's own environment so whatever version of lib-xyz is needed, even if incompatible with other parts of the project, are walled off for only the service that needs it.

That's why containers are great, but you have to remember most of the world isn't as fast as this community to adopt things, a lot of things are still being managed the hard way on shared servers with literally thousands of dependencies. Migrating to containers in these instances can't happen fast enough.

> There are solutions to the issues you raise (often trivial ones at that), they just require an adjustment to workflow and an increase in communication between developers and their sysadmins.

Implementing even trivial changes to processes that impact hundreds of people across multiple continents is often not trivial. Devs in India, devs in the US, hosting teams, release management, etc. A lot of those people are doing just enough to get by and not up-to-date tech wise, so not only are you implementing new tools and processes, but you're building out training programs around using them, etc.

These processes are old and will be modernized in time but that's the reality for a lot of "sysadmins."

dozzie · on Jan 9, 2017

>> Check for conflicts of this version of this library with other software currently in use (by other developers maybe, or even by the same developer).

> That's not the case when using containers properly. Every service gets it's own environment so whatever version of lib-xyz is needed, even if incompatible with other parts of the project, are walled off for only the service that needs it.

This illustrates why developers should have some experience with administering systems: do not deploy unrelated services on the same machine.

And you know what happens as a byproduct of this rule of hygiene? Suddenly the version conflicts disappear, at least for things that aren't broken anyway.

e40 · on Jan 9, 2017

That's the tail wagging the dog (and I manage the sys admin department at my company).

jasonlotito · on Jan 9, 2017

> Because he maintains that installation and you don't?

Or not. You're devs are on-call, aren't they? They are maintaining their own software, right?

xorcist · on Jan 9, 2017

> why should a sysadmin who doesn't even know Python

Because a Python sysadmin has been through all the transitions of packaging systems, all the nasty corners of "backwards compatible" changes, and knows how underlying changes to the operating system will affect your code, what the storage behaves like under load, and why one tech is not "better" than another. If you really hired an admin (cough, "reliability engineer") for a Python codebase that doesn't know Python, well, that's a different question altogether.

> I have regression tests to catch if an upgrade breaks anyyhing

You don't know what you don't know.

When you can reason about the multiple ways the above statement can fail, congratulations! You are now a seasoned sysadmin, the scorn of junior developers who just want to get things done (who incidentally read a great blog post the other day about a new packaging system that we should immediately transition to and by the way it's all backwards compatible).

crdoconnor · on Jan 9, 2017

>You don't know what you don't know.

That's the point of regression tests. The sysadmin also doesn't know. Unless he's the one writing the tests (and IME he's not) or he's painstakingly regression testing everything by hand (trust me, he's not doing that either), making him a gatekeeper for all library upgrades achieves very little except adding bureaucratic friction.

xorcist · on Jan 10, 2017

Look, do you want a gatekeeper or not? For your small little web project you don't need it, and a little downtime probably isn't catastrophic. But as soon as you are under audit rules you need it, and we call this specialized role the admin. When you grow bigger this will likely branch out to a dedicated change manager, at which point you hopefully have other specialized roles for security as other things as well.

I understand this does not make sense when you are not more people than can fit around a table, but as you grow you will feel the need for more and more specialized roles to fit the changing requirements. The first specialized role is probably the sysadmin (devops, reliability engineer, whatever you call it) and he or she should preferrably be the one on the team with the most knowledge of how things work "under the hood" because that person is the one that can save you when things go haywire. Unless you trust this person to be more knowledgable than you are in those areas, as they rightfully should be, you're going to have a problem.

crdoconnor · on Jan 10, 2017

>Look, do you want a gatekeeper or not?

No, ideally not - that's the idea behind https://en.wikipedia.org/wiki/Continuous_delivery

Where gatekeepers are required (because regression testing is not yet fully trusted enough for continuous delivery), QA should be the gatekeeper, not sysadmins.

>For your small little web project

My comments are based upon working on projects with a turnover of > ~1-1.5 million USD / day.

>But as soon as you are under audit rules you need it, and we call this specialized role the admin. When you grow bigger this will likely branch out to a dedicated change manager

Every time I've worked with somebody whose role was "change manager" this role was introduced:

* As a response to repeated downtime in the past caused by some kind of idiocy.

* They were required to "sign off" on releases purely as an added bureaucratic step to cover some manager's ass.

* They never once prevented or caught a production issue.

* They always slowed down releases.

>The first specialized role is probably the sysadmin (devops, reliability engineer, whatever you call it) and he or she should preferrably be the one on the team with the most knowledge of how things work "under the hood" because that person is the one that can save you when things go haywire. Unless you trust this person to be more knowledgable than you are in those areas

Ironically the whole idea behind devops (which I fully agree with) is that it should not be a specialized role - developers and ops teams should be blended.

This is precisely because if the two teams are separate and one throws code over the wall to the other then things will go wrong. Then a manager will insist on a gatekeeper.

warbiscuit · on Jan 9, 2017

I think a lot of this debate argues for the sysadmin role being part of the dev team. The only real way to get both constraints (production stability and update to date fixes/features) is to have fast feedback between the interest holders of two sides.

In the python-specific case -- the requirements.in / .txt files for the virtualenv should be part of the software VCS, but the sysadmin should be able to edit & pin things just like the devs, so that they can bring their expertise to the container, rather than having to fight it.

---

Mind you, my opinion might not scale - I'm part of a small enough team that I'm holding both those roles, but I try make sure to spent time wearing both "hats", so that one role doesn't get more man-hours clocked.

crdoconnor · on Jan 9, 2017

IMO if a sysadmin wants to have visibility into the requirements.txt that's fine.

If they want to enforce a policy of pinning versions, that's very welcome (though I would do that anyway).

If they have specific, relevant comments about upgrades of specific packages - again, fine (though in practice they never do).

If they want to be a gatekeeper for changes to that file they can fuck off.

jstimpfle · on Jan 9, 2017

Another perspective: Why don't you want your software to be compatible with the system that your sysadmin provides? (Assuming that system is not completely obsolete).

Minimize your dependencies. It's incidentally also what leads to clean code bases.

crdoconnor · on Jan 9, 2017

>Minimize your dependencies. It's incidentally also what leads to clean code bases.

Oh hell no. I have wasted far too much of my life maintaining buggy, technical debt ridden reinvented wheels where there was a well maintained package that could just have been used instead.

brianwawok · on Jan 9, 2017

> Minimize your dependencies. It's incidentally also what leads to clean code bases.

You also crush velocity. Smart use of libraries lets you ship code 10x faster. Two identical businesses.. one writes all their own code, one is smart about using libraries. Which one makes it to IPO first, and which goes bankrupt?

sly010 · on Jan 9, 2017

Are you arguing that you can keep velocity while building up technical debt?

The company "smartly" using libraries might stuck maintaining a monster of dependencies that only ever was meant to be an MVP. It will require 10x more engineers and while they might move fast at the beginning, they will only slow down over time.

The company minimizing their dependencies and paying attention to their stack will be able to add complexity over time without breaking a sweat. Their costs will be 10th the cost, and they will be able to run profitably.

I am not anti-library or anything, but e.g. adding sci-py to your python project just because you need a gaussian function in one place in your code is just lazy.

Managing dependencies wisely is one of the hardest thing in software development. It's right after cache invalidation and naming things ;)

nasalgoat · on Jan 9, 2017

In my experience, the vast number of problems we've had with our mobile apps, especially on Android, has been a developer deciding to use some random SDK to solve a simple problem because he didn't want to take the time to write it himself.

Then the app is broken or has memory crashes, or the final binary is 10x the size it needs to be. 9 out of 10 times it's a third party library. This is why I ban the use of them unless absolutely needed.

noir_lord · on Jan 9, 2017

That only holds true (as much as it does) if you assume the goal is always an IPO.

prodigal_erik · on Jan 9, 2017

If the world has already written 90% of the code I need, how likely is it that the remaining 10% is valuable enough to make a viable business?

FanaHOVA · on Jan 9, 2017

Really likely. Code doesn't mean much, it's mostly about how you present it

thebeardisred · on Jan 9, 2017

To call out something that nobody else mentioned: because when you to use that newer version of pycurl or pyASN1 it's probably going to break the system level tools for running patches, handling license enforcement, and being able to keep the database online for other teams.

I've had this problem over and over again (biggest one last being with the US Census). Folks insisted on upgrading a python library and auth to the hosts stopped working.

user5994461 · on Jan 9, 2017

> That's terrible. If I'm a Python dev, why should a sysadmin who doesn't even know Python tell me what version of a library I will use?

What if the sysadmin can code in the same language as you, faster and with less bugs?

dozzie · on Jan 9, 2017

Better yet: what if the sysadmin can code better and faster in the same language and also three others?

user5994461 · on Jan 9, 2017

That would be me!

awinder · on Jan 9, 2017

  > 15 years ago, when a project would kick-off, as a sysadmin 
  > I'd be invited in and the developers and I would hash out
  > what versions of each language and library involved the
  > project would use

To wit: the new ideologies in infrastructure management are actually designed to solve the underlying problem that necessitated that kind of working setup. Why should the version of a lib in one part of the software somehow pose existential threat to the infrastructure? Engrain the dependencies into contained, independently deployable pieces, and make it so that app-level code can evolve without bringing down the world with it. Make it easy to revert back, and/or utilize phased rollouts, and you've got the ability to iterate quickly, keep pace with external dependencies, and it no longer has to be some scary thing that requires big back-and-forth meetings over mundane details.

(As for software that releases often, maybe it's an over-correction, but there's a reason things don't work as they did in the glory days, and that's because they were never really that glorious.)

This doesn't necessarily rule out the expertise of systems administration, because the platforms for all of this need to be built & maintained, and there's still a lot of work to be done on network boarder security, etc. It's a movement that focuses systems administration to systems administration, instead of having to be this big org arbiter of microdecisions, and all the baggage that goes along with trying to be the gatekeeper of all.

astrostl · on Jan 9, 2017

> Why should the version of a lib in one part of the software somehow pose existential threat to the infrastructure?

Because that's how software developers wrote every dominant packaging system :P

There are tradeoffs to self-contained units. Disk space isn't so much of a practical concern these days, but security is very real: with a dozen apps, you could be at the mercy of a dozen different entities to update their embedded OpenSSL libraries.

eropple · on Jan 9, 2017

Or the statically compiled application that "just works" and is "so easy to build and maintain". Lookin' at you, Golang.

jstimpfle · on Jan 9, 2017

Counter-wit: Why can't people get their software to work with the existing libs? Hint: It's very rarely that the existing libs disqualify.

bluejekyll · on Jan 9, 2017

this is the core of the problem between Devs and sysadmins. Sysadmins come from a mindset of a polished working system which never needs to change. They deliver stability and reliability to the business.

Devs come from a mindset to actively create change. This is to add new features and deliver new value and product to the business. As a Dev I do have to say that many Devs don't have enough experience in operations to understand properly how to help sysadmins, many don't understand the complexities of that job.

These two perspectives are at odds, and they should be. The new tools, like docker, start giving everyone what they want... Devs pick their dependencies, and in theory, can't stomp on the sysadmins pristine environment.

To respond directly to your question: because there are new things available in new libraries that allow us to develop new features!

jstimpfle · on Jan 9, 2017

> To respond directly to your question: because there are new things available in new libraries that allow us to develop new features!

If it were only that, we would have an easy time. The new things you need to develop new features are far and far between.

gaius · on Jan 9, 2017

The new things you need to develop new features are far and far between

99% of web software written these days could fulfil identical use cases on an IBM 3270 from 40 years ago. You enter something into a form and it gets stored in a database. You enter something into a field and it generates a report. That's all Amazon, Facebook, Google, any e-commerce site are.

Sure it might be nice to use a new version of that new JS framework that all the twitterati are going crazy about, but does it deliver value to the business that justifies the risk and investment?

jdbernard · on Jan 9, 2017

And yet none of those things did arise 40 years ago. All of the nuances of all the code written since then make a difference, despite duplicating "identical use cases".

gaius · on Jan 10, 2017

Amazon was founded in 1994, so over 20 years ago and somehow they managed to succeed without AngularJS 3.7 or whatever the fashion of the month is.

jstimpfle · on Jan 9, 2017

You didn't come up with that idea, but it's not about "40 years ago".

adrianN · on Jan 9, 2017

People could do great things just with punch cards, yet somehow technology kept marching on.

If developers want to use newer stuff usually they have a good reason. The ability to hack around the deficiencies of old dependencies does not mean that one couldn't get a better, cheaper solution with newer technology.

jstimpfle · on Jan 9, 2017

> People could do great things just with punch cards, yet somehow technology kept marching on.

That's not the situation I've described - punch cards disqualify.

The situation I mean is where developers insist on writing software on version X, which doesn't compile on X-1 and is buggy on version X (and might not compile again on X+1). For a concrete example, new C++ features that aren't correctly implemented and lead to harder to read code and worse error messages when applied to day-to-day problems (which these features were never meant for).

bluejekyll · on Jan 9, 2017

If it is as you say, then why upgrade ever? How would we even discover bugs in software until it is used?

To have progress we need to change things. When we change things, we may break things, regardless of tests.

To quote deijkstra: "testing can be a very effective way of showing the presence of bugs, but it is hopelessly inadequate to show their absence". From 'the humble programmer'.

Production is the only way to eventually discover the stability of any software, even with 100% test coverage. It's a necessary evil in the support of progress.

jstimpfle · on Jan 9, 2017

All I disagree with is testing bleeding edge third party software by heavily depending on it in your production systems.

Software needs to be tested. But your view that the whole world needs to jump on it at once is very black-and-white.

bluejekyll · on Jan 9, 2017

That is not my view. If you're relying on prerelease software, you're definitely playing with fire.

gregmac · on Jan 9, 2017

Pick your poison.

If you run into a bug or problem with a 3rd party component (open source library, commercial tool, whatever), one of the first things they are going to ask you to do is upgrade. The fact you're on an old version of some library is an easy (and sometimes correct) scapegoat for problems.

Put yourself in the 3rd party's shoes: if you spend a bunch of time trying to fix a problem that turns out to be a bug in a separate library that's already been fixed, that's entirely wasted time.

The same goes for direct usage: you're likely to spend time fixing problems that have already been fixed.

StillBored · on Jan 9, 2017

Upgrading the version of the library wouldn't be a problem if the concept of stable ABI's were as prevalent as it was 15 years ago. Back then, the major.minor version number system was used as a signal that it was safe to upgrade to a newer version of a library without worrying that the entire application stack was going to come falling down around itself because the developer of said library decided to rework some part of the library without providing any backwards compatibility.

Put another way, a sysadmin could feel confident that moving from 1.52->1.53 would be a painless and transparent operation and that the provider of said library would continue to release 1.x branches with little ABI changes for some length of time. The expectation was that at some point the library provider would release a 2.0, which would require a more careful testing/deployment schedule likely with other upgrades to the system.

Today, that is all out the window, very few open source projects (and its infecting the commercial software too) provide "stable" branches. The agile, throw out the latest untested version mentality is less work than the careful plan/code/test/release, followed by fix/test/release, cycles.

This is a major rant of mine, as upgrading the vast majority of open source libraries usually just replaces one set of problems with another. Having been on the hook for providing a rock solid stable environment for critical infrastructure (think emergency services, banks, power plants, etc) I came to the conclusion that for many libraries/tools you had better be prepared to fix and backport bug fixes yourself unless you were solely relying on only libraries shipped in something like RHEL/SLES (and even then if you wanted it fixed fast, you had better be prepared to duplicate/debug the problem yourself).

gregmac · on Jan 10, 2017

> Put another way, a sysadmin could feel confident that moving from 1.52->1.53 would be a painless and transparent operation

This is what Semantic Versioning [1] aims to achieve, but as you highlighted, it still requires the maintainer(s) of the project to actually deliver stable software, regardless of what the version is. I think some people took "move fast and break things" a bit too literally.

A project following SemVer and that has good automated test coverage is definitely on the right track though, and in generally should be a pretty safe upgrade (of course it's important to know their track record).

"Move fast and break things ... in a separate branch with continuous integration running an extensive test suite" isn't quite as catchy but is what should be happening.

[1] http://semver.org/

flukus · on Jan 10, 2017

Was there no automated testing that allowed you to go from 1.52 -> 1.53 with some degree of confidence?

jstimpfle · on Jan 9, 2017

> The same goes for direct usage: you're likely to spend time fixing problems that have already been fixed.

That depends on whether it's a feature or a fix release. Feature releases might or might not include bug fixes, but they typically include new bugs. I welcome localized fixes, however they are not as common because of constrained resources. (Fix releases is the idea behind Debian stable. Of course it only works to an extent).

A different perspective, I prefer to have the bugs that I already know, and know not to trigger.

awinder · on Jan 9, 2017

Because those libraries have bugs, sometimes catastrophic ones. Sometimes they must update, due to API changes or other factors outside of their control. If your organization relies on keeping things static as a means to stability, one day that rule will have to break, and you may be pretty underprepared for it.

simplehuman · on Jan 9, 2017

Because many of these old library versions go unmaintained.

talideon · on Jan 9, 2017

> Why does virtualenv exist? A similar reason.

The reason why virtualenv exists is because different apps may have conflicting requirements, and you have apps that need to be deployed in different environments with different versions of different libraries. I know that even if I were developing against versions of libraries in system packages, I'd still end up having to use virtualenv in development (EDIT: I wrote 'production' here by accident) because my stuff gets deployed on different versions of Debian and RHEL, necessitating virtual environments if only so that I can make my development environment as close to production as possible.

> In the year 2015, pip went from version 1.5.6 to 8.1.1 (!) through 24 version bumps, introducing thirteen (documented) backwards incompatibilities.

Much of that has been down to efforts in recent years to finally fix the major issues with Python packaging. It has settled down quite a bit. Also, the 1.* to 8.* change is because the initial '1' was dropped: 8.* is essentially 1.8.* in the old versioning scheme.

I'm not saying that this couldn't have been handled better, but it's not just a 'hummingbirds off of their ritalin' situation: Python spent many years with packaging stagnated, and what you're seeing is rapid development to fix the mess that years of PJE-related neglect caused.

vinceguidry · on Jan 9, 2017

> Because developers got tired of sysadmins saying "sorry, you can't upgrade Ruby in the middle of this project".

As a Ruby developer, I can only laugh at this particular example. No Ruby project I've ever worked on ever upgraded their gems midway through a project, much less the version of Ruby. Developing procedures for this kind of ongoing maintenance is just way too much to ask.

This stuff tends to get done years after the original devs have all moved on. Maybe they tried that kind of thing back in the early days, before I started working with Ruby, definitely not today.

bandrami · on Jan 9, 2017

The time I'm thinking of was a team that wanted to switch from Rails 1.2 to 2.0 along with whichever interpreter bump was required to make that happen (IIRC 1.8.5 to 1.8.6, but this was a decade ago; I'm pretty sure 1.9 hadn't come out yet). Halfway through a project.

vinceguidry · on Jan 9, 2017

Unreal.

Yep, that sounds like that 'long time ago' I was talking about. Nowadays you can do that, no sysadmin to tell you not to, but nobody bothers.

falcolas · on Jan 9, 2017

DevOps here - yes, they do bother. Pinned set of dependencies, but one of them updates? Upgrade all of the dependencies. But don't worry, it's all in a Docker container (I have a completely separate rant about Docker's compatibility ignorant hummingbird).

Ironically enough, I think the current DevOps culture emerged partially because sysadmins got tired of saying no (if only so they could sleep through the night), so now they let developers tie their own nooses so they can be woken up at night.

It's wonderful to give up all of those software pages back to developers. And the developers do seem motivated to fix the bugs which wake them up at 3am, so it turns into a win all around. It's still hard to watch a new team come up to speed though, knowing how little sleep they will be getting over the next month because they made their new docker program stateful...

brianwawok · on Jan 9, 2017

I upgrade Python deps all the time. Java too. How often depends on the scope of the project.

SixSigma · on Jan 9, 2017

Ironically all-aboard-the-update-train was the actual reason I jumped off IIS back in the 90s when I got badly burned by updating Windows NT. Automatic database connection pooling for IIS was dropped and I started getting annoyed phone calls from clients who's websites were dying after updates.

One had to read MSDN every day to keep up with what might break on sites you had no control over.

paulddraper · on Jan 9, 2017

The problem is that sysadmins don't know everything.

> in the year 2015, pip went from version 1.5.6 to 8.1.1

The only releases in 2015 were 6.x and 7.x.

There were 8 documented backwards incompatibilities, 4 deprecated the previous year, and 3 documenting a couple bugs that were fixed several days after the 7.0.0 release.

These are the sorts of thing an aware Python developer will know.

bandrami · on Jan 9, 2017

You're right; it was the period from December 22nd, 2014 to March 17th, 2016, so about 15 months centered around 2015.

We may be counting regressions differently; I'm including both adding and removing the spinner as a regression, for instance (since both the addition and removal added unexpected behavior).

Note that the undeniable regressions that occurred in releases during those 15 months included:

1. Exceptions raised in any command on Windows

2. Switching from not installing standard libraries to installing them back to not installing them

3. Blocking if the particular server pypi.python.org was down

4. An infinite loop on filesystems that do not allow hard links

Note that in that time they also added yet another internal package management system (incompatible with the existing two), changed the versioning semantics twice, and dropped support for versions of python that were 3 years old at that point.

And, again, there's nothing particularly wrong with or bad about pip; this is just what a younger generation of developers are used to.