Hacker News new | past | comments | ask | show | jobs | submit login
The Lost Art of System Administration (matt-rickard.com)
114 points by tapanjk on Sept 10, 2022 | hide | past | favorite | 95 comments



Most of those things listed are just technical details that change over time. Before all the Linux trivia there was the Solaris trivia, AIX trivia, Novell, HP/UX, Windows NT, OS/2, VMS, AS/400. Alpha, Sparc, VAX, tape robots, IPX/SPX, Token ring, VT100 dumb terminals. Old heads have forgotten more technical crap than exists today. (I can barely remember how to set up a Cisco router, and I used to work there.. I do, however, remember to turn off spanning tree)

The technical stuff changes but Systems Administration remains the same practice. You figure out how to put together a complex system, how to maintain it, keep it highly available. Operations is the work of keeping the business afloat through IT, and Systems Administration will be a part of that as long as there are Systems to Administrate. The title will change, because apparently people are squeamish about being lumped into a group with people who aren't all cutting edge. But the art won't be lost. It lives in every woman who stays up til 3am to ensure maintenance goes as planned, and every man who tells the developers, "No, I'm not giving you root on all the servers, tell me what you want to do. You just want to copy a file? You don't need root for that. ...."


> No, I'm not giving you root on all the servers, tell me what you want to do. You just want to copy a file? You don't need root for that.

I'm a developer, so I'm very biased.

My interactions with system administrators were more along the lines of "I can't give you access to this system, you have to [come to my desk / hop on a zoom call] and tell me what to type into the shell." and me "Please type tail dash f, no, not slash, dash, the horizontal line thing" and a whole lot of back and forth like this, after which I thank them politely and they feel very accomplished while I'm left guessing why that entire department is necessary...


Anecdotally I've had the opposite experience. Majority of developers I've interacted with didn't understand how anything systems admin side worked. This isn't to say all developers are like that, but that you may be more of an outlier in recent times. Also your admin team sounds awful to work with.


Admittedly, that experience sucks. But there's a couple reasons they did that:

1. They could give you root access. But then you might start making other changes, and over time that makes everything unmaintainable. By them just typing in what you want this one time, they limit the scope of your access to just one change, they are aware of what changed, and can push back on it. It's basically a crappy change management process.

2. This person has a lot on their plate (a way larger backlog than the average dev) and doesn't have time to research how to do the thing you want them to do. They are asked to be experts at everything, but they can't be. So they ask you to show them exactly what to do. Gets rid of ambiguity, lets you both troubleshoot any unexpected problems in real time.

3. A lot of roles where one person/group has "power" over another, often takes it for granted and unnecessarily gate-keeps and adds red tape and blockers. DevOps is supposed to flip this situation on its head by giving you, the dev, the tools needed to do your job, without you making changes to the system that would affect system stability. This keeps the system stable while giving you agency. Unfortunately, doing all that is much more difficult than most people realize, so it rarely happens.

4. Like every role, sometimes there's idiots. I've met my fair share of both idiot devs and idiot sysadmins.


1. I don't want root on prod servers, sounds like a nightmare!

2. Yes, I'm sure this was the case.

3. Indeed!

4. I haven't! My original comment just took some situation out of context. I know nothing about what else the sysadmins were doing, they probably had other things on their plate about which I knew nothing...


A metaphor that just occurred to me.

When it comes to debugging running systems, nobody has it better than doctors. They knock the "system administrator" out and go right in with root access via scalpel to do what they know needs to be done.

SysAdmins can be a huge help in devops culture, but sometimes things go wrong and the surgeon needs to get in there.


In a properly designed and deployed systems, no one (including sysadmin) should have a need to run tail/ less/ grep or whatever.... Logs should be in what ever centralized logging system is in use.. Metrics should be in what ever centralized metrics system is in use.. Config and secrets update should be possible from CI/ CD/ build system, without a full compile + test + deploy pipeline. Need of anyone having access to production to be able to run tail (or anything of that nature) is a sign that system is not well designed, and compensated by... "Give me root access everywhere". Let's not forget user privilege separation came in for some serious reasons.


You're of course right.

But also, are you shipping all the logs? Of course you're shipping application logs somewhere searchable, what about gc logs, syslog, kernel logs, etc? Is it never necessary to run ad hoc commands on your production systems?


How are gc/ syslog/ kernel logs are special and not sent to central logging? Ad-hoc logs (such as thread dumps etc.) can also be triggered via safe application end points from restricted locations.

> "Is it never necessary to run ad hoc commands on your production systems?"

Ad-hoc commands in production AS ROOT should be exception, not norm, and reserved for on-call members, not every developer working on team.

Even the most competent people, sysadmin or developer, can make a mistake.

I did not say no commands should ever be run on production, there could be non-root read-only accounts. I am opposed to everyone having root on production. I have seen enough data loss incidents or full blown outages because of mistakes.


Well done you, I guess? Nothing to disagree with. I'm just amazed this can be actual reality :)


It takes a while to get there - and a lot of discipline of those involved.


In an ideal world, you wouldn‘t need that. But most systems never matured to be considered being in an ideal state / world.


As the other child comment said, are you shipping _all_ logs? For a K8s cluster of a few nodes, sure, you might send `/var/log/*` out, but that doesn't scale. If you have weird Kubernetes problems like mysteriously dying containers that don't trip 137 (hello, non-init processes [0]), you will need to dive into kernel messages. For this particular example, there is to my knowledge one poorly-maintained package [1] that fixes this, but it also requires at the very least some Linux capabilities that might raise eyebrows with security, and as written it requires privileged containers. Thankfully this particular example will hopefully be fixed with cgroupsv2, but still.

SaaS DB like RDS is another example. I've seen mysterious growth on a read replica, which was fixed by restarting it. Temporary intrinsic tables were not the cause, if you were wondering. No noted errors were logged in its error log it shipped out, and of course the general log wasn't enabled because it generally gets absurdly large. Had this been a self-hosted installation, there may well have been clues that would have been available only by logging into the host.

"Cattle, not pets" is an admirable goal, but sometimes you have to play veterinarian on a few of your herd to figure out what illness is spreading, and how to inoculate against it.

[0] https://github.com/kubernetes/kubernetes/issues/50632

[1] https://github.com/transferwise/oomie


Seems "not well deployed system" or "system not well understood by developer", what stops one reducing retention to few hours for node kernel logs or having read only access to certain parts, if at all there are so many kernel/ sys logs. I have never come across a system where application logs were not multiple orders of magnitude more in volume than syslogs. Not sure what RDS vs in-house DB example was meant to convey.

Ironically, these examples are only proving the point made in first comment.


I like the list of OS you give, but System Administration is not the configuration of a system but instead thousands, with a mind towards performance and allocation of resources, but also includes stressing systems to see what will break. What is being administered is actually people, not machines, and perhaps ironically, relationships become of fundamental importance. Generally, sysadmins are not software developers; they are technology generalists and experts at any operating system and every application, even those they have never used.


I am a developer. Give me a VM I have root access to and walk away. That’s all I need from you.


Right. And then we’ll see how secure your app is running after you follow a bunch of random blog posts telling you to make all files world-writable, or to disable SELinux, or just turn off the firewall. And then the inevitable “it works for me” when we try to push to production and it all breaks.


Just assume for a second that you are talking to an experienced competent developer here. I have never done any of the things you mention. And I have never worked with a developer who did any of the things you mention.

I have however worked with a narcissistic power hungry incompetent sys admin in the past. That doesn’t mean though that I automatically assume that all sys admins are the same. The one I am working with right now is highly competent and a pleasure to work with. And he is smart enough to simply give VM’s with root access to developers who ask for it.


This seems a bit generalized. Yes, infrastructure as code/etc. are becoming more prevalent, but underneath there are still systems running.

While a lot of jobs can abstract away these things, they are still there, and very real. If more people had an understanding of the underlying operating systems and file systems we could mitigate a lot of vulnerabilities, or find performance issues that may otherwise be obstructed, or a myriad of other things.

Hacker News is in a bit of a bubble in that the answer to everything seems to be "kubernetes" or "$newHotSolution". I think part of it is that a lot of developers haven't actually worked with machines. At one point in time there was a post here about how hard it is to set up a LAMP stack, a task you could hand a first year sysadmin and they should be able to figure it out. Abstraction and automation are nice, but the underlying concepts are still important.


Comparing kubernetes to a sysadmin manually provisioning a LAMP stack is like comparing a home kitchen to commercial factory. They can both make a pizza, but one can make an order of magnitude more - at same time.

They’re solving two different problems.


They are, but a lot of people think they need a commercial kitchen to make a bagel now a days, because they've only worked in commercial kitchens.


We’re in agreement there. Right tool for the job.


Developers deploying static website in 2022 using K8S be like: https://xkcd.com/1319/


I'm a dude who uses Kubernetes to make pizzas. Kubeternetes is absolutely a commercial bakery class machine, but much of it's adoption is due to the fact that for just a bit more in price and effort, you can have that class of machine in your home and run real things on it.

Seriously: I run clusters from a few dozen nodes (down from a few hundred at my peak, sigh) down to a trio of Raspberry Pis in my living room. They're overkill by a little bit, but not by much. And it's definitely my ambition to make the tooling even easier and even more powerful, such that every small home can run something with enterprise level reliability.


I'd say Kubernetes was more like having a lathe in your shed. Almost no-one needs one but it sure does make some projects a lot easier.


I just got one and not sure I need it. The lathe always seemed so dangerous to me.


Of topic but having a lathe in my shed is a definite life goal.


Wrong comparison, sorry. K8s is enterprise thing, while LAMP is good for SOHO. So cookies factory vs small bakery.


This is a poor analogy overall, but I think to would be better to think of it thusly...

K8 is where you don't own the kitchen, and you lease it when you wish to cook. You aren't aware of how to maintain the kitchen, or buy the ingredients you choose to cook with. In fact, you aren't even able to tell if a mango is ripe or not when shopping, because you don't shop and don't know how to.

That's what K8s are.

Meanwhile, SysAdmins know how to maintain, manage, run the kitchen... as well as cooking the meal. SysAdmin knowledge scope is greater than K8 knowledge scope for this reason.

What AWS, what docker, what K8s have done, is outsource specific realms of knowledge and skills, so people don't have to "deal with that". But if one is outsourcing knowledge and skills, one cannot claim that this makes the work more sophisticated.


A couple of examples I've found:

Celery spawns n workers, defaulting to the number of logical processors available. As anyone familiar with cgroups can tell you, this is fraught with problems when containerized, since nearly all mechanisms to detect processor count (or available memory) lead back to `/proc`, which will dutifully report the host's information out to the container. This leads to questions like, "I requested 4 vCPUs; why do I have 4+n threads?"

ORM in general. The worst example I've seen was querying a massive table, and to get n results from the end, was using OFFSET + LIMIT. It was also deliberately not using the PKEY, leading to a full table scan every time the query ran. If you aren't familiar with DBs, it may seem perfectly reasonable that querying a ~100 million row DB would take a long time, when in fact it could and should be extremely fast with a properly written query.


You do realize some of the internet's biggest sites run on the lamp stack, right?


With Apache httpd inside? They don’t have devops at all?


not really. its a spectrum.

the only real difference is the sense of smugness when it works. In the old days deploying LAMP was a sense of achievement. save for patching, there wasn't much more work to do.

Kubernetes is basically the same level of effort, but the upkeep is a bit more.

Also the networking is batshit, and so is the aversion to swap.


Doesn't seem to me that there's a bubble here where the answer is always Kubernetes because everytime that topic pops up there are a lot of posts like yours.


Another part of the problem is that developers are often discouraged or outright not allowed to work with machines, due to "it's not your job" kind of arguments, or corporate security enforced by auditors.


You don't need corporate security or auditors; even in a small web-shop, administering production servers is a power reserved for wizards. Sure, you can administer your own developer workstation; perhaps you get a Linux VM to yourself, that you can tinker with.

It would be nuts to let every developer tinker with the production server, or the source-code repo, or the fileserver. The private VM gives them a playground where they can learn to play at sysadmin, if they want; real sysadmin, I would contend, is taking the burden of responsibility when shit happens. Nobody cares much about wizards, until shit happens; the wizard then becomes an essential scapegoat. The newsgroup alt.sysadmin.recovery wasn't created for nothing.


> It would be nuts to let every developer tinker with the production server, or the source-code repo, or the fileserver.

I think that it might be a good idea to have most of the configuration for servers be based on Git repos, with something like Ansible or another such solution (Chef, Puppet, Salt), so that server configuration wouldn't be treated that differently from versioned code (which also serves as a historical record).

Don't give developers access to just push changes for production servers, but absolutely let them create merge/pull requests with changes to be applied once proper review is done: ideally with a description of what these changes accomplish, references to other merge/pull requests for development/test/staging environments where they were tested beforehand (perhaps after prior tests against local VMs) and a link back to whatever issue management system is used.

Then, have an escape hatch that can be used by whoever is actually responsible for keeping the servers up and running, in cases an update goes bad or other manual changes are absolutely necessary, but other than that generally disallow anyone to access the server directly for whatever reason (or at least remove any kinds of write permissions).

Personally, I'd also argue that containers should be used to have a clear separation between infrastructure and business applications, but that's mostly the approach to use when dealing with web development or similar domains. Regardless, I find that it's one of the more sane approaches nowadays, the Ops people can primarily worry about the servers, updates, security, whereas the developers can worry about the applications and updates/security for those.


I grew up with spectrums and ms-dos with a turbo and a reset button.

I feel sad for my kids who will grow up with iPads and Windows 14 PCs with not the slightest clue on how things are running and the OS shielding them from crashing the thing or running viruses for fun.

Same things for young lads that come out of university today with their degrees. They're well spoken and can recite algo theory like nothing but still have no clue what's underneath and why it works.

Have we reached that level of the civilisation that goes extinct, for forgetting how its own inner machines work?


> Have we reached that level of the civilisation that goes extinct, for forgetting how its own inner machines work?

Probably. Almost nobody under the age of 40 works in the "deep weeds" anymore (firmware, kernel, casually "Ring 0 and below" spaces). A decade ago, it was about 30. I'm the last wave, it seems, of people in these spaces, and that's quite terrifying.

But I get it. The hardware is maddeningly complex, rapidly getting more complex to solve the problems created by the last complexity, and the whole stack is cracking and crumbling around the edges (the uarch vulns and Intel pulling SGX from consumer chips are a reflection of this). Another decade, this entire low level group will have retired to something far away from computers, and... we'll see what happens. Nothing good, I fear.


> Almost nobody under the age of 40 works in the "deep weeds" anymore (firmware, kernel, casually "Ring 0 and below" spaces).

On PCs, maybe, but "kids these days" use microcontrollers like I used meccano parts.


In my experience, your point is mostly true in academia and business settings. The "foundation" layer of CS only brings a negligible number of jobs.

But with the global availability and affordability of computing devices plus the global knowledge base of the web curious geeks never had so much opportunities to learn and practice.

Here is just a link which i hope will bring some joy to your day : https://steamcommunity.com/sharedfiles/filedetails/?id=28422...

Other sources of hope are retro computing communities, emulation devs, FPGA scene (Mister) for hardware side, assembly devs / optimization communities for software side.

Hardware changes slowly make current system design choices obsolete. Persistent RAM, TBs of RAM will require low level OS redesign sooner than expected.


> Almost nobody under the age of 40 works in the "deep weeds" anymore (firmware, kernel, casually "Ring 0 and below" spaces)

A lot of the best people I know in that field (low level stuff) are younger than me.


I often wonder the same. I grew up at a time, where I had to learn assembly on the C64, to get my machine to handle > 300 baud with uploads/downloads, on a BBS I wrote.

On the hardware side, it was very easy to learn how to fix said machines, with schematics provided in the back of the user manual. This level of understanding is not as easily possible today, and my understanding grew outward from that.

I think two things, the need to understand low level to work with hardware competently in the past, and that it was fun to work with hardware in the past, have led to a theoretical golden age of people who understood it all.

I have deep understanding of baremetal, right through to frontend. Part of that comes from working with computing as it changed and morphed.

People today are graduating, entering the field, but are starting at a higher level.

One other thing to bear in mind. When I entered this field, it wasn't to make money. Salaries were abysmal, and on top of that, the field was seen as "boring" and "stodgy" to a great many. And there were often even poor employment prospects!

This meant that people who worked in computing prior to 2000 or so, did so because of joy. They didn't think "Oh, I can make a good living here, I choose this job". In fact, people would tell you not to work in computing, who on Earth would hire you!

So people from this era often were in this field, because they were drawn to it not for income, or job security, but instead because it was more important to work with the fun and joy of computing.

While there is absolutely nothing wrong with entering a field to make a living, or for job security, the motivations are different.

An example here ; using a computer is different than building computing. Enjoying using a web browser, enjoying interacting online, that is not "enjoying computing". Yet most people I speak of, were building circuit boards, coding in basic, when kids.

How many CS graduates were coding apps, building webpages, for 100% fun, without any encouragement when 5?

So I think the same capability is there, it is just that most people today are not interested in computing. They just happen to make a living there, and so those enjoying computing as a primary are awash in a sea of those who couldn't really care less.


Actually this was the much of the reason for the creation of the Raspberry Pi.

Its creator, Eben Upton was a director of studies at Cambridge university and realised that there were fewer CS applicants than in the past and that many of those that did apply knew far much less about computers and programming through there own "tinkering" than previously.

He saw that this was due to modern computers being far more opaque than the systems he had used as a kid (like the BBC Micro). So he decided to build something to address that issue that was open enough to understand the workings and cheap enough that people wouldn't be discouraged from tinkering due to the risk of breaking it.

It seems to have worked quite well...


As someone said, kids this day build computers with arduinos and stm32.


Do we work in the same field? I'm 32 and am the oldest person on my team of 10. We work on drivers and other low level software. Adjacent teams have similar mixes. At previous roles I saw similarly young demographics in my coworkers. I might agree with the premise that the overall number of engineers down in the weeds has increased at a snails pace compared to the rest of the industry which may give the impression folks are older by comparison.


The iPad is an appliance.

If you’re looking for buttons, something to hack on and break, try: https://www.raspberrypi.com/products/raspberry-pi-4-model-b/

Personally I think it’s never been a better time to be a hacker. $5 PCBs from free and good CAD software and one can even make their own ASICs now. I wish I was a kid today!


HG Wells appears to have been one of the more prescient science fiction authors. In The Time Machine the Morlocks maintained the underground machines that fed and clothed the beautiful Eloi who lived above ground (and periodically they would come up to the surface to eat the Eloi).

Today the iPhone is one of the biggest status symbols the beautiful people covet, the most important thing in their lives (their Instagram account) depends on it. Yet none of them have a clue how it works. Like the Morlocks, hackers kind of understand how it works and can maintain it to a degree but can't really build a new one. We may be only one supply chain collapse and food shortage away from HG Wells' fiction becoming reality... :)


> Have we reached that level of the civilisation that goes extinct, for forgetting how its own inner machines work?

This thought was expressed in Isaac Asimov's short story "The Last Question" [PDF]: https://physics.princeton.edu/ph115/LQ.pdf

> Alexander Adell and Bertram Lupov were two of the faithful attendants of Multivac. As well as any human beings could, they knew what lay behind the cold, clicking, flashing face -- miles and miles of face -- of that giant computer. They had at least a vague notion of the general plan of relays and circuits that had long since grown past the point where any single human could possibly have a firm grasp of the whole. Multivac was self-adjusting and self-correcting. It had to be, for nothing human could adjust and correct it quickly enough or even adequately enough. So Adell and Lupov attended the monstrous giant only lightly and superficially, yet as well as any men could. They fed it data, adjusted questions to its needs and translated the answers that were issued. Certainly they, and all others like them, were fully entitled to share in the glory that was Multivac's

I wonder whether they'll have to deal with CrashLoopBackoff errors in 2061.


I grew up with Unix 1 and a soldering iron.

I feel sad for my kids who will grow up with ms-dos and high-level languages and not have a clue how things really work.

Whither civilization?


Kids these days and their electricity. They'll never learn to be a good barrel cooper.


"Have we reached that level of the civilisation that goes extinct, for forgetting how its own inner machines work?"

extending this logic. modt people in the wedt can't go food, connect plumbing or fix a broken chair. only 10% of the people will find correct ancor for drywall or fix a flat on a bycicle


I am 100% sure we can’t reboot civilisation. All the easily accessible resources needed to bootstrap has been depleted years ago.


I consider 3 books essential reading for system administration. "UNIX and Linux System Administration Handbook" by Nemeth & Snyder, "Essential System Administration" by Aeleen Frisch and "TCP/IP Guide" by Kozierok. Yes, they're quite old but the expertise of these authors can't really be dated. Craig Hunt's "TCP/IP Network Administration" (2002) is also still worth reading. As The Cloud becomes the only system new devs come into contact with genuine system administrators are likely to decline in numbers but may become that much more valuable as specialists.


Isn't modern system administration very much still a thing, just with a different name: devops? I worked with a really bright (sysadmin/devops) guy that built out a FOSS configuration management service using saltstack and zabbix; a near real-time visualized platform of critical assets that also afforded a great deal of control, all from a distance.

I do very much agree that use of command line is declining, though; in general, technology - both its physical form and digital interfaces - has become more closed to the user. Really hope that changes.


There are other titles as well like SRE or cloud admin/architect


sysadmin used to be an actual title of a specific person responsible for managing the company's systems.

These days it's devops, but now there are no specific devops people and it's yet another task unrelated to engineering regularly thrown onto engineering's plate.


DevOps = Development & Operations... Used to be about one person/team practicing both. However specialization is a thing, so there are "DevOps Engineers" who focus solely on getting the container management and CI&CD pipelines set up; and not writing actual application code. I don't see how its unrelated to engineering, since its all software that needs to be written/configured/understood.


I recall a push of "sysadmins are just glorified janitors, we don't need them" around 2005? or thereabouts. Since then it seems there's been a lot of work put into dealing with broken, chaotic systems and how to use them anyway.


Reminds of the IBM story.

They had an IT department. Things were smooth. Nothing ever went bad.

So they sacked 50% of them thinking they do nothing but hang around... and everything went to hell.

Someone can probably point to or correlate that story better.

But essentially the human nature of business is, if something runs smooth, it needs fixing.


This seems like a very interesting story if true.

Anyone have a link and/or know more?


Somebody needs to choose version of k8s, underlying Linux kernel, system options for this kernel, etc. It is still THE THING, if you try to earn some money and don't have VC money to burn before Google buy you.

For example, in place where I work (as developer) operations found good combination of kernel, driver version, NIC chip revision and sysctls, which allows us to use 20% less hardware for our typical load (or allow us to serve 20% more clients on existing hardware without buying new one), and it is substantial for us.

Yes, I, as developer, don't need to be system administrator, for me it is containers and CI and automatic deployment, infra as a code, and all this stuff.

But somebody NEEDS to build & support this infra. And change faulted SSDs.


Ironically, the one computer where I still have to maintain multi process discipline, with a boot and init system, hardware management, multiple users, and low level filesystem stuff is… my desktop PC!

We used to think “the year of the Linux desktop” would be the extension of the Linux ecosystem to include both servers and desktops. Now that Linux personal computers are finally mainstream — at least for people like us — will it instead turn out to be an exodus from the former to the latter?


> Learning how to effectively use the command line. Unix philosophy. Pipes, scripting, and tooling.

This is a complaint I used to hear about young software developers even a few decades ago. It's the same today, but those accused then are now complaining the same for the current crop of developers. I guess it is a skill that people pick up late and slowly, but they eventually do it nonetheless.


You mean the skill of complaining?


Definitely not. If you haven’t picked up the skill of complaining by your second programming task (be it shell or anything else), your humanity is in question.

Honing that skills can be slow.


I started as the System Administrator for a small service company in 1997, at that time it was 40 hours of work per week just keeping things running, and users happy. Over the years, things got more and more reliable, until eventually I just would show up, wait for things to break, then go home. At that point, they outsourced IT, a wise decision.

Everything is much more stable than it used to be. It might seem not to be the case, but I assure you, it is. Once something works, it tends to keep working.


It's not lost, it just evolved.

I've been a Linux sysadmin from the late 90s through to the mid/late 2010s, now I'm doing DevOps. My sysadmin-spidey-sense has helped me to solve so many issues, and my foundation of knowledge in systems/networking/etc helps me design better code and infrastructure.


Eh, maybe in some spaces. As a system software developer (computers embedded in the product), I spent today inspecting the environment variables and command line args of running processes.


This is pretty much why we we call Arch Linux the "Glorious Arch". It's not only a daily machine or a server. It's Art!


It would be nice if there was a "more stable" version rather than "newest all the time no matter what".

I switched from Arch to Ubuntu because they shipped a broken avr-gcc despite upstream GCC saying "roll it all back, guys", because it was the newest version. Doesn't matter that it produced deeply faulty output, newest is best!


"The network is down, why are we paying the sys admins?"

"The network is running perfectly, why are we paying the sys admins?"


Working with diverse and complex infrastructures the system that you administer becomes less the individual machine operating system and more the whole interconnected companywide system. You still have to go down one particular server and tune, configure, orchestrate or whatever, but the big vision should be present.


> Init systems and daemons have been replaced by single-process containers. General-purpose operating systems have been replaced with small (e.g., alpine) or even smaller ones (e.g., microkernels).

Valves have been replaced by transistors, transistors have been replaced by integrated circuits.


Isn't it knowing kubernetes now? Which does kinda tend to require you to also know a fair bit of "single machine" admin tools along with a whole slew of new complexity.


I have built a lot of stuff on kubernetes. Outside of certain edge cases I've had to use almost none of the actual system administration knowledge I have. The vast majority of things are relatively simple, or can be handled via a nice UI provided by <cloud_provider_here>.

In fact, I'd argue that kubernetes was really made to remove sysadmin from being needed. For large kubes obviously some skill is needed but in the average case a developer can do 99.9% of all the work needed. It was a huge boon for businesses that would otherwise have to pay both developers and sysadmins. Now they can pay one of them half the value!

The majority of other sysadmin tasks have been offloaded to cloud providers. For example, AWS makes setting up a network topology/compute/etc very easy. Sometimes these tasks are handled by SRE, a lot of times they are handled again by developers. Terraform really brought everything together in an absolutely brainless mechanism to get things started (with the subsequent pitfalls).


As someone who gravitated to sysadmin side (it was easier to get work than AI, especially if you grew banging together components to get a working desktop since 1990s)...

k8s is a great lever for sysadmin. It's the kind of tool we spoke of in fevered tones when we shared legends of "automating our job so we would only need to pick they paycheck". Does it mean there's less need for absolute Ops-side numbers? Sure. Did we have the numbers before? hahaha, fat chance, nope. So in my experience, moving onto k8s often means my job involves less understaffing now, less panic, and less cursing. Most importantly, less reasons to visit Scary Devil Monastery (alt.sysadmin.recovery).

And if the low amount of work starts to get depressing... you can always take another job[1] or start engineering the shit out of the infrastructure. Bonus points if you make it without making it problem for future you (so documentation and the like) and without raising costs.

[1] A polish joke about sysadmins: "Hey, Joe lost a job, you heard?" "How can he deal with having only 5 full time jobs?"


> I'd argue that kubernetes was really made to remove sysadmin from being needed. For large kubes obviously some skill is needed but in the average case a developer can do 99.9% of all the work needed. It was a huge boon for businesses that would otherwise have to pay both developers and sysadmins. Now they can pay one of them half the value!

> The majority of other sysadmin tasks have been offloaded to cloud providers

Yes. There is a word for this process: deskilling — https://en.wikipedia.org/wiki/Deskilling

It is typically followed or accompanied by a loss of autonomy and/or other privileges in the workplace


> a developer can do 99.9% of all the work needed

I've yet to meet a dev who knows how or wants to create and maintain the IaC to manage the infra.

> Terraform really brought everything together in an absolutely brainless mechanism

Understanding Terraform's limitations and being able to work around it in a way that is both readable and scalable is most definitely not brainless. Sure, I can write some .tf files to spin up an EKS cluster pretty easily. Load balancer? I can also do that in Terraform, or I can let the cluster manage its own. Uh-oh, a fork. Which one is better? What if I want to have multiple AWS accounts; how do I properly delegate IAM permissions across them? Etc.


In my experience once terraform moves beyond "brainless script" it becomes a massive footgun. Completely unexpected, unintuitive behavior can happen that destroys an entire cluster. You're often better at that point legitimately paying for an SRE team to manage things but "move fast and break things" tends to prefer detonating a cluster over paying professionals.


> Now they can pay one of them half the value!

They’re paying the other half of the value to the cloud provider, more or less. Ideally, perhaps a bit less. But that’s how business works. The cloud provider is operating these systems much better than 99+% of companies could hope to do.


All those sysadmin skills are needed to create the "serverless" runtime platform to run everything on.


Like everything that can be run on a computer, it is eventually abstracted. Once it is abstracted it can become commodified. Once it is commodified it is effectively irrelevant how it is run on the computer!


Absolutely! I tend to be the one deploying the physical architecture so it still feels very real haha.


Storage management is a real point here. When everything is a blob in a cloud, who really cares about the underlying details?


Storage and networking, which IMO were the hardest parts of traditional sysadmining anyway


Owners of the cloud (and, by transition, "your" data)?

And you should too, BTW. Or one day you learn, that cloud you choose to use don't have BOTH backups and redundancy, for example. Learn in a bad way.

Or, maybe, you'll learn, that you need 10x more performance now (more throughput, less latency), and this cloud can not provide it at any cost.


You're absolutely correct.

I was just moving an EBS volume between regions and doing the partitioning and thinking "wow, you just don't do this much anymore".


The fact that linux systems need adminstration is one of the reasons why they suck.

Imagine after buying a macbook that you also need to hire a sysadmin to keep it running smoothly? The idea is laughable.

> Learning how to effectively use the command line. Unix philosophy. Pipes, scripting, and tooling. So much of programming is stitching things together.

This is really bad. Much of the complexity in contemporary software architecture arises from this mentality.

"Do things by stitching programs together" only works for relatively simple tasks. For everything else you need to program the thing properly.

One of the worst ways this manifests is in how websites are built: use a combination of a scripting language, a database server, an http server, a memory cache server, etc.

I'm really grateful we have things like Go and SQLite that make it possible to create fully functional websites as self-contained programs instead of the hot mess that everything else uses.


Windows needs sysadmins too, and Mac does as well. There are group policies and all sorts of other tools out there for managing Macs. Apple even has documentation on this.

Someone who is running a newer Linux desktop distro is not going to need sysadmin experience to web browse and read email. The same way someone using Windows or a Mac for a desktop distro isn't going to need sysadmin experience, either.

As for your websites, no one is going to build a simple website that needs a dedicated database server, a "memory cache server", or whatever. Those issues become relevant with scale.


>The fact that linux systems need adminstration is one of the reasons why they suck.

Do you buy and string together several MacBooks to run your business's custom built ERP system 24/7/365?

There isn't a vast ecosystem of open source solutions to enterprise problems because that's not what macOS is, or attempts to be.

I can't tell if this is a serious response or not, this comparison seems pretty ridiculous.


I realize you're being serious-but-funny, but I actually had to have that conversation with a development lead at a previous employer.


> "Do things by stitching programs together" only works for relatively simple tasks. For everything else you need to program the thing properly.

What do you think functions are?

> I'm really grateful we have things like Go and SQLite that make it possible to create fully functional websites as self-contained programs instead of the hot mess that everything else uses.

Yes, because as someone who will be called to troubleshoot it, I definitely want to trudge through whatever logging you may or may not have implemented in this monolith to solve a cache miss or 5xx error. /s


Just out of interest, I wonder what the largest and "most important" thing you've worked on is?


Good luck if you ever get security audited.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: