Hacker News new | comments | show | ask | jobs | submit login
Don't tug on that, you never know what it might be attached to (plover.com)
288 points by JoshTriplett on July 1, 2016 | hide | past | web | favorite | 96 comments

As a sysadmin on a Windows network of ~100 computers, this story makes me want to cry, although maybe for the wrong reason:

I see weird problems of the sort "It did work before I went on my lunch break" on a fairly regular basis. How often would I like to go down the rabbit hole and explore these problems in such depth, but if I did that, I would hardly get any work done. The frequency at which our users run into these problems is just too high.

It is so frustrating so restart a computer or maybe re-install it and see the problem disappear, because now all hope to understand what caused the problem in the first place is gone. And problems that disappear for no good reason have a tendency to return for no good reason, most often on a Friday afternoon, just when you're about to call it a day. ;-/

This is what I hate about closed source software in general. Every sysadmin out there has complaints about how terrible support is from XYZ vendor, but it's not just that the vendor can't provide support, it's that by going closed source, the vendor is the only one who can really track down these sort of bugs. With an open source stack, no matter what breaks in any software component, I'm not left high and dry hoping that some vendor who charges buckets of cash per incident is competent. As a sysadmin, I'm putting my neck on the line anytime something breaks. I need to be able to fix things when the vendor fails.

As a sysadmin, I'm putting my neck on the line anytime something breaks.

In many years of doing this type of work, I've noticed this fear of your neck being on the line is a myth. Heads rarely roll merely because something goes wrong.

I've also noticed almost every sysadmin believes that myth, and it can make us very difficult to relate to or even work with. Ironically, that's a problem that may actually cause you to lose your job.

Back when I used to lead the engineering group for a company I was asked to also lead the IT and Ops teams. I was amazed at how many of the sys admins lived in fear that anything going wrong or taking a risk on something new and it not working as expected would get them fired. Virtually no one wanted to take any risks because of this fear. It took a long time of working with the team and demonstrating that people did not get fired when things went wrong. It was a real eye opener for me.

Issues like the one in the article are orthogonal to the openness of software. Quite often, the problem lies in a set of settings that each have perfectly legitimate reason to be what they are that together interact in a bad way, even though every component is working as it should.

Similar to composability problems in cryptography. Naively combining primitives that individually hold certain security guarantees will most certainly undo all of the security, instead of "stacking" these individual guarantees.

Usually you can trace the insecurity to mismatched assumptions, unhandled edge cases or a failure to consider global contexts (like where you accidentally turn one function into a decryption oracle for another).

This is why every web developer needs to learn to read C and C++, and how to navigate a codebase written in either of those, even if they never write any programs in them. Most of the software we use is built with those two languages. If you can't read them, you can't track down what's happening behind the curtain.

I think that's a little optimistic and to be honest, unrealistic.

A problem shared is a problem halved. In this case every web developer needs to have a network of people to collaborate with in order to diagnose the problem.

A good vendor will work with you to resolve the problem, not all closed source software is bad.

I posit that all software has bugs. If this is true, closed source vendor software allows you no insight into the inner workings of the code; you are always at the mercy of the vendor.

Open-Source software has bugs, but it gives you the ability to look into the issues directly; allowing you to figure it out if you want.

The cool thing in open-source is you can always absolve yourself of responsibility and hire a vendor or maintainer or others to help fix the problem if there is one. Which means you have N potential solutions to a problem, compared to just 1 with a closed source vendor.

I'm familiar with that theory, but from my personal experience I'd put the number of good-vendor software packages around 1%. And in retrospect, I'm not sure those cases were so much "good vendor" as "vendor over whom we had significant financial leverage".

From my personal experience I've only gotten consistently good support from Cloudera and Jetbrains, out of some 20-odd companies that I've contacted for paid support. So, for me the figure is more like 10% instead of 1%.

I am in a similar boat.

With DSC, I am increasingly excited with the idea end user computers can also shift to immutable, or approaching what we call immutable, infrastructure that has logically valid weight in the communities represented here.

The problem? Culture. So many people do not understand when I say the following things:

- Do not install with the GUI, please use the deployment system to document unattended installation.

- You should not have to log into a computer and manually install and customize things, especially if you forget to document.

- You should not be doing things, install, configuration change, with following this AND updating the team in your notes, so when I ask you, you in fact remember.

These things are the product of less fires, as I see it, and why I too have to re-image everything (although is largely out of concern for any malware, especially with the majority of our userbase).

Honestly, this is what got me most excited about NixOS. I mean, the underlying technology is spiffy, of course, but the real key is that it lets you configure everything declaratively while keeping track of different versions and actively pushes you in that direction. Doing things the "right" way is also the path of least resistance.

Sometimes this is a bit annoying when you just want something to work but can't just hack it because the system files are hidden all over the place and papered over with symlinks, but in the long run it means your system configuration is way easier to maintain.

I am very excited for NixOS, for me. In the professional IT sphere, outside of DevOps, this is so far away it is depressing. I spend my weekends with QubesOS, NixOS, and maybe in the future GuixSD and SubgraphOS as my therapy.

I have started to look into a career transition into DevOps (please do not laugh, feel free to downvote) not because of how cool it looks or the increasing culture around it, but I need a break from the mainstream of throw everything and anything on the wall until it sticks without rigor, or your time is cheap automation is a very, very low priority since we pay you to do the tasks we dislike as more serious engineers (who also build some systems with checklists).

I read about DSC a while ago, and it sounds very appealing.

So far, though, I have not managed to actually get to know it personally, so to speak.

We have over the past two years tried to move as much of the configuration as possible to GPOs, although they bring their own share of problems. On Unix-like systems, one can use log files to track down problems most of the time, on Windows it seems like logging is kind of an afterthought. I especially hate it when gpresult says it did apply a certain setting when upon inspecting the system the setting clearly has not been applied (or overriden by something else? Who knows?).

It is frustrating because GPOs seem like such a great idea in theory.

I have just read about it and the WMF (WMI Management Foundation) docs, whitepapers, and framework. I have not played much.

GPO, and GPP (the preferences), are a nightmare.

- The gpresult utility and rsop.msc tell you changes, but that does not mean much, because

- GPO is async and requires sometimes a shutdown, not just a reboot (fom 5+ years of experience), so good luck dropping this crap on a dime; God help you

- The Registry.pol files are not easily auditable or usable outside gpresult and rsop.msc

- If you hit slow links, processing will be disabled; this is not slow all the time, but when 2% of the computers booting at 6:54AM on Tuesday it does not get applied

- A whole bunch of other stuff I forget in this rant

But I totally agree with you, such a pain. I have not had a lot of time for Salt, Puppet, and Chef. Whether it is in parallel or thanks to DSC (I saw some Powershell in one repo ... Ansible?) those tools are also a reality.

I use an expensive SCCM alternative, but I am seriously considering proposing moving to one of the Chef/Puppet/Anisble/Salt stacks with SSH becoming a reality on Windows.

All I can say is thank God.

One of the biggest cultural changes I had when I jumped the fence from Windows to Linux was the reboot. On Windows, the culture was strongly "reboot first, if it still happens, then it's a problem". On linux, it was a measure of last resort, because then you can't fix the problem (as you say).

Obviously the latter is the best way, but it's interesting that the culture of the two systems is so different, no doubt borne from Windows' legacy era.

Well, the other difference is that reboot isn't often needed to solve the issue, unless you're messing around with kernel modules.

This may well be changing though, as more and more "desktop-isms" claw their way deeper into the stack.

There are windows sysadmins who schedule nightly server reboots "to keep the machines healthy". I've seen them in action. It is scary. The scariest part? It helps.

Except you have to triage. I'll happily reboot any box which fails unusually because my overall system should be HA enough to survive that. It's only interesting if the failure count is high, and I'm running reasonably up to date code. Otherwise there's too many things in a day to get through.

I'm with Vacri; I'm very reluctant to do that. Small problems can be harbingers of big ones. Even when they aren't, small problems often confound the ability to solve big ones.

Most concerning for me, though, is what safety experts call "normalization of deviance". It's the process by which people become accustomed to small failures, which creates opportunities for big failures to happen. A big example is the Challenger disaster. [1]

I see shops with low bug rates, where people think a lot about quality. And I see shops that, thanks to high bug rates, are too busy fighting fires to ever spend much time on quality. I never see any place in between. And I think normalization of deviance is why.

[1] http://mikemullane.com/stopping-normalization-of-deviance/

This is where a good incident/problem management tracking system comes in handy. Sure, you can't chase down all the oddness happening on Windows, but there is nothing from stopping you from having the incidents logged. What you do is use the incident you clone the incident to an open problem record (regardless of whether you have a workaround or not) with all the details of what you saw and everything you did. Then keep it open till you determine the root cause of the problem.

When other incidents are logged, if you have defined the problem well enough then you search for a problem record that matches the incident symptoms and link it to the problem record. The problem record also holds the workaround that use used to get the end user up and running so you can use this if the issue is super critical.

If you find that you've linked a certain number of incidents to the problem, then you know you it's actually worthwhile doing root cause analysis and spend the time figuring out what is causing the error, so you go down the rabbit hole - and you can justify the time to do so.

When you figure out the root cause, if it's a simple resolution that doesn't require a major change to the environment then you may not have to do much to prevent the issue in future - sort of depends on the complexity/needs of your environment and organization. But regardless you raise a known error record and link the problem to this. In the known error record you document the problem and as many symptoms as possible (some people list workarounds here, others list the workarounds in the problem record, other prefer to keep workaround info strictly in incident records), the root cause and how you resolved the issue fully.

Regardless, mainly from the known issue record if you find you need to make a scheduled change that may impact environments then you lodge a change request through the CAB processes you have in place.

Normally I find for server and network infrastructure the change just requires coordination with teams who use the infrastructure, which if you've setup your CMDB properly you can work out by backtracking the infrastructure configuration items to linked services. I've found that if you have defined your service catalog properly then you will have defined your operational services and linked these to business services that are mostly customer facing. This helps impact analysis and finding the correct window in which to make the change.

For things like fixing application bugs, I have found that it's still worthwhile raising a change request, then have that change moved into the development fix process with all appropriate testing, etc - normally this then links into a wider release management process which may actually require a new overarching change management request as other fixes are part of the change - sometimes you need to review the impact of how deploying the release might impact the environment in unexpected ways.

I that's basically a big chunk of ITIL, and I found that if it's done correctly and busy-work is reduced (mainly by asking for too much info), when an appropriate setup of the service layer is made and the CMDB has been mapped well, then it actually can help medium to large organizations pretty effectively. The key is to define a catalog of services across the business, without this it's hard to know the impact of incidents, bugs and any changes you may want to make in your environment.

You can start small though. Just go with broad categories like "printer", "CAD/CAM", etc and log reported/solved times and some text on both.

It's been a while ago since I worked with a Windows network, 15-18 years or so, but graphs from that was enough to prove investing in multi-purpose on-site free support network printers was a good idea. Support logs dropped by about 20% if I remember correctly (lots of crappy inkjets) and it saved the company some money in printer repairs and not buying ink cartridges and toners all over the place.

After that budget talks and time for in-depth problem solving became easier.

We ended up in something ITIL like naturally. We just started scripting solutions naturally and shared them between each other. Some of those scripts ended up being pushed to clients so traveling sales people could remap network drives and other simple things. Then we wrote a GUI for them - because clicking is easier apparently. That didn't work properly but proved the case for remote control/monitoring/inventory software (well, control really but it was IT buying the software so..)

It probably did help a little that I wrote in C and my co-worker at the time thinks x86 assembly is self documenting.

Now days I develop and use Linux for basically everything except gaming. Friday horrors persists though. This week it was trying to find a solution to a problem in others code that include sql triggers, framework triggers, various code components and quite a few custom sql tables/relations that I haven't worked with before.

The problem with "newish" features like capabilities, file attributes, SElinux is they haven't been integrated into the traditional nix utilities and almost nobody knows what is going on with them. A few examples of the poor integration:

File attributes override the nix permissions such that you can set the immutable flag on a file and even root can't modify it. `chattr +i FILENAME && rm FILENAME`

On distros that use capabilities copying a file doesn't copy capabilities by default. ie copy the ping program and it wont work unless you're root.

When SElinux blocks an action the error message is almost always wrong. ie A program tries to make a TCP connection which it doesn't have permission for. Instead of an error message like "SElinux violation" you get an error like "No route to host". To debug you need to look at the SElinux audit.log and try to match up timestamps of violations to when your program died.

Chris Siebenmann has been arguing for a long time that SELinux should have it's own error code. (https://utcc.utoronto.ca/~cks/space/blog/linux/SELinuxSecuri...) It seems like a good idea to make it more user friendly.

nix formatting suggests that the asterisk in nix needs to be escaped. Apparently on HN that requires putting a space after the * though, so you end up with * nix.

Apparently HN's markdown implementation is supposed to leave the * alone as long as here is not another the other end. But there seems to be no upper limit to where that end may be.

Also, it seems to only check if the * is near something else, not if it is before or after. Nor if the after is after a before (if that made any sense at all).

> Apparently HN's markdown implementation is supposed to

What made you think it's Markdown implementation? It's pure text, with paragraphs delimited with empty lines, code blocks being prefixed by space (or two, I never remember) and emphasis being marked by asterisks. There's nothing more.

Great debugging, and an example of the sort of behaviour that long dependency chains can expose.

Or rather, I did until this week, when it suddenly stopped working.

When this happens to me, the first thing that I ask myself is "what changed?", and I'm usually able to track down the cause to some configuration change. Incidentally, this is also why I never like modifying a working system unless it's absolutely necessary.

The fact that it ultimately was caused by some security features that would be very important for a multiuser shared server but nearly irrelevant for the (presumably) single-user local machine that he is using suggests that perhaps we shouldn't be thinking of "one size fits all" paradigm for OSs, since a lot of problems like this one stem from the unnecessary extra complexity introduced by such thinking.

> Incidentally, this is also why I never like modifying a working system unless it's absolutely necessary.

It's why I find auto-updating apps so infuriating. The trend is that every app, OS, and driver insists on being self-updating. It's going to be very difficult to maintain a reliable system if you're doing anything complex.

Privacy issues aside, that's another reason I never plan to use the continuously self-updating Windows 10.

On the other hand, chasing after CVEs is also infuriating, but in the opposite direction. auto-update makes it possible to live in an environment where security issues are found by the bucket load every day.

There really isn't a good answer either way, but between "breaks occasionally" and "needs a full-time admin, but updates are vetted", I prefer option 1 for my private systems, and option 2 for things that run in production.

This is where supported distributions come in. In Debian or Red Hat you can practically assume that the OS update will not break anything. (There are occasions where it does break something, but I don't remember anything major in a good few years.)

I hate apps that do 1 job and do it extremely well and it insists I update just so I am on the latest version. It works now stop bugging me. I don't care if you changed the color scheme!

A benefit to web apps is seamless updating to users. One click loads the updated webpage. They might not even notice their profile pic has changed from square to round!

Today you reload and see nicer buttons; tomorrow you reload and see them thanking you for the incredible journey you went on together that leaves them with new jobs and $hittons of cash, and you with a search for a new tool.

That often is not a benefit for the users though and one of the reasons why I only use 'web apps' for throwaway work.

...and then good luck when the application that yesterday worked well today doesn't work at all, because somebody removed a small function critical to your workflow and you can't get it back. Seamless experience!

The basic problem there is the mixing of feature changes and security changes in a single stream.

So you can't just say you want security fixes only, no new or changed features.

The problem with trying to separate them is that often a security fix is put into code that had feature changes, and so you can't get the security fix without the feature changes.

Going the other way requires developers to maintain a variety of old versions of their code so they can backport security changes. Which is a lot of work for them for very little extra value.

Hence Debian's practice of back-porting security fixes on stable distros.

Also applies to Ubuntu, probably Red Hat, though the latter's vastly smaller repos mean vastly greater reliance on third-party sources, and concommitant risks of introducing/changing features when security fixes are wanted, or riding bareback without security updates.

There's also the inherent conflict between running current code and fixed code. Debian's legendary conservatism reflects a bias toward the latter, at least on its stable branches. Of course, you're welcome to lead and bleed on testing, unstable, or experimental, if you so choose.

IOW you can't get a developer to do straight maintenance work...

I think this is a serious issue, at least for developers. I've never been burned by a self-updating browser, but anything beyond that seems downright unacceptable. There are just too many fragile, hand-managed dependency chains at work even in good setups.

"I've never been burned by a self-updating browser..."

I develop browser-based software which, after one Chrome update, was rendered unusable by a bug in Chrome. Fortunately, Google pushed out a new version with a fix the next day.

I did suspect that "I've never been burned" just meant I hadn't developed enough for browsers...

This also seems like a good reminder that transparent updating only works if your team is good and responsive enough to fix mistakes on the fly. If you're rolling out one update a month, you'd better give people a choice so they can decline when you hand them faulty upgrades.

> this is also why I never like modifying a working system unless it's absolutely necessary.

this is safe, but it paints you into a corner over time, where you become paralyzed and can't improve anything. Needs better testing, so changes are safe.

Exactly. It's analogous to the difference between big bang integration and continuous integration. The lesson there is: if something hurts, do it more often. Little steps let you know exactly what changed when something breaks.

Tests don't magically make changes safe or give the confidence to make changes.

A culture of many small changes means that you deal with smaller problems relatively quickly. The more you fall behind, the bigger the jump to where you should be, and it's not a linear relationship.

At one place where I work, we're on nodejs 0.10, which is several release versions behind. It's causing us a bunch of problems, because while 0.10 is still technically not EOL'd yet, npm modules behave like it is... however we've left it so long, that the jump to current stable is a giant task, which we don't have the time for given other business reqs.

Tests indeed don't guarantee safety, but lots of small changes are easier to deal with than the occasional massive change. It's also the basic concept behind version control.

This is my experience as well with node and shrinkwrap. I see people using shrinkwrap to avoid potential issues, but what ends up happening is they get stuck on old versions of dependencies and when there's a bug fix or new feature that's needed it can be very difficult to upgrade. Instead, I prefer to try to always keep my dependencies up to date, especially with new major versions to avoid exactly this problem.

No, they scientifically make changes safe and give confidence to make changes.

Do you think that, in case of the problem from this article, Perl devs should have had a test checking if their update doesn't break someone's Emacs when they try to use it in client-server mode, launching one via a Perl script and other via some other means, on a Linux with "capabilities" feature?

This story wasn't about trivial day-to-day developer bugs, but what kind of problems happen in really complex systems.

>this is safe, but it paints you into a corner over time, where you become paralyzed and can't improve anything.

As someone who went through the process of a painful, long delayed upgrade not too long ago I definitely second this, although as a much more generalized principle I think it'd be more accurate to say that there's a fine, eternal balancing act between "work" and "meta work", and that this principle applies to way more areas of life then systems work. However much fun (or "fun") it may be, as mjd said there most/all of us primarily have work to do using our tools ("tools" being in the most generic sense here, including knowledge) rather then working on our tools. To some extent, a few days spent on tools/skills is a few days not spent applying them, and it's all too easy to sink so much time going down various rabbit holes that "actual work" loses out. But of course on the flip side improving our tools/skill sets is key to realizing major boosts in long term productivity, keeping up with changing standards, and so on. I remember a few years back at one workplace when a number of senior engineers (50s/60s) all finally bit the bullet and started to work to get up to speed on the latest CAD developments. Or myself a decade back when I decided I really needed to update my shell usage, read the full ZSH manual and spend some time seeing how I could improve my speed in general. There were many significant projects going on, but then there always were, always something that "needs to be done next week!". I personally find it can be a tough balancing act to optimize the savings gained from increased productivity down the road vs the time expenditure needed to begin realizing them in the first place, particularly if "everything is working fine". I know that over the years I cumulatively lost plenty of time on manual involvement in tasks I could have automated, but each individualized instance seemed trivial and it was easy to default to just hacking something quick and getting on with the day vs deciding it'd be worth spending time to improve it for good.

Of course that's all assuming there aren't any other barriers in the way. My extremely oddball pain point on one workstation was that I'd enthusiastically built an tower Mac Pro OS X system around ZEVO, an short lived attempt to salvage Apple's old ZFS work and bring a fully functioning version to OS X. And despite a few niggles (some which didn't matter to me, like CLI-only), by the time it was getting ready to go it was fantastic, nicely integrated and all that. I was pumped, it was exactly what I'd wanted under OS X ever since I'd seen Sun's original presentation, and I hopped fully onboard. But of course the company developing it promptly went under just as they were launching, were bought for IP/people by GreenBytes (which itself was subsequently acquired by Oracle), and after a single bug release that was it. It only worked under 10.8 and not one version later, and there was no clear upgrade path (I really didn't want to revert that system back to pure HFS). So 10.8 was where I stayed until OpenZFS and in turn O3X came along to save the day, but by that point I was out of the habit of frequent upgrades there. Testing is definitely helpful (along with a nice rollback system) but sadly can't always save you, frequent upgrades definitely help keep key meta-knowledge fresh.

This was a really cool bug track down article though, and inspiring.

No reference to the full quote? :)

"You can check your anatomy all you want, and even though there may be normal variation, when it comes right down to it, this far inside the head it all looks the same. No, no, no, don't tug on that. You never know what it might be attached to. " - Buckaroo Banzai

The movie is quoting that from a neurosurgeon. I first saw that line in a book about the Massachusetts General Hospital.

In the movie the lead character is also the neurosurgeon :)

Fantastic movie. I don't remember that quote though. My go-to quote has always been "Remember, no matter where you go, there you are."

Which reminds me, I have to re-watch that movie!

> This computer stuff is amazingly complicated. I don't know how anyone gets anything done.


When me and my colleagues are tearing our hair out over a problem, I'll often exclaim "These computer things are hard!". It's delivered both as a joke, and also a reminder that it's okay to take a while to figure out the problem in a particularly complex system. Eases the tension a touch.

In particular, the set of exported shell environment variables is a sinkhole of state that can potentially affect every program we run.

If only we had a way to deal with state without it being mutable...

Quote of the day for sure.

Mostly by accident.

What's funny is that in this case the dynamic loader was sanitising something irrelevant to the actual capability granted, which seems to me should be a bug.

Also, I'll be an advocate for just starting emacs with systemd, and never worrying about it again.

> What's funny is that in this case the dynamic loader was sanitising something irrelevant to the actual capability granted, which seems to me should be a bug.

EDIT: fixed incorrect description of the yak-shaving conclusion.

The dynamic loader did so because it ran with an extra capability that the user invoking it didn't already have. Most of that sanitizing exists to prevent the user from gaining those privileges themselves by invoking a more privileged program, such as by setting LD_LIBRARY_PATH. Sanitizing TMPDIR prevents a somewhat different class of vulnerabilities, such as using those extra privileges to write to files you normally couldn't. However, I don't think it makes sense to have a complex special case like "if you only have one of this subset of extra privileges, allow TMPDIR but don't allow all the other potentially dangerous environment variables"; that adds a significant amount of complexity and subtlety to already security-sensitive code.

Giving /usr/bin/perl itself extra capabilities effectively grants them to every user on the system, since you can use Perl to run arbitrary code. At that point, it would make more sense to just allow all non-root users to bind to arbitrary ports. I'm somewhat surprised that there isn't a sysctl to disable the reservation of ports 0-1023.

> However, I don't think it makes sense to have a complex special case like "if you only have one of this subset of extra privileges, allow TMPDIR but don't allow all the other potentially dangerous environment variables"; that adds a significant amount of complexity and subtlety to already security-sensitive code.

I think it'd make more sense to have a collection of lockdown functions which are run for each capability, with the action functions run being the union of the collections of each effective capability (with full root just being the union of the collections of all capabilities).

Or, y'know, rethink root in general. Plan 9 had good ideas in this area …

No, it was the dynamic loader that did the sanitizing. Mark said that he thought of Perl's sanitizing, and had ruled it out, and then explicitly said it was the dynamic loader that did the sanitizing in this case.

Thanks for the correction; fixed. I misremembered that bit of the yak-shaving adventure when I went to write my comment.

The conclusion still holds, though: I don't think special-casing particular capabilities makes sense. And in the case of the dynamic linker, it doesn't actually have that information available; it relies on the AT_SECURE bit set in the process's "auxiliary vector" (see "man getauxval"), which the kernel sets when the process has any privilege its caller didn't have.

I have to agree. It seems the vast majority of daemons that run as root only do that to get the special port they want. It made a tiny bit of sense in the days of multi-user systems. Those days are over.

> I'm somewhat surprised that there isn't a sysctl to disable the reservation of ports 0-1023.

There is: /proc/sys/net/ipv4/ip_local_port_range

That does something entirely different; see Documentation/networking/ip-sysctl.txt (online version at https://www.kernel.org/doc/Documentation/networking/ip-sysct...). ip_local_port_range sets the range of ports used as source ports for outbound connections that don't bind to a specific port.

I checked for a sysctl controlling the ability to bind to privileged ports before writing my comment. The relevant code in the kernel compares against a hardcoded #define PROT_SOCK 1024, and doesn't have any means to disable that check. See inet_bind in net/ipv4/af_inet.c .

FreeBSD has the sysctls you are looking for to control the reserved port range, allowing unprivileged users to bind (curious that Linux doesn't):

    $ sudo sysctl net.inet.ip.portrange.reservedlow=10
    net.inet.ip.portrange.reservedlow: 0 -> 10
    $ nc -vvl 1
    $ sudo sysctl net.inet.ip.portrange.reservedlow=0 
    net.inet.ip.portrange.reservedlow: 10 -> 0
    $ nc -vvl 1
    nc: Permission denied

> the dynamic loader was sanitising something irrelevant to the actual capability granted

No. You can theoretically use tmp to gain any privilege. So anything extra at all must be blocked.

> You can theoretically use tmp to gain any privilege.

Not when you access it as yourself rather than root. The capability in question only grants access to open low ports; there's no way to combine that with files in a temp directory to get root.

This drives home the point that (almost) no one actually knows the details of their full stack.

And that present day computing is damn hard to reason about, because oh so much happens silently in the background.

You should definitely tug on things when you have a controlled test environment and the time to explore what-ifs.

Much like companies should try to replace their own products (before a competitor does), infrastructure teams need to force “predictable” upgrades in a controlled environment on a regular basis. For example: look at your dependencies, imagine what upgrades are likely to be required in the near future, and try making those upgrades on test systems to see what could go wrong.

That approach achieves three things. One, since you’re not in emergency mode and you’ve used a test environment, any problems that you do uncover are not going to cause a crisis. Two, if you do this semi-regularly then you’re likely to see only minor issues. Third, exploratory upgrades give you a lot of time to fix problems (whether it’s time for your own developers to make changes, or time to wait for an external open-source-project/vendor to make changes for you).

Environmental variables have caused some of my more troublesome debugging experiences:

1) On Windows, VisualVM not being able to find the IntelliJ IDEA process I had running. This happened because IDEA was started with Launchy, which had a different TMP directory set, because I used both RDP and the Console, or something.

2) On Linux, ibus IMEs not working at all in my browsers; happened because I was starting them in a tmux, and the tmux server was started in my previous login session, so the DBUS_SESSION_BUS_ADDRESS in the tmux was stale.

In the end it seems that all complexity stems from trying to enforce access privileges on a system that do not gives a rats ass about anything beyond 0s and 1s.

IF we want to avoid said complexity we basically have to go back to running only one process at a time, loaded directly from dedicated, removable, storage when needed.

I read this, but seemed to skip over the part where he explains why this changed suddenly, when the behavior was documented?

What changed to make the perl become capable whereas previously it lacked the low port capability?

He usually started emacs directly. In this case he started emacs via a perl command, very very indirectly.

Also, a recent change was an admin setting TMPDIR, previously it had been left unset.

So it was a combination of two things that triggered it, and not one alone.

> So it was a combination of two things that triggered it, not one alone

And this basically sums up all the troubleshooting time sinks I've experienced in my career.

That is a good point. I still don't know the answer to that! I _think_ it's that the sysadmins added the new capability to Perl sometime in the last few weeks, and the the problem then didn't appear until after the next time I reloaded the system configuration.

I will confirm this with the sysadmins and add it to the article. Thanks!

I checked with Frew on this, and we still don't know the answer.

That seems to be a site-specific configuration.

The real question is why in the world you'd move tmp.

I almost hesitate to say this, but it seems to me that emacs needs a new command line parameter to allow the user to specify the location of the socket file.

In case you're talking about emacsclient, that has one, --socket-name, as the article says.

If you actually are talking about "emacs" and not "emacsclient", then the argument you want is --eval '(setq server-socket-dir "/what/ever")'

I did not quite get why is he starting emacs with "git re-edit" with re-edit being some Perl script. What is the role of git in this setup?

Just that, when you run '$ git nonstandard-subcommand', then the git executable looks for anything executable in the path that is named 'git-nonstandard-subcommand'. Then it runs it. That's all. But at that point, git could have been doing something to the environment before invoking the custom script.

A pretty good showcase that the more features you have the harder it is for them to keep working correctly together.

> This computer stuff is amazingly complicated. I don't know how anyone gets anything done.

tl;dr Don't use unnecessary amount of scripts and tools to do what vim + git do just fine.

Why even bother to learn programming if you can't use it to automate common tasks?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact