Hacker News new | past | comments | ask | show | jobs | submit login
It used to be simpler to teach Unix (jpmens.net)
173 points by pcr910303 28 days ago | hide | past | favorite | 236 comments

It was simpler to teach Unix because what we expected from Unix back then was simpler to what we expect from Unix these days.

Back then we expected programs like: `telnet`. A program that can be developed by one person (or a few at most) and requires just a few tools to be developed (gcc, gdb, vim, make).

Nowadays we expected programs like: `Twitter`. A program that cannot be developed by one person but hundreds (don't ask me why, I don't get it either), and requires hundreds of tools to be developed (html, css, sass, ruby, ror, ansible, git, javascript, babel, docker, docker-compose, yaml, json, make, vscode, kubernetes(?), mysql, nginx, ...).

I find comparing telnet to Twitter disingenuous. Back in the days big banks/insurance had big application, developed by big teams.

I'm not saying that what point out is not true, but telnet is prolly best compared to openssh, which is seems to be developed by 2[0] devs, and waaaaay more complex. Both are tools, both may be used for various over-the-network interactions with OSes.

> don't ask me why, I don't get it either

It's an ad network. It has to deal with compliance in many regions. There are mobile apps.

[0]: https://github.com/openssh/openssh-portable/graphs/contribut...

Anything can be developed by one person or thousands people. It depends on how organisation is managed and what other software the software need to support and communicate with.

Right. I wanted to focus though in the most "demanded" programs back then and now.

I would say back then, the most demanded programs were command line programs (small utilities that do one thing well).

Nowadays the most "demanded" programs are programs that want to attract as many users as possible (and so, they have to "scale", they have to be "distributed"), programs that need to grow as much as possible (the bigger the program the bigger the company the bigger the profit), programs that need to be rewritten in new tech stacks every 3 or so years.

I guess the mentality has changed along the way as well.

Demanded is a weird term here. No one demands "ls", not the banks/insurance companies back then, not the Twitter users today. It's a tool, not an application. You dont buy your computer (or cloud VM) to run "ls".

Back in the days consumers were not in the software picture so much. Maybe some pre-PCs were catering for home users, maybe some BBS to dial into, but that was it.

This has changed. And this is why we have Twitter.

> programs that need to be rewritten in new tech stacks every 3 or so years.

Where do you get this from? I dont see such trend at all. Okay, some library needs to be replaced, and we share more code nowadays (the stacks are deeper, but what would you expect? more cycles to burn and more ecosystems). But no-one factors in the cost of constant rewriting to stay up to date with the hippest stacks.

> > programs that need to be rewritten in new tech stacks every 3 or so years.

> Where do you get this from?

I'd guess front end dev. 3 years is positively prehistoric laughing material for all the up and coming resume-driven-development front end rockstars.

No it's not, get your facts checked. React.js for instance is 7 years old and here to stay. Angular 2+ and Vuejs are both younger but here to stay too.

Frontend dev has largely matured and stabilized, it's not the shit show it used to be anymore.

> Nowadays we expected programs like: `Twitter`. A program that cannot be developed by one person but hundreds (don't ask me why, I don't get it either),

I'd say this is wrong. Creating a twitter-clone is simple, and can be done by one person. The hundreds are neccessary for making it scale and managable. And many of those hundred are even there for the content, not the tool. They filter out fakenews and nasty behaviour, monetize the platform and do all kind of other things which are not in the realm of Unix-tools itself.

Twitter the server is hard and requires hundreds, twitter the client is easy and requires 1.

> Twitter the server is hard and requires hundreds, twitter the client is easy and requires 1.

You are severely understimating the amount of work that goes into a (web) client application that is being used by as many people as Twitter's - UX design and testing, accessibility, internationalization, browser compatibility (including lots of exotic ones you may never have heard of)...

What client?

by client I mean anything that allows you to send tweets and read whatever tweets you are currently subscribed to, so a theoretical command line client for twitter (assuming of course the twitter api allows that kind of thing nowadays which I am not at all sure of)

Ok, I disagree. Such a client is even more work. Because you need to add an additional API for connecting this client, and recreate the whole interface again in a different environment.

No, doing a twitter-clone is quite simple, because the interface is not very complicated. It's even on the usual list of tutorial-examples these days. So in worst case you can just copy&paste the whole interface from somewhere. The backend is also not quite complicated for an average developer.

The hard part in doing twitter is scaling up your poor clone which works well enough in a single instance for some hundred users, to an architecture of multiple instance serving million users.

But this comparison is unfair, because the original telnet-client&server was also only a single instance-solution, not some upscaled monster for millions. Even though a twitter-clone still needs a bit more knowledge than telnet, it's still on a level one person alone can handle.

>The hard part in doing twitter is scaling up your poor clone which works well enough in a single instance for some hundred users, to an architecture of multiple instance serving million users.

That's twitter the server right there.

twitter the client is sending in your tweets, and showing you a list of tweets from other people you are subscribed to in some order - probably chronological.

The twitter client does not need to keep any data (between sessions) but should be usable with just sending off requests for data from twitter's public APIs and posting your tweets to same. (assuming of course as I said that twitter APIs give access to this which I am not sure of)

>Nowadays we expected programs like: `Twitter`

I didn't, and I'm sure I'm not alone. Let a website stay a website.

I expect it to be free and maintainable (from both developer and user perspectives).

> Let a website stay a website.

So you still need Chrome/Firefox, and Gnome/KDE/Mate/whatever to run them in.

I think the OP's underlying unhappiness is more that "the people who want to use unix now are fundamentally different people who want different things that the people who wanted to 'learn unix' 35 years ago.

(Though I'll bet 80+% of the people they taught to pipe ls to wc were within weeks using irc and mutt whenever they were looking the other way in class, because for most of us people are more fun than machines, and back then we used irc/usenet instead of Twitter/Facebook, but to fill the same sort of niche.)

It's not that hard to build a browser (I'll say this because I've done it, and there are plenty of people who have done it better.) It's just tedious and not something you can put on the app store.

> It's not that hard to build a browser

One that reliably runs all the javascript needed to support modern Twitter/Facebook/GMail/whatever style webapps? I mean _maybe_, if you're pulling in someone else's code as dependencies to deal with your javascript runtime and your rendering engine. A lot of people could hammer out lynx or w3m in a weekend, but not something that'll run Twitter. (Hell, the Safari Tech Preview occasionally chokes on Titter...)

I liked the web better with HTML2.

There were simple technologies which did a lot too, like Hypercard.

The creeping complexity of the web isn't something which had to happen. I understand the reasons it did happen, but in an alternative timeline, without Netscape vs. IE, the dot-com boom/bust, and the whole Netscape meltdown, it might not have happened.

People were building sophisticated applications which did a lot in the days of NeXTStep, BeOS, SGI, and Amiga, without this massive complexity build-up. Altavista scaled well too.

Just like people got tired of "fancy" user interfaces and minimalism became somewhat fashionable, I hope the same will happen architecturally, possibly after especially bad yet-another vulnerability and breach.

>One that reliably runs all the javascript needed to support modern Twitter/Facebook/GMail/whatever style webapps?

That can stay within the browser or some shared library. It absolutely doesn't need to affect unix/linux architecture in any way.

You're vastly underestimating the effort required to go from a simple HTML renderer to a full-blown web browser. See [1].

[1]: https://drewdevault.com/2020/03/18/Reckless-limitless-scope....

> So you still need Chrome/Firefox

I think for most people learning unix, understanding chrome/firefox boils down to knowing "there is a chrome/firefox repo somewhere, it gets compiled/packaged for your distro somewhere and has a bunch of dependencies". DE doesn't need to have any provisions for Twitter specifically either, apart from what is already essential for most GUI applications.

I get what you mean, I guess I just twitched reading "unix twitter client" being something we expect from unix.

Me neither. I guess that's something that has also changed. Back then, the "users" were a bunch of guys who required programs like `telnet` and `ls`. Today the "users" are practically everybody in the world (and the majority of the people out there wants Twitter, Instagram, Facebook).

Most of those technologies are deployment and infrastructure. There's Twitter the server program, and then there's Twitter the business. But whether or not an app runs on docker, or kubernetes, or is version-controlled by git, or what database it speaks to, is orthogonal to the technologies used in the app itself (or at least it should be). I wonder if one had a source package for the Twitter server if you could just run it on an arbitrary device with an arbitrary configuration, or if it actually was bound at the hip to docker or mysql or whatever configuration they choose for their own servers. Ideally, they should be separate concerns.

The problem with Unix is that people started building solutions on top of eachother while not hiding the underlying abstractions. Then you get annoyances like the "/etc/hosts" or "/etc/network/interfaces" files which are overwritten by some management tool.

how much is the basic features of twitter (account, tweets, feed) and how much is scaling, analytics ?

telnet is a bad example, it is full of quirks ans special cases.

"It used to be simpler"

I hear that more and more, often followed by: "In the 90's..." But then I compare the steps it used to take to setup my NextCloud install, including altering php.ini, installing and configuring mariadb, php-fpm, getting a free start-ssl cert, etc. to my current docker-compose.yaml and I think: No.

Edit I also think we forgot all about how we manually did things, like mounting USD drives with kernel 2.4. I love how Linux is today.

Nothing has become easier; everything is becoming more difficult as technology advances and we've just learned to automate the complexity.

But the automation comes with the downside that we're forgetting how to build simple systems at all.

Put differently: Try explaining your docker compose setup to a beginner that only knows how to use normal desktop software.

I often have to explain Docker (or containers in general) to non-tech people. I tell them I take an entire PC, with all software on it and make it a single function package that I can give to anyone and they can run it anywhere. Almost nobody understands the minutiae of their OS or their CPU. We may understand it a bit more but a Dockerfile or a docker-compose file simply says, "take a fresh full Linux install, install some stuff and run it."

It a mind numbing number of abstractions but one can understand the high level if one understands what an OS (or a computer) is.

It IS easier, but I agree it's much more complex under the hood. I don't think that has to have downsides. A human can only comprehend so much, we must build on the skill-sets of our fellow humans.

Yes, we've all used that abstraction. And it is a lousy one. For starters, the novice interpretation of a PC are peripherals (mouse, screen, keyboard, wifi) with a GUI and then down from there. Your abstraction starts without a screen, without a terminal even.

It's a bad interpretation because the abstraction level is arbitrary. GUI applications? No. Terminal, yes, but we need to automate. Wifi? No, we virtualize the network. DNS? Depends, in k8s it is used for service discovery. Tooling? Package manager: yes. Network Center? No. Firewalls? I don't think so.

So, what constitutes a 'fresh linux install'? Why are there different distros (archlinux, ubuntu, alpine)?

What are the contracts for such a Docker image? Should it be secured? How is input and output arranged on the Docker image? What if I want to use a smarter way for logging (using UDP packets to a discoverable logging provider), can I tell all my Docker images to start using that?

How about monitoring and reliability? What constitutes a failed docker image? How do I detect it?

An abstraction is only good when it covers 99% of the underlying complexity. This abstraction is so leaky, it is more of a burden. Especially to beginners, who now are tasked with understanding Docker and understanding the limits of the abstraction.

Good technologies are explainable through solid abstractions. They are not 'leaky'. For me, Docker and k8s are so complicated because they are based on leaky abstractions and not hiding the underlying complexity sufficiently.

>For me, Docker and k8s are so complicated because they are based on leaky abstractions and not hiding the underlying complexity sufficiently.

Bingo. Ive worked with so many Docker houses of cards slopped together that like to pull the latest docker images regularly in their docker compose and, surprise, something changes in that images structure and assumptions they were leveraging that breaks the rest of the application. As the number of containers increase, the failure rate also increases. You can lock in versions but for trendy users, "then why are you using docker, bro!?"

Containers aren't bad, they're actually great and useful technology, but using them for rapid development just to hope you're going to manage complexity often only makes things worse. Theyre far too often misused and abused than they are used for sane development purposes.

The reason for this is because Docker still doesn't have a lockfile like every (other?) dependency manager ever. Pinning a hash dooms you to use that hash forever, because there's no easy way to update under controlled circumstances.

Well it depends on your audience, in my case it is sufficient but I agree if one really wants to learn this just makes you ask questions about horrible overkill, and then you have to explain layers and indeed, it is a rabbit hole, I agree.

Still, you understand it ones you can build on that. But tbh I agree that there are now students that lack basic sysadmin skills needed to understand docker images from the inside. That said, I don't understand Kubernetes very deeply, so we complement each other nicely.

> It's a bad interpretation because the abstraction level is arbitrary. GUI applications? No. Terminal, yes, but we need to automate. Wifi? No, we virtualize the network. DNS? Depends, in k8s it is used for service discovery. Tooling? Package manager: yes. Network Center? No. Firewalls? I don't think so.

You can absolutely run all of the above in Docker containers. Just because most don't doesn't mean you can't or shouldn't!

For a user who is not concerned with the difference between kernel and userland, I think the analogy is good enough. Just throw in that it has to be a Linux system and you can get away with explaning it the same way you would VMs, just being more lightweight and reuses some of the "base system".

But how do you explain the need for a "whole system" just to run one application?

"Back in the days", you installed one package, edited one config, started the service, and it "just worked". Now i need to install a docker, type some docker install command, that pulls who-knows how many files, puts them in some 'weird' location, including the config file (which belongs in /etc), so instead of having a 2mb service, I now have a 1gb docker image sitting on my system, with a config file hidden somewhere, and I, the user, don't see any benefits, compared to just 'apt-get install'-ing something.

I mean, I understand, daling with dependencies are hard for some developers, but sometimes is just overkill.

Did it just work? Were there never any dependency issues or unsolvable complexities? Or dependencies that were solved differently when run a second time leading to a different output of software?

The "whole computer" analogy works because I can drag it anywhere, plug it in and it runs exactly as intended. I can even copy it and run in 1000 times in parallel. When people complain about overkill I start talking about shared layers.

To me, apt-get install always 'just worked'. Same for yum install, emerge etc.

I understand it's easier to ship software as a "almost VM", but imagine everyone doing it... need a browser, download firefox docker image. Video player? docker image! Text editor? Docker image!

If firefox can make it 'just work', and whole (eg.) lamp-stack can 'just work', why not others?

Docker is a relatively new thing, and software worked for decades before that, without any real issues.

> To me, apt-get install always 'just worked'. Same for yum install, emerge etc.

Sure, as long as a third party did all the hard work of integrating the package into the system first. Have to install something outside the repo? Clear your schedule...

> and it "just worked"

For me, it frequently didn't. Sure, nginx will work out of the box, but I've had some difficult Apache installs, and less mature software? Often a dependency/configuration nightmare, and then, sometimes, it still would die with an opaque error message. And then something breaks and getting to a clean slate can be challenging.

That's one of the huge upsides of Docker for me: Stuff finally "just works" on my machines, too! And it actually "just works" on others' machines as well, which it has all along as they tell me, but it even does so when I'm watching. Amazing, really.

I think the config situation has improved as well – the most frequently used configuration options usually can be passed in as environment variables, so I don't have to pin things to non-default values just because the config file I copied and edited had that default value in its config at that particular version, only to break five versions later – neat. If I do need a config file, I keep it in git and bind-mount it in – the location is usually easy to find (e.g. in the docker run one-liner on the DockerHub page).

Image sizes can be crazy, that part needs work, many people don't write space-efficient Dockerfiles. I try to use alpine or scratch images where possible, that makes the problem quite manageable; and, frankly, the amounts of network, storage and ram needed for the overhead of an alpine image are a very fair bargain for the time and headspace I gain by being able to treat applications as a way darker-gray box than otherwise.

> But how do you explain the need for a "whole system" just to run one application?

Well you see, Linux culture doesn't really consider the concept of "system" and "application" to be independent. Everything has to be built to work together or the whole thing catches fire and explodes. Consequently, its easier to build entirely different systems for every single application and package them all up together than it is to try and get them to work on the same system.

Weirdly, they seem to think this is a good thing.

No one's made a working alternative, except maybe Nix which took 20 years to show up.

Yeah, I mean, there never was RiscOS, MacOS (classic), or any other operating system...

Nix is just another overengineered solution to the problem that doesn't really exist on other platforms that are actually, you know, platforms.

I usually explain this one in terms of multiple projects: Project A uses postgres-11, Project B uses postgres-12, Project C uses porgres-11 with postgis and also redis.

Having them all in containers and a docker-compose.yml file makes it easier to quickly start an environment for a specific project.

Ofcourse like everything in life, it's a tradeoff, and in this case we're trading in diskspace and some overhead for ease of use and portability.

You needn't (and shouldn't) edit the config file in the image. Going into /var/lib/docker/images and editing things is bound to end poorly.

Most images have environments variables you can set that will control the config, or you can bind mount a config file from anywhere on your hard drive.

> Now i need to install a docker, type some docker install command, that pulls who-knows how many files, puts them in some 'weird' location, including the config file (which belongs in /etc), so instead of having a 2mb service, I now have a 1gb docker image sitting on my system, with a config file hidden somewhere, and I, the user, don't see any benefits, compared to just 'apt-get install'-ing something.

On the other hand, I can `docker-compose up -d` on practically any platform in existence. I don't have to worry about what package manager this host uses, or whether the version they have supports the config flags I'm using. It's the same everywhere. I don't have to deal with the pain that is 3rd party apt repositories, and handling pinning when they want to ship a newer version of a library than the main OS does (I believe Salt does this).

That, perhaps, is worth less to you than to other people. I would _far_ rather handle setting up another couple Docker containers in Compose than I would handle figuring out which Apt repository they actually publish to, and handling the installation, and fixing the configs (because many services ship with broken configs, because they have no idea what your Postgres password is, for example. Not true of Docker, their docker-compose.yml typically includes setting the password).

This is especially true of multi-component systems, or systems that I'm testing out or using in dev. If I want to enable the Jaeger built into an application, it's one command to run it. In apt, that would have to install 4 or 5 daemons that I would then need to configure.

I also think complaining about disk space is rather disingenuous at this point. Yes, disk space costs money. No, you are unlikely to have Docker make a significant difference in the price to build your PC. 1GB is huge for a Docker image (but many people are bad at trimming them down, so I'll accept it). In 2017, storage cost roughly $0.03/GB on an HDD. You're paying 3 cents to store that Docker image. Probably less at this point, I saw an article about WD releasing $15/TB HDDs in the near future. The moderate performance impact seems more important to me.

And what you answer when asked what is the difference between a container and a virtual machine? Because calling it a lightweight VM is obviously a lie and it costed me too much time at work

For the people I talk with that distinction is irrelevant. It's just management that wants to know how I can guarantee that the same code runs on our batch processing cluster and their cloud environment. And how I can develop one code base for both platforms.

>Nothing has become easier;

Try running an OS off a USB back when Windows 2000 was around. It was very very difficult. These days I just run my OS off Samsung SSDs plugged in externally 24/7 on nearly all non cloud hosted machines.

If I go to a friends house or bring my laptop around I just boot off that external flash devices. Seperating your data from the machine is very flexible.

This is simple now because the hard work has been done. Also way easier to learn programming these days with StackOverflow and virtually infinite resources of the modern internet. Even 80s or 90s internet had some advantages over today but holistically it's just really been getting better and bigger. Learning programming back then require decent manuals and access to other people with expertise.

> Put differently: Try explaining your docker compose setup to a beginner that only knows how to use normal desktop software.

And on the flipside: Try explaining to people that running stuff on hardware is still viable, that they don't need to break their software up in a dozen containers when, if you look critically, it's just another CRUD app with authentication.

I've this discussion with some friends AWS lovers: now it all has to be a lambda or a cloud container, even for a simple CRUD server for a side project...

IMO the things are not getting more difficult, we are just getting older and some of us don't want to adapt.

I work with go (system programming) and containers today (I'm 40+, working since 94 with Unix) and the whole development/deployment process is way easier/cheaper/better engineered, than it was before when i was writing and maintaining C based services running on Unix. I remember the staging environment was a physical machine (an expensive spare machine idling), the test suite was bad, we needed a huge QA team, we used to debug stuff in production, every deployment was a surprise. Software engineering best practices were almost 0.

> Try explaining your docker compose setup to a beginner that only knows how to use normal desktop software.

Which "beginner that only knows how to use normal desktop" is trying to run "docker compose setup"?

No. Some things are genuinely becoming easier and simpler, just because we have the extra oomph to make different trade-offs between implementation complexity and performance.

Simple example from outside of computing: if you have high quality steel available, the design of many machines becomes much simpler.

Why would you need to? Would a battery engineer need to explain everything simply? Would a bridge engineer? Do you think a normal user knows anything about how a graphics card works?

Accountants, financial managers, psychologists, chefs, any field that requires training has intricacies that can't be explained simply to a lay person

It is different shades of simplicity. Assembler is a very simple language, but it needs a lot of effort to write something non-trivial in assembler. Rust is much more complex, but a lot of tasks are much easier to do with rust than with asm.

NextCloud in the 90's ...?

I think the point is not about "add an abstraction layer" and "the problem is gone".

You could make a tar.gz of a chroot in the 90's too, or you could make a snapshot of an VM. Or you could have an abstraction layer too (I did a lot for provisioning apache+mysql) like: addwebserver.sh "hostname", and addvhost.sh "customdomain" (which also included the php thing, backups, and compliant log rotation).

I think the main point here was about the fact that UNIX (and internet) was more "easier to teach" because it was more simple.

Or... do we need to understand and install docker, in order to teach to "list files" in the first day?

"It used to be simpler"

I don't know about that. You had the BSD vs System V at the time, and standard commands had different flags, default shells, etc.

Then shortly after, all the various RISC Unix flavors.

There is, of course, more than Linux these days, but it has so much market share on the server side.

AIX vs SunOS vs Solaris vs HP-UX vs Digital Unix vs etc etc.

I worked one place that had IRIX, AIX, Solaris, and Tru64 on one site, for a company of only 600 people. Each with different service management, shells, low-level (hardware/firmware), backup, etc all to learn. Old UNIX was a fucking shambles, with companies desperate to break things in small and large ways to try and create lock-in to their flavour.

Networking added to the fun. Token Ring. VAX/VMS networks. Then some normal TCP/IP. Then these new PC things with whatever it was they had, which certainly was not TCP/IP. It could even be Novell.

Then the network cards had DIP switches on them and things like thin-net or AUX big network connectors.

This made things we now take for granted a highly skilled domain. Getting your IBM box to talk to the Sun workstation on the other side of the room was rocket science.

The different dialects of UNIX meant that there were niches for those that knew the difference between how to run a 'ps' command on a Sun box to (say) an IRIX box.

Incidentally I am not sure it was about lock in for all companies. IRIX was about doing things in 3D and enabling those applications to work. They had no interest in tinkering with UNIX to make it different for the sake of it. It would be different because they had different hardware and 3D to be standardised.

I felt IBM was different just for the sake of it with AIX. The same with Sun except they dominated in some universities so were the de-facto standard for many people coming through education at that time.

Believe it or not, places like this still exist. I worked at one (eerily similar story), then moved to a much larger company with even more platforms at one site. It was pretty silly.

Try the same with Red-Hat, Ubuntu and BSD, to see how much it has improved, hint nothing at all.

It was worse. AIX, for example, had SMIT. You had to use it, or changes were only temporary. There was a lot more difference in those RISC Unix distributions than your current examples.

Oh, and compiling open source programs across several of those. Real fun.

I have started my UNIX knowledge with Xenix in 1992, followed by DG/UX and followed up from there across HP-UX, Tru64, Solaris, Aix, Linux distributions, FreeBSD,...

Looking at modern distributions I hardly see a difference given distribution specific directories, packages or tooling.

Well, for me "sh configure" has a much higher chance of producing a working Makefile than it used to, as one example.

> You had the BSD vs System V at the time, and standard commands had different flags, default shells, etc.

You still have that.

For example, I develop on a Mac but production is Linux. The differences between GNU TAR and BSD TAR are a continual pain in my side. More everyday commands like ps still trip me up, too.

It's not nearly as bad as back in college when I had to keep track of the differences among Solaris, HP-UX, Slackware, FreeBSD and NEXTSTEP, but still.

Any particular reason why you don't just install GNU coreutils via brew et. al?

I also develop on macOS for Linux and using the GNU equivalents with a quick function for path setting, is well, not very hard.

I truly cannot stand “any reason why you don’t” questions. They immediately make me defensive.

I’m far more open to alternate approaches when they are laid out in a clear, detailed and unbiased way. Mutual respect and all that.

I am fine with "any reason why you don't" questions. Being asked one assumes that I am aware of the options, have gone through the pros and cons, and have made a decision on the matter. Clearly, I must have some reason for the decision, and so the person wants to know what reason that may be. Mutual respect and all that.

Layout out alternate approaches assumes that I haven't done any research into the matter at all. It may be that the reason I don't use something is that I have never heard of it, in which case having the alternate approaches laid out is useful. Otherwise, it is a waste of the person's time to lay out what may already be known.

The phrasing though sounds like I’ve better got to know about it “or else”.

Hmm, different connotations for different people, I guess. The phrasing to me sounds like a junior dev asking a senior dev why something was implemented the way it was, for the sake of learning. Thank you for letting me know how the phrasing could be interpreted, and I'll try to keep that in mind when asking similar questions in the future.

Is there a better way to phrase that question?

Have you considered installing GNU coreutils via brew et. al?

That is a great way to give the suggestion. It is not a great way to check what the reasons are why they did not.

Something like "Have you considered installing GNU coreutils? If so, I wonder what made you choose not to" might do the job. But really "Any reason why you didn't install GNU coreutils via brew" seems to directly ask the question.

It can come across a bit passive aggressive. But I don't think we should concede clear questions to passive aggresiveness.

"I would have gone with (installing MSDOS), what were your reasons for going the other way?"

What if, as a junior developer, you don’t have an opinion either way and just want to know how they do things?

Everyone has reasons - from "that's what the code I pasted from StackOverflow did" to "this is what the standards doc I wrote decided upon after months of rewriting"

Forcing you to examine your reasoning usually results in better understanding

Ah, what I had meant was in response to:

> I would have gone with (installing MSDOS)

what if the person asking the question wouldn’t have gone with any approach in particular? Do you find it more polite to suggest an approach nonetheless?

Yeah there is almost always a better way then "you idiot no one would do that." There needs however to be some Venn diagram overlap in ability and understanding - the discussion starts where you and they have a common shared understanding of the code and you then expand their circle a little.

Its painful and usually waaay easier to call people an idiot, so its important to pay people to be teachers.

I could, but then I end up with everyone else on the team (also developing on Mac) being confused about why my shell scripts don't run on their computers.


I wouldn't worry about it. Maybe they thought they were upvoting in Linux.

Then tell them you installed GNU coteutils, and recommend them to do the same.

Aside: has anyone else switch from Brew to MacPorts after one too many permission issues with Brew?

I tried Brew because it is always mentioned, but I went back to MacPorts because I found it easier (especially their version of flavors) and the permission issues.

Any particular reason why you don't just install BSD utils via pkgsrc et al?

Just out of curiosity: Have you considered creating a small Linux VM for your local development...? That might save some frustration.

Not OP, but normally a small Linux VM is going to be slow, and a big Linux VM is going to slow down anything else. It can become worth it, but it may also not be...

Slow for what...? That would depend very much on what kind of development one does. AFAIK there are lots of dev work that doesn’t really require much raw power from the dev environment. (Disclaimer: I’m not at all familiar with 'mumblemumble’s requirements.)

It depends a lot on your tech stack, but lots of techs require compilation of some kind, which is usually expensive both in terms of CPU and RAM. CPU time is sometimes shareable between your VM and your OS, but RAM usually isn't.

For my own tech stack for example, the most expensive part of a build in terms of compute resources is TypeScript compilation.

The difficulty of learning Linux doesn't lie in what flags you should pass to ls, or other GNU core utils idiosyncracies.

The difficulty lies in making sense of the stack of different pieces of software that make up a Linux system, and the terminology we use to communicate about them (e.g. what mounting a filesystem means, which was initially hard for me to understand, coming from Windows).

Also: most people think the terminal is something nerds use because they think it look cool. They see that thing and it looks to then like a wall of Katonese text — something they will never understand, partly because they think they can't, partly because they decided not to care.

In teaching the first steps should always be to make clear:

* why is this interesting, why would they care?

* why is this easier than it looks? (Take their fear away to get them started, reintroduce fear later as you see fit)

* show some examples that solve problems they always had, but never could solve any other way

Not all will learn, but even if you just managed to get people to understand why this makes sense and that it is totally possible to things on your own, you succeeded.

The worst teachers in my life where those where I had to learn something incredibly complicated without having a remote clue what it is needed for.

my grossly over-simplified understanding of Linux (for the average end-user) if I had to "teach it" to somebody really quickly


processor + RAM in motherboard with hard disk and some kind of integrated graphics solution/external graphic card for video output, plus keyboard + mouse for input (hardware)

hit the power button, motherboard inits procoess and runs BIOS which jumps to hard disk MBR (typically, ignores CD/USB/net boot)

MBR jumps into bootloader on first sector on hard drive (GRUB or syslinux/extlinux) (this is my pre-EFI understanding)

bootloader loads kernel (with optional initrd) from hard disk into memory

kernel performs its own init, mounts root device (hard drive formatted in Linux-friendly file system like ext4 usually), sets up all hardware given bundled drivers/modules, calls /sbin/init or comparable

/sbin/init spawns TTYs, does network configuration, then starts X server + a display manager (typically gdm if i remember correctly for GNOME?)

X has input drivers for keyboard + mouse, output driver to get graphics onto monitor(s) (not sure if Wayland finally killed X yet, gonna guess no)

from there, you can load /usr/local/bin/Chromium while will create a window on DISPLAY=:0 and go on Hacker News (granted you have it installed with all of its wonderful dependencies. probably 300mb+ of executables and resources and dynamic libraries)

the fact that there's 20 different ways to do a lot of the steps I just described is... interesting. the magic (to me at least) happens after /sbin/init. What all gets started? Why? I feel the latest Ubuntu probably spawns... 40-60 processes? systemd and this/that/the other.

I kind of conceptualize Linux as an OS (as compared to say a microprocessor directly executing a single program) like this- in the kernel there is a big infinite loop, just like a videogame loop that "updates" everything. In this loop the OS iterates over a linked list of task structs (AKA processes in userspace) and either gives each process its fair share of CPU time according to some scheduler or skips over it if it is sleeping (which the majority are).

That's part of it. Now consider the CPU probably has multiple cores, so the "loop" as you have it, isn't a single thread. It's quite fuzzy.

Add in the many tricks CPUs employ, like branch prediction, re-ordering, et al. Things don't happen in sequential order anymore, and multiple chunks of that code are being operated on at the same time.

Even Assembler is pretty much a high level language nowadays, because once its loaded in memory, lots of unseen permutations will happen to it at execution time. It's a language of macros after all.

A question of scope. Do you want people to understand every intricate of the system? Do you want them to grasp the basic concept?

Or do you want people to be able to use it on a day to day basis?

These are very different courses

For my part I just want to answer basic questions that will help me understand the Linux kernel workings. The assembly and hardware details are a different, lower level issue.

CPUs are very constrained on the kinds of reordering they are allowed to visibly exhibit.

I agree with you but I feel like that isn't specific to Linux as an OS. Windows and Mac both have a kernel responsible for keeping processes ticking in an infinite loop (scheduling executing across threads) as well.

I think the original comment I was responding to was about the complications of Linux. To me... the complications of Linux aren't really in the 20+ years of "tech debt" that is "just make it boot and work for x86_64 like it used to work for i386", but moreso the 20-30 different configuration files and formats that happen after boot.

While it's a difficulty in understanding a Linux system as a whole, it means you can understand pieces without understanding the entire system. One of the major difficulties I've had learning about Windows systems is that it's one coherent system (as least compared to Linux), so it's hard to understand a component without also having to understand how it integrates with the entire system.

Just like the BSDs, mac, Android, ChromeOS,... there is a certain beauty that even if the whole design is a bit Frankenstein like, at least it was kind of thought out together.

> e.g. what mounting a filesystem means

Funny. I've been using Linux for 10 (?) years and I still don't know what mounting means. I just know it's something that needs to happen in order for me to access the contents of a disk. I don't really care about it either, I only care about accessing the data. Why isn't it automatic anyway? Windows does it automatically whenever possible, why can't Linux?

> Windows does it automatically whenever possible, why can't Linux?

Linux can do that as well, every desktop environment worth their salt does it by default.

To expand on this a bit, desktop versions of Linux will happily mount filesystems when they're attached. Server versions tend not to because with a server you likely have a specific use case for the device you just attached, maybe you want to use it for database storage, or extra swap space, or any number of other things. There's no intelligent way to divine that just from the fact a new drive was plugged in, so it doesn't.

Thats a part of it.

There is also no one canonical linux stack. Different distros do things differently (sometimes drastically so).

E.g. systemctl flags and argument ordering differs from sysv style /etc/init.d/foo commands (as just one example)

So to learn “GNU/Linux” really you need to learn several distributions and piece together the common components and why different approaches are better suited to different environments.

> There is also no one canonical linux stack. Different distros do things differently (sometimes drastically so).

Does this fragmentation help or hurt the ecosystem over the long-term? I personally think it's the #1 reason Linux never stole any sizable market share from Mac OS X/Windows. Hardware support just isn't there. Polish just isn't there. Instead I can name multiple init systems, multiple shells, multiple package managers all fighting for space in our "brains" to be remembered and used. :P

> I personally think it's the #1 reason Linux never stole any sizable market share from Mac OS X/Windows.

I assume you speak about desktop. MacOS never had any huge market share in this space. Its iPhone (and its predecessor, iPod) which allowed Apple to become as big as they currently are.

> Does this fragmentation help or hurt the ecosystem over the long-term?

Over the long-term it allows for abstraction layers, such as Ansible or Nix, to become status quo.

Most hardware systems are binary drivers that manufacturers don't want to open up in the first place. Think nvidia/broadcom

Yeah but put yourself into the shows of a hardware manufacturer: if you'd build a Linux driver, which version do you build against? There is no reference implementation that you might use as a "if it works here it should run on every Linux there is".

If we really want good hardware support on Linux we have to at least some reference that manufacturers can use to verify it works

That's not how is works. Linux devs will write high quality drivers for free for any hardware that provides a spec.

The bad manufacturers like Nvidia and Broadcom intentionally block Linux compatibility for drivers they don't write themselves.

I'm confused. Drivers are kernel level right? All the distros use the same underlying kernel code (plus patches and binary blobs). But basically if a driver works for RHEL it will work for Ubuntu. Even better once the driver is mainlined in the kernel it will be supported going forward.

The problem for hardware manufacturers is that they think their drivers are valuable trade secrets and don't want to mainline them into the kernel...

The problem isn't distros, but different kernel versions. Linux has no stable driver ABI by design, so unless you open source all your code and get it in the mainline, you have to rewrite it for potentially every single kernel release.

Again, this is by design to try and force manufactures to open their driver code. To me, that seems like a lost cause now thanks to the proliferation of binary blobs required for hardware anyway.

Source for it being by design?


> So, if you have a Linux kernel driver that is not in the main kernel tree, what are you, a developer, supposed to do? [...] Simple, get your kernel driver into the main kernel tree (remember we are talking about drivers released under a GPL-compatible license here, if your code doesn't fall under this category, good luck, you are on your own here, you leech).

In order for a driver to be in-tree, it has to be open.

Incidentally, this document is full of the kinds of arrogant and user-hostile arguments that the Linux community is known for, for example: "You think you want a stable kernel interface, but you really do not, and you don't even know it."

For what it's worth, backing your statement of "it used to be simpler" with an example of how easy it used to be to get the wrong answer does not make for a good argument.

The example provided: "how many files in a directory? just pipe `ls` to `wc` and because `ls` output is one file per line, you've got your answer!" Except newlines are totally valid characters to have in filenames. Try it:

    echo "foo" > 'test

    ls | less
    ls | wc -l
Pipe the output of `ls` to `less` to see the old-style one-line-per-file output to see what you get. Run `ls | wc -l` and see the wrong answer pop up.

Headline reaction: didn’t it used to be simpler to teach loads of things, especially using anything related to technology?

“Today in some versions of Linux ls puts single quotes around file names...”

I recommend a shell that’s intelligent enough to distinguish lines and present them downstream as strings including spaces. I hope not to come across such an odd implementation of ls ...

This is the default behavior in the GNU coreutils since version 8.25 so it will eventually be every major Linux distro.


I think the maintainers made a good decision with this change but this page exists because they received a lot of negative feedback. I’m kinda surprised to see people wanting ambiguous output.

> I’m kinda surprised to see people wanting ambiguous output.

It's visual noise, and because of column output I never found it ambiguous.

Not every major distro ships with coreutils.

It always surprises me when people are totally unreasonable. I guess I shouldn’t be surprised though.

The quotes are only used outside of pipes. One file per line without quote has always been the comportment of ls when piped (but with shell globbing you rarely if ever have to pipe ls anyway).

Or rather, a newline after each filename. However, as filenames themselves can contain newlines, a single name can span several lines. These new fancy quoting styles actually finally make it possible to parse the output of ls (although I still can't think of a reason to actually do so).

Except filenames can contain quotes as well as newlines, so it is still not possible to parse the output of ls[1][2]. In fact, filenames can contain any byte except for NUL (the terminator) and forward slash (the directory separator). Dealing with this reliably means using NUL separators everywhere, as in `find -print0`, `grep --null`, or any of a shedload of similar flags. Unfortunately POSIX doesn't seem to support any of these, and it's unlikely that NUL support will be universally available everywhere (let alone implemented using the same flags).

[1] https://mywiki.wooledge.org/ParsingLs

[2] https://unix.stackexchange.com/q/128985/3645

Quotes are a non-issue. The classic problem is that we can't tell apart newlines that are part of filenames, and newlines that separate filenames; however, recent coreutils ls provides an alternate format where this is not an issue.

It's not pretty, and it shouldn't be done but it's doable (in many shells, but none of this is specified in POSIX):

  ls --quoting-style=shell-escape-always | while read -r escaped; do
    eval "filename=$escaped"
    echo "'$filename'"

Doesn't work:

  $ cd "$(mktemp --directory)"
  $ touch $'foo\'\n\'bar'
  $ ls --quoting-style=shell-escape-always | while read -r escaped; do
  >   eval "filename=$escaped"
  >   echo "'$filename'"
  > done
There's no way to tell that that's one file rather than two. Seriously, parsing ls output reliably isn't possible. I've seen exactly one example of a legitimate ls in a script (just this week, for the first time in many years using Bash), and all that did was to check something like whether a directory was empty.

Are you sure it didn't work? How does the output look if you change the single quotes in "'$filename'" to some other character? (I mean, the point of this snippet is to parse the output of ls, i.e. assign each path to $filename, once, and then use it in some way. Of course the newline will still be there if you just echo the name.)

As per my original comment, literally any separator other than NUL (or slash if you're only ever dealing with filenames, not paths, even though that would be hella confusing) breaks with some filenames. Just use NUL.

With the given options, ls is no longer trying to separate "raw" filenames with any character. Rather, it encodes them unambiguously, and hence we can parse the newline-separated output just fine. You can see the same "trick" used in one of the links of your first comment (the one that warns against parsing ls output).

The example produces ambiguous output. But you can count the number of times echo was called to see that it worked.

    $ touch $'foo\'\n\'bar'
    $ ls --quoting-style=shell-escape-always | while read -r escaped; do
    >   eval "filename=$escaped"
    >   echo "[$filename]"
    > done

And this is why I find the idea of Powershell to be intriguing.

Some complexities added since the early days of Linux:

- systemd

- dbus

- disk encryption

- cgroups

Not even counting things that aren't technically part of Linux but are still quite common, like:

- configuration management tools (ansible, puppet, salt, etc)

- docker

- kubernetes

- every language's reinvention of 'make' and their own package managers

> - every language's reinvention of 'make' and their own package managers

Because those languages have a life outside Linux distributions, and even those cannot agree which package manager one is supposed to use nor developer specific packages that don't infect all users on the same machine.

And make is even outdated for modern C compiler toolchains with support for incremental compilation and linking. That is why everyone that cares about build performance is on ninja.

- every language's reinvention[s] of 'make' and their own package managers

.. you forgot the plural there, fixed it for you.

systemd and dbus aren't Linux features but userland software.

There are many Linux distributions that include neither of them, like Android and ChromeOS.


> Today in some versions of Linux `ls` puts single quotes around file names which contain white space likely in order to have those paths easier to copy and paste, but it does so only if !isatty().

I almost fainted. I believe this is a misprint: Linux puts the single quotes only if isatty().

The author's tweet, linked nearby, uses different wording and gets it right.

GNU isn't Unix, for one thing. The GNU project intentionally flaunts historic behaviour in favour of pragmatism. I also find this line maddening:

>in some versions of Linux ls

Do you mean "in some configurations of GNU ls"? I wonder if educating people about computers these days, might have something to do with intentionally vague and gatekeeping language.

People dislike rms because he's seen as a weirdo, and too political, so GNU is hardly mentioned anywhere outside itself and outside FSF-allied ethical/political hacker circles.

Huh? GNU is hardly mentioned because Hurd was never finished. "Linux" is the name for (obviously) all the Linux based systems if you don't have a specific distro mind.

No one talks about an "Emacs" system or a "Firefox system" either. They talk about the relevant products where the label is disambiguating, like GNU Emacs.

The article is talking about GNU coreutils. If this article was talking about how Docker behaves, I'd hope the article would refer to it as "Docker" rather than "Linux".

It's one of the things I like about powershell, rather than piping "text" you pipe objects and it remains independent of the display formatting.

This is still the default behavior of `ls` in OpenBSD, very likely all BSD's. I also remember reading one man page on a Linux machine (which presumably is using GNU core utilities) that stated `ls` would always print per line if it was directed to anything other than stdout, e.g. a pipe.

I think you mean anything other than a terminal. Whatever its output is directed to, whether it's a pipe, a file, or anything else, is the stdout of the ls process.

Do those BSDs allow newline chars to appear in file names?

Yes, and they also don't add single quotes to filenames that contain whitespace characters when displaying them as mentioned in the article.

So then BSD's behavior isn't very useful then, right? You can't actually rely on 1 filename = 1 line.

No because in the case of the "ls" command, it replaces non-printable characters with a question mark '?', including newline characters. I believe the actual source code for this behavior can be seen here: https://github.com/openbsd/src/blob/master/bin/ls/utf8.c

So then BSD can't be guaranteed to accurately display the filename. Either way, you have to pick a side.

Can I hijack this to ask how best to 'learn Unix' (or Linux specifically, i confess I'm not actually sure on where the line is, my only non-Linux Unix experience is with Mac)?

I'd like to know more about namespaces, file organisation - what 'should' be where - processes and syscalls, etc.

(To be honest man pages just don't suit me for deliberate learning, I treat them the same as --help outputs, whereas for an answer to this question, for example, I'd find a textbook more helpful.)

For Linux, specifically, I can recommend the O'Reilly book "Linux in a nutshell". Now in its sixth edition, I have been recommending it for years and it's still my go-to book.

You learn it a little at a time as you need it. man pages are great. Did you know that there are different man pages? `man 3’ for example, are the programming man pages, mostly documenting the C libraries. There’s also the `info’ pages (or pinfo if you like vi key bindings). And `apropos’ to help you find help.

Anyway, you don’t learn it before you start using it. The more you use the command line, the more you’ll find out that what you’re trying to do is easier or more scriptable there (with personal exceptions, like, I use a GUI file manager). You learn it as you need it.

Maybe I should have been more specific - I'm pretty comfortable on the command line, I use Arch, i3wm, pretty few GUIs.

I just feel there's a lot I don't know still, and is like to pick some more up preemptively rather than waiting until I need it and somehow guessing it does exist, or asking the right question to discover them as an answer.

Man pages are great when I want to know how to use something specific. I'm asking more for a textbook I can read cover to cover, and discover things I didn't know much or anything about, or hadn't thought to use.

Advanced Programming in the UNIX Environment by W. Richard Stevens and Stephen A. Rago.

As for the "file organisation", here is the Filesystem Hierarchy Standard (FHS): https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard.

There is also a lot of mention of "file organisation" in the book.

What’s weird is if I think about how I would learn C all over again, I know exactly what I would do. If I had to learn networking all over again, I would know what to do.

But Linux? I wouldn’t really know where to start. Maybe some old 80’s UNIX books. Out of newer books maybe one by Sobell or some of the O’reilly books. I think I tried learning from the “Unleashed” and SAMS books, which aren’t great.

There’s two things to learn. The technical aspect and all the commands. And then, perhaps more importantly, is the “UNIX philosophy”. I’ve seen too many systems administered and scripts setup like they were Windows systems.

The worst way to learn is by googling questions and topics, then reading the first page of search results which seem to be the same regurgitated cut and paste crap from the big “Linux admin” websites.

The AWK programming language, the Unix Programming Environment and the Oreilly Perl Books. For everything else, any serious distro provides specific man pages and handbooks.

Also learned a bit from the SAMS book, but can't recommend it

* https://linuxjourney.com/ for a nice overview

* https://github.com/learnbyexample/scripting_course/blob/mast... - my collection of resources, mostly for command line and bash scripting

* https://0xax.gitbooks.io/linux-insides/content/index.html linux kernel and its insides

Thanks - the 'journeyman' and 'networking nomad' sections of Linux Journey seem the sort of thing I'm looking for, if only they were a bit more in-depth.

If you want to really dive in you might want to check out linux from scratch. Or possibly arch linux which is a usable bleeding edge distribution that will require you to understand how to do the basics of maintaining your system.

Arch really clues you into how things work when you are used to point and click installs.

I do use Arch, suppose I ruined that joke by not mentioning it up front!

I found installing and using gentoo as my primary desktop to be a great learning experience. Linux From Scratch is often recommended, but is a significant jump higher in complexity.

I learned a ton from the first 6 chapters of "How Linux works: what every super user should know", 2nd edition.

A 3rd edition is coming out on March 2021.

"The Design and Implementation of the FreeBSD Operating System" is a great book I highly recommend.

I'm not really a fan of Plan 9 in general, but I like brining it up when teaching UNIX. Its tools are usually very short, straightforward C.

And he didn't even went through the endless list of ls options, so much for each program does one thing well cargo cult.

I honestly don't understand the nostalgia for the 1990s vendor-specific weird ass versions of Unix/similar stuff. All mutually different and incompatible and vendor-proprietary between IRIX, SunOS, HP/UX, ULTRIX, etc.

Linux, FreeBSD, OpenBSD and so forth are in a much better place now.

Red-Hat RPM, SusE RPM, DEB, tar.gz, Flatpak, Snap, X11, Wayland,....

Just another flavour of vendor-specific weird ass versions of Unix.

yeah, but even if I need to, I can easily run CentOS as a guest VM on xen, on a debian dom0.

not an uncommon way to run FreePBX for VoIP stuff which is based on centos.

A kludge that doesn't change the actual reality.

I was also referring to the general commonality of software, even it might be packaged differently. If I do a yum install htop on a centos system I'm going to get pretty much the same thing as apt install htop on a debian or ubuntu derived system.

Except if the said software tries to access folders or daemons that are distribution specific.

GNU is not Unix, the GNU utilities have all expanded drastically over the decades. Gnu ls has more LoC than the old Unix V6 kernel now.

BSD, on the other hand, IS Unix, and its utilities remain much smaller and leaner than the Linux ones.

if you mean "teach Unix" as "userland basic commands", I'm not sure, maybe, but Unix in general (including all flavors), device driver development, troubleshooting, no. If you are talking about end users being able to use Unix as their main desktop OS, the answer is "hell, no" too.

I'm working with BSD and Linux since the 90s, then around 2000-2005, I had to manage some IRIX, HP-UX and AIXs and they were all different, with different set of parameters for common tools or other specific management tools. I spent years to master shell scripting, hardening and troubleshooting on HP-UX and AIX.

Does anyone else remember the learn(1) program by Kernighan & Lesk? That's how I got started with Unix, and I sincerely think it was simpler back then.



What is Unix? Because the actual Bell-Labs Unix didn't sound easy to install or get ahold of.

Minix is trivially easy to work with but that's not Unix, just something similar.

Linux (with GNU or busybox) is super easy to find and often easier to run (both on qemu and on modern computers you can just stick the kernel somewhere the [virtual] machine can read it and then just power on.) Once you do that all the code is there. It's not unix though, it's not really even POSIX, it's just what everyone is currently doing with computers (but you really need terraform, docker, and node.js to have the complete picture.)

The author also tweeted a specific example before writing the blog post: https://twitter.com/jpmens/status/1310200703391068160 (direct link to image if you worry about twitter tracking pixels https://pbs.twimg.com/media/Ei7EmD8XgAIJjrd?format=png&name=...)

> likely in order to have those paths easier to copy and paste, but it does so only if !isatty().

I thought the reason was to aid shell scripts that assumed no whitespaces in file names, wasn't it? Also, I believe I've seen single quotes when using `ls` on a terminal, so the behavior is not only for `!isatty()`.

No, the reason is to aid human.

    $ touch 'a   b' c d
    $ ls
    'a   b'   c   d
Without quotes you would get something like

    $ ls
    a   b   c   d
and that would be confusing.

It wouldn't help shell scripts that assumed no whitespaces in file names anyway. You would get something like "'a", "b'", "c", "d" tokens which make no sense anyway.

The only sane way to iterate over files that I'm aware of is to use something like

    find . -type f -print0 | while read -d '' fname; do echo "fname: '$fname'"; done
But that's bashism...

All this is horrible incidental complexity stemming from absolute madhouse insane splitting and substitution rules in shell languages.

Splitting on spaces, expanding asterisks, shell scripting is a minefield. You would basically never want any splitting to happen. Applications like reading a CSV by splitting using IFS are dirty hacks, that break with the slightest additional file parsing complexity (escapes, quoting etc).

In Python you can just do `os.listdir('.')` and it will actually do what you want. No 5 layers of "oh, actually" and "yes, but what if", which inevitably happen in threads discussing the simplest shell operations. It just works as intended. If you only want the files and not the directories, you can do `filter(os.path.isfile, os.listdir('.'))`.

There is no reason to use shell scripts for anything more complex than a few lines or for one-off interactive work.

To be even more pedantic, you actually want to do

    while read -d '' fname
      echo "fname: $fname"
    done <<<"$(find . -type f -print0)"
Because if you wanted to export a variable from the loop body, it wouldn't work in your example -- the loop is running in a subshell when you pipe output into it. <<<"$(...)" is not exactly the same thing (it doesn't stream input) but it does allow you to get around this problem. For folks who suggested glob expansion -- that's okay for simple file listings, but find lets you do more complicated ones that can be quite useful. Not to mention that the suggested

    for fname in *; ...
Is not actually doing the same thing -- find lists files recursively.

find(1) is super useful for any hierarchical or filtering case, and I use it often, but it can also be arcane and unwieldy. In many circumstances your cwd shell iterator can be simpler:

    for fname in *; do echo "$fname"; done
This skips dotfiles, of course, but that’s intentional. The point of dotfiles is to be skipped.

With bash or on GNU systems try

    printf "%q\n" "$fname"
instead of echo, to obtain an escaped string.

I thought that it would iterate over space-separated tokens. But it works, some magic here, thanks, that's definitely the proper way to go.

It's because the result of a glob is not immediately subjected to shell expansion.

won't this echo "*" if there are no files in the directory?

I always have to look up the 'safe iteration' invocation when iterating files in a directory, because it involves jumping through a few more hoops than is really reasonable.

Yes. You can change the behaviour in bash by setting the nullglob option (shopt -s nullglob).

    find -exec


Won't work for complex loop body.

Isatty() is powerful. Powerful isn't always easy to work with. But, sometimes we want powerful. Like, knowing it's a tty so not echoing password. Or, Pike's plumber. Or maybe even what ls does?

I also got confused when it converted to more complex behaviour, but not everything about it is bad.

That's the problem with passing around unstructured text as opposed to data structures.

Of course, while it doesn't solve the entire problem, there are ASCII characters for things like "group separator", "record separator", and "file separator" that could be used for establishing structure in output. But they're not used often, and so we just parse plain text.

And, of course, it's valid to use these characters in filenames because apparently the idea of making any filename invalid is anathema.

Can you elaborate please? I'm curious.

Basically most Unix utils outputs free text data and then one expected to parse/filter that free text to pipe to another util. Power shell tries to operate in 'objects' and not free text.

I'm well aware. I think the Unix approch might have worked back in the day, but it's kind of broken these days.

Maybe we shouldn't use complex structures if we don't really need. Having e.g. special characters which can't happen in normal strings - only on boundaries - makes parsing simpler. At the same time we remove "hidden" layer of object organization, so whatever is seen in shell can be byte-for-byte transferred to another computer and processed there.

I've seen arguments for object-oriented shells when we want to have more complex plumbing than traditional Unix shell piping, but not sure if those goals couldn't be achieved while remaining in text-only land.

Back in the day it was considered the only option for many applications.

It seems very questionable to be living with a severe restriction from the 1970s in 2020.

But... it can't be unbroken without breaking backward compatibility in all kinds of legacy code.

VMS had structured files in the 1970s. Plaintext streams were not considered the only option; this was a deliberate decision by the developers of unix because they regarded it as more flexible.

The trick is to create an alternative that can coexist.

Filenames can have newlines. If you had it as a string data structure you wouldn't have bad parsing assumptions people make like files contain no spaces (most realize they can), files contain no newlines (many don't know they can), a file named '-rf' doesn't ever exist in the current working directory (many people may not realize `rm *` may recursively forcibly delete lots of things under a directory).

Newlines in file names screw up so many things (both on the command line and in visual output) that I think that anyone using them should just stop.

hah, I made a related mistake once: I had intended to move something from `file` to `~file`, and moved it to `-file` instead.

This breaks a remarkable number of things! I just deleted it from Finder, then remembered I could `rm '-file'`.

Typing those single quotes in a shell won't make any difference as far as rm is concerned (because it won't ever see them). However, some rm utilities are smart enough to remove files, even if they resemble non-existing options. For others, the idiom is (either) "rm -- -file" or (preferably) "rm ./-file" although you can combine the both of them.

A good example is a naive shell script that tries to parse the output of ls


The official conclusion by GNU is that "ls" has already been a lost battle back in Unix's days when decisions were made to change its behavior between single-column and multi-column modes according to whether the output is a terminal or not, and since then more and more features have been progressively introduced. The mere existence of "ls" actually violates the GNU Coding Standard [0].

> Please don’t make the behavior of a utility depend on the name used to invoke it [...] Instead, use a run time option or a compilation switch or both to select among the alternate behaviors [...] Likewise, please don’t make the behavior of a command-line program depend on the type of output device [...]


> Compatibility requires certain programs to depend on the type of output device. It would be disastrous if ls or sh did not do so in the way all users expect. In some of these cases, we supplement the program with a preferred alternate version that does not depend on the output device type. For example, we provide a dir program much like ls except that its default output format is always multi-column format.

As a result, GNU provides a "dir" command since the early 90s, which is meant to be a consistent alternative to "ls". Yes, unlike what the myth said, it was not here to help CP/M or DOS users.

The issue of "ls" has been described in this StackExchange answer by Eliah Kagan [0],

> When its standard output is a terminal, ls lists filenames in vertically sorted columns (like ls -C). When its standard output is not a terminal (for example, a file or pipe), ls lists filenames one per line (like ls -1). When its standard output is a terminal and a filename to be listed contains control characters, ls prints ? instead of each control character (like ls -q). When its standard output is not a terminal, ls prints control characters as-is (like ls --show-control-chars).

On the other hand,

> Whether or not its standard output is a terminal, dir lists filenames in vertically sorted columns (like ls -C). Whether or not its standard output is a terminal, when dir encounters a control character or any other character that would be interpreted specially if entered into a shell, it prints backslash sequences for the characters. This includes even relatively common characters like spaces. For example, dir will list an entry called Documents backups as Documents\ backups. This is like ls -b.

[0] https://www.gnu.org/prep/standards/standards.html#User-Inter...

[1] https://askubuntu.com/questions/103913/difference-between-di...

Yes. The more important point of the article is that behavioral changes in *nix commands/utilities have enormous downstream consequences that often get ignored or lose out to some Platonic ideal of "better" or perfection


1 point by somerandomboi 0 minutes ago | edit | delete [–]

This appears to be quite a subjective claim, people comprehend/retain information extremely differently. The ls command applied as an example to the argument undermines its credibility, as virtually every Unix user is aware of its importance. ls remains arguably the most Unix filesystem command, why would it be difficult for users to understand its relevancy?

The author isn’t talking about teaching ls, but about using it to teach fundamental principles of Unix, in particular “do one thing well” in the context of piped standardised regular line-oriented text formats.

So ls is chosen because it is supposed to be the universally familiar archetype, but author is noting that corruption of the underlying principle in the default configuration of some variants & distributions has extended to something as basic as this.

Hey, it looks like you've copied more of your comment than you intended to ;)

I'm on Ubuntu 18.04 and ls doesn't output files in a multi-column format.

Before virtualization and containerization, UNIX machines needed careful love and feeding. But now they are just disposable. Configured by scripts, and thrown away when they are no longer needed.

I mean. Maybe? Ideally?

Mostly we’ve traded a class of problems (debugging unknowns and differences between systems) with a new class of problems (complexity of cloud providers, abstractions and less debugability).

Isn't that the UNIX philosophy, just developed further? Do one thing and easily compose, applied to hosts?

Contrary to what Docker and Kubernetes generation might think, we were already doing that in HP-UX and Tru64 glory days.

Turns out "retry, reboot, reinstall" was right all along.

in some ways it was simpler, but in a lot of ways it wasn't. reference materials back in the day were on actual goddamn paper, or in man/info pages, which seem weird and archaic nowadays. sure, maybe the tools were simpler, but they were sitting on top of a lot of complexity that you had to know about: fundamental hardware was more diverse in a way we largely don't have to worry about as often. docs seemingly haven't evolved as much though--i can't really point to an MDN of infrastructure knowledge, even if the old manpages are still around.

i'd say that the problem now is less a lack of simplicity and more that there just isn't a whole lot of interest in teaching practical usage and operation of the computer systems that underlie a lot of modern tech: from where i sit, /industry/ (individuals are whole different story) is interested in paying well primarily for new development both applications and infrastructure, but has decided that the quality, improvement, and maintenance are more of an afterthought. it's a microcosm of a larger human tendency to prioritize novelty, perhaps.

from where i sit, much of the opportunity for a good (well-paid) position is skewed towards greenfield development: if you can build new applications, or architect new infrastructure, great! industry loves you, and will pay you much dosh to do that. if you want to fix or improve existing things, or ensure that new stuff actually works, you're valued less.

QA/SDET as an attractive profession appears to have largely ceased to exist, cause you can just assign that work to the greenfield devs, and they can do it sorta--they'll probably focus more on new development, and the QA will suffer for it, but they'll produce enough testing and validation product to tick a box.

on the infrastructure end, architects are in great demand, but once that infrastructure is built, you don't "need" anyone to maintain or improve it, or diagnose issues as they arise. i've spent a lot of my career doing tech support work for infrastructure for odd career path reasons, and capable colleagues or counterparts are few and far between: the people i support, who maintain existing systems, seem to have not the slightest idea how they or their underlying protocols work, and other people in support roles seemingly sit on a binary divide of incredibly capable technical diagnosticians and troubleshooters and people who have been shunted into the role for lack of ability to perform in greenfield work, but can't perform in other roles either.

management has often asked me why they can't find people who are good at maintaining infrastructure or who are able to diagnose and address emergent problems, but the root of this seems quite obvious. it's not that the systems have grown too complex: they've always been complex, and while the locus of that complexity has perhaps shifted somewhat, i don't think our forebears were operating in some simple, easily-understood world that has ceased to exist. the problem is that nobody wants to pay anywhere near for an advanced brownfield skillset compared to what they'll pay for adjacent skillsets in greenfield work. the smart and capable people recognize this and move towards greenfield work even if they don't like it as much, and brownfield work is left with a sea of people who couldn't transition and can't deal with the complexity because they never could. the complexity or difficulty ain't new, but all the people that could deal with it were driven out.

My first foray into seeking linux help on the web, circa 1996

  me> hey how do you bla
  guru> rtfm
  me> wtfm
  guru> $ man bla
  me> oh shit thanks
To this day, I still use man on a near daily basis.

The bsd/Solaris man pages were always significantly more useful because they incorporated practical examples due to the philosophy of putting more stuff in info on gnu systems. I found this pretty tedious as anything arcane can be hard to derive from invocation specifications without examples.

Sounds like Arch.

In 1996??? No, it was redhat.

>if you can build new applications, or architect new infrastructure, great! industry loves you, and will pay you much dosh to do that. if you want to fix or improve existing things, or ensure that new stuff actually works, you're valued less.

Let the old flesh die. Maintaining it is Sisyphean, and it should have been written in Rust.

>the smart and capable people recognize this and move towards greenfield work even if they don't like it as much

People can actually like maintaining enormous, C/C++ legacy codebases instead of greenfield work in safer and smarter and easier-to-use technologies?

Masochists, I guess.

> People can actually like maintaining enormous, C/C++ legacy codebases instead of greenfield work in safer and smarter and easier-to-use technologies?

in a sense, yes? it's not masochism, it's perhaps recognition that sometimes it does make sense to recognize that old systems are imperfect, but weren't built entirely out of toothpicks and gum, and can be retrofitted to remove their worst parts, and that strong parts do exist and can be made stronger?

the alternative seems to be that we continuously build new toothpicks and new gum, but that somehow those will better by virtue of having been forged in a modern toothpick and gum factory that puts out perfect new shiny. it definitely has no faults of its own to be discovered several years down the line, nope.

I patch that behavior out of the GNU core utils on my Gentoo machines, and set the quoting style properly everywhere else.

This is one of the worst behavior changes ever to come out of GNU. It should never have been the default.

As someone who actually likes the new behaviour[+] would you mind elaborating why you dislike it so much? The only case I can think of is copying filenames into a GUI or something which doesn't support escaped paths, but the previous behaviour wasn't much better because you couldn't be sure what the full filename was.

[+]: Imho it makes it easier to copy-paste paths (sometimes I need to , as well as spot whitespace / strange characters in filenames.

It's not about liking the behaviour, it's about the usual UNIX nonsense of arbitrary breaking decisions being made on the hoof for whimsical reasons - everywhere. Because no one cares enough to cooperate with others on these things outside of their own little sandpit.

Path handing is an important feature. It should be standardised and predictable. It doesn't even matter how it's standardised. What matters is that everyone uses the same system so there are no random surprises or thwarted expectations.

In a robust OS everything would be a lot more interoperable and standardised than it is in UNIX. Being able to pipe things around is not the killer feature it might be - not if you have to waste time pre/post translating everything for arbitrary reasons before you can do anything useful with it.

The main way you'd be interoperating with ls is through pipelines, and GNU ls doesn't do any pretty-printing of paths when used in a pipeline (besides, arguably the correct way of handling paths that may have newlines or spaces embedded is with `find -print0` but that's a whole different topic). Or is there some other aspect of interoperability you're referring to?

But if we're being honest, path handling (as well as structured data) in shell scripts and pipelines has always been of the largest trash-fires in Unix -- while I don't personally like how Powershell solved the problem on Windows at least they tried to solve it.

If you use open source software long enough, you will eventually be disappointed in a design decision. In this case, it was a big surprise when a tool I'd been using since the early 1990s suddenly changed its output with absolutely no warning. This is both an expected result from using Gentoo, and simultaneously very disappointing.

I don't think I was the only person unhappy with this. The fact that https://www.gnu.org/software/coreutils/quotes.html exists seems to indicate that others feel as I do.

Furthermore, I was disappointed by the reaction from both the developers and other people leaping to their defense who felt that they'd been personally insulted by users suggesting that this may have been better as a non-default option. If I can set QUOTING_STYLE=literal everywhere, surely the distro maintainers who wanted this could have set QUOTING_STYLE=shell-escape?

I'd be the first to say that everyone is free to disagree with me. I have the source, I have a workaround, I adapt.

I have been disappointed in FOSS design decisions before, but I've also been on the other side of such decisions as a maintainer -- and it's often the case that users aren't aware of all of the trade-offs that go into making "obvious" decisions. I try to be more understanding these days as a result -- yes, sometimes maintainers are wrong but all things being equal they probably know better than you or I what the right decision is.

For instance, one could argue that hiding the behaviour behind a flag makes the feature effectively useless (users that would benefit most from it would never know about the flag, and users who know enough to find the flag probably know about `find -print0` too). Punting the problem to distributions just means that everyone who is against the feature on general principle will now hound distributions for making the change (probably making arguments like "why are you making yourself incompatible with Debian X or Ubuntu Y.Z?") -- and will also result in the feature being unused and thus useless.

Now, is that enough of a reason to make a change to the default behaviour? I don't know, but to me it doesn't seem as though the right decision was "obvious". And again, the behaviour is only different when the output is displayed on an interactive terminal -- so the only breakage is the interface between the screen and your eyes.

The ls quotes behaviour still pisses me off. One time I let off steam by tweeting about that stupidity and @-ing the maintainer of that package. Of course he defended that decision and moaned about how his work maintaining an open source work is under-appreciated and what he gets is just the abuse. It wasn't a productive conversation, but I realised the more he got criticized the more he was going to defend that that decision wasn't wrong. (It is, though.)

Whatever you hoped to achieve, that was really counterproductive. You made someone who volunteers to maintain open source enjoy that just a little less, and accomplished nothing. Even the way you describe the interaction here makes you come across as deeply entitled and unpleasant.

I agree with this so much. What an unaware individual to just suddenly realize that calling someone's decisions stupid isn't a great way to get a change happen.

This was many years after they added this change, and there was also a lot of debate/abuse when it came (I didn't take part), so I wasn't aiming to change minds. Fine, call me entitled, but stupid technical decisions are still stupid, and people shouldn't just be left to "enjoy" their work after doing something stupid affecting all Linux systems on the entire planet.


So you felt the need to rant at someone, knowing that it wouldn't be productive and that they'd experienced abuse over this in the past, but did it anyway? Why do you feel that's an appropriate way to act toward another person?

Ignoring that FOSS developers are basically working in the public good (and usually unpaid or underpaid relative to their impact), this is a childish way of acting towards anyone in an even remotely professional environment. The maintainer replying to you was actually a courtesy, but of course you see it in a negative light.

If every technical disagreement you have ends with you ranting/abusing the other person, you'll quickly discover you're the only one left in the room.

Behavior like yours is why we need codes of conduct. Did you stop to think that you can fix this yourself, either by recompiling ls itself or through a simple alias?

The quotes are actually a really useful feature. I generally try to avoid putting spaces in filenames, but it can be unavoidable. It helps when copying and pasting file names to other commands.

I'm happy that you've realized that the conversation wasn't productive and that it ended up entrenching the maintainer.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact