That's my primary issue with UNIX culture. It took a huge step backwards by deciding to work with unstructured text. It wasn't a wrong turn, mind you. It was backtracking on known and understood best practices all the way and then picking a wrong turn. And only now people seem to rediscover what was in common use in the era before UNIX - the virtues of structured text.
I guess our industry is meant to run in circles, only changing the type of brackets on each loop (from parens to curly on this iteration).
It's not like unstructured piped text is the only possible way to work. It's widely used precisely because it's so expedient. If you use structured data, then every program in the sequence has to understand the structure. If you just smash everything flat into a stream of text, you can then massage it into whatever form you need.
It's not always the best way to approach a problem, but it's not meant to be. It's duct tape. You use it where it's good enough.
The way i see it, shell scripting and piping allows non-programmers to get their feet wet, one command at a time.
You run a command, look at the output, now you know what the next command in the pipeline will see, and can add adjustments as needed.
Powershell etc seems to be more programmer oriented in that one keep thinking in terms of variables and structures that gets passed around.
And this seems to be the curse of recent years. More and more code is written by programmers for programmers. Likely because everyone has their head in the cloud farms and only the "front end" people has to deal with the users.
UNIX came when you risked having admins and users sitting terminal to terminal in the same university room. Thus things got made that allowed said users to deal with things on their own without having to nag the admins all the time.
UNIX actually came when users were programmers at the same time. There was an expectation in the past that using computers involved knowing your way around your OS and being able to configure and script things.
If you use structured data in a standard format, you can have a single system-wide implementation of a parser and then have each program process the data it needs, already structured in a semantically-meaningful way.
In current, unstructured text reality, each program has to have its own (usually buggy, half-assed) shotgun parser, and it has to introduce semantic meaning back to the data all by itself. And then it destroys all that meaning by outputting its own data in unstructured text.
It works somewhat ok until some updated tool changes its output, or you try and move your script to a different *nix system.
But if your data is structured in a semantically meaningful way, then your receiving program needs to understand those semantics. Maybe you could introduce new command line tools to refactor streams of data so as to adapt one program to another, but I can't see it being simpler and quicker (in terms of piping input from one place to another) than the current approach.
I do like the idea of a standard parser to avoid the ad-hoc implementation of parsers into everything.
Your last comment gives a hint at the real problem, which is people using command line hackery in 'permanent' solutions. It's duct tape. You don't build important system components out of duct tape. Well, you shouldn't, anyway.
Well, I'm advocating for structured text, not binary. Mostly because I haven't seen a future-proof binary format yet, and editing binary formats indeed would require special tooling. I think - for a data exchange protocol meant to be used between many applications - going structured text instead of binary is a worthwhile tradeoff of little lower efficiency vs. much better accessibility.
EDIT: Some comments here are slowly making me like the idea of a standard structured binary more and more.
Let's say I want to know how many days since an asset on my web server has been modified. With bash + some standard unix tools, from the top of my head I have to do something like this:
And that's just to get the last modified date in text form. Now I'm writing a script that parses that date and gets today's date, convert them to days, and subtract. YUCK!
Config files are less of a problem. The issue is with programs, which you want to use with pipes. Each has its own undocumented, arbitrary pseudo-structure with often inconsistent semantics, optimized for viewing by user.
systemctl status wibble.service : Displays human-readable information, with colours, line drawings, and variable-length free-form text; that is exceedingly hard for a program to parse reliably.
Contrast with
systemctl show wibble.service : Outputs machine-readable information, in systemd's favourite form of a Windows INI file.
Yikes. Makes me glad I run Slackware. Though I seem to recall from The Init Wars that it was precisely this quality of SystemD that made people lob the charge that it violated the "Unix philosophy"
All four of which users are enjoined, over and over again, not to try to parse the output of (particularly ls). That is, those are the tools specifically not meant to be connected by pipelines, but merely used for operator convenience.
It's only "specialized" because we haven't been doing it so it's considered special.
At some point you have to admit that what's meant for the computer is not always byte by byte the same as what's meant for the human.
We try to shove these two together and we screw up both of them.
Empower the computer to be the best that it can be by taking the human out.
Empower and respect the human by giving him/her their own representation.
The "I just want to read the bytes that are on disk" philosophy is inherently limiting and broken when the audience are two very different things (humans vs computers).
My argument is that instead of fighting that we must embrace it.
Yes, because it's:
a) inefficient
b) invites a lot of wrong assumptions about the storage format, e.g. about the underlying grammar, the maximum length of tokens or the possible characters than could occur in the file.
c) requires you to solve the same problems over and over again (how to represent nested lists, graphs, binary blobs, escaped special characters, etc)
d) encourages authors to roll their own format and not research if there maybe is an existing format that would solve their case.
I agree with you, the other extreme - treat binary files as sort of opaque black boxes that you can only access with specialized tools beloniging to your application - is even worse. But I don't see why we can't reach some middle ground: have a well-documented, open, binary format that encodes a very generic data structure (maybe a graph or a SQLite-style database) and a simple schema language that lets you annotate that data structure. Then you can develop generic tools to view/edit the binary format even though every application can use it in their own way.
Parsing structured text is slow and inefficient. Also reading out just one part of a data structure stored as text often requires either walking through the file character by character or first slurping the whole thing into memory.
Let's not forget that when your system crashes, all of these easy-to-read text files are actually stored in a binary format, sometimes scattered in pieces, and require special tools to extract.
I guess our industry is meant to run in circles, only changing the type of brackets on each loop (from parens to curly on this iteration).