Hacker News new | past | comments | ask | show | jobs | submit login
The Rule of Silence (2006) (linfo.org)
338 points by riansanderson on Dec 13, 2016 | hide | past | web | favorite | 310 comments



To play devil's advocate, part of the reason things like The Rule of Silence are talked about is because of the messy unix philosophy of treating everything like plain text.

If structured data was embraced we would have developed appropriate tooling to interact with it in the way that we prefer.

This runs very deep in unix and a lot of people are too "brainwashed" to think of other ways. Instead they develop other exotic ways of dealing with the problem.

Oh you don't like that output? Easy! pipe that crap into sed then awk then perl then cut then wc and you're golden!

When you get tot that point you have to understand that you have chosen to ignore the fact that the data you are dealing with must be represented in something much closer to a relational database than lines of ASCII text.

Logging is another area you see the consequences of this. A log is not a line of text. Repeat after me a log entry is not a damn line of text.

"Oh but isn't it neat you can pipe it to grep?" NO! No it's not neat, maybe it was neat 20 years ago. Today I want that damn data in a structure. Then you can still print it out in one line and pipe it to grep all you want.

Another area that you see the unfortunate side effects of this philosophy is with the mess of file-based software configuration.

Yes I get it, you like your SSH session and Emacs/Vim blah blah but that's short-sighted.

I want my software configuration stored in a database not in a bunch of fragile files with made up syntax that are always one typo or syntax error away from being potentially silently ignored.

The fetish for easily-editable ASCII files and escaping from structure is holding us back. Structured data does not automatically imply hidden and inaccessible, that's a matter of developing appropriate tooling.


In 1993, someone made exactly your argument a in Redmond board room, and so many people agreed that what you describe could be adequately called "The Windows Philosophy".

All settings in a database and not text files (the registry); a command-line that pipes data, not text (PowerShell). Tailored UIs to change settings, not magic invocations and obsure text file syntaxes.

I guess most developers on HN are also aware of the downsides of this philosophy. If not, try configuring IIS.


Surely you can reconcile structured representations and something like the Unix command line.

Imagine if the default wasn't bash, but something like Ruby + pipes (or some other terse language).

What is the argument for shell scripts not working on typed objects? How much time has been lost, how many bugs have been created because every single interaction between shell scripts has to include its own parser. How many versions of " get file created timestamp from ls" do we need?

Something Windows does get right is the clipboard. You place a thing on the clipboard, and when you paste, the receiving program can decide on the best representation. This is why copy-pasting images has worked so magically.

I could see an alternative system where such a mechanism exists for shell programs.


Wow, the clipboard is really a thought-provoking comparison. I'm not sure if many people are quite aware of what you said, unless they've done desktop programming: when an application puts something on the clipboard, it can put multiple formats, so that when something else wants to retrieve it, it can use whichever format it prefers. This is how you get such good copy/paste interoperability between programs.

What if pipes worked the same way? What if we added stdout++, stderr++, and stdin++, and when you write to stdout/err++, you can say which format you're writing to, and you can write as many formats as you like. And then you can query stdin++ for which formats are available, and read whichever you like. And if stdin++ is empty, you could even automatically offer it with a single "text" format, that is just stdin(legacy).

The appeal of the Unix text-based approach is a kind of "worse is better". It is so simple and easy, compared to Powershell. The clipboard idea seems like it has a similarly low barrier to entry, and is even kind of backwards-compatible. It seems like something you could add gradually to existing tools, which would solve the marketplace-like chicken-and-egg problem.

You could even start to add new bash syntax, e.g. `structify my.log || filter ip_addr || sort response_size`. (Too bad that `||` already means something else....) Someone should write a thesis about this! :-)


Didn't the Amiga do something like this? (I'm not actually familiar with its OS, I've just seen allusions to how it handled file formats)


Amiga had datatypes.library, and also IFF filetypes (that were mostly lovely) designed be EA for DPaint. It's the way the world should work.

Let's say you're MS writing word for the Amiga.

They provide a datatypes description for doc files. this gives ability to read and write the format, and fingerprint it (not based on extension).

Now any program, old or new, that wants to read or write doc files can do. It's just there.


The problem is that now simply providing user-friendly output is not enough. For every program or script you throw together you need to provide the text output for the user, and then the type object stream for piping. And then, the user would need to read documentation to see how to access each piece of data, what data type it is, what data type the other command takes, and maybe consider how to convert one to the other.

... at which point you basically have a scripting language, so you could just as well use an existing one (e.g. Ruby).


> The problem is that now simply providing user-friendly output is not enough. For every program or script you throw together you need to provide the text output for the user, and then the type object stream for piping.

In PowerShell if the result of a command is just an object the object is pretty-printed to the console in practice ends up looking pretty much like what a Unix command would have given you.


With generic pretty-printing, your program output becomes generic.

Compare the output of "df -h" vs the PowerShell equivalent "gdr -psprovider filesystem", for example. One provides the data in dense (easy to follow) rows, while the other spaces it out across the whole screen, leaving large gaps of empty space around some columns while also cutting off data in others. The difference is especially noticeable of you have network shares with long paths.

PowerShell is probably nice for scripting, but I wouldn't want to have it as my shell.


You can pretty easily pass it to select and get just the properties you care about, or you could output to a different format with the various out- commands. I find it pretty good as a shell.


Options to select output? Sounds like advice from the Rule of Silence.

But.

- People above were just complaining about having to use sed to tweak output. I don't see why they would prefer a built-in filter to an external one. The external filter is far more flexible, and if that isn't enough, you can replace it.

- I'm generally not a fan of applications that tailor their output to what they think the human wants. Unless there is a deeply compelling reason, I want to see the same output on pts/0 as something down the pipeline. The reason for this is that picking up environmental hints to serve as implicit configuration is hacky, subject to error and and can later be the cause of really difficult-to-find bugs.

Perhaps I'm just irredeemably brainwashed. If you like a typed command line, Redmond has your back. For me, wanting types is a signal that I should start considering whatever little shell hack I'm working on complex enough to take it to a language that wasn't designed for ad hoc interactive use.

And at the same time, I really, very much do not want my command line to look like C#.

I get that a lot of folks these days are mostly GUI users who maybe type a little at git or run test suites from the command line, and not much else. I get why things like Powershell are appealing to such folks[1]. But when the command line is your primary interface, strong typing and OO hoop-jumping is huge waste of cognitive energy.

I do feel that Unix, to a first approximation, got the balance somewhere close to right. Loose coupling with lot of shared commonalities instead of a rigid type system and nonexplicit magic works really well for me, and if tighter coupling is a good idea, then I'll build it.

[1] Why they want to radically change the command line instead of using their favorite language to do systems stuff from the comfort of wherever they spend most of their time, I do not get.


I feel compelled to point out, re-reading this, that you've misunderstood. You pass the output to the Select command and indicate the object properties you want; it's not a feature that has to be built in into each command.


> People above were just complaining about having to use sed to tweak output. I don't see why they would prefer a built-in filter to an external one.

You don't see why someone would prefer 'select' to scraping with grep or awk?


Yeah, wow, all I have to do is write a bunch of regular expressions tailored to the unique output of this command instead of passing in a list of property names. It's so easy.


To add to that: a list of property names you don't even have to exactly know beforehand, since the shell can easily deliver you the exact data structure and metadata of any object (cmdlets, parameters, results,...) usable on the shell.

I find the idea baffling that basically having to have the whole syntax tree memorized instead of the shell providing it for me is somehow less of a "waste of cognitive energy".


I really think people are mistaking "I'm not familiar with it" with "it's bad and poorly designed" because they've forgotten what it was like to first use the Unix command line.


Slightly more readable posh version:

    Get-PSDrive | Where Provider -like '*FileSystem'


> in practice ends up looking pretty much like what a Unix command would have given you.

So... no real improvement, then?


The improvement is that piping commands together doesn't require tedious text-munging while the console output looks basically the same. Which I think you could have figured out if you thought a little harder before jumping to snark.


>What is the argument for shell scripts not working on typed objects?

Typed objects can make it harder to pipe commands together. How do you grep a tree when tree is an actual data structure and grep expects a list of items as input? You would need to have converters. Either specific converter between tree and list, or a generic one: tree->text->list.

>Something Windows does get right is the clipboard.

It useful, but the actual implementation is pretty bad. Opaque, prone to security issues, holds only single item, cannot be automated.


> Typed objects can make it harder to pipe commands together. How do you grep a tree when tree is an actual data structure and grep expects a list of items as input? You would need to have converters. Either specific converter between tree and list, or a generic one: tree->text->list.

To be fair, untyped objects also require converters, but at every boundary. That is, instead of having some pipes of the form `program -> mutually agreeable data structure -> program` and some pipes of the form `program -> unacceptable data structure -> parser -> program` (as happens with a typed language), you are guaranteed by a text-based interface always to have pipes of the form `program -> deparser -> text -> parser -> program`.


grepping a tree would not be hard. You simply go down to the leaves and "grep" on the properties themselves. Inversely, structured data lets you easily get to a property.

For example, if you grep the output ls -l for a file named "1", you'll also get files with 1 in their timestamp. In text land, you have to edit the ls command to get simpler output. In structured land, you could edit your filter: ls -l | grep name~=1

You could imagine various structured data filter tools that could be built that wouldn't require modifying the input.

Though in this example, you can easily use awk to select the column, wouldn't it be nice to not have to worry about things like escape characters when parsing text?


How about protocol buffers and content negotiation? Your pipe figures out a proto that program a can emit and program b can consume.

ls -> repeated FileDescriptor files; | repeated string names;

Where FileDescriptor is whatever it needs to be, has all the info ls -l does. You have a heirarchy of outputs: if the next takes FileDescriptors, you give it FileDescriptors, if it doesn't you give it strings.

What would go to stdout goes through a toString filter.


You are absolutely, completely correct. The two can certainly be reconciled!

There is one possible complication, though. The two would need to be reconciled in the same way by everyone who wants to write a shell tool. Given that even fairly simple standards (RSS, HTML, etc.) cause lots of failures to comply, what are the odds of near-universal compliance in a larger and more diverse ecosystem like shell utilities?


Image processing works pretty well in shell pipelines because image formats are generally self-identifying.


> What is the argument for shell scripts not working on typed objects? How much time has been lost, how many bugs have been created because every single interaction between shell scripts has to include its own parser. How many versions of " get file created timestamp from ls" do we need?

Aside: that's what the stat command is for. My big concern with types is how would you make sure that the output of a command will always have the right types? Otherwise you'll have runtime type errors which would be just as bad as runtime parsing errors.


> Otherwise you'll have runtime type errors which would be just as bad as runtime parsing errors.

Parsing errors in script shells can easily go unnoticed (until you realize your data is corrupted).


> How many versions of " get file created timestamp from ls" do we need?

none? 'ls' is for humans. 'stat' is for scripts.


What you get from text + availability of source code = a set of practical documentation that shipped right where you needed it.

Microsoft forgot (and still forget) the documentation for Windows - if you want a nice A-Z reference for the bootloader, or kernel, or shell or IIS's configuration file or half the command-line tools, you're usually out of luck. The official place is often an inaccessible, badly-written knowledgebase article written from a task-based, usually GUI-based perspective.

It never mattered that they'd implemented a more coherent system full of better ideas, because the only way they'd tell you about it is through the GUI.

The MSDN CDs from 20 years ago were really good for a complete programmer's reference, but 1) I'm not sure how well they kept that up and 2) I could never find anything as comprehensive for sysadmins.


You should have kept looking at the MSDN and TechNet, because the idea that there's no documentation because it has been forgotten, or indeed the idea that there is no documentation, is utter nonsense.

Windows Server 2012 Command-line reference:

* https://technet.microsoft.com/en-gb/library/cc754340(v=ws.11...

Windows XP A-Z Command-line reference, that is actually named that:

* https://technet.microsoft.com/en-gb/library/bb490890.aspx

Doco for the wdsutil and md commands, picked at random:

* https://technet.microsoft.com/en-gb/library/cc771206(v=ws.11...

* https://technet.microsoft.com/en-gb/library/cc754711(v=ws.11...

bcdedit command references:

* https://technet.microsoft.com/en-gb/library/cc731662(v=ws.11...

* https://msdn.microsoft.com/en-gb/library/ff542205(VS.85).asp...

And so forth. There's a huge amount of doco, including reference doco.


This perspective sounds a bit out-of-date when you consider how heavily they're pushing to get PowerShell into sysadmins' workflow. The online MSDN docs are also often pretty good.


The Windows way of working is only "self-evidently" horrible if you're used to the Unix way of doing things. There are real defects that are legacy baggage (running as admin by default is not good, the registry probably shouldn't be a single database or at least should have better segregation between apps) but having a UI and having real objects in the shell isn't one of them. And I hardly think the registry and PowerShell, which came out nowhere near the same time, were conceived of at the same time.


> And I hardly think the registry and PowerShell, which came out nowhere near the same time, were conceived of at the same time.

I believe that the registry, in a way, caused PowerShell.

PowerShell works the way it does because Windows is structured data all the way down. A text-based shell a-la bash would not be very useful for Windows sysadmins. If you want to do Windows automation (e.g. on a cluster of windows servers or whatnot), you need to process and manipulate structured data (server settings, user permissions, AD groups, whatever). Hence, a shell and scripting language for doing just that.

If Microsoft hadn't moved from .ini files to the registry between Windows 3.1 and 95, I don't think PowerShell would've had the same design goals as it does now.


Perhaps, but PowerShell wouldn't look the way it does without the CLR being ready to hook into either. And I could easily see just having applications that interact with the structured data but return text -- that's what all the old stuff did.


S/PowerShell/Windows Scripting Host/ ... which was later than the registry, but not much.


I'd say PowerShell and the WSH are fairly different but I've worked very little with the latter.


The simple fact that the Registry doesn't have comments is ludicrous.


What database has?

To clarify: The registry is a database for OS or application stuff (caches, settings, etc.). It's not meant to be user-editable and outside arcane trouble-shooting stuff you're unlikely to ever have to venture in there.


Oracle, for one, has had it for a long time.

"Use the COMMENT statement to add a comment about a table, view, materialized view, or column into the data dictionary."

https://docs.oracle.com/cd/B19306_01/server.102/b14200/state...


Also PostgreSQL (since 7.0, released in 2000):

https://www.postgresql.org/docs/current/static/sql-comment.h...


Also mssql, the extended property ms_description can be applied to objects and columns and appears in SSMS as comments.


> It's not meant to be user-editable

I think that's kind of a stretch since regedit has been a part of windows as long as I can remember.


I'd say regedit is not really the preferred method for anyone to interact with the registry though.


I'd say that regedit is integral to day-to-day use of Windows.


Well I don't know what in the world you're doing but I almost never use it.


> arcane troubleshooting stuff

... is exactly the reason to have useful comments in there.


Config files do, which is what the Registry replaces.


Adding to that my favourite pet peeve - translated UIs for server processes.

Every time I come around a German installation of IIS or SQL Server I'm cringing. Googling the right solution and then trying to figure out how they've translated this option is something I can't stand.


For the best of both worlds, why not something that is both text and structured, like JSON or sexps?


Yes! I don't know how many times I've had to learn yet another one-off configuration syntax, logging format, or template language and wondered why oh why doesn't this use s-expressions (or even json?)... I fail to understand why the same people who are so allergic to lisp-like syntax will happily work on some project that makes you context switch between about a dozen different syntaxes once you count in all of the configuration, templating, and expression languages, some of which are embedded in string literals with absolutely no tooling or syntax highlighting.


I think IIS is a bad example, many of it's config settings are a complete mess of partially obsoleted legacy settings. And some of it has no definitive reference, e.g. the threading and concurrency settings which seem to change with each IIS version, but no documentation is available that defines what settings are available in each version.


The philosophy sounds great.

Maybe the downsides are because of the execution?


Yeah, the registry has been solid for years now, which pretty clearly shows that the troubles were in implementation and not the philosophy. And powershell's object-passing has been pretty great from the start, so I'm not even sure how that's an argument against.


The windows registry is not too bad. You can easily store this stuff in a SQLite file though: https://www.sqlite.org/appfileformat.html

PowerShell' object-oriented nature is great btw!


Around the office when teaching PowerShell we say it takes about two Googles per line. That's not a compliment.


This is in line with my experiences with powershell (or powershit as we refer to it).

The canonical example of the complete failure and total friction is in the simple case of obtaining a file from a web server and sticking it on disk. This is the steps:

1. Try various built in cmdlets. Eventually find one that works.

2. It's a big file and gets entirely buffered in RAM and takes the machine out. You don't get the privilege of finding this out until you try dragging a 20GiB vhdx image from a build server to plonk on your SCVMM platform. That conks it completely.

3. So you think you'll be clever and just use curl. Oh no there's a curl alias!

4. Every damn machine gets sent a reconfig to remove this stupid alias.

5. Then when you do get there you find that the curl output pipe just pukes multibyte garbage out. 20GiB of it that took a long time to come across the wire.

6. You tell curl to save the file directly. Might as well have used cmd.exe by now!

So 8 Googlings and 3 hours later you got a fucking file. Every time I use powershell that is my life.


    man iwr
Oh, there's an OutFile parameter. Let's see what it does

    man iwr -param outfile
How nice, it writes the response directly to a file instead of to the pipeline. The curl alias actually points to the very same cmdlet.

Not to excuse bad examples on the internet (there are lots of people who fail to grasp PowerShell and still try to write articles and how-tos), but PowerShell having built-in documentation for commands and parameters is actually fairly easy to figure out from within the shell. Admittedly, once you learn the basic half dozen or so cmdlets you tend to use all the time.

You're free to ask me (or on SO, I tend to answer PowerShell questions there) if you're having trouble. Figuring above out took literally just 20 seconds. Really. I'm kinda sick of people cursing and blaming tools just because they're different from what they're used to. If I did the same and complained about bash and Unix (which I rarely find the need or time to learn) I'd be tarred and feathered in seconds ...


It doesn't work for large files. Try iwr with outfile on a 16Gb file on a 4/8Gb machine.

Knowing all the edge cases, exceptions and places where reality breaks down is the problem.

Where software should indeed work with the principle of least surprise, Microsoft have patented the principle of most inconvenient surprise.

Also SO has one small comment about this which didn't exist when I discovered it. I had to use windbg to dump the CLR heap and find out that there were a crap ton of buffers...

And that's every day using powershell. It's even worse if you trip over someone else's cmdlets which aren't aware of the correct harry potter style incantations to issue that don't cause the universe to implode.

The whole thing is a joke. A bad one.


Unless you also write things to the pipeline via -PassThru specifying -OutFile will read in 10K chunks and write them to the file. No memory apart from that buffer is used. Look at the source, it's public. My PowerShell instance uses 38 MiB of memory the whole time during the download, regardless of the file size.


What's in your $PSVersionTable?

I'm doing this on windows 2008 server.


5.1 on Windows 10. Admittedly, I never really tried iwr on older versions. Heck, the last time I tried downloading something with PowerShell to a file I went the WebClient.DownloadFile route, so it's been a while.

You can probably install a more recent version of WinRM on that machine, though (or by now probably the open-source version of PowerShell).


To be fair there are many, many examples of downloading files using System.Net.WebClient. In fact that is how the first several search results for `powershell download file` tell you to do it despite Invoke-WebRequest being included in PowerShell since 2012. `get-help download` doesn't return anything useful. Contrast that with `apropos download`, which at least on FreeBSD includes wget in the search results.


Well, for many years I had that experience every time I tried to use UNIX command line. I think it's just the issue of familiarity.


How many Googles per line do Unix shell scripts need, assuming equal familiarity with both? As for me, I understand PowerShell fairly well, but hate having to deal with Unix utilities and for me the amount I have to google to get stuff done on a Unix command-line is significantly higher than with PowerShell.

It's just that once you learn how to use a tool you don't have to think about how to approach a problem anymore, you just do it. And then, when learning a completely different tool you have to learn again. Surprise.


I should've explained the background - these are C# coders coming to Powershell, not Unix admins.

The problem with Powershell is that it tried to have its cake and eat it too - it wants the lightweight IDE of Unix-philosophy tools, but the detailed structure of C# objects. The problem is that the former is tightly tied to the simple common api of raw text, and the latter is inherently dependent on an intellisense-oriented IDE and static typing that heavily hints the names and parameters of useful actions.

Powershell manages to combine the worst of both worlds - objects means more complex APIs, but without the powerful IDE guiding you around those APIs.


I'm sorry what? Powershell has to be one of the better documented systems I've used in quite some time, and the sytnax lends itself to a certain "learning velocity" if you will. I also find that the general standard for tooling is a bit higher than say, bash.

It's also been really neat to see how well Powershell lends itself to extending other applications.


For people who don't know the standard Unix tools, I expect much the same.


Well, sure, if you don't know PowerShell or use it often you'll need a lot of help. That just sounds like an argument for slavish adherence to existing conventions till the end of time -- in which case why bother with a new language?


Yep, it's really ground breaking stuff[1]

It actually is really good, but I do find it interesting that the first and last loser of the desktop wars got there a good long time ago.

[1] https://en.wikipedia.org/wiki/Workplace_Shell , https://en.wikipedia.org/wiki/Object_REXX


Powershell was much later invention. Agreed otherways.


lol at powershell in 1993


Tell me that somehow writing this comment was worth making a new HN account for.


> To play devil's advocate, part of the reason things like The Rule of Silence are talked about is because of the messy unix philosophy of treating everything like plain text.

The amount of tooling that surrounds text is vast and has evolved over decades. You cannot replace that with a single database and call it better.

I can place the majority of my config files in git and version them. I can easily perform a full text search on any log regardless of its syntax. I can effortlessly relocate and distribute logs to different filesystems.

> I want my software configuration stored in a database not in a bunch of fragile files with made up syntax that are always one typo or syntax error away from being potentially silently ignored.

So you would like your configuration stored as a bunch of made up data structures? Databases do not make you immune from typos and syntax errors, ask anyone who has ever written a delete without a where clause.

And what happens when the giant, all-knowing database your system depends on has a bug or vulnerability? When something on my linux box breaks I can drop to a recovery mode or lower runlevel and fix it with a kernel and a shell as dependencies.

I think you would be a lot happier with a configuration management system (puppet, ansible et al) and some log infrastructure without having to completely redo the unix philosophy and the years of experience and fine-tuning that comes with it.


> The amount of tooling

Or the amount of cruft.

> git

Any structured data can still be serialized and diff'd, but it isn't always the clearest. Where is the contrast here?

> made up data structures?

so standardize the non-text format

> Databases do not make you immune from typos

Depends on the constraints. There are few on text files, possibly excluding sudoers.

If you aren't sticking to good practice you can just as easily rm a text file.

> has a bug or vulnerability?

What happens when the kernel has a bug or vulnerability? There are quite a few mature db systems. Plus, all text files depend on the file-system, which is why you store root on something stable like ext (still depends on hdd drivers though, unless you have some some ramfs on boot).

> years of experience and fine-tuning

Can you describe specifically what that experience is, and what the "fine-tuning" is?


You can rm a database as well. In multiple different ways in fact. You can also put constraints on text files by forcing editing via a helper program (much like visudo and crontab do). In that regard the text format isn't much different from a database format aside the encoding of the data (it's probably also worth mentioning that you can - and some people do - store a database as flat text files if you wanted. They don't necessarily have to be binary blobs).

> "What happens when the kernel has a bug or vulnerability? There are quite a few mature db systems. Plus, all text files depend on the file-system, which is why you store root on something stable like ext (still depends on hdd drivers though, unless you have some some ramfs on boot)."

I'm not sure I get your point. What do kernel bugs have to do with text vs binary formats? Or the argument for or against centralised databases? Databases still need to store files on persistent storage so if your text files are compromised from a kernel vulnerability or instability in the file system then your binary database files will also be.


I'm not sure using a helper program has as many assurances around it, and manipulating data in a db often won't involve any 'rm' command, though copy/replacing a text file might.

> What do kernel bugs have to do with text vs binary formats?

you said "what if the db has a bug or vulnerability", my point is you have to rely on something, even the kernel. The difference is how stable these things are, and databases can be very stable.

> if your text files are compromised from a kernel vulnerability

not all kernel vulnerabilities will put the db at risk, it depends on the exposure to parts of the kernel. You can restrict the type and "fanciness" of the file-system a database will use if you know you don't need those additions, in the same way you use a stable fs for system files. You need a basic set of binaries one way or the other to access this data.


> I'm not sure using a helper program has as many assurances around it, and manipulating data in a db often won't involve any 'rm' command, though copy/replacing a text file might.

DBs still have files, they can be rm'ed. DBs also have other delete commands like 'DELETE FROM x'

My point is it's just as easy to "accidentally" delete data in a database as it is in text files.

> you said "what if the db has a bug or vulnerability", my point is you have to rely on something, even the kernel. The difference is how stable these things are, and databases can be very stable.

Someone else said that. I think the whole stability point is moot.

> not all kernel vulnerabilities will put the db at risk, it depends on the exposure to parts of the kernel. You can restrict the type and "fanciness" of the file-system a database will use if you know you don't need those additions, in the same way you use a stable fs for system files. You need a basic set of binaries one way or the other to access this data.

If a software vulnerability exposes text files like /etc/passwd then it can expose the database disk files in exactly the same way. Having a database format won't magically stop files from being read remotely.

It's also worth mentioning that most of the time it's not kernel vulnerabilities you need to be worried about (not that I'm saying they're not bad); any bug in software (e.g. Wordpress vulnerability) that allows an attacker to specify the source file to be read would put both your database config concept and the existing UNIX config layout at risk.


  The fetish for easily-editable ASCII files and escaping
  from structure is holding us back. Structured data does not
  automatically imply hidden and inaccessible, that's a
  matter of developing appropriate tooling.
Speaking as someone who's in the process of automating configuration management on Windows, I'll say that this is _much_ easier said than done. Imagine something like Active Directory Federation Services, which stores its configuration in a database (SQL Server) and offers a good API for altering configuration data (Microsoft.Adfs.PowerShell). Instead of using a generic configuration file template---something supported by just about every configuration management system, using a wide variety of templating mechanims---I must instead write a custom interface between the configuration management system and the AD FS configuration management API. Contrast that with Shibboleth, which stores its configuration in a small collection of XML files (i.e., still strongly typed configuration data). These I can manage relatively easily using my configuration management system's existing file templating mechanism---no special adapter required. I can easily keep a log of all changes by storing these XML files in Git. I can put human-readable comments into them using the XML comment syntax. The same goes for configuration files that use Java properties or JSON or YAML or even ini-style syntax, not to mention all the apps that have configuration files that amount to executable code loaded directly into the run-time environment (e.g., amavisd-new's config file is Perl code, SimpleSAMLphp's is PHP, portmaster's is Bourne shell, portupgrade's is Ruby, and so forth).

In short, your configuration database scheme is like an executable, whereas text config files are like source code (literally, in some cases). I'd much rather work with source code, as it remains human readable while at the same time amenable to a variety of text manipulation tools. Databases and APIs are more difficult to worth with, especially from the perspective of enterprise configuration management.

Edit: See also http://catb.org/jargon/html/S/SMOP.html.


I was on Windows for about 20 years and moved to Linux about a year ago. One thing I happen to like the most in Linux is file-based configuration. If you add a settings-database to Linux you would end up with the same mess that you find in Windows. In Windows you have the Registry (a giant database for OS and app settings) AND you have the config files. Great... settings are spread over a gigantic db AND the file-system. IMHO: a nightmare. If you add a db for settings there will still be config files, they won't go away and you end up with an additional (huge) layer of complexity..


Don't forget group policies (2 or 3 layers of those, iirc)!


> If structured data was embraced we would have developed appropriate tooling to interact with it in the way that we prefer.

What kind of tooling might work with ad-hoc structured data and still getting all the tools to talk with each other like in Unix? How would it work without having to write input parsing rules, data processing rules, and output formatting/filtering/composing rules for each tool?

I suspect that the reason it's not very popular to pass around structured data is that it's damn difficult to make various tools understand arbitrary data streams. Conversely, the power of text is that the tools understand text, only text, and do not understand the context, i.e. the user decides what the text means and how it needs to be processed. Then the tools become generic and the user can apply them in any number of contexts.


>it's not very popular to pass around structured data

There's an awful lot of JSON that gets passed around. That seems a reasonable compromise between readable text and some sort of structure.


JSON is an OK serialization format, and a terrible format for outputting human readable data. Let's take, for example, the output of 'ls -al' and imagine it were presented via JSON:

The keys 'mode', 'number of links', 'owner', 'group', 'size', 'last modified' would be repeated over and over again, stacked vertically.

Mode would remain an arbitrarily formatted string, or (worse) be broken into its own object for every attribute represented by the string.

A reasonably populated directory would fill multiple screens with cruft.

The formatting of the timestamp in the last modified field would still be an arbitrary string.

Comparing two files would require piping through another utility to extract just the appropriate fields and display them unadorned.

Sure, it might be moderately easier to consume in another program since you can simply iterate over a list and reference specific keys, but it's not really that hard to iterate over a list of lines and extract a specific field by number.

    ls -al | awk '{print $5,$9}'
vs.

    ls -al | jq '.[] | [.size,.name|tostring] | join(" ")'


So, how do you parse, read, and process JSON without additional instructions of how to interpret the data and what to look for?

All right, maybe you do have a common tool that implements a query language so that you can filter out certain paths and objects from the JSON data into a thinner data set with known formatting expected by the next command in the pipeline. Then you need to write that command and you need more instructions, possibly again in another language, to describe what you actually want to do with the data now that you know where to find it.

At this point you typically write a separate script file to do this because it's easier to express in full-blown programming language what you want to do with the tree of hashes and lists and values. On the other hand, programs for lines of text are quite short and fit on the command line.

I don't see an immediate value in structured data, and especially none that would outweight loss in general applicability and usability in comparison to text based data processing.

Don't get me wrong: I would love to see a good prototype or sketch of how such a thing would work, and then try to imagine how I might be able to apply it to similar things for which I use Unix command line today. But I'm sceptical of "how" and also quite sceptical of "why".


I've started using jq (https://stedolan.github.io/jq/) in pipelines to parse/transform JSON output from commands that support it. It has some XPath-like notations for specifying elements within the JSON data tree. It's not perfect, but it's a good start and useful right now.


YAML is a superset of JSON, but is much more human readable:

http://yaml.org

This means you can give it JSON when you have JSON, but otherwise use the nicer format when possible.



But the programs don't really just "understand" text; it has to be munged into exactly the format they expect.


> Yes I get it, you like your SSH session and Emacs/Vim blah blah but that's short-sighted.

How so? I'm one of those people that like my SSH session, and in my case vim and "blah blah blah". I've contributed to countless open source software packages that you likely use with this method, and so have tons of other developers. Nothing is broken here, things are working great for everyone who reads the manual and follows it.

> I want my software configuration stored in a database not in a bunch of fragile files with made up syntax that are always one typo or syntax error away from being potentially silently ignored.

apache, postfix, haproxy, even vim are certainly not prone to silently ignore anything, just to name a few.

> The fetish for easily-editable ASCII files and escaping from structure is holding us back.

Holding us back from what?

I am both a developer and an administrator, and I've had all the fun with solaris/aix configurations that are often not stored in plain text that I care to have. If you also have this experience, and still feel the way you do, then I'd love to hear more. Otherwise, your rant comes off as "your way is hard, and I don't want to learn it!"

Look at all the available structures available for that plain text you speak of... XML, JSON, YAML, the list goes on. You are free to use one of those, then you have that structure you crave. There are plenty of areas that could use revolution, but UNIX-like configuration files are not one of them. There is no problem here. If you are making typos or mis-configuring your software, then you have a problem of your own creation.


> How so?

As I mentioned in another comment in my view the problem is that something like the configuration is shared both by the humans and the computers. Because of this we settle on something that is not optimal for either group.

We end up with something that is hostile to both the humans and the computers, just in different ways.

In fact the argument of people like you for ASCII config files exactly demonstrates my point. You are fighting for your human-convenience against the machines.

> Holding us back from what?

By embracing and acknowledging that the humans and computers are not meant to share a language we free ourselves from this push-pull tension between human vs machine convenience.

We can develop formats and tooling that respects its human audience, that doesn't punish the human for making small superficial syntax or typo errors and so on.

And we can finally step the hell out of the way of computers and let them use what is suitable for them.

And at that point you could still have your SSH session and Vim/Emacs and blah blah blah and you could still view and interact with stuff as plaintext if you wanted to.

> apache, postfix, haproxy, even vim are certainly not prone to silently ignore anything, just to name a few.

It's not always a matter of silently ignoring something but due to the nature of the task it is certainly very easy to shoot yourself in the foot doing something that isn't technically an error but wasn't your intention.

For example you can silently break your cron jobs by leaving windows-newlines in them.

Perfect example of humans and computers sharing a language that is hostile to the human.

BAD BAD human! You stupid human why do you use bad Windows invisible characters? Use good linux invisible characters instead that are more tasty for your almighty lord Linux.


> It's not always a matter of silently ignoring something but due to the nature of the task it is certainly very easy to shoot yourself in the foot doing something that isn't technically an error but wasn't your intention.

I agree with you here, it's easy to write perfectly valid configurations that don't do what you intended. But throwing it all out seems like the baby going out with the bathwater to me.

In all seriousness, what if it was XML all around? I hate writing XML by hand, but that's part of the problem you are describing(ie human editing of raw config). XML parses very nicely, so difficulty of coding tools to speak XML is almost non-existent.

All in all, it's just a big change you are proposing. Us UNIX people are incredibly change-averse. We keep getting burned :)


> All in all, it's just a big change you are proposing. Us UNIX people are incredibly change-averse. We keep getting burned :)

Well that's exactly my point. I don't propose that we deploy it tomorrow but I want to see people do it and think about it and talk about it so our grand children can have better computing. Instead of sticking to the way of our fathers for all eternity.

XML as you identified has the same problem.

You need a human-dedicated interface that captures the intent of the human and you then convert that to something that the computer likes and tell them here ... this is what the human has ordered.

When we share the same raw input file with the computers that's when we set ourselves up for trouble.

Do I have the ultimate solution? No. Can I still complain about it being a problem? Yes.

One "solution" is a graphical rich interface with things like auto completion and validation and all that so it's much more capable of capturing the true intent of the humans. Basically the same way we interact with other web sites.

Imagine your bank told you to append a new text line to the end of "transactions.txt" file if you wanted to transfer money.

Now I put solutions in quotation marks because I know a GUI has other practical limitations and problems but you get the idea.

My point is as humans eventually we have to learn to graduate from sharing a rudimentary text language with the computer for the sake of short-term convenience.


A lot of config files are already trivial for the computer to parse (by design). Most config files even follow a loose standard of some sort (e.g. INI, which is a sort of key-value system).

Would it not be easier to keep this format, and then add some integration into your favorite text editor that prevents you from accidentally editing the INI keys, or warns if you create a syntax error, etc? You could even make a generic INI gui editor that replaces appropriate values with checkboxes, sliders, etc.

I guess I don't see much wrong with the formats for most config files, but if you insist that there is a problem, and the problems you've outlined are caused only when editing them, then why not change the editor and preserve the format?

(That cron problem does sound bad though)


> "Oh but isn't it neat you can pipe it to grep?" NO! No it's not neat, maybe it was neat 20 years ago. Today I want that damn data in a structure. Then you can still print it out in one line and pipe it to grep all you want.

The systemd journal works how you describe, and it is very painful to interact with. I'll take plaintext logfiles any day of the week.

It's fine if you want to interact with the log in ways that have been designed into it. But:

- it's harder to work out what you can delete to free up space in an emergency

- it's harder to get logrotate to do what you want

- it's harder to use "tac" to search through the log from the bottom up

> I want my software configuration stored in a database

So now you can't put comments in your config, you can't (as easily) deploy config with puppet, or in RPMs. You can't easily diff separate configs.


I get it.

All the things that you mention can in theory be fixed over time.

The stuff I'm talking about is not for the next 6 months. It's not very meaningful to compare it against the current tools and landscape.

I can almost imagine a similar conversation in the past.

Someone saying "MAYBE ONE DAY WE CAN FLY!" and everyone's like "BUT OUR HORSES CAN ONLY JUMP 2 METERS HIGH! It would never work."

I understand your comment from a pragmatic point of view but none of those problems are big or important enough that we couldn't fix them in other ways.

Throwing away a rich structured piece of data and trading that for a dumb line of characters that needs to be re-parsed just so that it's easier to use logrotate and tac with them and so on is a losing trade.


> You can't easily diff separate configs.

Reminding me of one of the little niceties of Gobolinux.

You have a pristine pr version config copy sitting in the main dir, and a package-wide settings dir. It also provides a command (implemented as a shell script, as is most of Gobolinux tools) that gets run upon installing a new package version.

If said command detects a difference between existing and new config files it gives you various options. You can have it retain all the old files, replace them with the new files, or even bring up a merged file that give you the new lines as comments next to the old ones.


> ... the messy unix philosophy of treating everything like plain text. If structured data ...

You are forgetting a crucial point: plain text is very well defined. Actually, it was already defined when the first Unix tools were being written. Using plain text means that you can use grep to search the logs of your program, even if your program was written yesterday and grep was written 40 years ago.

Structured data? In which format? Who will define the UNIQUE format to be used from now on for every tool? The same people who chose Javascript as the web programming language?

Do you realize that choosing plain text prevented any IE6-like monstrosity from happening?


Clearly, you want to avoid the overloading caused by implementing a half baked version of Common LISP into every utility. Just use s-expressions.

Everyone can just boot into emacs :P.


You would think plain text was well-defined, but along came Unicode and even that's not true anymore.


I'm with you when it comes to structured data, but plz no more data bases. these config files do not need to be centralized. I am thinking more in the direction of a parser that could check the validity of a configuration...


So, horribly unreadable xml config files?


Like S-expressions don't exist. Or even goddamn JSON. Come on, we don't have to jump from one stupidity (unstructured text) to another (using XML as a data representation format).


CSON or .desktop / .service or something similar is immediately understandable to most people and doesn't waste time with unnecessary tokens like XML does.


Not necessarily. JSON is not bad, _if_ you allow comments. Even plain-jane key/value config files can be sanity-checked. I suspect part of the problem is that anything fancy like that is awkward to do in C, so people take the lazy way out.


Maybe many tiny sqlite databases? Then you would not need to to centralize your data in a single database, but still do queries across different config files.


SQLite is a huge win. All the power of SQL, and you don't have to introduce system-wide and non-local dependencies.


How is this a win? Instead of opening a text file we have to write update statements to change any setting. Sql is itself a verbose language that should have been discarded in the 80's.


Your complaints are, IMO, orthogonal to the UNIX philosophy (which, IMO, is also pretty orthogonal to the rule of science; now I have upset both sides :) ).

UNIX philosophy is, as written in the article, is based on programs that do one thing well and work with other programs. The second part, "work with other programs" is the one that encourages (but not requires) simple, text based I/O.

If A, B and C write programs and independently design some custom, structured, binary I/O the chance of them being compatible is nil. If they output text, the UNIX glue of pipes and text conversions makes them cooperate quickly and efficiently. Not elegant? Sure. But working well in no time.


That's my primary issue with UNIX culture. It took a huge step backwards by deciding to work with unstructured text. It wasn't a wrong turn, mind you. It was backtracking on known and understood best practices all the way and then picking a wrong turn. And only now people seem to rediscover what was in common use in the era before UNIX - the virtues of structured text.

I guess our industry is meant to run in circles, only changing the type of brackets on each loop (from parens to curly on this iteration).


It's not like unstructured piped text is the only possible way to work. It's widely used precisely because it's so expedient. If you use structured data, then every program in the sequence has to understand the structure. If you just smash everything flat into a stream of text, you can then massage it into whatever form you need.

It's not always the best way to approach a problem, but it's not meant to be. It's duct tape. You use it where it's good enough.


The way i see it, shell scripting and piping allows non-programmers to get their feet wet, one command at a time.

You run a command, look at the output, now you know what the next command in the pipeline will see, and can add adjustments as needed.

Powershell etc seems to be more programmer oriented in that one keep thinking in terms of variables and structures that gets passed around.

And this seems to be the curse of recent years. More and more code is written by programmers for programmers. Likely because everyone has their head in the cloud farms and only the "front end" people has to deal with the users.

UNIX came when you risked having admins and users sitting terminal to terminal in the same university room. Thus things got made that allowed said users to deal with things on their own without having to nag the admins all the time.


UNIX actually came when users were programmers at the same time. There was an expectation in the past that using computers involved knowing your way around your OS and being able to configure and script things.


If you use structured data in a standard format, you can have a single system-wide implementation of a parser and then have each program process the data it needs, already structured in a semantically-meaningful way.

In current, unstructured text reality, each program has to have its own (usually buggy, half-assed) shotgun parser, and it has to introduce semantic meaning back to the data all by itself. And then it destroys all that meaning by outputting its own data in unstructured text.

It works somewhat ok until some updated tool changes its output, or you try and move your script to a different *nix system.


But if your data is structured in a semantically meaningful way, then your receiving program needs to understand those semantics. Maybe you could introduce new command line tools to refactor streams of data so as to adapt one program to another, but I can't see it being simpler and quicker (in terms of piping input from one place to another) than the current approach.

I do like the idea of a standard parser to avoid the ad-hoc implementation of parsers into everything.

Your last comment gives a hint at the real problem, which is people using command line hackery in 'permanent' solutions. It's duct tape. You don't build important system components out of duct tape. Well, you shouldn't, anyway.


I can't disagree there. I've used SNMP in anger before. Underlying SNMP is the MIB.

People ran away screaming from it :)


Structured text is good. Very good, in fact. It might even be idea. structured binary data, less so, at least as a storage format.

I want to be able to look at your file format using tools that haven't been specialized to the task. Is that so wrong?


Well, I'm advocating for structured text, not binary. Mostly because I haven't seen a future-proof binary format yet, and editing binary formats indeed would require special tooling. I think - for a data exchange protocol meant to be used between many applications - going structured text instead of binary is a worthwhile tradeoff of little lower efficiency vs. much better accessibility.

EDIT: Some comments here are slowly making me like the idea of a standard structured binary more and more.


Let's say I want to know how many days since an asset on my web server has been modified. With bash + some standard unix tools, from the top of my head I have to do something like this:

    curl -svo /dev/null http://example.com/file 2>&1 | grep Last-Modified | cut -d ' ' -f 3-
And that's just to get the last modified date in text form. Now I'm writing a script that parses that date and gets today's date, convert them to days, and subtract. YUCK!

Wouldn't it be nice if your shell could do this?

    curl(http://example.com/file).response_headers.Last-Modified.subtract(date().now).days
I think it would be nice.


Yes, but as it stands, most of those would have to be builtins for that to work. I would rather have:

   curl -j example.com|select headers.Last-Modified|time before now|time to days
Where the commands send and receive JSON.


Can you name me a single UNIX configuration file whose format is not "structured"?


Config files are less of a problem. The issue is with programs, which you want to use with pipes. Each has its own undocumented, arbitrary pseudo-structure with often inconsistent semantics, optimized for viewing by user.


What programs are you thinking of? Maybe this is my sysadmin bias but about 90% of my UNIX tools usage is on config files...


Let me suggest an incendiary example: systemd

systemctl status wibble.service : Displays human-readable information, with colours, line drawings, and variable-length free-form text; that is exceedingly hard for a program to parse reliably.

Contrast with

systemctl show wibble.service : Outputs machine-readable information, in systemd's favourite form of a Windows INI file.


Yikes. Makes me glad I run Slackware. Though I seem to recall from The Init Wars that it was precisely this quality of SystemD that made people lob the charge that it violated the "Unix philosophy"


ls, ps, du, df, ... pretty much all CLI tools. The kind you use in scripts.


All four of which users are enjoined, over and over again, not to try to parse the output of (particularly ls). That is, those are the tools specifically not meant to be connected by pipelines, but merely used for operator convenience.


...But how do you extract that data otherwise?


Instead of ls, find. Instead of ps, you parse the nodes in the /proc filesystem that ps itself parses. Ditto the /sys filesystem and du/df.


When those tools were written, /sys and /proc didn't exist.


It's only "specialized" because we haven't been doing it so it's considered special.

At some point you have to admit that what's meant for the computer is not always byte by byte the same as what's meant for the human.

We try to shove these two together and we screw up both of them.

Empower the computer to be the best that it can be by taking the human out.

Empower and respect the human by giving him/her their own representation.

The "I just want to read the bytes that are on disk" philosophy is inherently limiting and broken when the audience are two very different things (humans vs computers).

My argument is that instead of fighting that we must embrace it.


Yes, because it's: a) inefficient b) invites a lot of wrong assumptions about the storage format, e.g. about the underlying grammar, the maximum length of tokens or the possible characters than could occur in the file. c) requires you to solve the same problems over and over again (how to represent nested lists, graphs, binary blobs, escaped special characters, etc) d) encourages authors to roll their own format and not research if there maybe is an existing format that would solve their case.

I agree with you, the other extreme - treat binary files as sort of opaque black boxes that you can only access with specialized tools beloniging to your application - is even worse. But I don't see why we can't reach some middle ground: have a well-documented, open, binary format that encodes a very generic data structure (maybe a graph or a SQLite-style database) and a simple schema language that lets you annotate that data structure. Then you can develop generic tools to view/edit the binary format even though every application can use it in their own way.


Parsing structured text is slow and inefficient. Also reading out just one part of a data structure stored as text often requires either walking through the file character by character or first slurping the whole thing into memory.


...But when your system crashes, having all that data in an easily accessible manner (regardless of what tools you have on hand) is a major win.


Let's not forget that when your system crashes, all of these easy-to-read text files are actually stored in a binary format, sometimes scattered in pieces, and require special tools to extract.


True, but they're less fragile: even if the text is garbled, you might be able to get some information out of it.


A text editor is specialised to the task of viewing and modifying text in a certain format.


Wrong. It's specialized to viewing and modifiying any text, regardless of format. That's a huge difference.


First, rule of silence has nothing to do with plain text. It applies to any human-machine interface including GUI and physical knobs.

Second, you are right about structured data and all. The only thing is that it's either impossible or extremely hard to achieve. Many have tried, all of them failed. Windows now has a mix of registry, file and database configs which is a nightmare and is much worse than any unix. AIX has smitty and other config management solutions which are a bitch to work with if you want something non-trivial. Solaris is heading this direction (actually it's heading to the grave but it's another story) and it's also not nice. There are a lot of other OSes and programs which tried to do it but failed.

This is much like with democracy: it's a terrible form of ruling, too bad we have nothing better. This is exactly what's up with unix configs and data formats. It is possible to make some fancy format and tools which will achieve it's goal for like 80% of the time. But it will cause huge amount of pain in the ass in the rest 20% and this is where it will be ignored and you'll end up with a mix of two words which is worse than one.


The thing about today's Unix Philosophy, epitomized by both the BSDs and Linux is that it's backward-looking. There's a Garden of Eden myth and everything. Look at the reaction to systemd.

I remember getting fairly excited when Apple OSX first came out and quite a few of the configuration files were XML-based. Finally, a consistent format, but it wasn't pervasive enough. Even Apple couldn't see fit to break with the past.

I've even contemplated rewriting some core utils as an experiment to spit out XML (because I didn't know about typed objects at the time), but I lack the skillset.

I know we can't (and maybe we shouldn't) change Unix and its derivatives. There's too much invested in the way things work to pull the rug out. But, when a new OS comes along that wants to do something interesting, I hope the authors will take a look at designing the interface between programs rather than just spitting out or consuming whatever semi-structured ball of text that felt right at the time.

Wouldn't it be neat, for example, if instead of 'ls' spitting out lines of text, which sometimes fit the terminal window, sometimes not, which contain a date field which is in the local user's locale format, which is in a certain column which is dependent on the switches passed to the command, instead you get structured, typed information, ISO-formatted date and time, etc. On the presentation side, you can make it look like any old listing from ls if you like, rather than mashing the data and presentation layer together. I'd like to imagine such a system would be more robust than one where we could never change the column the date was in for fear of breaking a billion scripts.


I would be so excited to see new operating systems which depart from POSIX completely and introduce new abstractions to replace the dated notions of hierarchical filesystems, terminals, shells, shared libraries, IPC, etc. The sad truth is that everyone targets POSIX because there is so much software that can be ported in order to make the system usable.


I agree. The "Unix compatibility layer" has killed so many interesting projects over the years. If systems research were still an academic pursuit, maybe there would be some interest in bringing a system like this to fruition. It would require a long-term investment in the design of the system foremost.


I don't disagree with you, but how do we decide which structured data format to use as a replacement for plaintext? I have the sneaking feeling that a large part of why we still use plaintext is because it's established already as the standard, for worse or for better, and replacing it with a standard everyone could agree on proved impossible.


Use whatever and stick to it. The sane structured data formats differ little but by shape of the braces and some punctuation. Were UNIX developers thinking about this back in the days, they'd probably chose s-expressions or something else from that era. Now it may as well be JSON. The thing is, one format should have been picked, and a parser for it should have been available in a system library. Then we wouldn't have to agree on anything, we'd have one system standard to use.


Plaintext is the lazy way out. It's not even remotely a standard because everyone does it differently.

It's the shortest path from thinking "I need to persist this crap" to getting something working. Write the bytes to a file, sprinkle some separators, read and parse it back.


> When you get tot that point you have to understand that you have chosen to ignore the fact that the data you are dealing with must be represented in something much closer to a relational database than lines of ASCII text.

This is true. But when your needs aren't that complex, basic textual output sure is nice.

> The fetish for easily-editable ASCII files and escaping from structure is holding us back. Structured data does not automatically imply hidden and inaccessible, that's a matter of developing appropriate tooling.

Good plan. I'll set up a schema by which people can exchange data, and wse'll get it standardized. Given the complexity of the relationships involved - and the fact that I really don't know how my data will be used downstream of me - I'd better make it something really robust and extensible. Maybe some kind of markup language?

Then we can ensure that everyone follows the same standard. We can write a compatibility layer that wraps the existing text-only commands and transforms the data into this new extensible markup language (what to call it thought? MLX?). Then anyone who has the basic text tools can download the wrappers, learn the schema , and start processing output.

Then again, I could just do that grep | cut. The only thing I have to learn is the shape of the thing I'm grepping for, and the way to use cut - the basics take a few seconds, and no additional tooling is required. Best of all, chances are high that it'll work the same way 20 years from now (though likely with expanded options not current available).

There's a lot to be said for having simple tools that accept simple input and produce simple output.

This doesn't mean it's the only approach - databases and structured data absolutely have their places in modern CLI tooling - but that has no bearing on the value of an ASCII pipeline.


You can just develop alternative tools and pipe JSON. Text and data in one.

Pipes can transfer arbitrary data, so it's just the tools that you don't like, not the underlying mechanism.


Yeah, but that's the point. Pipes are fine. The tools suck, though. UNIX would be infinitely better if it defaulted to piping structured text instead of making each tool have to implement its own shotgun parser.


...So you're talking about having text-serialized key-value objects (or any other kind of object), with standard deserializers and tools for manipulation?

That's actually a great idea. Better yet, it's actually viable now, unlike many proposals for "fixing" unix.


> ...So you're talking about having text-serialized key-value objects (or any other kind of object), with standard deserializers and tools for manipulation?

Yes, basically. Someone in power should just pick any format - modified JSON (without the integers are really IEEE754 floats stupidity), s-expressions, whatever - and make standard deserializers part of the system library.


Heh heh, "someone in power." Who would that be?

There's no authority to complain to about all this...

Structured text formats come and go: S-expressions, SGML, XML, JSON, "modified JSON", etc etc etc...

Fortunately the Unix tools work the same with any byte stream, pretty much, so they've survived gloriously since the 1970s.

That's kind of the whole deal with "worse is better."


> There's no authority to complain to about all this...

Whoever accepts patches to base UNIX system libraries would be a good start.

> Fortunately the Unix tools work the same with any byte stream, pretty much, so they've survived gloriously since the 1970s.

So would a structured text format, except you wouldn't be discarding and then recreating semantic information at every junction of a pipe.


> Whoever accepts patches to base UNIX system libraries would be a good start.

That can be you! Get to work, or pay someone else to get to work if the skills required aren't in your set.


But then when you put structured data through pipes, why bother with making it text at all (over the wire, at least)?


Modern tools should offer the option to write out JSON (with a proper spec, please). I can definitely see the value of a 'ls' variant that can do this, and I remember people discussing JSON-based shells. But not either/or! For example, the Rust compiler can now be persuaded to write out error messages in JSON.


Modern tools are offering the option to write out JSON. Grepping /usr/share/man and /usr/local/share/man on my macOS system reveals a good dozen or so commands that take JSON as input or generate it as output. (Their man pages generally document the schema, too.) Most of the tools I've written myself also have JSON I/O.


* http://jdebp.eu./Softwares/nosh/guide/service-show.html

Jos Backus asked for the JSON option in 2013.


Great points, esp about logfiles. So grateful to have discovered lnav --http://lnav.org -- an amazing little CLI tool with embedded sqlite engine and regex support. It solved all my logfile parsing / querying problems, and then some.


Our civilization has been using unstructured text for a very long time now...

I totally agree that Unix sux. We need a better philosophy. But you eventually have to come up with something that actually works in practice. I am still waiting.

I remember my excitement at the idea that things like CP/M and MSDOS running on personal computers were going to free us all from the tyranny of mainframe computers running things like Unix. We all know how that turned out. Everyone eventually just gave up and started emulating the top of a desk.

So Unix is good at messing with unstructured text? Good. Get back to me when you have something better that actually works.


I'm split, because there was this article saying the opposite, that actually text is a pretty good format because it's simple, compact, and easily debug-able. On the other hand text is awful if you want to serialize data because parsing and searching text is one of the most CPU intensive thing.

Now, it's true that we should recognize where text is really inadequate, especially when indexing and searching is needed. Webpages, for example, should not be plain text.

I think the problem resides in programmers not being able to properly use and understand how a database works. Databases and their engines are black boxes, so it's normal if fewer developers want to to use DB like you say. Meanwhile dictionaries and B-tree are not very sophisticated algorithms, yet I see almost no programmers using them consciously. The less a programmer know about the tools he has in his hands, the less he will get benefits from it and thus he will start using easier things.

So really my thought is that the tools are not accessible enough. The concepts of file and database are so distant that it's completely impossible to work with both, but to me it should.


I can see your argument for consistent structure--I definitely don't love vimscript or nginx's custom configuration language. But that doesn't require we jump out of text--JSON and XML are viable, provided they remain geared toward hand editing.

It may be that given proper tooling for database-driven configuration it could be visible and accessible, but the fact is, I haven't seen timing that pulled that off.


JSON and XML are both horrible in their own ways, especially for use as configuration languages.


True, but my point is that the alternatives being proposed are vaporware. JSON works now. IMHO JSON is the least shitty of the shitty solutions that exist.


I guess my point is that if JSON is the least shitty alternative then I can't really agree that Unix-type environments would be better off using it for everything, because JSON is a pain in the ass to type and edit with a regular text editor... the same with XML... and the same with S-expressions... YAML is a monster, CSV isn't really structured, and so on...

Basically there is no universal structural language that's ergonomic and nice for all uses, so it makes sense that the Unix hackers of yore preferred to create tiny custom languages for everything. It's also because "worse is better".

https://www.jwz.org/doc/worse-is-better.html


It doesn't matter if it is text or not, and if it is structured or not.

You missed the whole point about signal vs noise.

When I'm ALWAYS presented with a blob of something to decipher, it requires a context switch.

Nothing IS something, and it's a structured something.


Structured data may or may not imply "hidden and inaccessible", but there's a lot of correlation between the two.


Agree with log lines. We now output our logs as json format (one message per line but in json format). Often with lots of metadata. Now you can still use something like jq to parse/grep it. But you can also process it into something like elasticsearch and use all the lovely metadata.


> This runs very deep in unix and a lot of people are too "brainwashed" to think of other ways. Instead they develop other exotic ways of dealing with the problem.

Well, powershell did solve that problem...


A somewhat decent middle ground, which my employer uses a lot, is CLIs that output JSON. You pipe them to JQ to do "selects" on the data structures in a reasonably clear and compact way.


Seem to me that while shell scripts can be used by anyone given a bit of time, wysiwyg style, powershell is made by programmers for programmers.


That's why I think PowerShell is cool.


The text-oriented nature of Unix may be a mistake, but the Rule of Silence is not concerned with it. It is about not overwhelming the user with information they are not interested in. GUIs can follow or violate the Rule of Silence just as well.


Talking about the rule of silence in GUIs, I wish I could slap the eurocrats who decided to force websites to show the cookie warning on a first visit of all websites. Did they even understand the consequence and wasted time of what they were doing? Having to click all the time to get this dumb warning off?

Of course some websites had to do it in an even dumber way than the law asks for. Like slashdot: http://i.imgur.com/5Fp0nmo.png

This is what greets the French every time slashdot decides to forget you agreed to let them put cookies on your computer and you need to click continue before you can get to the actual website.

The law actually made it worse for the people it's supposed to protect (those who might refuse cookies for privacy?) because those warning then will stick around like glue if they can't give you a cookie to remember you accepted their existence.


The 'best' part is that the only way to remember the fact that the warning was shown (and not display it any more) is to use a cookie or something functionally equivalent to one.[0] So instead of empowering people who wish to protect their privacy, these warnings push people even further to keep cookies enabled.

[0] Storing it server-side, per IP address, is obviously impractical.


"GUIs can follow or violate the Rule of Silence just as well."

It would have been nice if this was permitted to be violated only by GUIs. Ask about the first-time *nix experiences before the GUI-embracing era and one of the few things they were noticing was the continuous text-spitting. For instance, that was happening on boot and OS loading sequence (and still happens, now only being hidden by default with splash-screens), a lot of reporting about all the things that were performed successfully. It's funny in this regard seeing Unix' Rule of Silence being respected more... outside Unix, where is just common sense, with no need to be formulated as rule.


Booting up is a bit of a special case that I'd actually be willing to carve out an exception for. When the boot-up goes wrong, it's often due to causes so deep that they leave you with very few means of diagnosing it or showing an error message. If the booting process hangs, having a rough idea at what stage it happened can be very helpful. Without these messages, discovering the cause of failure would be much harder.

The boot-up splash screen hardly even counts as "GUI" anyway.


Your comment was saved. Click OK to continue.


Precisely!


Note that the "rule of silence" (combined with the habit of writing documentation like longform essays) is also one factor that makes unix-like systems newbie-unfriendly. (Famous example: trying to exit vi)

I think the rule makes sense within the specific constraints *nix programs are usually expected to work in (two output channels with no structure except the one informally defined by the program and the convention that the output should be human- and machine-readable at the same time) but I don't see it as a general rule if better ways to filter the output are available.


> Famous example: trying to exit vi

To be fair, this has been fixed a long time ago. At least Vim (which is the Vi installed on most systems) shows the following message on startup:

  ~                     VIM - Vi IMproved                       
  ~                                                             
  ~                      version 7.4.1829                       
  ~                  by Bram Moolenaar et al.                   
  ~                          [...]
  ~        Vim is open source and freely distributable          
  ~                                                             
  ~               Help poor children in Uganda!                 
  ~       type  :help iccf<Enter>       for information         
  ~                                                             
  ~       type  :q<Enter>               to exit                 
  ~       type  :help<Enter>  or  <F1>  for on-line help        
  ~       type  :help version7<Enter>   for version info
On the other hand, it doesn't show this message when you call "vi" with a filename. But at least a beginner running "vi" for the first time should be taken care by this.


Someone unfamiliar might just be dumped into vi because some other program thought it'd be great to open a text editor, e.g. for a commit message. You're unlikely to run vi intentionally unless you also know you want to run vi and how to exit, I guess.


Programs that put you into Vi, /should/ be calling $EDITOR, if $EDITOR is to to Vi(m), you should know how to use it.

Also, if you ^C in Vi, you get the message:

  Type  :quit<Enter>  to exit Vim


And what if $EDITOR is not set ? Vi is the standard POSIX editor, calling it makes sense.


Actually, defaulting from $VISUAL to vi and from $EDITOR to ex (or ed) is what -- strictly speaking -- makes sense.

Remember what the difference between $VISUAL and $EDITOR was intended to be. There's a whole range of places in Unix where there was, and even still is, a distinction between a line editor and a full screen editor. Consider, for just one example, the ~v and ~e commands in BSD Mail.

Now enjoy the discussion at https://news.ycombinator.com/item?id=13113556 .


I do think distros should change their default $EDITOR to nano.


Debian has done exactly that, years ago.

It tends to annoy me, but it's easy enough to change :)


Yeah happened to me. It's probably standard for anyone trying to learn GIT.


to be fair thats not enough either. someone might accidentally have pressed 'a' , and now good luck understanding how to get out of it without having a crash course on vim


Yeah, whenever I run some program, I just press a random key like "a" to get started. ;)


It's not unreasonable to start typing in a text editor and you're likely to hit a or i sooner or later ...


^z

pkill vi


Actually I found the colon 'syntax' at the beginning of the command to be quite confusing - I wasn't aware that youre supposed to literally type shift + semicolon. That's completely unintuitive compared to 'regular' shortcuts you see in nano like ctrl + c.


I consider myself a major vim beginner, only using it for commit messages, very light file creations, and checking out config settings files when sshed into a server. I have never seen this screen.


> Note that the "rule of silence" (combined with the habit of writing documentation like longform essays) is also one factor that makes unix-like systems newbie-unfriendly. (Famous example: trying to exit vi)

    $ man foo
    *scroll to the end with the EXAMPLES section*
There should be an option for that. man --take-me-to-the-examples foo


(NOTE: The parent comment was edited. This response applies to the original parent comment that contained just the command line ("man foo" + EXAMPLE section) and nothing else.)

Wow, I haven't seen such a blunt and unhelpful RFTM comment for a while. This comment is inappropriate in so many ways:

1) The unix systems have an inconsistent documentation mix of man pages, info pages, "-h", "-help", "--help", HTML docs, separate manuals (e.g. Debian Administrator's Handbook) and so on.

2) "man foo" leads to: "No manual entry for foo"

3) "man vi", as well as "man vim" both lead to a manpage that has no EXAMPLES section at all (see https://www.freebsd.org/cgi/man.cgi?query=vi, https://www.freebsd.org/cgi/man.cgi?query=vim)

4) The Vi(m) manpages explain only the command line arguments, not the editor commands. The latter are available by typing ":help" in the editor.


You don't really need an "examples" section for vim, 90% of the time you just type "vim file" which the usage info probably covers well enough. You only need an "examples" section for something like zip.


> you just type "vim file" which the usage info probably covers well enough

This is wrong. If you just type "vim file", you don't get any usage info, not even in the status line. See also: https://news.ycombinator.com/item?id=13165795


You might find this intersting:

https://github.com/tldr-pages/tldr


That project is a symptom of manual pages not having good “EXAMPLES” sections. The examples on that web page should be contributed upstream to the manuals pages of the software that they are for.


The issue isn't just the lack of EXAMPLES, but also with how man pages tend to be structured. They tend to be very "encyclopedic". There is a set ordering for sections, with a lot of them very verbose, and examples, when present, near the end. Options are often listed in alphabetic order, which doesn't usually correspond to how often they are used or useful.

Man pages are OK when you're first learning how to use something; but if you're already familiar with a command and just need to remind yourself of a the specific sequence of options to achieve a desired result, they're not the most convenient.

I think it's useful to have a tool that fulfills the latter purpose without worrying about the former.


Microsoft documentation was mentioned earlier in this discussion. One of the things that MSDN and TechNet doco does is have both "X reference" and "using X" sections. Manual pages are reference doco, in this way of organizing things.

The FreeBSD, TrueOS, and related worlds put the "using" doco into what are often called "handbooks" or "guides".

* NetBSD Guide: https://netbsd.org/docs/guide/en/

* FreeBSD Handbook: https://freebsd.org/doc/handbook/book.html

* DragonFlyBSD Handbook: https://www.dragonflybsd.org/docs/handbook/

* TrueOS User Guide: https://www.trueos.org/handbook/trueos.html

* PC-BSD User Guide: http://web.pcbsd.org/doc-archive/10.1.2/html/pcbsd.html (viewable off-line directly in both PDF and HTML forms in /usr/local/share/pcbsd/doc/)

Some parts of the Linux world do the same. upstart had the Upstart Cookbook for example:

* http://upstart.ubuntu.com/cookbook/

The Linux Documentation Project was supposed to contain a wealth of this stuff, but large parts of it are seemingly moribund, and incomplete after decades or woefully outdated. Wikibooks tried to take up the slack with an "anyone can edit" Guide to Unix and a Linux Guide:

* https://en.wikibooks.org/wiki/Guide_to_Unix

* https://en.wikibooks.org/wiki/Linux_Guide

If you want examples and doco that works from the basis of what you usually want to do, then these handbooks and guides are the places to go, not reference manuals.


Whenever the discussion comes up about man pages and how documentation should be organized, I like to quote this section from the GNU coding standards about how Info documentation is structured:

----

Programmers tend to carry over the structure of the program as the structure for its documentation. But this structure is not necessarily good for explaining how to use the program; it may be irrelevant and confusing for a user.

Instead, the right way to structure documentation is according to the concepts and questions that a user will have in mind when reading it. This principle applies at every level, from the lowest (ordering sentences in a paragraph) to the highest (ordering of chapter topics within the manual). Sometimes this structure of ideas matches the structure of the implementation of the software being documented--but often they are different. An important part of learning to write good documentation is to learn to notice when you have unthinkingly structured the documentation like the implementation, stop yourself, and look for better alternatives.

[…]

In general, a GNU manual should serve both as tutorial and reference. It should be set up for convenient access to each topic through Info, and for reading straight through (appendixes aside). A GNU manual should give a good introduction to a beginner reading through from the start, and should also provide all the details that hackers want. […]

That is not as hard as it first sounds. Arrange each chapter as a logical breakdown of its topic, but order the sections, and write their text, so that reading the chapter straight through makes sense. Do likewise when structuring the book into chapters, and when structuring a section into paragraphs. The watchword is, at each point, address the most fundamental and important issue raised by the preceding text.

https://www.gnu.org/prep/standards/standards.html#GNU-Manual...


See also http://bropages.org

User-submitted and voted-upon examples for commands.

(hadn't seen tldr, looks great, I'll check it out)


Thanks for this, you've just made my day!


If you want a fast way to read the EXAMPLES section only for a command, here is a shell function which creates an ‘eg’ command which only displays the “EXAMPLES” section of manual pages:

  eg(){
      MAN_KEEP_FORMATTING=1 man "$@" 2>/dev/null \
          | sed --quiet --expression='/^E\(\x08.\)X\(\x08.\)\?A\(\x08.\)\?M\(\x08.\)\?P\(\x08.\)\?L\(\x08.\)\?E/{:a;p;n;/^[^ ]/q;ba}' \
          | ${MANPAGER:-${PAGER:-pager -s}}
  }

 Usage:

  $ eg tar
  EXAMPLES
       Create archive.tar from files foo and bar.
             tar -cf archive.tar foo bar
       List all files in archive.tar verbosely.
             tar -tvf archive.tar
       Extract all files from archive.tar.
             tar -xf archive.tar
  $


Here's mine:

  examples ()
  {
    man $1 | less +/^EXAMPLES
  }
Usage:

  $ examples su
  EXAMPLES
       su -m man -c catman
              Starts a shell as user man, and runs the command catman.  You will
              be asked for man's password unless your real UID is 0.  Note that
              the -m option is required since user “man” does not have a valid
              shell by default.  In this example, -c is passed to the shell of
              the user “man”, and is not interpreted as an argument to su.
       su -m man -c 'catman /usr/share/man /usr/local/man'
              Same as above, but the target command consists of more than a
              single word and hence is quoted for use with the -c option being
              passed to the shell.  (Most shells expect the argument to -c to be
              a single word).
       su -m -c staff man -c 'catman /usr/share/man /usr/local/man'
              Same as above, but the target command is run with the resource
              limits of the login class “staff”.  Note: in this example, the
              first -c option applies to su while the second is an argument to
              the shell being invoked.
       su -l foo
              Simulate a login for user foo.
       su - foo
              Same as above.
       su -   Simulate a login for root.


You probably still want the MAN_KEEP_FORMATTING=1 part, to keep colorization and bolding etc. in the manual page. Also, your solution does not respect the user’s pager preference; the user might prefer to read man pages in “w3m”, for instance.


>You probably still want the MAN_KEEP_FORMATTING=1 part, to keep colorization and bolding etc. in the manual page

Underlining is the only formatting I care about and that works without MAN_KEEP_FORMATTING on FreeBSD.

I don't have colorization enabled in man pages on my laptop, and also I don't have bolding enabled either. I like it this way.

>your solution does not respect the user’s pager preference; the user might prefer to read man pages in “w3m”, for instance.

The pager preference of the user in this case is `less`. I know because the user happens to be myself :^)


> Famous example: trying to exit vi

The interesting thing here is that in order to make this argument people forget that at least some new users know the "Press F1 for help" dictum, because that particular part of Common User Access was drummed into them from the start of their encounter with computers. The new users press F1 in vim (not vi) and how to exit is the second and third items on the screen.

(Press F1 in actual vi, and, in some terminals at least, it very informatively inserts the letter "P" into the document. (-:)


i agree. I think it's the problem that improvements for one group would hurt the other and power-users (man, that sounds stupid!) don't want to develop features that don't provide value for them, or would worsen their experience. I wouldn't want to!

I am getting really comfortable with unix and i can say that the transition on a mac is not bad! I started using it with only rudimentary knowledge about how to navigate the shell, but you really don't need it for most parts. When you start digging deeper you explore more and more of the unix-backend until it's like a second face to the computer. SSHing into a server is no inconvenience anymore etc.

I think this is the reason for developing simple, stupid GUI programs that help beginner do beginner stuff. They will (sooner or later) obtain knowledge of the terminal, but i think it's critical that the first steps are not too challenging.

Note: i am speaking about developers. I don't think non-developers need to know how to navigate the terminal. It just doesn't provide any value for them.



Having debugged Linux kernel and userspace programs extensively, I'd say this rule is golden. Typical user don't need logs, unless something is really f*cked up. On the other hand, if you are running production or development machine, you can enable as many log messages as you want as most of it can be turned on via /proc/sys/kernel/printk (kernel messages), program parameters or writing your own specific messages in code if that's not enough. Actually, I more often encounter the opposite problem -- there are so many log messages, that it's hard to find a specific problem among them.


> there are so many log messages, that it's hard to find a specific problem among them.

This is exactly my experience, too.

Some programs get this even worse: They spam you with lots of useless information, yet when something goes wrong, you don't get the information you need. Instead, you have to rerun it to increase the verbosity even more.

So ... if I have to rerun and having a hard time trying to reproduce the issue anyway, why did it spam me in the first place?


This is a part of the unix philosphy I often forget, but agree with just as much as the rest.

As an example: I love curl for piping the data to stdout per default, but I'm frequently annoyed by the progress bars I didn't ask for, especially if a script involves multiple curl commands.


Seconded. Though curl isn't the worst offender in that regard. Anything that touches TeX drives me nuts with its blatant disregard for this rule. Not only does every TeX engine spew pages of output while processing even the simplest documents, but there is no way to turn it off. (It's hardly the only way in which TeX makes my blood boil, but it's the most visible one.)

FFmpeg is also quite bad here, but at least you can use -hide_banner and/or -loglevel to alias the problem away and mostly forget about it.


Combined with the fact that they also spew out a number of files that you don't want, because it isn't capable of traversing the source more than once.


One of the things I liked about typesetting with troff is that it follows Unix conventions (duh).


Modern Unix tools like curl are like swiss-army knives with regard to the numerous combinations of options that they accept. A particular combination of options may change the behaviour of the program in a way that it still succeeds, but does something slightly differently.


Spolsky addressed this cultural difference between the Windows and Unix world in 2003: https://www.joelonsoftware.com/2003/12/14/biculturalism/

At its core, it is about putting humans before computers. Engelbart coined HCI as Human Computer Interface, not CHI. This philosophy steered my product designs ever since I read that as a teenager.


Honestly, I don't know if the rule of silence is actually all that good of an idea. Unix already gives us stdout vs stderr; it's one thing not to write useless information to stdout, but it could be useful to have a stdinfo or stdlog or what-have-you.

Granted, with too many options it could quickly get confusing (should this message go to stdout or stdinfo; is that message more informational or more debugging?), but I think that it could be managed.

Similarly, I think that Unix fell down by relying too much on unstructured text (in the sense that the structure isn't enforced, not in the sense that it's altogether absent): because of this, every single tool rolls its own format, and even very similar formats may have subtle incompatibilities.

I'd love to see a successor OS which builds on the lessons of Unix, Plan 9 and other conceptually brilliant OSes, but I fear the world will never see another successful operating system.


> Similarly, I think that Unix fell down by relying too much on unstructured text

This is what made Unix last. Text and keyboards are the universal computing interface that has survived since the 1970s.


Note that I'm not arguing for structured binary data. Structured text (e.g. s-expressions, JSON, even the-extensible-structured-text-format-which-shall-not-be-named) can last just as long. Indeed, S-expressions have existed since the 1950s.

There's no particular reason why /etc/passwd couldn't be:

    ((root nil 0 0 root /root /bin/bash)
     (daemon nil 1 1 daemon /usr/sbin /usr/sbin/nologin) …)
There are any number of similar dialects which could be used, of course, but the principle is obvious.


What would your example achieve? You're making the format more verbose and error-prone (someone might easily forget to match a paren), without imposting any additional structure over what is already implied by line breaks.

Though I do agree with your overarching point that some of the formats/outputs could do with a more consistent structure. Perhaps something like YAML would strike a good balance between structure and conciseness/readability...


The example is not really interesting, because as you said there is already a simple structure. But programming with text becomes really tiresome after a while. For example:

    git blame -L ${LINE},+1 --porcelain ${FILE} | sed -n '/^author / {s/^author //; p}'
I would rather:

    (name (author (first (git blame file :line line))))
... where git blame returns a sequence of "blame" data for which I can retrieve the author easily (if you prefer pipes over function composition syntax, use threading macros). Then I don't have to worry about strange characters crashing my scripts randomly. Suppose I forgot to add the "^" symbol in my regexp (I can assume this, since you assume people forget parentheses), there could be situations where I would match too many lines.


> You're making the format more verbose and error-prone (someone might easily forget to match a paren), without imposting any additional structure over what is already implied by line breaks.

It's already error-prone (as anyone who's ever incorrectly edited /etc/passwd knows).

Ultimately, structured data (which is pretty much all data) should be edited with structure editors. Good text formats make it easy to write such structure editors.


> What would your example achieve? You're making the format more verbose and error-prone (someone might easily forget to match a paren), without imposting any additional structure over what is already implied by line breaks.

That particular example is fairly straightforward (at a simple level, passwd files aren't complex), but being able to express arbitrary nested structure would make various things a lot more straightforward. Line breaks and some sort of tab/colon/what have you work fine if everything has at most two levels of hierarchy, but it starts being painful after that.

Missing matched parens are a bit of a specious argument, since many of the random formats for files are fairly strict about what they parse, and the ones that matter (e.g. passwd, sudoers, crontab) are conventionally edited through tools that check the syntax before committing.


Maybe you shouldn't use arbitrary nested structures where you can go without?


Beyond text, structured data can give you actual functions or object instances. You can inspect a path object without dealing with escaped slashes, etc. For example, that's how I understand the "capabilities" security model: once you authenticate, you get an object which allows you to perform some tasks (https://en.wikipedia.org/wiki/Capability-based_security).


But column-oriented output is still structured and tools like AWK are meant to be a "programmable filter" on it. Reading or outputting deeply nested structures like JSON or S-exps would make it less practical to pipe programs together and instead have big "monolithic filters" with lots of options.


Parsing columns of text with awk or any other text tools is vexingly problematic. How do you even define "column"? What if columns contain whitespace? How do you differentiate that from the whitespace between columns?


That just illustrates the limitations of the pipe abstraction, as something only able to send sequentially-read unstructured data.

Also, jq, jshon and similar tools are a thing.


> This is what made Unix last.

An unfortunate turn of phrase. It's ambiguous. Do you mean last as in "endure"? Or do you mean last as in "last place"?


both :-D.


Emphasis "are". Still.


There are tons of good things to be said about "the unix philosophy". The Philisophy itself is good (do one thing, play nice together) but the implementation is crap to be honest.

The worse thing about the unix philisophy implemented as a unix shell environment is that programs (often) have only one interface, and that's used both for interactive use and as a programming API. This means, for example, that when we realize git version N has terrible default behavior given some arugments, we can't fix the behavior of those arguments t in git version N+1 because we would break it's API.

And yes of course - structured in/out, sane encoding handling etc. is just missing.


If looking at the history of tech, but als the general history of humanity, has taught me anything, it's that assuming the current systems will never fall and be replaced will always end up making you look like a fool eventually.


To paraphrase Keynes, in the long run we are all fools. (Except for Alan Kay, apparently)


One consequence of "the rule of silence" is that sometimes it is not obvious if a command is processing data or waiting for input, there is no visual difference.


That's more an occasional annoyance than a real problem though. The only commands that don't prompt when waiting for input are commands that are written specifically for stream processing from stdin. In which case the user should already be aware of it's behaviour because they're either already familiar with the command or have have consulted it's man page (or similar reference) before executing it. So the unprompted waiting for using input is a bug in a shell script / command line pipe and thus usually pretty quick to debug.

Going back to the man page point: I will granted you that not everyone does check what a program does before running it. Sadly in those instances there's little you can do to protect them from themselves. It's similar to how you cannot protect people from blindly copying and pasting code from the internet. If someone is willing to run a command "blind" then the usefulness of the output is the least of their worries.


I was more thinking of a situation where you have misunderstood the syntax of a command or forgot to type an argument.


I did cover that point. :)


This is why well-behaved programs prompt for input. Like bash, e.g.


With cat being the notable exception :-)


And grep.

Programs that use the input and output to carry data around can not show prompts to the user.


A workaround is to prompt on STDERR. Data input will enter the same place, and output will be distinguished from the text for prompts.


You can often just check by hitting ^D, but yes, this is a workaround, not a fix.


Or if it has failed silently.


"Rule of Silence" sounds so dark. Nope, I'll stick with "no news is good news".


I think it fits well into the whole monastery style type of UNIX philosophy.


> There is no single, standardized statement of the Unix philosophy

They could at least link to tAoUP[1].

[1] http://www.catb.org/esr/writings/taoup/html/ch01s06.html


I love this rule, as it is the opposite of what modern computing does. When i plug something into a Windows PC i get a multitude of beeps and popups saying it did everything right. But if something goes wrong there is a eerie silence and i have to dig into the error hex dumps to hope i find anything useful at all (or just reboot and hope it works right on second try).


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: