Hacker News new | past | comments | ask | show | jobs | submit login

That's my primary issue with UNIX culture. It took a huge step backwards by deciding to work with unstructured text. It wasn't a wrong turn, mind you. It was backtracking on known and understood best practices all the way and then picking a wrong turn. And only now people seem to rediscover what was in common use in the era before UNIX - the virtues of structured text.

I guess our industry is meant to run in circles, only changing the type of brackets on each loop (from parens to curly on this iteration).




It's not like unstructured piped text is the only possible way to work. It's widely used precisely because it's so expedient. If you use structured data, then every program in the sequence has to understand the structure. If you just smash everything flat into a stream of text, you can then massage it into whatever form you need.

It's not always the best way to approach a problem, but it's not meant to be. It's duct tape. You use it where it's good enough.


The way i see it, shell scripting and piping allows non-programmers to get their feet wet, one command at a time.

You run a command, look at the output, now you know what the next command in the pipeline will see, and can add adjustments as needed.

Powershell etc seems to be more programmer oriented in that one keep thinking in terms of variables and structures that gets passed around.

And this seems to be the curse of recent years. More and more code is written by programmers for programmers. Likely because everyone has their head in the cloud farms and only the "front end" people has to deal with the users.

UNIX came when you risked having admins and users sitting terminal to terminal in the same university room. Thus things got made that allowed said users to deal with things on their own without having to nag the admins all the time.


UNIX actually came when users were programmers at the same time. There was an expectation in the past that using computers involved knowing your way around your OS and being able to configure and script things.


If you use structured data in a standard format, you can have a single system-wide implementation of a parser and then have each program process the data it needs, already structured in a semantically-meaningful way.

In current, unstructured text reality, each program has to have its own (usually buggy, half-assed) shotgun parser, and it has to introduce semantic meaning back to the data all by itself. And then it destroys all that meaning by outputting its own data in unstructured text.

It works somewhat ok until some updated tool changes its output, or you try and move your script to a different *nix system.


But if your data is structured in a semantically meaningful way, then your receiving program needs to understand those semantics. Maybe you could introduce new command line tools to refactor streams of data so as to adapt one program to another, but I can't see it being simpler and quicker (in terms of piping input from one place to another) than the current approach.

I do like the idea of a standard parser to avoid the ad-hoc implementation of parsers into everything.

Your last comment gives a hint at the real problem, which is people using command line hackery in 'permanent' solutions. It's duct tape. You don't build important system components out of duct tape. Well, you shouldn't, anyway.


I can't disagree there. I've used SNMP in anger before. Underlying SNMP is the MIB.

People ran away screaming from it :)


Structured text is good. Very good, in fact. It might even be idea. structured binary data, less so, at least as a storage format.

I want to be able to look at your file format using tools that haven't been specialized to the task. Is that so wrong?


Well, I'm advocating for structured text, not binary. Mostly because I haven't seen a future-proof binary format yet, and editing binary formats indeed would require special tooling. I think - for a data exchange protocol meant to be used between many applications - going structured text instead of binary is a worthwhile tradeoff of little lower efficiency vs. much better accessibility.

EDIT: Some comments here are slowly making me like the idea of a standard structured binary more and more.


Let's say I want to know how many days since an asset on my web server has been modified. With bash + some standard unix tools, from the top of my head I have to do something like this:

    curl -svo /dev/null http://example.com/file 2>&1 | grep Last-Modified | cut -d ' ' -f 3-
And that's just to get the last modified date in text form. Now I'm writing a script that parses that date and gets today's date, convert them to days, and subtract. YUCK!

Wouldn't it be nice if your shell could do this?

    curl(http://example.com/file).response_headers.Last-Modified.subtract(date().now).days
I think it would be nice.


Yes, but as it stands, most of those would have to be builtins for that to work. I would rather have:

   curl -j example.com|select headers.Last-Modified|time before now|time to days
Where the commands send and receive JSON.


Can you name me a single UNIX configuration file whose format is not "structured"?


Config files are less of a problem. The issue is with programs, which you want to use with pipes. Each has its own undocumented, arbitrary pseudo-structure with often inconsistent semantics, optimized for viewing by user.


What programs are you thinking of? Maybe this is my sysadmin bias but about 90% of my UNIX tools usage is on config files...


Let me suggest an incendiary example: systemd

systemctl status wibble.service : Displays human-readable information, with colours, line drawings, and variable-length free-form text; that is exceedingly hard for a program to parse reliably.

Contrast with

systemctl show wibble.service : Outputs machine-readable information, in systemd's favourite form of a Windows INI file.


Yikes. Makes me glad I run Slackware. Though I seem to recall from The Init Wars that it was precisely this quality of SystemD that made people lob the charge that it violated the "Unix philosophy"


ls, ps, du, df, ... pretty much all CLI tools. The kind you use in scripts.


All four of which users are enjoined, over and over again, not to try to parse the output of (particularly ls). That is, those are the tools specifically not meant to be connected by pipelines, but merely used for operator convenience.


...But how do you extract that data otherwise?


Instead of ls, find. Instead of ps, you parse the nodes in the /proc filesystem that ps itself parses. Ditto the /sys filesystem and du/df.


When those tools were written, /sys and /proc didn't exist.


It's only "specialized" because we haven't been doing it so it's considered special.

At some point you have to admit that what's meant for the computer is not always byte by byte the same as what's meant for the human.

We try to shove these two together and we screw up both of them.

Empower the computer to be the best that it can be by taking the human out.

Empower and respect the human by giving him/her their own representation.

The "I just want to read the bytes that are on disk" philosophy is inherently limiting and broken when the audience are two very different things (humans vs computers).

My argument is that instead of fighting that we must embrace it.


Yes, because it's: a) inefficient b) invites a lot of wrong assumptions about the storage format, e.g. about the underlying grammar, the maximum length of tokens or the possible characters than could occur in the file. c) requires you to solve the same problems over and over again (how to represent nested lists, graphs, binary blobs, escaped special characters, etc) d) encourages authors to roll their own format and not research if there maybe is an existing format that would solve their case.

I agree with you, the other extreme - treat binary files as sort of opaque black boxes that you can only access with specialized tools beloniging to your application - is even worse. But I don't see why we can't reach some middle ground: have a well-documented, open, binary format that encodes a very generic data structure (maybe a graph or a SQLite-style database) and a simple schema language that lets you annotate that data structure. Then you can develop generic tools to view/edit the binary format even though every application can use it in their own way.


Parsing structured text is slow and inefficient. Also reading out just one part of a data structure stored as text often requires either walking through the file character by character or first slurping the whole thing into memory.


...But when your system crashes, having all that data in an easily accessible manner (regardless of what tools you have on hand) is a major win.


Let's not forget that when your system crashes, all of these easy-to-read text files are actually stored in a binary format, sometimes scattered in pieces, and require special tools to extract.


True, but they're less fragile: even if the text is garbled, you might be able to get some information out of it.


A text editor is specialised to the task of viewing and modifying text in a certain format.


Wrong. It's specialized to viewing and modifiying any text, regardless of format. That's a huge difference.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: