It's still a good read, not only as a historical artifact, but also because many of the criticisms still apply to the spiritual descendant of Unix, Linux. It's funny to see that 20 years ago people were already complaining about X window system or Unix security philosophy.
> Expect this stuff to change drastically in the next 5 years.
Bwahaha... is this your first encounter with backwards compatibility?
Changing this would be very hard, but this has nothing to do with somebody embracing some grand philosophy, but with tons of existing software relying on Unix like systems to, well ... be Unix-like, because that's what the software was written for.
I've given it a lot of thought, and below is my one "reformist" idea to significantly improve things without breaking back compat so bad
Invert the shell and terminal: every shell command (with unredirected streams) gets it's own pty.
- Like the those repl "notebooks" you can scroll each output separately, without hacks.
- Backgrounded processes won't spew garbage when you are trying to type. They just continue running in their terminal. You can even background something like vim or a game that takes the whole terminal, and it's just fine!
- Rather than hacking scroll back onto processes that might use tty stuff in expected ways, just straight-jacket them by giving them plain pipes instead! The OP exploits are then opt-in impossible.
- Having powershell-esque "post-text" commands that dispense with the bullshit are far easier to manage: also give them regular file descriptors and negotiate protocol or whatever, just like the above, rather than a pty.
- stdin can be a nice form with submit button rather than fixed (usually line) buffering policy.
- shell might even be simpler as there is less careful juggling of shared resources except where the programmer asks for it.
I also use ssh multiplexing instead of tmux for extra shells. When I do need to use tmux, I hate that read only mode prevents scrollback. The above builds nicely upon this:
- The persistent remote shell has as many ptys as needed before for chained commands which might still be running, or separate "sessions"
- The local client can use native UI elements for everything, no TUI jank.
- Scrolling over many (live or saved) terminals, the normal scrollback case, is possible and very side-effect free.
Anyone see any issues? I don't for anything I do daily (git, coretuils, etc.)! I don't think I have time to make this, so please someone else do.
You can suspend vim. If you try to run it in the background (e.g. with the bg command) then it just suspends itself again. To actually run it in the background on a shared terminal you need something like screen or tmux… which assigns the process its own pty.
Perhaps for most use cases involving vim it doesn't make that much difference—what would it be doing in the background anyway?—but there are other TUI programs that could be doing useful processing in the background if they didn't need to share the pty with other processes.
I’ve given this a lot of thought over the years and the biggest conclusion I have is that there’s no real “killer feature” in any of this that makes it spectacular, and maintaining backwards compatibility would be critical to let anybody use it. Still, I like the idea of splitting each command into a separate PTY, I wonder how good a TUI replacement for tmux built around this could be (yes, gotta start with the TUI before you think about a completely new GUI, again for the interop).
It's not a direct match, but I think you'd be fascinated by aspects of Plan9's model. It certainly comes closer to what you've described than Unix does. Depending on exactly what you mean or you're envisioning, it might be really close.
The problem, of course, is this either breaks existing software or requires new software to be written from scratch to utilize the new frameworks. This was (one of) the reasons plan9 never gained much traction, and the big hurdle for any contender to the TTY/shell throne.
(It's also worth pointing out microsoft themselves reworked their terminal interfaces to be more unix-like for this very reason -- software compatibility)
You’d need to implement a new protocol between terminal and shell. Maybe the shell would be in charge - allocating PTYs and multiplexing the output back to the terminal. Or maybe the terminal would be in charge where it would allocate PTYs for children and send them to the shell using FD passing. There are advantages both ways.
Integral to its success though would be getting the support for this new protocol upstream in bash. Bash is the most widely deployed shell and the only way to get to the point where you can log in to a new machine (or over SSH) and expect this to “just work”. Without that it would remain a niche and could join the graveyard of other improved shells/terminals that approximately no-one uses.
As such the protocol would have to be minimal and easily implementable in C. Systemd’s protocols could be an inspiration. I’ve implemented both sides of sd_notify and LISTEN_FDS before, in more than one programming language and it was very straightforward.
No, I'm saying this because there exists mailing lists and chat rooms where all of the terminal emulator developers are discussing these problems to collectively solve this issue.
The feature is "everything is text". The bug is the misunderstanding that "every text is identifiable". That names are garbage in, garbage out. Unix names are far worse than non-unix names in this regard.
Terminal sequences, unicode all violate the identifier rules. There is no u8ident library to sanitize this input or output.
When I type "ls", I'm getting a text-serialized list of objects. Why can't I just get the (serializable) list directly? So that I don't have to mess with (implicitly) converting text back to objects, often involving regex and hacks.
Because the objects are probably either relatively deeply encoded inside hundreds of plain-C stack and heap locations, or they're not even fully resident in RAM anymore by the time output occurs.
It's not that uncommon for ol' C hackers to directly write those stack and heap locations out to disk and call it a "file format." Trouble is, you're almost entirely at the whims of your platform and compiler as to what the actual layout of that is.
If you're thinking, "well that's dumb, why doesn't C have a standardized representation for those in-memory objects that hides platform differences", it does: printf and scanf.
Text isn't necessarily a great answer to that problem, but it definitely is an answer. Others include packed structs with htonl and friends and low-overhead serialization formats like protobuf, Thrift, and Avro. Inside, say, Google, you have "everything is a protobuf" instead of "everything is text," and it does end up working roughly as well as you might expect. That is to say, reasonably well, but with its own sets of problems that people won't ever stop complaining about.
You are not supposed to parse "ls" (unless this is "ls -1"), because that format is only for humans, and defaults change all the time.
If you are parsing "ls -l" output, you are doing something wrong.
Use your language built-in features, every language has them (for example in bash, use *-expansion and [-commands). If they don't work for some reason, there is "stat -c" and "find .. -printf", which both produce text which is absolutely trivial to parse.
The reason you're not supposed to parse ls output doesn't have to do with the default formatting; I'm quite sure all widely used implementations of ls have always printed the equivalent of "ls -1" when the output is directed to a non-tty. The actual problem is that UN*X paths can contain all printable characters, including newlines, so if you don't plan to place additional restrictions on the file names you support, you can't at any point parse ls.
Of course, ls does more than just list file names so it can be tempting to utilise its features. coreutils ls has (relatively) recently received an additional output format that can be unambiguously parsed, but that's as far as the portability goes.
Like with Git, the usual set of common *nixy tools (coreutils as well as certain shell built-ins) contains tools in both the 'porcelain' (meant mostly for human consumption) and 'plumbing' (meant mostly for scripting and constructing pipelines) categories. Perhaps tutorials and reference documentation should place more emphasis on which is which, and why. Using the right tool for the job leads to better results and less frustration.
> If you are parsing "ls -l" output, you are doing something wrong.
> Use your language built-in features
The issue is that current shells _don't_ have a built-in structured data type for e.g. a `struct stat`. But why not?
`ls` calls some APIs and gets some in-memory `struct stat`s populated by the kernel. Then it throws away 90% of the data that the kernel copied to user space and then serializes it as text. Why not pass the structs themselves to the next process? We can't currently (except with powershell?), so you have to write actual code.
This is a "bright line" between code and shells that could very well be blurred, but it would take an agreed-upon serialization format for posix-y data structures.
One of the big advantages of shell is exploratory nature -- you write pipelines one step after another, looking at the intermediate result at each step. Any sort of complex serialization will break this.
This is why in shell, if you want programmatic "stat" output, you use "stat" tool, not "ls". "ls" prints all at once. "stat" has a custom output format, which means you print exact fields you want, so your intermediate results are concise and readable. And since every element of the "struct stat" is an integer, a simple space-separated format works very well.
(that said, I would not mind seeing more tools print JSON or JSONlines. I add this functionality to many of the tools I write, and it is pretty powerful in conjunction with jq)
If you are in a situation where you are parsing the output of ls you are already writing code, and a shell designed for interactive use is a terrible tool for that job. Use something else. There are many many more better high level programming or scripting languages, which makes it both easier to write parsing code and removes most need to do so in the first place, since the apis usually give you structured data in the first place.
Make it auto-serialize to text if the receiving end doesn't understand lists. Working with text directly doesn't get me any type safety, or any safety at all (unless I'm writing a one-off, where it matters less).
Ok cool I can pipe ls into gz and pipe further into scp. Now what? How do I use it on the receiving end after unzipping it again? You still have to parse it into a structural list of files over there somehow to make use of it? Not everything is a tunnel.
Structural data can still be serialized easily, take json for example which’s spec fit on a napkin.
Text is fine for humans. The trouble comes when you want the machine to do anything other then simply print out the literal text (and even that's tricky, what with unicode and fixed width columns).
The moment you want to do something else with the text, you have to parse it in some way. At that point it's no longer text, it's a poorly specified serialization format.
It's not really good, but calling it abysmal is just too negative in my opinion :) It does get a lot of things right after all, especially in the "not everything is text" department. Also a lot of things wrong (my main gripes would be a lot of the syntax in general, case insensitive, usually more than a couple of ways to achieve the same thing). So in the end the learning experience/curve for me was the same as for bash (or languages like C++) and just as long/steep probably: feels like suffering until you've learned all the caveats by heart and in the process of doing that also learned how to actually use it, then it becomes ok to use. After that process I'm leaning towards favoring PS over bash though. Now it's possible there are already other shells which get more things right but I really doubt that in my lifetime I'm still going to spend (waste?) time on learning yet another shell. Or it must be convincingly good and provably fast to learn :P
I think one of the biggest peeves of mine with powershell is they dumped you into a god awful console. I dunno if this has changed much but good lord was the console horrible. No resize, nasty font, nasty color scheme, scroll back sucked, command history sucked, pain in the arse copy & paste.
Also the commands were super long. Sure there was the ability to alias them to something shorter but still....
That being said... powershell has a lot of potential.
(Also I’m woefully out of date... I haven’t touched powershell in a decade. I hope somebody tells me it’s all better now)
And yet the web is text too - and has lots of fun vulnerabilities relating to failure to escape content.
I do not expect text to be significantly replaced any time in the next 5 years. Accessibility - screenreaders, increased text size with text reflow - is one massive reason.
I'd say, what we need is a --json flag on all pipe producers like like ls, exa, find, grep, ripgrep. Then you'll have the best of both worlds: something that is equally(?) parseable by humans and computers. (jq to the rescue)
Adding a --json flag is just the start. Ideally, other grep-like programs would need would want to emit the same format. But what happens when some grep-like programs have additional features?
Suggesting a structured representation as an output format isn't a panacea. It's just the start and it's not clear at all to me that it would be better than what we have now.
I agree there is another can of worms waiting to be opened: agreeing on the schema of said json output. Also structured output is probably way more useful downstream from the likes of ls, exa, ps (say) rather than from grep like tools
I have created JSON schemas for the output of dozens of commands and file-types with my jc[0] project. I tried to keep them as flat as possible, but there are many other ways it could have been done. Creating the schema many times requires more thought than writing the parser itself.
With NUL-sepearated output instead of line output people are doing fine. It's unstructured for sure, but how much structure can you put into something as general as grep?
grep can emit structured GrepResult records that include things like
- the pattern that was searched for
- the search options from the invocation (case, line folding, etc)
- the text of the matched segment
- the byte range of the matched segment within the file
- the line number(s) of the matched segment within the file
- the file name
- contents of any (named) capture groups from the search pattern
I think a huge factor is the dominance of the keyboard as the most used physical interface. Even in the emerging age of VR, keyboards are everywhere, even in software form in phones. I guess once we can solve language recognition, so it maps 1:1 with the input somehow, then we could have a voice-only terminal/shell equivalent. The building blocks for that would look different and not just saying commands out load.
I see the appeal, however outside of areas where you completely own the stack (like inside a company) it feels idealistic to imagine any standard being close to text in terms of pervasiveness.
I also worry about a bad standard becoming popular a la most of the web
I like the idea of Powershell and how it's object driven as well, but it doesn't get you away from text in the sense of you're still typing commands into a console.
Isn't the real solution to do away with the idea that we need to remain compatible with VT100 video terminals?
That's not what they are referring to. Powershell passes typed objects between cmdlets when pipes are used. This DEFINITELY better than plain text.
We've learned a lot since the unix philosophy was created. One of those things is that, if you want software that lasts, software that performs well, you need to be strongly typed.
Plain text is as far from strongly typed as you can get. Passing plain text between programs in 2020 goes against everything we've learned in the last 50 years.
Ad-hoc duck typing vs strong typing is a pragmatist vs theorist argument.
In theoretical terms, strong typing every, always is preferred. In practical terms, this devolves into a Java-esque nightmare of what's-the-type-for-the-parent-type-of-the-thing's-type absurdity. Where people stop using your system because they don't want a second job learning its Borges-esque comprehensive type encyclopedia.
In practical terms, bare minimum typing that prevents 80% of errors, but doesn't require gyrations to cover the remaining 20% works. And given a choice between the two, more software is going to be built in systems that function this way.
Does it break all the damn time? Yes. Does it still get more done? Yes.
>Passing plain text between programs in 2020 goes against everything we've learned in the last 50 years.
I would have to say the opposite is true.
It's text based interfaces that have stood the test of time, and now even flourish in HTTP. We have more widely used text based interfaces than ever before where REST,HTML, XML and JSON seem to rule most machine-machine communication.
Strongly typed binary RPC such as DCOM and CORBA however are now nearly gone.
> We've learned a lot since the unix philosophy was created. One of those things is that, if you want software that lasts, software that performs well, you need to be strongly typed.
Unix itself, for example, won't last even 10 years. Look for it to be gone by the mid-late 1970s as it gets replaced by strongly-typed objects passed directly between applications to control our fusion-powered flying cars.
Our continuing mission: To explore strange new platforms. To seek out new bugs and new software. To boldly shitpost where no one has shitposted before.
Its things like this that make me think terminals are irredeemably broken. One does not simply print a string to a terminal. I'm not aware of a standard way to handle user strings safely like you can on say, the web, or any other UI platform.
The Unix way seems to be to not solve the problem at all, unless you're writing a text editor, and then you reinvent the wheel and solve it yourself.
My vtclean project [1] cleans terminal escape sequences. It handles the full ECMA-48 spec, and every behavior described by all of the vt100 documentation I could find, and observed behavior of many command line programs and terminals.
It has good test coverage, fails safe (if it doesn't understand a sequence, it still strips the escape character), and can preserve colors in cleaned up text. It is directly and indirectly used by quite a few projects. It receives extremely few bug reports, because I took the time to do it right.
If you want a good way to do this, probably start here. It's three short regular expressions and a simple state machine (to clean up single-line edits).
My only complaint about the code at this point is it operates on string instead of []byte, but I don't want to break the API and nobody has complained.
It could be interesting to extract this to a context-free grammar that could be implemented in different languages.
There are many reasons users might not want/be able to use Go, or any other language this might be published in. Given the terminal's provenance I can't help but be absolutely certain (despite having no substantive evidence) a lot of people working with C in Interesting™ situations would love something like this, for example.
(I must admit I skimmed it so fast (on my phone) I didn't realize the comment about RGB colors was only referring to the line directly underneath... and then didn't stop to think "RGB can't need all that", woops.)
The nice thing is this is definitely simple enough to be able to just directly port somewhere else, which I guess would make extracting a higher level representation premature-optimization-level superfluous complexity.
Thanks for writing this! I used it a few years ago to remove the ANSI output from HP/Aruba network devices that clear your screen or line which was a nightmare for scripting. It works like a dream.
Except, could you imagine a software ecosystem where 95% of the software is vulnerable to injection, and nothing is ever done about it, and there is no clear easy solution for developers either, and you have a legion of users who turn technology into a religion and think there's nothing wrong at all with Unix terminals.
It also probably doesn't help that the problem is unfixable in general without rewriting so much software... and that things mostly work okay, most of the time.
> where 95% of the software is vulnerable to injection
As opposed to what, the ẃ͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́ͥ́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́̂́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́ͥ́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́᷅́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́ͥ́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́̂́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́ͥ́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́ͅͅͅͅͅͅͅͅé͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́ͥ́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́̂́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́ͥ́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́᷅́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́ͥ́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́̂́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́ͥ́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́ͅͅͅͅͅͅͅͅb͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́ͥ́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́̂́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́ͥ́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́᷅́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́ͥ́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́̂́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́ͥ́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́ͅͅͅͅͅͅͅͅ? :)
I'd posit that in this case it's git's failing to pipe content from some random internet connection unfiltered to the terminal. The terminal doesn't know where those control sequences come from. But anytime you interface with something external, you have to be prepared for hostile behavior or content that's out of spec. In this case it's git that interfaces with something external. Remember that any software can be hacked or abused.
Some terminals support the C1 control set, and in particular 0x9b as the CSI is dangerous (it's a single character version of the ESC-[ CSI). So you have to watch out for that too.
Right... and you have the Windows-1252 versus iso-8859-1 situation, which only differ in that the former puts printable characters where the latter has C1 control codes.
Of course your code is a library so you have the library author's prerogative to punt that problem to the users :)
What's unsafe about it? It doesn't break out of the terminal as a system that shows data to the user. The fact that you can sneak data that looks weird doesn't make it unsafe.
Some terminals support a programmable answerback string, and of course the ENQ control character to send it to the host. Some have programmable function keys and a command to read them, which often (always?) sends the programmed string as consecutive unencoded characters (instead of hex or something).
Off topic, but the person posting this mentioned the artist who did the avatar and of course I love their art style, and of course they aren’t taking commissions right now. How does one find good pixel art artists who might do a small commission or three?
Fiverr? I've had a generally decent experience with it, but I also follow the rule of three: always hire 3 different freelancers for any task, unless when I have established relationships with an artist who did very similar things in the past.
They have some awesome artists, there, but I have had bad luck getting through to them on DeviantArt. There's a couple of artists that definitely could have made some money from me, but couldn't, because I couldn't contact them (or they ignored my contacts).
It's usually not hard to follow a few pixel artist and get a bunch suggested to you on Twitter. I know Kiana Mosser should have commissions open within a few weeks. https://twitter.com/kianamosser
Many years of experience have taught me that fields holding data being able to hold any sequence of bytes is normal (and in the case of string data, anything except 00), so this didn't surprise me too much; in fact, it's usually when something is unusually restrictive that it becomes notable.
I was honestly more surprised at how much data I could fit in that field. Git itself doesn't appear to have any limit at all, though GitHub doesn't allow more than a few dozen megabytes.
You can also use emoji to name any object in Active Directory, which is a heck of a lot of fun if you want to confuse the IT guy that needs to remote into your machine for maintenance.
Interesting. HN will let me submit a comment that contains emoji just fine (no error in dev tools network log or UI) but it doesn't show up in the thread (at least for me, if there's a stream of emoji from me testing that I can't see I apologize).
The problem with this attack is that you have to drop the 6c file in your PATH for it to work, because otherwise you need to use ./ for it to execute. This makes the attack pointless because if the attacker can drop something to your PATH, you're already pwned since the attacker can just name his payload "ls" and wait for you to execute it.
Good thing I normally reach to "less" instead of "cat" nowadays. You get scrolling, and your terminal is safe from malicious injections or (much more likey) binary garbage.
And "git" for example applies "less" to pretty much all output by default, which makes most of the git-based attacks, including this one, irrelevant.
For animation, the real issue is frame timing. It looks like there's an animation API in kitty that allows this sort of control, so it'd work there, and iTerm2 supports inline animated gifs which I've already demo'd.
I'm not sure if there are any format specifications for the author and committer lines in a git commit message object. I guess it really depends on whether commands that are used to display the information (git show, git log, etc.) can handle arbitrary data. I wonder how commands like git format-patch and git am would deal with a commit like this.
It can probably be done in grayscale, with even some dithering with shading characters from Unicode Block Elements to support Terminal.app and other emulators.
why is this so astounding to those commenting on twitter? because git doesn't artificially limit the line length?
i've used ansi sequences in my zsh prompt for 20 years to make colors and move the cursor; it's just in-band ascii that is interpreted by the terminal emulator, no?
...and for this "display some art" hack specifically, it looked like it can be made to work with plain git log by using `\r` characters as newlines in addition to the escape sequences, but I haven't tested that much.
https://web.mit.edu/~simsong/www/ugh.pdf