Hacker News new | past | comments | ask | show | jobs | submit login
Terminal escape sequences in Git commit email field (twitter.com/ryancdotorg)
322 points by whack on March 29, 2021 | hide | past | favorite | 157 comments



Related: The Unix Haters Handbook, chapter 6:

https://web.mit.edu/~simsong/www/ugh.pdf


Oh, wow, I'd heard of the unix haters handbook, but I never realized it was an actual published book.


It's still a good read, not only as a historical artifact, but also because many of the criticisms still apply to the spiritual descendant of Unix, Linux. It's funny to see that 20 years ago people were already complaining about X window system or Unix security philosophy.


I will have a lot of fun reading this. Thank you!


I maintain chalk and a slew of other TTY related things.

Terminal emulators are, far and away, the most archaic computer technology we still use daily.

Believe me when I say, it's a very hard problem to solve since everyone has subscribed to Unix's philosophy of "everything is text".

As much as I dislike Microsoft's philosophies on software, they had the right idea with Powershell. Its execution is just abysmal.

Expect this stuff to change drastically in the next 5 years.


> Expect this stuff to change drastically in the next 5 years.

Bwahaha... is this your first encounter with backwards compatibility?

Changing this would be very hard, but this has nothing to do with somebody embracing some grand philosophy, but with tons of existing software relying on Unix like systems to, well ... be Unix-like, because that's what the software was written for.


I've given it a lot of thought, and below is my one "reformist" idea to significantly improve things without breaking back compat so bad

Invert the shell and terminal: every shell command (with unredirected streams) gets it's own pty.

- Like the those repl "notebooks" you can scroll each output separately, without hacks.

- Backgrounded processes won't spew garbage when you are trying to type. They just continue running in their terminal. You can even background something like vim or a game that takes the whole terminal, and it's just fine!

- Rather than hacking scroll back onto processes that might use tty stuff in expected ways, just straight-jacket them by giving them plain pipes instead! The OP exploits are then opt-in impossible.

- Having powershell-esque "post-text" commands that dispense with the bullshit are far easier to manage: also give them regular file descriptors and negotiate protocol or whatever, just like the above, rather than a pty.

- stdin can be a nice form with submit button rather than fixed (usually line) buffering policy.

- shell might even be simpler as there is less careful juggling of shared resources except where the programmer asks for it.

I also use ssh multiplexing instead of tmux for extra shells. When I do need to use tmux, I hate that read only mode prevents scrollback. The above builds nicely upon this:

- The persistent remote shell has as many ptys as needed before for chained commands which might still be running, or separate "sessions"

- The local client can use native UI elements for everything, no TUI jank.

- Scrolling over many (live or saved) terminals, the normal scrollback case, is possible and very side-effect free.

Anyone see any issues? I don't for anything I do daily (git, coretuils, etc.)! I don't think I have time to make this, so please someone else do.


> You can even background something like vim or a game that takes the whole terminal, and it's just fine!

Focusing on this point, you can background vim just fine right now - I have tons of vim processes backgrounded all the time.


You can suspend vim. If you try to run it in the background (e.g. with the bg command) then it just suspends itself again. To actually run it in the background on a shared terminal you need something like screen or tmux… which assigns the process its own pty.

Perhaps for most use cases involving vim it doesn't make that much difference—what would it be doing in the background anyway?—but there are other TUI programs that could be doing useful processing in the background if they didn't need to share the pty with other processes.


I’ve given this a lot of thought over the years and the biggest conclusion I have is that there’s no real “killer feature” in any of this that makes it spectacular, and maintaining backwards compatibility would be critical to let anybody use it. Still, I like the idea of splitting each command into a separate PTY, I wonder how good a TUI replacement for tmux built around this could be (yes, gotta start with the TUI before you think about a completely new GUI, again for the interop).


It's not a direct match, but I think you'd be fascinated by aspects of Plan9's model. It certainly comes closer to what you've described than Unix does. Depending on exactly what you mean or you're envisioning, it might be really close.

The problem, of course, is this either breaks existing software or requires new software to be written from scratch to utilize the new frameworks. This was (one of) the reasons plan9 never gained much traction, and the big hurdle for any contender to the TTY/shell throne.

(It's also worth pointing out microsoft themselves reworked their terminal interfaces to be more unix-like for this very reason -- software compatibility)


Yes please. I've often thought the same.

Mentioned this comment here: https://blog.williammanley.net/2021/03/31/seen-on-hn-invert-...

You’d need to implement a new protocol between terminal and shell. Maybe the shell would be in charge - allocating PTYs and multiplexing the output back to the terminal. Or maybe the terminal would be in charge where it would allocate PTYs for children and send them to the shell using FD passing. There are advantages both ways.

Integral to its success though would be getting the support for this new protocol upstream in bash. Bash is the most widely deployed shell and the only way to get to the point where you can log in to a new machine (or over SSH) and expect this to “just work”. Without that it would remain a niche and could join the graveyard of other improved shells/terminals that approximately no-one uses.

As such the protocol would have to be minimal and easily implementable in C. Systemd’s protocols could be an inspiration. I’ve implemented both sides of sd_notify and LISTEN_FDS before, in more than one programming language and it was very straightforward.


I’m working on exactly this right now. I’d love to chat and show you what I’m up to; if you’re interested, email me at noah@coterm.dev


No, I'm saying this because there exists mailing lists and chat rooms where all of the terminal emulator developers are discussing these problems to collectively solve this issue.


We've tried many times and in many ways, but text is still the main interface for programers (and even for a lot of end user tasks as well).

As long as this interface remains undisputed, "everything is text" is a feature, not a bug. Arguably, one of Unix's greatest.


The feature is "everything is text". The bug is the misunderstanding that "every text is identifiable". That names are garbage in, garbage out. Unix names are far worse than non-unix names in this regard.

Terminal sequences, unicode all violate the identifier rules. There is no u8ident library to sanitize this input or output.


> unicode all violate the identifier rules […] library to sanitize this input

Unicode standardises identifier rules. The default identifier as per UAX31-R1 can be checked with a regex:

    perl -E'say "my_identifier" =~ /\A \p{XID_Start} \p{XID_Continue}* \z/x'
I guess that's not what you mean, so an explanation with some details is needed here.


Yes, unicode is fine. What is broken is that nobody cares. Not even perl5 cares about non-identifiable identifiers.

Btw your snippet does not work for identifier validation. There is much more needed to pass the identifier security guidelines.

But here we just need a simple terminal escape sequence stripping library. Unicode is a bit harder. Only Java, rust and me did that.


> it's a very hard problem to solve since everyone has subscribed to Unix's philosophy of "everything is text".

Look at it the other way round. Embrace text. Text is not the problem, non-text is. You'll take text from our cold, dead hands!


When I type "ls", I'm getting a text-serialized list of objects. Why can't I just get the (serializable) list directly? So that I don't have to mess with (implicitly) converting text back to objects, often involving regex and hacks.


Because the objects are probably either relatively deeply encoded inside hundreds of plain-C stack and heap locations, or they're not even fully resident in RAM anymore by the time output occurs.

It's not that uncommon for ol' C hackers to directly write those stack and heap locations out to disk and call it a "file format." Trouble is, you're almost entirely at the whims of your platform and compiler as to what the actual layout of that is.

If you're thinking, "well that's dumb, why doesn't C have a standardized representation for those in-memory objects that hides platform differences", it does: printf and scanf.

Text isn't necessarily a great answer to that problem, but it definitely is an answer. Others include packed structs with htonl and friends and low-overhead serialization formats like protobuf, Thrift, and Avro. Inside, say, Google, you have "everything is a protobuf" instead of "everything is text," and it does end up working roughly as well as you might expect. That is to say, reasonably well, but with its own sets of problems that people won't ever stop complaining about.


You are not supposed to parse "ls" (unless this is "ls -1"), because that format is only for humans, and defaults change all the time.

If you are parsing "ls -l" output, you are doing something wrong.

Use your language built-in features, every language has them (for example in bash, use *-expansion and [-commands). If they don't work for some reason, there is "stat -c" and "find .. -printf", which both produce text which is absolutely trivial to parse.


The reason you're not supposed to parse ls output doesn't have to do with the default formatting; I'm quite sure all widely used implementations of ls have always printed the equivalent of "ls -1" when the output is directed to a non-tty. The actual problem is that UN*X paths can contain all printable characters, including newlines, so if you don't plan to place additional restrictions on the file names you support, you can't at any point parse ls.

Of course, ls does more than just list file names so it can be tempting to utilise its features. coreutils ls has (relatively) recently received an additional output format that can be unambiguously parsed, but that's as far as the portability goes.


Like with Git, the usual set of common *nixy tools (coreutils as well as certain shell built-ins) contains tools in both the 'porcelain' (meant mostly for human consumption) and 'plumbing' (meant mostly for scripting and constructing pipelines) categories. Perhaps tutorials and reference documentation should place more emphasis on which is which, and why. Using the right tool for the job leads to better results and less frustration.


> If you are parsing "ls -l" output, you are doing something wrong.

> Use your language built-in features

The issue is that current shells _don't_ have a built-in structured data type for e.g. a `struct stat`. But why not?

`ls` calls some APIs and gets some in-memory `struct stat`s populated by the kernel. Then it throws away 90% of the data that the kernel copied to user space and then serializes it as text. Why not pass the structs themselves to the next process? We can't currently (except with powershell?), so you have to write actual code.

This is a "bright line" between code and shells that could very well be blurred, but it would take an agreed-upon serialization format for posix-y data structures.


One of the big advantages of shell is exploratory nature -- you write pipelines one step after another, looking at the intermediate result at each step. Any sort of complex serialization will break this.

This is why in shell, if you want programmatic "stat" output, you use "stat" tool, not "ls". "ls" prints all at once. "stat" has a custom output format, which means you print exact fields you want, so your intermediate results are concise and readable. And since every element of the "struct stat" is an integer, a simple space-separated format works very well.

(that said, I would not mind seeing more tools print JSON or JSONlines. I add this functionality to many of the tools I write, and it is pretty powerful in conjunction with jq)


If you are in a situation where you are parsing the output of ls you are already writing code, and a shell designed for interactive use is a terrible tool for that job. Use something else. There are many many more better high level programming or scripting languages, which makes it both easier to write parsing code and removes most need to do so in the first place, since the apis usually give you structured data in the first place.


You're looking for nushell:

https://github.com/nushell/nushell


Wow, that's pretty great, I'm going to try that out.


> When I type "ls", I'm getting a text-serialized list of objects. Why can't I just get the (serializable) list directly?

Because then the output of ls can be used by programs that don't know what a list is, or what a file is; and this is breathtakingly beautiful.


Make it auto-serialize to text if the receiving end doesn't understand lists. Working with text directly doesn't get me any type safety, or any safety at all (unless I'm writing a one-off, where it matters less).


"type safety" is the "structured programming" of the twenties


...As in, it will become so universal that languages without it are utterly unheard of, except for assembly and joke languages?


Ok cool I can pipe ls into gz and pipe further into scp. Now what? How do I use it on the receiving end after unzipping it again? You still have to parse it into a structural list of files over there somehow to make use of it? Not everything is a tunnel.

Structural data can still be serialized easily, take json for example which’s spec fit on a napkin.


You can. Use your favorite programming language's interface for listing the content of a directory!

`ls` is for you, the human.


In C, my favourite language, I would do this:

        system("ls");


Please never do this; do not use system, it executes $SHELL and you can fall victim to various PATH munging attacks. :)


As I've mentioned in other discussions here, I find jc useful for exactly this purpose -- https://github.com/kellyjonbrazil/jc


Text is fine for humans. The trouble comes when you want the machine to do anything other then simply print out the literal text (and even that's tricky, what with unicode and fixed width columns).

The moment you want to do something else with the text, you have to parse it in some way. At that point it's no longer text, it's a poorly specified serialization format.


libxo can help with that. Of course, the programs you want to use need to use it, but they'd also need to support whatever else.


+++ATHokay.


Actually text is now a problem... The old way of ASCII only stream of bytes was much superior.


Why are the next 5 years going to be different than the last 50?


Its execution is just abysmal

It's not really good, but calling it abysmal is just too negative in my opinion :) It does get a lot of things right after all, especially in the "not everything is text" department. Also a lot of things wrong (my main gripes would be a lot of the syntax in general, case insensitive, usually more than a couple of ways to achieve the same thing). So in the end the learning experience/curve for me was the same as for bash (or languages like C++) and just as long/steep probably: feels like suffering until you've learned all the caveats by heart and in the process of doing that also learned how to actually use it, then it becomes ok to use. After that process I'm leaning towards favoring PS over bash though. Now it's possible there are already other shells which get more things right but I really doubt that in my lifetime I'm still going to spend (waste?) time on learning yet another shell. Or it must be convincingly good and provably fast to learn :P


I think one of the biggest peeves of mine with powershell is they dumped you into a god awful console. I dunno if this has changed much but good lord was the console horrible. No resize, nasty font, nasty color scheme, scroll back sucked, command history sucked, pain in the arse copy & paste.

Also the commands were super long. Sure there was the ability to alias them to something shorter but still....

That being said... powershell has a lot of potential.

(Also I’m woefully out of date... I haven’t touched powershell in a decade. I hope somebody tells me it’s all better now)


Its better.

Windows Terminal is gr8 and removes all those problems. You have alternatives too like ConEmu.

Long commands are OK, use aliases. Bash has long command arguments too and nobody complains.

Saying Powershell has a lot of potential is kinda lame - its best of them all.


Awesome to hear! I was hoping somebody would tell me it got way better.


And yet the web is text too - and has lots of fun vulnerabilities relating to failure to escape content.

I do not expect text to be significantly replaced any time in the next 5 years. Accessibility - screenreaders, increased text size with text reflow - is one massive reason.


I'd say, what we need is a --json flag on all pipe producers like like ls, exa, find, grep, ripgrep. Then you'll have the best of both worlds: something that is equally(?) parseable by humans and computers. (jq to the rescue)


ripgrep does that already. But I just came up with my own format from whole cloth because there is no standard: https://docs.rs/grep-printer/0.1.5/grep_printer/struct.JSON....

Adding a --json flag is just the start. Ideally, other grep-like programs would need would want to emit the same format. But what happens when some grep-like programs have additional features?

Suggesting a structured representation as an output format isn't a panacea. It's just the start and it's not clear at all to me that it would be better than what we have now.


I agree there is another can of worms waiting to be opened: agreeing on the schema of said json output. Also structured output is probably way more useful downstream from the likes of ls, exa, ps (say) rather than from grep like tools


I have created JSON schemas for the output of dozens of commands and file-types with my jc[0] project. I tried to keep them as flat as possible, but there are many other ways it could have been done. Creating the schema many times requires more thought than writing the parser itself.

[0] https://github.com/kellyjonbrazil/jc


With NUL-sepearated output instead of line output people are doing fine. It's unstructured for sure, but how much structure can you put into something as general as grep?


grep can emit structured GrepResult records that include things like

  - the pattern that was searched for
    - the search options from the invocation (case, line folding, etc)
  - the text of the matched segment
  - the byte range of the matched segment within the file
  - the line number(s) of the matched segment within the file
  - the file name
  - contents of any (named) capture groups from the search pattern


I think a huge factor is the dominance of the keyboard as the most used physical interface. Even in the emerging age of VR, keyboards are everywhere, even in software form in phones. I guess once we can solve language recognition, so it maps 1:1 with the input somehow, then we could have a voice-only terminal/shell equivalent. The building blocks for that would look different and not just saying commands out load.


I see the appeal, however outside of areas where you completely own the stack (like inside a company) it feels idealistic to imagine any standard being close to text in terms of pervasiveness.

I also worry about a bad standard becoming popular a la most of the web


I'm unfamiliar, what's the idea of Powershell you mention here?


By default, it uses objects (structured data) instead of plain strings.

I'm also not extremely familiar with the concept, so at the moment, I'm reading this document to understand it better https://www.varonis.com/blog/how-to-use-powershell-objects-a...


Nushell has a similar philosophy to powershell. They sum it up as "everything is data". Their book gives more details.

https://www.nushell.sh/


I like the idea of Powershell and how it's object driven as well, but it doesn't get you away from text in the sense of you're still typing commands into a console.

Isn't the real solution to do away with the idea that we need to remain compatible with VT100 video terminals?


That's not what they are referring to. Powershell passes typed objects between cmdlets when pipes are used. This DEFINITELY better than plain text.

We've learned a lot since the unix philosophy was created. One of those things is that, if you want software that lasts, software that performs well, you need to be strongly typed.

Plain text is as far from strongly typed as you can get. Passing plain text between programs in 2020 goes against everything we've learned in the last 50 years.


Ad-hoc duck typing vs strong typing is a pragmatist vs theorist argument.

In theoretical terms, strong typing every, always is preferred. In practical terms, this devolves into a Java-esque nightmare of what's-the-type-for-the-parent-type-of-the-thing's-type absurdity. Where people stop using your system because they don't want a second job learning its Borges-esque comprehensive type encyclopedia.

In practical terms, bare minimum typing that prevents 80% of errors, but doesn't require gyrations to cover the remaining 20% works. And given a choice between the two, more software is going to be built in systems that function this way.

Does it break all the damn time? Yes. Does it still get more done? Yes.


>Passing plain text between programs in 2020 goes against everything we've learned in the last 50 years.

I would have to say the opposite is true.

It's text based interfaces that have stood the test of time, and now even flourish in HTTP. We have more widely used text based interfaces than ever before where REST,HTML, XML and JSON seem to rule most machine-machine communication.

Strongly typed binary RPC such as DCOM and CORBA however are now nearly gone.


I agree with your comment, but also want to point out that binary RPC seems to have a comeback with gRPC.


> We've learned a lot since the unix philosophy was created. One of those things is that, if you want software that lasts, software that performs well, you need to be strongly typed.

Unix itself, for example, won't last even 10 years. Look for it to be gone by the mid-late 1970s as it gets replaced by strongly-typed objects passed directly between applications to control our fusion-powered flying cars.


If you think the presence of Unix today is proof of anything good, you do not have an active imagination.

What will computing look like in 500 years? Do we really believe that we reached the pinnacle of OS development in the 1970s?


Doesn’t even need to be strongly typed. Just having the ability to pass around JSON would fill about 80% of the imaginable use cases.


>Powershell passes typed objects between cmdlets when pipes are used. This DEFINITELY better than plain text.

I know but my point is that the interface itself is still text.


Would you prefer dragging a pipeline together? Or is there some other way?


Until humans can type in binary or read binary unaided, the human interface portion will always be textual.


From the thread:

Our continuing mission: To explore strange new platforms. To seek out new bugs and new software. To boldly shitpost where no one has shitposted before.

Mission accomplished, LMAO.


Yeah, it's kind of my thing :-)

One of the previous times I hit the front page of HN it was for inserting snark into an X509 certificate and then using it on my site.

https://rya.nc/cert-tricks.html


I'd say the hat of a security researcher suits you well :)


Its things like this that make me think terminals are irredeemably broken. One does not simply print a string to a terminal. I'm not aware of a standard way to handle user strings safely like you can on say, the web, or any other UI platform.

The Unix way seems to be to not solve the problem at all, unless you're writing a text editor, and then you reinvent the wheel and solve it yourself.


My vtclean project [1] cleans terminal escape sequences. It handles the full ECMA-48 spec, and every behavior described by all of the vt100 documentation I could find, and observed behavior of many command line programs and terminals.

It has good test coverage, fails safe (if it doesn't understand a sequence, it still strips the escape character), and can preserve colors in cleaned up text. It is directly and indirectly used by quite a few projects. It receives extremely few bug reports, because I took the time to do it right.

If you want a good way to do this, probably start here. It's three short regular expressions and a simple state machine (to clean up single-line edits).

My only complaint about the code at this point is it operates on string instead of []byte, but I don't want to break the API and nobody has complained.

[1] https://github.com/lunixbochs/vtclean


It could be interesting to extract this to a context-free grammar that could be implemented in different languages.

There are many reasons users might not want/be able to use Go, or any other language this might be published in. Given the terminal's provenance I can't help but be absolutely certain (despite having no substantive evidence) a lot of people working with C in Interesting™ situations would love something like this, for example.


It's more than a CFG, it simulates line editing, so you need to port some code either way.


Ohh, that's what the code is doing.

(I must admit I skimmed it so fast (on my phone) I didn't realize the comment about RGB colors was only referring to the line directly underneath... and then didn't stop to think "RGB can't need all that", woops.)

The nice thing is this is definitely simple enough to be able to just directly port somewhere else, which I guess would make extracting a higher level representation premature-optimization-level superfluous complexity.

*Adds to bookmarks*


Thanks for writing this! I used it a few years ago to remove the ANSI output from HP/Aruba network devices that clear your screen or line which was a nightmare for scripting. It works like a dream.


This is the SAME type of injection attack we've been fighting on the web for years.

Apache 1.3 (way back when) would escape escape sequences before printing them to the log, to prevent this exact same thing.

It is the responsibility of the program dumping data to the terminal to escape things prior to dumping to the tty.


Except, could you imagine a software ecosystem where 95% of the software is vulnerable to injection, and nothing is ever done about it, and there is no clear easy solution for developers either, and you have a legion of users who turn technology into a religion and think there's nothing wrong at all with Unix terminals.

It also probably doesn't help that the problem is unfixable in general without rewriting so much software... and that things mostly work okay, most of the time.


> where 95% of the software is vulnerable to injection

As opposed to what, the ẃ͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́ͥ́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́̂́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́ͥ́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́᷅́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́ͥ́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́̂́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́ͥ́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́ͅͅͅͅͅͅͅͅé͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́ͥ́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́̂́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́ͥ́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́᷅́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́ͥ́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́̂́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́ͥ́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́ͅͅͅͅͅͅͅͅb͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔͔̩͔́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́ͥ́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́̂́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́ͥ́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́᷅́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́ͥ́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́̂́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́ͥ́͗́́͗́́͗́́͗́́͗́́͗́́͗́́͗́ͅͅͅͅͅͅͅͅ? :)

(note: appearance of above may depend on browser)


Weird that you could post that but not emoji (tried to post a smiley face).

There must be some libraries or something that filter Unicode...


Yeah, I kind of think Unicode was a big mistake too :)

If anything is text, nothing is text.


What is something you feel that the text encoding system shouldn't be able to encode?


Non-text?


What is an example of non-text?


+1 agreed. Unicode (UTF-8) kills the old stream of bytes paradigm.


I'd posit that in this case it's git's failing to pipe content from some random internet connection unfiltered to the terminal. The terminal doesn't know where those control sequences come from. But anytime you interface with something external, you have to be prepared for hostile behavior or content that's out of spec. In this case it's git that interfaces with something external. Remember that any software can be hacked or abused.


If you want to make string safe, filter out or escape all bytes / characters in 0-31 range. Maybe keep newlines if you expect multiple lines.

Done -- as simple as it gets.


Some terminals support the C1 control set, and in particular 0x9b as the CSI is dangerous (it's a single character version of the ESC-[ CSI). So you have to watch out for that too.


Interesting: https://man7.org/linux/man-pages/man4/console_codes.4.html

I'll definitely need to add support for that. Possibly annoying to strip that with utf8 involved. Any idea how many terminals support 0x9b?


With UTF-8 you need to UTF-8 encode the 9b.

For example this shows red on xfce4-terminal in UTF-8 mode:

  printf "\xc2\x9b31mHi\e[0m\n"


Hm, but I won't know if the text/terminal is utf8 or not going in.


Right... and you have the Windows-1252 versus iso-8859-1 situation, which only differ in that the former puts printable characters where the latter has C1 control codes.

Of course your code is a library so you have the library author's prerogative to punt that problem to the users :)


...interesting


> I'm not aware of a standard way to handle user strings safely

I thought that this was what isprint() was for?


What's unsafe about it? It doesn't break out of the terminal as a system that shows data to the user. The fact that you can sneak data that looks weird doesn't make it unsafe.


Some terminals support a programmable answerback string, and of course the ENQ control character to send it to the host. Some have programmable function keys and a command to read them, which often (always?) sends the programmed string as consecutive unencoded characters (instead of hex or something).


Concrete example in sibling comment: https://news.ycombinator.com/item?id=26617412


Off topic, but the person posting this mentioned the artist who did the avatar and of course I love their art style, and of course they aren’t taking commissions right now. How does one find good pixel art artists who might do a small commission or three?


You can find forums where pixel artists look for work. Here's one such forum: https://pixelation.org/index.php?board=8.0


Oh wow exactly what I was looking for. Thanks!


Fiverr? I've had a generally decent experience with it, but I also follow the rule of three: always hire 3 different freelancers for any task, unless when I have established relationships with an artist who did very similar things in the past.


Oh that’s a good point. I’ve used them before. It was neat.


Try searching on skeb, it's a site specifically for art commissions and there are at least a few people doing pixel art on there.


Oh this is a super cool concept! Thanks!


Deviant Art used to be a good start a few years back. For pixel art you could also look at r/gamedev, lots of artists looking for work roam it.


They have some awesome artists, there, but I have had bad luck getting through to them on DeviantArt. There's a couple of artists that definitely could have made some money from me, but couldn't, because I couldn't contact them (or they ignored my contacts).


I found Brian when someone I follow retweeted (or perhaps liked?) this post:

https://twitter.com/cmmrc2/status/1329991505524690944

I'd been looking for an artist for months at that point, and looked through his work and saw some nice portraiture and dropped him a DM.

I can't figure out who shared it, but I think it was another pixel artist who I liked and was hoping would open up commissions again.

I wish it were more common for artists to auction off at least some commission slots.


You might find someone for hire at r/gamedevclassifieds

https://www.reddit.com/r/gamedevclassifieds


Oh cool. I’ll look there.


This user does small commissions on occasion. http://imgur.com/gallery/qszHeHe


Love the Slavic turtle tax, thanks!


It's usually not hard to follow a few pixel artist and get a bunch suggested to you on Twitter. I know Kiana Mosser should have commissions open within a few weeks. https://twitter.com/kianamosser


Go to the Aseprite discord server. They have a channel exactly for what you want.


I'd go spelunking in indie game dev forums and spaces. There are almost certainly artists looking for a little bit of side work.


Follow good artists you like on various platforms, get in touch as appropriate (use your best judgement).


I recently found one for a side project via instagram. Look under #pixelart tag.


Try pixelation.org


Many years of experience have taught me that fields holding data being able to hold any sequence of bytes is normal (and in the case of string data, anything except 00), so this didn't surprise me too much; in fact, it's usually when something is unusually restrictive that it becomes notable.


I was honestly more surprised at how much data I could fit in that field. Git itself doesn't appear to have any limit at all, though GitHub doesn't allow more than a few dozen megabytes.


You can name PostgreSQL tables with poop emoji


You can also use emoji to name any object in Active Directory, which is a heck of a lot of fun if you want to confuse the IT guy that needs to remote into your machine for maintenance.


Okay, I'm going to try this now. This is perfect. I wonder if it'd break anything else.

This is gold.


Dont forget wifi also allows emojis in the SSID.


I wonder if I can use zalgo text for an ssid.


Yes, but there's a 32 byte limit which is a bit of a buzzkill.


I have an Outlook folder named (trash can) (fire) and a rule set up so anything older than 30 days in there is deleted.


Interesting. HN will let me submit a comment that contains emoji just fine (no error in dev tools network log or UI) but it doesn't show up in the thread (at least for me, if there's a stream of emoji from me testing that I can't see I apologize).


HTML/CSS also allows emoji in class names



The problem with this attack is that you have to drop the 6c file in your PATH for it to work, because otherwise you need to use ./ for it to execute. This makes the attack pointless because if the attacker can drop something to your PATH, you're already pwned since the attacker can just name his payload "ls" and wait for you to execute it.


Good thing I normally reach to "less" instead of "cat" nowadays. You get scrolling, and your terminal is safe from malicious injections or (much more likey) binary garbage.

And "git" for example applies "less" to pretty much all output by default, which makes most of the git-based attacks, including this one, irrelevant.


Reading that just made me want to re-watch 1995's Hackers


The next step is to add animation.


This same person tweeted a terminal animation out yesterday: https://twitter.com/ryancdotorg/status/1375880616789573633

Though I suppose it's not technically a git commit email, it's still a git command


It was still stuffed in the author email field.


That looks very graphical - not the ASCII characters of the portrait.


It should be possible: https://github.com/saitoha/libsixel

(Unless sixel required nul bytes in the encoding)


For animation, the real issue is frame timing. It looks like there's an animation API in kitty that allows this sort of control, so it'd work there, and iTerm2 supports inline animated gifs which I've already demo'd.


I'm not sure if there are any format specifications for the author and committer lines in a git commit message object. I guess it really depends on whether commands that are used to display the information (git show, git log, etc.) can handle arbitrary data. I wonder how commands like git format-patch and git am would deal with a commit like this.


Oh wow, can someone repost that ANSI art documentary that was shared recently? Great inspiration for my next commit.


Previous submissions over the last few days:

https://news.ycombinator.com/from?site=twitter.com/ryancdoto...


I've not had much success showing block characters in the standard Mac Terminal.app. Is there a particular font I need to use?


AFAIK terminal.app doesn't have any control code filtering before it's rendered. Although iTerm2.app does have this as a feature, if that helps.


It can probably be done in grayscale, with even some dithering with shading characters from Unicode Block Elements to support Terminal.app and other emulators.


Your scientists were so preoccupied with whether or not they could, they didn’t stop to think if they should.


why is this so astounding to those commenting on twitter? because git doesn't artificially limit the line length?

i've used ansi sequences in my zsh prompt for 20 years to make colors and move the cursor; it's just in-band ascii that is interpreted by the terminal emulator, no?


By default, I'd expect data that likely came from third parties to be sanitized before output.


it was, note the twitter had you pass "--no-pager" to get cursor movement to work.

With default options, you only get colors, and this is pretty safe.


It's a bit more complicated than that. For example, if you do

git log --format=%ae | sort -u

the cursor movement sequences will be preserved.

I haven't delved too deeply into what git actually does here in terms of processing for "pager" output vs not.


...and for this "display some art" hack specifically, it looked like it can be made to work with plain git log by using `\r` characters as newlines in addition to the escape sequences, but I haven't tested that much.


Sure, sure, but why would such a sanitization remove useful things like terminal drawing codes? That would be untoward.


[flagged]


It may not be productive but understanding the extensibility of our systems gives us a more wholistic view of them.

It could be useful in unexpected ways in the future. Or not.


For someone who frequents a site called “hacker news” it sure seems like you don’t understand the hacker mentality.


Well, one can put characters from the Canadian Aboriginal Syllabic Block into Go identifiers, so this would be the next step, I suppose.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: