Hacker News new | past | comments | ask | show | jobs | submit login
Fixing Unix/Linux/Posix Filenames: Control Characters (Such as Newline), 2009 (dwheeler.com)
15 points by based2 on Mar 9, 2019 | hide | past | web | favorite | 11 comments

The problem is not in the filesystem, it's in the shell. Stop writing shell scripts for tasks that need to be robust! None of these problems hit you when using a better programming language like Python. (Yes, Python 3 has a bit of an impedance mismatch between its native Unicode string type and the filesystem, but that just makes for slightly uglier code, not actual bugs)

I wonder how much effort it would take, to make the shell robust when it comes to this. Obviously in Python 3 it's not a big deal to process UTF8 strings with control characters, store them in a database and still have decent XSS protection with no ("manual") input filter. But with Python you loose the flexibility of pipes, combining commands tersely on the fly.

Of course, the problem here is in thinking that there is "the" shell, and that one needs to always use a POSIX compatible shell for interactive login. (The C shell users of the world should disabuse one of that particular notion, if nothing else.)

* https://news.ycombinator.com/item?id=11672023

* https://news.ycombinator.com/item?id=17989710

Surely the problem is the 'Everything is a stream of bytes' model?


Ls | foo | bar

Ls just prints nl separated strings, no metadata, nothing to tell you what it is.

In current practical terms that means avoiding the shell, that isn't inherent though. I guess powershell avoids this issue.

Hence Greg Wooledge's famous advice not to parse the output of ls. As catern points out, in a fully-fledged systems programming language, one can just directly call the library routines that read directories. And thereby obtain metadata without having to parse it from locale-dependent forms, too.

* https://mywiki.wooledge.org/ParsingLs

* https://unix.stackexchange.com/a/503711/5132

"Filenames are reasonable" is in practice a perfectly workable assumption. In nearly 20 years of using Linux, I've never – not once – been bitten by a filename starting with a hyphen, or one that contains a newline or control characters. It seems plenty dangerous to allow untrusted user input for naming files, so just don't do that and you're fine.

I don't get why you'd want to limit yourself to POSIX in 2019. Do you really need to run your script on AIX or what have you? 99.9% of cases you just want GNU and macOS, as far as I know that includes the most important GNU tools.

Bash is really good enough for many tasks that involve gluing together a number of utilities or shuffling files around. To do this in Python or Node.js is to invite another world of hurt where you write easily 10x as much code and need to worry about package management all of a sudden.

I've come across some crazy filenames under UNIX before and most originate from SMB connections to PCs. Ones I've seen had '*' and '&' in their names amongst other combinations. The craziest I ever saw had several '#' in their name and that was interpreted by the bash script as a comment!

This isn't a problem if you properly use double quotes in your scripts.

It's 10 years later, what has changed? Not that much. I am not aware of any Linux distro that comes with an LSM to prevent nonsense/dangerous names.

All distros I have used use UTF-8 locales by default now, so that part of his lengthy argumentation could nearly be removed. In Yocto though, non-ASCII characters on the command line still cause havoc. Well, Yocto is not a distro, but a way to build your own. So one needs to replace several standard tools like vi, less, etc. from busybox by the "full" version.

In the '90s there was a case of corporate espionage (theft of integrated circuit IP) where the guy had copied the design databases into a unix directory named '..^A' (control-A at the end) such that a 'ls -a' would show two '..' entries.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact