
Don't run 'strings' on untrusted files - robinhouston
http://lcamtuf.blogspot.com/2014/10/psa-dont-run-strings-on-untrusted-files.html
======
kens
Am I the only person who thinks there's something fundamentally wrong with
computing if running "strings" could let someone take over your computer? (I'm
not being snarky; I seriously think the whole approach to security need to be
redone somehow.)

~~~
lazyjones
There's something fundamentally wrong with still using unsafe languages for
system software.

C should have been buried decades ago, it's a toy language for small pieces of
code on isolated systems (and yes, I've written larger programs in C myself,
20 years ago).

Nowdays, C should not even be considered safe for implementing interpreters
and runtime systems for other languages. There are plenty of reasonably
portable choices available (and no, don't use Java / anything JVM based).

It can't be terribly hard to reimplement "strings" and similar software in a
modern language without such deficiencies (i.e. plenty of ways to shoot
oneself in the foot and overwrite the stack etc.).

~~~
meowface
The problem is, until now there has not been a very good systems programming
language that allowed you to stay lean on memory without introducing any
performance overhead. Rust is certainly a contender, thankfully.

So, there is a perfectly good reason C is still prevalent to this day, even if
there are many security implications in doing so.

~~~
marcosdumay
Whatever. One can always write unsafe code. No matter in what language. A
really secure language is almost worthless because there's so much it can't
do.

No, the problem comes from the other end. Strings should not be able to
compromise a system. It only needs to read a user supplied file, and write to
stdout. There's no reason for it to be allowed to exec a piece of code.

~~~
meowface
Multiple factors are to blame. I agree this is dumb behavior here by strings,
but if the ELF parsing code were to be written in a language like Rust, it's
far less likely it would have a bug of this nature.

~~~
thirsteh
Rust... or pretty much any other language than C.

C is scary from a security perspective because it is both incredibly easy to
write code that has subtle but very serious bugs leading to e.g. arbitrary
code execution, _and_ it is easy to exploit those vulnerabilities.

------
kjetil
Even without such vulnerabilities, I would be wary of printing out stuff from
any untrusted files in a terminal. Most terminal emulators have been
vulnerable to escape character attacks at some point.

[http://marc.info/?l=bugtraq&m=104612710031920](http://marc.info/?l=bugtraq&m=104612710031920)
[http://cve.mitre.org/cgi-
bin/cvename.cgi?name=CVE-2008-2383](http://cve.mitre.org/cgi-
bin/cvename.cgi?name=CVE-2008-2383)

~~~
panzi
But I don't think that strings will print escape characters. It's the point of
strings to extract printable ASCII characters. Or am I mistaken?

~~~
theoh
No, you're right, I think the parent comment was referring to using cat or
grep on binaries.

~~~
malka
even on 'valid' binaries, it still tends to mess up your terminal. I noticed
that pretty quickly when I started working with Linux. Are there really people
that work with cat and grep on binaries files ?

~~~
theoh
Perhaps not intentionally, but cat has valid uses for concatenating binary
files and sometimes they end up going to the terminal just by accident. As far
as grep goes, the answer to your question is "yes":
[http://stackoverflow.com/questions/9988379/how-to-grep-a-
tex...](http://stackoverflow.com/questions/9988379/how-to-grep-a-text-file-
which-contains-some-binary-data)

------
avian
I'm a regular user of this utility and this came as a complete surprise to me.
So much so that I checked the source myself before believing the article.

I thought "strings" was just a dumb scan over the file. Does this mean that
with a properly crafted binary it is also possible to hide strings from a
quick check with "strings"?

~~~
eli
Yes, though a properly crafted binary has always been able to hide from
strings with even a minimal amount of obfuscation or encryption. There's no
way to know all the strings a program can output without running it.

~~~
avian
Of course, a properly crafted program can always arbitrarily obfuscate
strings.

But if you manage to trick libbfd into thinking it's looking at a particular
format, you can hide plain text from a simple "strings" invocation even in
files that are not executable. I've been using "strings" on all kinds of
files, not only executables, and assumed that it will always display all
sequences of printable characters present in the file.

~~~
eli
Ah, I see. Fair enough, but I don't think it was ever a great assumption that
"strings" would uncover all the text in a file. There are so many ways to
screw with a file at the byte level that could confuse "strings" but still
appear fine when read by an application.

------
Animats
It's time to start converting the low-level Linux/UNIX utilities to a language
with subscript checking. Go, or Rust (if and when it's finished), or D, or
something. We have some good options now.

~~~
thecatsass
Or, the authors of Linux/UNIX utilities should have implemented and should be
implementing bound checking. It seems excessive to switch languages rather
than encourage stronger programming habits.

~~~
_delirium
Are there any examples of that being successfully done? Even djb's software,
intentionally written to be minimalist and as secure as possible, has had
exploitable overflows (both qmail and djbdns have suffered from this). Every
Linux and BSD distribution (even OpenBSD) has suffered buffer overflows so
severe that arbitrary internet users could get remote root access. Etc.

~~~
vezzy-fnord
The Cyclone dialect of the C language comes to mind:
[https://en.wikipedia.org/wiki/Cyclone_%28programming_languag...](https://en.wikipedia.org/wiki/Cyclone_%28programming_language%29)

Unfortunately, it's a dead project, and as far as I recall, never compiled on
64-bit architectures.

~~~
_delirium
That's one approach, yeah. I used to think it was a likely one, but I now
think three others are more likely:

1\. A language that is low-level and safe _but also_ gives you enough
interesting & new to build some buzz/interest, rather than "just" safety. Rust
is a candidate here, perhaps.

2\. Static analyzers in C advance to the point where a subset of C large
enough to be useful can be routinely checked for common types of errors. And
it then becomes socially expected that at least core OS stuff will be written
in that "checkable" subset of C, treating "unable to prove safety" warnings as
errors, or at the very least as suspicious.

3\. Mitigate it at the OS level with finer-grained access controls. Utilities
like strings(1) or objdump(1) are the easy case here: they do not need to
actually have permissions other than "read a file" and "print to output". Even
in the worst case, arbitrary code execution in objdump(1) should not be able
to delete your home directory, join a botnet, or email your ssh key somewhere,
because objdump(1) does not need those permissions. FreeBSD's libcapsicum
looks promising, in the sense that it is actually being implemented in the
base system, rather than just being yet another ACL proposal going nowhere
(Solaris/Illumos also has an actually-shipped privileges system, but I don't
know how extensively the base install itself uses it).

~~~
vezzy-fnord
1\. "Just" safety is hardly peanuts given the status quo. In fact, "just"
safety would be much more practical. It'd be much easier to port all the
existing code to a C dialect (like Cyclone) than rewrite from scratch in
something like Rust.

2\. I find compiler instrumentation (think AddressSanitizer and Mudflap) to be
more promising than static analysis. Much of the latter is still stuck in the
lint era and give out too much noise. That said, tools like Coverity have come
a long way and I know a lot of FOSS projects use them frequently. I personally
haven't.

3\. Capsicum is quite promising, indeed. I like that it extends the existing
file descriptor metaphor and offers sandboxing based on namespaces instead of
system calls (unlike seccomp), as opposed to the crufty POSIX 1003.1e
capabilities which are underdeveloped and still limited to executable
processes, AFAIK. That said, we shouldn't just rely on sandboxing, jailing and
capability-based security. We need to fix underlying application bugs, as well
(the applications that implement the capabilities and sandboxing themselves,
particularly so!)

------
guns
Crap, so if `objdump` is likely vulnerable to overflows, and `ldd` is a simple
bash script ripe for abuse¹, is there a safe and easy way to determine dynamic
library dependencies in an executable?

¹ [http://www.catonmat.net/blog/ldd-arbitrary-code-
execution/](http://www.catonmat.net/blog/ldd-arbitrary-code-execution/)

~~~
cesarb
There's also readelf:

$ readelf -d /bin/ls

[...] Tag Type Name/Value 0x0000000000000001 (NEEDED) Shared library:
[libselinux.so.1] 0x0000000000000001 (NEEDED) Shared library: [libcap.so.2]
0x0000000000000001 (NEEDED) Shared library: [libacl.so.1] 0x0000000000000001
(NEEDED) Shared library: [libc.so.6] [...]

I don't know how careful readelf is with its input validation.

~~~
fabulist
The article mentions readelf as using the same buggy library :(

~~~
kayamon
readelf specifically doesn't use BFD - one of it's majors reasons for
existence is to _validate_ libbfd.

~~~
fabulist
Hmm, my mistake. You're completely correct.

From the readelf man page:

    
    
          This program performs a similar function to objdump but it goes into
          more detail and it exists independently of the BFD library,
          so if there is a bug in BFD then readelf will not be affected.

------
_delirium
> the Linux version of strings is an integral part of GNU binutils...

I think almost everyone ships that version of strings and objdump, fwiw.
FreeBSD and NetBSD ship an almost verbatim GNU binutils; OpenBSD's seems to
have more local changes (partly b/c it's based on an older binutils they've
diverged from), but its 'strings' still uses libbfd.

The only exception I ran across in some quick digging is that Illumos ships
the Solaris version of 'strings', and doesn't ship an 'objdump'. This
'strings' seems to have come via Sun via Microsoft via AT&T via UC Berkeley:
[https://github.com/illumos/illumos-
gate/blob/master/usr/src/...](https://github.com/illumos/illumos-
gate/blob/master/usr/src/cmd/strings/strings.c). Whether it's safer I haven't
investigated; it also parses ELF files, but via its own libelf rather than GNU
libbfd.

~~~
edwintorok
OpenBSD's strings outputs this: BFD: strings-bfd-badptr: invalid string offset
1179403647 >= 0 for section `'

------
bjackman
I'm probably going to sound painfully naive now... but why is that a security
risk? So libbfd reads past the end of a buffer and segfaults.. so what? It's
not _writing_ or _executing_ anything untoward, so who cares?

~~~
jrockway
The security risk is that input from the file that strings is run on ends up
in places it shouldn't. One classic place is on the stack; after the random
data your function puts on the stack is a bunch of bookkeeping information the
compiler puts there, including the address to jump back to when the function
returns. If you read user data onto the stack carelessly, you can overwrite
the instruction pointer with attacker-controlled data, and the program will
return from the function, jump there, and begin executing code.

In this case, he was just fuzzing with "AAAAAAAAAAA" and the program ended up
at 0x41414141 ("AAAA"), which is alarming. The next step would to be to figure
out where the input file is in memory, replace AAAA with that address, and
replace the data there with code. Now "strings" is executing CPU instructions
that were in the input file. That's bad.

Some compilers do workarounds, like putting a canary value between the user
data and the compiler data, and checking the canary before returning. Some
compilers also randomize the location where things go in memory, so it's
harder for an attack to predict that address. Some OSes set certain pages to
"not executable", so even if an attacker can jump to that memory address, the
CPU won't run code from there. None of these are fixes; just barriers for the
determined attacker. (W^X is easy to get around, just call code that's already
in the binary legitimately, like execve()!)

The fix is to only write to memory you've actually allocated; something the C
compiler will not help you with.

~~~
Confusion
> That's bad.

To emphasize to the casual reader: 'bad' is a bit of an understatement: it's
'game over'.

------
mihai_ionic
The binutils maintainers aren't exactly responsive when it comes to following
up on security-impacting bug reports:
[https://sourceware.org/bugzilla/show_bug.cgi?id=16825](https://sourceware.org/bugzilla/show_bug.cgi?id=16825)

------
roghummal
Is 'cat foo|strings' immune to the problems of libbfd?

~~~
ams6110
I wouldn't think so. The data being processed are the same.

~~~
mistaken
Using cat and piping it to string actually works without crashing. It seems
that libbfd style parsing is disabled on stdin.

------
101914
Despite what the blog comments suggest, gdb does not have to link to libbfd.

But what about objcopy? Much more important utility than strings(1), in my
opinion. I admit I rely on it and do not a have a substitute at the ready.

At some stage we need a BSD alternative to the GNU binutils (aside from gcc
alternatives). I have seen it discussed several times over the years, but as
far as I know it does not exist?

------
Bentech
I wonder if this counts for 'file' as well

~~~
greenyoda
I don't think so. "file" only needs to read the first few bytes at the
beginning of a file to guess what type of file it is, so it isn't likely to
have any buffer overflow problems.

~~~
_delirium
You might think so, but 'file' has its share of buffer overflows, integer
overflows, attacker-inducible infinite loops, etc. It does some more extensive
parsing for some kinds of files, and some of those end up with edge cases.
Here's one from two days ago, a buffer overrun in parsing ELF files:
[https://access.redhat.com/security/cve/CVE-2014-3710](https://access.redhat.com/security/cve/CVE-2014-3710).
Others: [https://security-tracker.debian.org/tracker/source-
package/f...](https://security-tracker.debian.org/tracker/source-package/file)

------
pronoiac
So, can anyone suggest a safer alternative? Or should I use this as an excuse
to pick up a new language?

------
rajivkomar
Can't this bug be fixed

