
OpenBSD's safe, new file(1) implementation - zdw
http://marc.info/?l=openbsd-cvs&m=142989267412968&w=2
======
TheDong
You may find the code for the current implementation in their CVS:
[http://cvsweb.openbsd.org/cgi-
bin/cvsweb/src/usr.bin/file/?s...](http://cvsweb.openbsd.org/cgi-
bin/cvsweb/src/usr.bin/file/?sortby=date&only_with_tag=HEAD)

You can also find a git mirror of the referenced commit at:
[https://bitbucket.org/braindamaged/openbsd-
src/commits/c421d...](https://bitbucket.org/braindamaged/openbsd-
src/commits/c421de15c47394de1ea2478edc0aa2c941aab2f8#chg-usr.bin/file/file.c)

~~~
cnst
OpenGrok on BSD Cross Reference is probably more useful in looking at it.

[http://BXR.SU/OpenBSD/usr.bin/file/file.c](http://BXR.SU/OpenBSD/usr.bin/file/file.c).

~~~
przemoc
Thanks for mentioning Super User's BSD Cross Reference (bxr.su)! Apparently
there are many more OpenGrok installations nowadays than it used to be 5 years
ago.

[https://github.com/OpenGrok/OpenGrok/wiki/OpenGrok-
installat...](https://github.com/OpenGrok/OpenGrok/wiki/OpenGrok-
installations)

EDIT: Not all of them are working, though.

------
brynet
There are many inherent problems with utilities like file(1), however, if
people are going to continue using them then we should try to make the best
implementation possible.

This new implementation was created from scratch, carefully, with modern
coding practices by a very proficient programmer.

~~~
_delirium
I do think a cleaner, more careful file(1) implementation is a step in the
right direction. But it's still a bit unnerving to me that a bug in a utility
like file(1), which needs to do nothing but read a file and output text to
stdout, could possibly do things like overwrite other files, write to the
network, etc., _even_ in the presence of parsing bugs and a maliciously
crafted file. A parsing bug in a utility that does nothing but read files and
parse them should not be able to do anything worse than force a misparse! The
file(1) binary simply does not need to have permission to do things like write
to the network or disk. This seems like a shortcoming of the traditional Unix
permissions model. A promising direction imo is the more fine-grained
permission model of Solaris/Illumos, which FreeBSD's Capsicum project also
aims in the direction of.

~~~
geofft
While capability systems are totally awesome, for this sort of application,
you don't really need it. Open the target file read-only, open the magic
database read-only, keep stdout open, and then even something as restrictive
as Linux's seccomp mode 1 sandbox (read(2), write(2), exit(2), nothing else)
should be enough to let you print out an answer.

------
brynet
I posted a portable version on twitter a few hours ago, but this is just
something I put together quick without much testing so "YMMV".

[https://twitter.com/canadianbryan/status/591997995459051520](https://twitter.com/canadianbryan/status/591997995459051520)

~~~
brynet
I've updated the diff/tgz to include the new privsep functionality,
systrace(4) based syscall whitelisting was conditionalized on OpenBSD.

------
jaytaylor
The commit message is:

    
    
        New implementation of the file(1) utility.
        This is a simplified, modernised version
        with a nearly complete magic(5) parser
        but omits some of the complex builtin
        tests (notably ELF) and has a reduced set
        of options.
        
        ok deraadt
    

What is significance beyond the obvious?

What options are now missing?

What are the tradeoffs? Beyond a simplified API, are there new capabilities,
functionality/use-cases, and/or improved performance?

~~~
ploxiln
brynet, who posted below, tweeted a pdf-rendered man page for the new openbsd
file command:

[http://brynet.biz.tm/pub/file.1.pdf](http://brynet.biz.tm/pub/file.1.pdf)

It seems to have all the options I'd ever use. In fact I've never used any
options to file (though I do use it to quickly identify a handful of files,
often enough).

The only reason OpenBSD does this kind of thing is because they got annoyed
with the vulnerabilities in the classic file/libmagic source which everyone
seems to use. They probably disliked the code style / organization of the
source enough that they didn't want to patch it up (which they also do fairly
often for various ports). That said I've never personally looked at the
source.

~~~
brynet
> That said I've never personally looked at the source.

I'd say very few people do, which is kind of scary for a utility like file(1),
which people feed any random file without giving second thought.

------
viraptor
While this simplification of tools is great, I'm a little disappointed that
it's a system integrated app and it happened in OpenBSD fairly quietly. If it
first happened in something like RH or Debian, the package would be separate
from core and we would have the opportunity for the following:

\- using language with more safety guarantees than C

\- including seccomp (either bfp, or even just mode 1) by default

\- in the announcement saying: tested with XXX, YYY, ZZZ; fuzzed with ABC, DEF

It would be a great opportunity to create a poster-child people could point to
and say - this is the way we should be (re)writing secure software, these were
the awesome tools used to help it. For OpenBSD the first two couldn't happen
(part of base system so C; there's no seccomp support), the third could but
didn't - at least not publicly (valgrind, clang-analyser, [am]san, afl, etc.
could all be listed)

Maybe the next time something important is broken.

~~~
brynet
Don't let the work of a few OpenBSD developers discourage you from creating
your own file(1) implementation using your new favourite "secure" language of
the week.

~~~
viraptor
Everyone can write their own. Not everyone can substitute the implementation
used by a large distribution and make a proper announcement people will care
about. I don't understand why are you mocking language choice point though.
Pretty much any language which doesn't manage memory directly would be a good
choice. This utility doesn't even have big performance requirements.

Basically I'm saying that we can try very very hard to write secure code in
languages which invite issues, or... try to eliminate whole classes or issues
at a time. Is it such a terrible idea?

~~~
brynet
New languages and technologies come and go and you're free to use them.
OpenBSD will continue to create new software while adhering to modern C coding
practices, even when that means defining them.

~~~
viraptor
> New languages and technologies

You mean decades-old languages like Perl and Python for example? Technologies
like the ones we can finally now afford to implement - actual syscall
filtering and selective capabilities dropping? Supporting utilities which only
matter at development time? How do those go away?

And I wrote in the first message - OpenBSD integrates `file`, has to use C and
doesn't implement seccomp. They couldn't become the poster child. But the next
`file` reimplementation probably won't hit the news anymore.

People can improve the way we write software right now. But choose a project
at random and they don't care or don't use what's available for free. Some
promotion would be great when everyone's looking.

------
thrownaway2424
Corrected headline:

OpenBSD's new, untested file(1) implementation.

The only thing "safe" here is that the author was completely unharmed by the
unit tests that he failed to write. Where I work 2500 lines of new C code
without a single line of tests would be laughed off the code review system and
then probably revisited around performance review time.

Heroic programming is how we got to where we are today, running the world on
gigantic piles of untested gotos and question mallocs and frees. There are no
real heroes in programming, only people who haven't yet figured out how to
write tests.

~~~
brohee
Considering file is supposed to be invoked on unknown files, I fail to see how
tests other than fuzzing would give any assurance security wise. A very anal
code review on the other hand...

And the OpenBSD people are very aware of fuzzing, this new implementation of
file is a direct reaction to Michal Zalewski findings...

~~~
thrownaway2424
A good and systematic approach to incorporate fuzzing findings would be as
follows:

1) Identify a harmful input through fuzzing 2) Reduce the input to minimal
testcase 3) Contribute testcase and fix to existing code

Note the absence of "rewrite the whole thing without tests" in this process.
From-scratch rewrites that may or may not have fewer bugs than their
predecessors are known as CADT.

[http://www.jwz.org/doc/cadt.html](http://www.jwz.org/doc/cadt.html)

~~~
userbinator
CADT refers mostly to rewrites in which lots of new features and complexity
are being added - the most likely sources of bugs - not to ones like this that
go in the opposite direction.

~~~
acqq
Let me quote from the source of the CADT term:

[http://www.jwz.org/doc/cadt.html](http://www.jwz.org/doc/cadt.html)

"This is, I think, the most common way for my bug reports to open source
software projects to ever become closed. I report bugs; they go unread for a
year, sometimes two; and then (surprise!) that module is rewritten from
scratch -- and the new maintainer can't be bothered to check whether his new
version has actually solved any of the known problems that existed in the
previous version."

The OP rightly points that there are no known tests that demonstrate the known
problems in the older implementations and also no proofs that the new
implementation passes them.

~~~
raverbashing
Don't compare the Gnome development process with the OpenBSD development
process.

They are very different. Especially one being a core utility another a Desktop
Environment.

I agree with that critique of the Gnome development process, but the OpenBSD
development process prides itself on security and that's a first commit, new
changes are being added (as it can be seen by the log of that file -
[http://cvsweb.openbsd.org/cgi-
bin/cvsweb/src/usr.bin/file/fi...](http://cvsweb.openbsd.org/cgi-
bin/cvsweb/src/usr.bin/file/file.c?sortby=date) )

------
cbd1984
What makes this file(1) safer than any other?

~~~
steakejjs
Lcamtuf notably found several ELF Parsing bugs in file, that appeared may have
been exploitable.

The work he has been doing with AFL and googles big fuzz farm, focusing on
utilities that are used daily without thought is insanely important, imho

~~~
liveoneggs
link?

~~~
brohee
From afl-fuzz page
([http://lcamtuf.coredump.cx/afl/](http://lcamtuf.coredump.cx/afl/)):

[https://www.freebsd.org/security/advisories/FreeBSD-
SA-14%3A...](https://www.freebsd.org/security/advisories/FreeBSD-
SA-14%3A16.file.asc)
[http://bugs.gw.com/view.php?id=409](http://bugs.gw.com/view.php?id=409)

------
rhapsodyv
What caught my attention most was the fact they are using CVS. Nothing
against, just I don't see CVS too much today.

Anyone knows how much impact this change can bring? How much systems rely on
file?

Personally, every time I use it was as one-time script.

~~~
_asummers
CVS is brought up in almost every single OpenBSD thread, right before someone
complains about Comic Sans on their website. The OpenBSD guys refuse to use
git because they feel it is way too complicated for the task it serves. At
this point, CVS is almost their barrier to entry: if you're (royal you, not
rhapsodyv) going to complain a bunch about the version control tool, you're
probably not going to be an active member of their developer community.

Despite using old tools, the OpenBSD guys release their updates every 6 months
like clockwork, and have for the past two decades. I know of no other FOSS
project with that level of project management.

~~~
mercurial
> At this point, CVS is almost their barrier to entry: if you're (royal you,
> not rhapsodyv) going to complain a bunch about the version control tool,
> you're probably not going to be an active member of their developer
> community.

CVS is frankly hideous. Saying "git is way too complicated" is another way of
saying "CVS is way underpowered". However, a good reason to cling to this
relic of a bygone age is that, as far as I know, they have _everything_ in
CVS, and it must be a lot more convenient to be able to checkout an arbitrary
part of their dev tree than messing around with git submodules.

That said, if their attitude is really "if you complain about CVS, you are not
worthy", that is bound to turn off many people, and for good reason.

~~~
ben_bai
CVS fits their development methodology just fine, so far. Having a central CVS
repo, where development is done in "head" which gets branched every 6 months
for a release. Contributions are supposed to never break "head", so mostly
small easy to review patches are commited. Even big changes are committed on a
per patch basis, working towards a bigger goal. Also there is AnonCVS, which
mirrors the central CVS repo on dozens of mirrors. The same way you can mirror
the CVS repo for yourself locally. Sure CVS+AnonCVS+diff(1) could be
better(whatever that means), but it does the job. Switching is hard, losing
history is bad and putting off old developers is way more dangerous than
discouraging new ones. Using mailing lists and CVS is the price to pay to
partake.

~~~
_asummers
An interesting data point in the OTHER camp is Emacs' decision to move to git.
But Emacs has much different goals as a project than OpenBSD.

~~~
adamrt
Just a note that Emacs was on BZR not CVS.

