Hacker News new | past | comments | ask | show | jobs | submit login
An Empirical Study of the Reliability of Unix Utilities (1989) [pdf] (wisc.edu)
58 points by beefhash on Feb 7, 2018 | hide | past | web | favorite | 17 comments

The section "comments on the results" is well worth reading.

> All array references should be checked for valid bounds. This is an argument for using range checking full-time. Even (especially!) pointer-based array references in C should be checked. This spoils the terse and elegant style often used by experienced C programmers, but correct programs are more elegant than incorrect ones

This always seemed to me like a flaw in the standard C library - there was no good/built-in true array type that held the array size and was incorporated into stdlib functions (especially string functions).

I haven't done C in a while, but I lost track of how many times I had to create `struct array` to ensure I always passed the `*` with its length - it always felt like I was doing something wrong. Maybe I was. But the prevalence of overflow errors that could be fixed by size-checking makes me think there's something missing in the language or standard-libraries that could have helped a lot for many years.

From what I understand, much of what seems missing, and possibly broken in C comes primarily from the vast number of platforms it supports. It's actually surprisingly hard to build a language that supports both x86, and 4bit microcontrollers, and everything in between. That extra couple of bytes attached to an array with it's size is huge for embedded systems.

Ya that was my intuition - C's always giving you enough rope to hang yourself and everyone around you. But I also wonder how many arbitrary-length strings are handled by embedded systems. Maybe don't define an `array` type with length, but maybe one just for `char*` (string) that included size and all the `str` functions used it, that would solve a huge number of exploits.

A PC running MS-DOS 3.3 (512KB) would be an embedded systems, micro-controller for today's standards.

I never had an issue with a Turbo Pascal application size due to string/arrays.

But why not add such capabilities in the standard library and not use them in code running on 4-bit microcontrollers? That's sort of similar to what c11 is doing with its 'safe' variants on stdlib string functions.

Isn't the problem simply that the pointer to the first element and the pointer to the array are the same address? And that this is a key reason you can actually alias one array into many sub arrays at runtime?

That is you have an allocator that views it as a free list of tracked size. You have the exact same memory loaned out to other parts of the system for various other uses. Some of which are other arrays.


I think the problem is you have to keep the array and its length together and always remember to call the appropriate array functions with the appropriate lengths. This isn't made easier by the fact that many string/array methods have variants that expect (and only respect) null-terminated strings (i.e. where you actually don't keep a length and rely on a terminal null-byte).

The language could have decided that the first byte is the length and used that convention in string/array functions, but instead we have the convention you point out that the array is really referenced only by its first element and array-indexes are just sleight-of-hand for pointer addition.

I get that. I'm saying the reason you can't universally have an array type with the length attached as a struct is because of aliasing. Precisely because you couldn't have two arrays use the same values in memory. Otherwise, the allocator couldn't keep track of the memory with an array like reference.

Consider, in c, you can have an array and pass it in two parts to two different functions. Each getting passed a different length. You couldn't do that if the array had it's length as an intrinsic property.

You could have split the function space into operations that work on parts of an array and those that work on a full array. So, not claiming this as a full blocker. And I don't know which is ultimately better. We know where we are. Seems easy to speculate we could have avoided a lot of errors. I just don't know if I agree it is guaranteed.

Oh I see what you're saying - you can have `char * x = "foo"` and `char * y = &x[1]` (or something - I forget the exact syntax). That's a very good point! So to be regular you'd need a whole host of other commands to deal with substrings.

I guess that makes the point that what they chose is probably correct, but it's still quite disappointing that the stdlib doesn't help more to prevent you from shooting yourself in the foot with pretty common operations and data-types.

"The ability to overflow an input buffer is also a potential security hole, as shown by the recent Internet worm."

Interesting data point that in 1989, it was still a novel idea that buffer overflow was related to security.

Already in 1979 it was clear that C developers needed help to write proper code.

> To encourage people to pay more attention to the official language rules, to detect legal but suspicious constructions, and to help find interface mismatches undetectable with simple mechanisms for separate compilation, Steve Johnson adapted his pcc compiler to produce lint [Johnson 79b], which scanned a set of files and remarked on dubious constructions.

Dennis M. Ritchie, https://www.bell-labs.com/usr/dmr/www/chist.html

It's good to be reminded that the state of the art WRT program stability has improved, though it's nearly 30 years later and things still aren't "perfect"... also, I didn't realize "fuzzing" as a term was 30 years old.

I remember someone saying in another article they did it in the 1950's by taking piles of punch cards out of the trash to run through programs. Probably earliest example.

"I didn't realize "fuzzing" as a term was 30 years old."

When you read papers from the 60s, 70s and 80s it's amazing how many things we think are new today have already been discussed back then.

Including safer systems programming.


Barton Miller (one of the authors of the linked pdf) coined the term (though I'm sure the practice predates him.)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact