and offered a simple adjustment to C to fix it. I quote from the article:
All it needs is a little new syntax:
void foo(char a[..])
BTW I must take issue slightly with your characterization of the problem. Arrays decay to pointers immediately when referenced, not just when passed to functions. I suppose a C implementation could do bounds checking on non-pointer-mediated accesses to local arrays, but I've never seen an implementation do that (except those that do bounds checking more generally). Did yours?
In fact, the way I put it once was, "C does not have arrays. It has pointers and notations for allocating and initializing memory; it does not have arrays." (Meaning, of course, "array" in the sense of a firstclass object, with bounds checking.)
Arrays decay to pointers when dereferenced, but this wouldn't impede array bounds checking. I didn't bother putting in static array bounds checking in Digital Mars C because pretty much all the interesting cases were buffers passed to a function.
I know C doesn't have first class arrays. That's what my enhancement suggestion was for.
Slicing is one common operation that's O(n) with a simple vector representation. I agree it's nice if it can be made O(1), but concatenation is still O(n) with that representation. This is why I'm a fan of ropes or other more exotic sequence data structures that can do all the common operations in log time or better.
> I know C doesn't have first class arrays. That's what my enhancement suggestion was for.
Fair enough. I think it's a good suggestion, but I'm skeptical that it would see wide use. Here's why. It has been straightforward for a long time to declare a struct type that holds a base and bounds, pass that struct by value, and then do one's own bounds checking. (The K&R-era compilers didn't have struct passing by value, but it was introduced in Unix v7, as I recall. That was a long time ago.) This isn't quite as convenient as what you're proposing, but it's not hard, and I don't think I've ever seen anyone do it. (Not that I spend my time reading other people's C code.) It certainly hasn't become standard practice.
Still and all, if you can persuade the committee to put it in the language, and also define new standard library functions that make use of it, I agree it would be a good thing.
I don't disagree with you, but I don't see how it's a disastrous problem, either.
Get an SVR4 API manual and do the same. After doing this I don't find this argument to be compelling.
All Unix system-call wrapper functions were specified this way, because it is guaranteed that they will need to move the string data from a user-space data structure to a kernel-space data structure (copying it wholesale in the process) before manipulating it. The NUL-terminated string can, in this case, be thought of as an encoded wire format for passing a string between the userland process and the kernel.
And because the Unix system-call wrappers--the most visible and useful parts of the C stdlib--followed this convention, C programmers generalized it into a convention for passing strings in general, even between functions in the same memory-space that can afford whatever registers they like.
If you're not going to iterate over everything in the passed buffer, though, C intended you to do things more like sprintf: take a destination buffer, and explicit size; write to the destination buffer, without scanning for a NUL first as a place to begin, and without appending a NUL afterward; and return an explicit size in bytes written.
I call it my billion-dollar mistake. It was the invention of the null reference in 1965
I.e. the array-decay-to-pointer is a more expensive problem.
It would convert corruption/security bugs into other, much safer runtime errors.
To get rid of the runtime error altogether, we need a stronger type system or a different form of static analysis, as mentioned here.
Also, the extra bounds checking might be somewhat expensive in some scenarios, especially as the fat pointers would consume many more cache lines than thin pointers.
Back then, C optimizers were not better than Object Pascal/Modula-2 ones and similar languages. It was all down to profiling and switching off bounds checking in specific code sections.
Of course, with time compilers got better, while at the same time safer languages lost to UNIX widespread into the industry creating this idea in younger generations only C can have fast code.
Regehr's suggestion of using assertions hidden in code avoids that: isn't it better? You could have just a comment directive like
@ arraycast char* a \represents char a[..]
The filesystem was about the only place where strings got manipulated and there, filenames were stored in fixed-size (14 bytes) buffers. Since all code could know strings were at most 14 bytes, having a complex type (pointer,length) was deemed a waste of memory and performance there. Everything else followed from that.
This has its benefits; _anything_ fancier would have embedded a whole set of assumptions about typical string handling usage for which the design had optimised.
That no safer string handling library became popular can be seen as a indictment of C or a tribute to their tasteful restraint.
If I compare the effort we've had to go through building things that are allowed into a banking environment the OpenSSL project is very underfunded. Somehow we have to solve this "tragedy of the commons" problem in open source. OpenSSL must be worth millions of dollars to so many companies, but almost no-one cares about it. It seems very strange that the project is not getting BIG corporate sponsorship, given the value of the data it helps to protect.
Crowdfunding seems to be the obvious solution. But I asked in a previous Heartbleed discussion whether people would be willing to try it for OpenSSL, and got no response.
Someone else commented that maybe Google will take over OpenSSL, or at least become a major contributor to it. It probably is in their interests to make sure this doesn't happen again -- maybe it's worth a large enough amount of money to them to justify dedicating a couple of engineers to it for a couple of years, which is what it seems to need.
As has been explained elsewhere, OpenSSL has a wrapper around malloc() and free() so that it re-uses allocations. This means that the 64k of buffer used to send the heartbeat response is data that has never been free()d and has previously been written to by the process.
To make the static analysis spot this kind of thing, I'd guess you'd have to mark/label the OpenSSL allocators as being 'like' malloc. Likewise, valgrind would also not spot the faults without extra knowledge.
(I am the architect for the SCA team.)
Considering that any non-trivial piece of software has a fairly large number of bugs, you can dial the sensitivity way down and still find some bugs, and the high specificity looks really impressive "Most of what it labelled as bugs really were!"
Consider the hypothetical situation where your piece of software has 40 bugs in it (but you don't know that).
If you see a report of 10 bugs and 6 of them are real, that looks more impressive than a report of 100 bugs where 20 of them are real, despite the fact that the latter actually found more bugs. In fact there is a non-trivial chance that the first 5 or 10 bugs you look at in the latter will be false-positives.
Obviously if your specificity goes too low, the payoff of inspecting each report starts to become questionable, but I think that to successfully market an analysis took, you need to sacrifice more sensitivity for specificity than merited by a rational analysis of the cost/benefit of detecting the bugs early.
A small typo, there's a sentence after the Frama-C code block that ends "[...] and that a subsequent memcpy() of the two regions will return zero" which should have a s/memcpy/memcmp/. Quite obvious, but in an article such as this it's best to be precise of course.
Bugs can always be found statically, but there may always be bugs that evade all the static analysis you happened to do.