

Why can't we all use standard libraries for commonly needed algorithms? - snth
http://queue.acm.org/detail.cfm?id=1944489

======
haberman
Counterexample: zlib.

 _Everyone_ uses zlib. It's hard to find a binary it's not linked into. I
doubt anyone reimplements it -- what's the point? Maybe some people copy it
into their source tree, but what's wrong with that?

The key? Make sure people have heard of it, and make it as unobtrusive as
possible. The author's "libmd" fails at the former, and PGP, GnuPG, and
OpenSSL fail at the latter. There is no good reason not to Just Use Zlib.
Quoting from their website: "A Massively Spiffy Yet Delicately Unobtrusive
Compression Library."

zlib is my inspiration.

~~~
tzs
We tried to use something other than zlib once. The application needed to make
a copy of a CD-ROM and compress the copy. This was back in the day when a
couple gigs was a big hard drive, and 200 MHz was a fast CPU. We first tried
zlib but it wasn't fast enough for the client. We looked around, and found
another library, written by a graduate student, that was a lot faster. It
didn't compress as well, but it was good enough for the client, given the
speed. The only problem was it was under GPL, and no way was the client going
to open source the application.

We contacted the grad student, and worked out a tentative deal to license the
library for around $2k and started working out the details. Then he got all
paranoid about Microsoft. Most of our products were Windows products--so what
if Microsoft bought us? Would they end up with a Microsoft-wide site license
for his library? He wanted to put into the contract terms that would terminate
the license if our company changed ownership. Those kind of terms are tricky.
Would we lose the license if we went public? What if one of the two major
owners (who together owned about 99% of the stock) died and the stock went to
their kids?

The CEO (who was one of those two owners) didn't have time to deal with that,
said screw it, and told me to find something else.

So I went back to zlib, and put a slider in the ripper settings to control
speed vs. compression. The implementation was to simply skip compressing some
of the blocks when ripping. The slider controlled what percentage of blocks
were compressed.

PS: ~12 years later, we're still owned by the same owner. Microsoft has not
ever tried to buy us.

~~~
burgerbrain
_"So I went back to zlib, and put a slider in the ripper settings to control
speed vs. compression. The implementation was to simply skip compressing some
of the blocks when ripping. The slider controlled what percentage of blocks
were compressed."_

How did that resolve your inability to license zlib?

~~~
wlesieutre
zlib didn't have licensing issues, it just wasn't fast enough

------
cpeterso
Linux developer Rusty Russell wrote a helpful article about designing C
libraries to be easily configured and reused:

 _"On C Library Implementation":_ <http://rusty.ozlabs.org/?p=140>

For such an important subject, there is little practical discussion or good
examples of library design. Compare the number of books about C++ object-
oriented design versus the design and packaging of C/C++ libraries.

The best reference I know is John Lakos's excellent (but ancient: 1996!) book
on the subject: _"Large-Scale C++ Design."_ Here are some excerpts from the
book.

[http://blog.codeimproved.net/2009/03/the-large-scale-c-
softw...](http://blog.codeimproved.net/2009/03/the-large-scale-c-software-
design-rules-in-practice/)

<http://www.cs.unc.edu/~stotts/COMP204/lakos/>

------
tptacek
PHK picked a really bad example with secure hash functions.

All of these algorithms are so simple, and have such a simple interface, that
copy-pasting the C code is actually easier than linking a library.

And in their most common use case, they're coupled with a bunch of other
crypto code which already exists in one of several large crypto libraries
(like cryptlib and openssl). So most of the critical mass of "roll this into a
library" effort goes into those libraries. Meanwhile, the people who just want
hashing don't want to link all of OpenSSL.

------
jrockway
Incidentally, I really want to use the shared memory logger from varnish in my
own app, but it seems to be quite embedded in Varnish. (It is not tightly-
coupled code-wise, but it is build-wise.) Anyway, although it's commonly
needed, it's not a library... and Varnish is the author of this article's
project! :)

(I'm not complaining, I'm going to try and library-ize this code very soon.
But it's odd that someone that feels the need to write about how people should
do this hasn't done it in his own project yet.)

~~~
dwc
Interesting, indeed.

> _But it's odd that someone that feels the need to write about how people
> should do this hasn't done it in his own project yet._

I'm afraid it's not that odd at all. When I hear people talk about code reuse
these days it means something about how they design things, not about whether
they actually reuse the code or make it available for reuse by someone else.
This detachment from actual reuse has also allowed some drift in the design
principles, so that "reusable code" may not be. _Sigh._

------
wglb
I remember when Communications of ACM used to have computer science articles.

Who says _one of the bedrock ideas of good software engineering is reuse of
code libraries holding easily accessible implementations of common algorithms
and facilities._ is really the case? One of the better pieces of software
known (qmail) avoids libraries because of unknown side effects.

Software reuse, I think, is the desire of a certain kind management and not so
much computer scientists or engineers. How much reuse is in PAIP or TAOCP?

~~~
_delirium
This is _ACM Queue_ rather than _CACM_ , which is more software-
engineering/practitioner oriented rather than CS-oriented.

On the other hand, _CACM_ isn't really that CS-oriented anymore, either:
<http://cacm.acm.org/magazines/2011/3>. Partly it's because "CS" is so big a
field that it's not clear what a technical yet completely general CS journal
would look like. So it writes more about computing as a field, with a few
articles on emerging areas thrown in, while the technical articles go to more
specialized venues (like a PLs, systems, or AI journal).

------
abecedarius
People often point to monoculture as a security problem: "Clones are
vulnerable. Computers need sex!" I'm not endorsing this argument, but it's
curious how neither side seems to mention the other when they're making their
points.

I once reimplemented much of SNMP because the standard library for it seemed
too crufty; a year or two later a whole lot of programs were hit by the
announcement of vulnerabilities in that library. (Unfortunately the employer
who was going to open-source my code never got around to it.)

------
T_S_
_"overwhelmingly this is just pointless copy-and-paste of identical source
code in blatant disregard of Occam's three-quarters-millennia-old advice."_

So isn't this analogous to asking why there are so many implementations of the
quadratic formula in various books? I think the problem is real, but this
doesn't seem to be the proof of it.

If programming allowed us to use equational reasoning then such cutting and
pasting wouldn't be considered a problem at all. We would simply recognize the
formulas, perhaps with the aid of computer. Unfortunately we tossed away that
option the first time someone designed languages that allowed statements like
"x = x + 1". Maybe it's time to rethink that decision.

~~~
lincolnq
Hmm. An interesting observation, but is mutation really the problem here? I'm
not sure it is... we are able to transform any program with mutation into an
equivalent program where each variable is assigned once upon allocation. (If
you don't already know what it is, this transformation is called static single
assignment.)

Anyway, it's really hard (undecidable in the general case) to compare two
programs and determine whether they do the same thing. Is that what you're
proposing?

And even if you could compare programs like that reliably, it seems
impractical to apply it to the giant body of today's source code, but maybe
you have an idea for doing that?

~~~
T_S_
In Haskell I frequently find myself searching libraries for functions
(algorithms) based on the input and output type. The list is often short. Also
there is some literature on deriving algorithms simply by specifying such
types.

If we had a standard algebra for programs, written in some relative of Haskell
or an EDSL thereof, then searching for algorithms that mapped between certain
types could be made easier. True, the case of proving two versions equal would
be undecidable sometimes, but not always. In that case it could time out.

I don't believe this would work for imperative languages because the semantics
of side effects are often not available, but I could be wrong, or they could
be restricted enough so that an SSA specification could do what you say.
Bottom line: types would help, purity or the equivalent would help.

Until then please excuse me while I reinvent the wheel...

------
alok-g
I have always wondered this. Even algorithms common discussed in computer
science books are not there in C or C++ standard library. (PS: This discussion
is not about negatives of C++). I once emailed Dr. Bjarne Stroustrup about
this; more precisely non-existence of even simple tree algorithms from C++
standard library. He just pointed me to implementation of some sorted list
using red/black tree "internally".

~~~
jbapple
Can you explain in more detail what you were looking for?

> I once emailed Dr. Bjarne Stroustrup about this; more precisely non-
> existence of even simple tree algorithms from C++ standard library. He just
> pointed me to implementation of some sorted list using red/black tree
> "internally".

What tree algorithms did you need to use?

Also, I suspect "some sorted list" was std::set or std::map.

~~~
alok-g
I think you are right about std::set or std::map.

I was looking for standard tree algorithms like AVL trees, splay trees, etc.
and graph algorithms like shortest path, minimum spanning trees, etc.

~~~
jbapple
> I was looking for standard tree algorithms like AVL trees

What did you need to use AVL trees for that std::set/std::map could not
provide?

------
skybrian
If you want the same algorithm to work everywhere, you can't write the library
in C. It will have to be written in some intermediate language with generators
for all common scripting languages. The closest thing I know of that's widely
used is the generators for proto buffers. We need something like proto buffers
for algorithms.

Ben Laurie has started on this: <http://www.links.org/?p=864>

~~~
unwind
Reading his pseudo code with its explicit size information ("pad_byte =
pad_byte plus32 1;") made me think, of all things, of LLVM's SSA assembler
syntax ("pad% = add i32 %pad_byte, 1"). :) Not surprising of course, making
the type visible in each operation is typically necessary in assembly.

------
itsnotvalid
If standard libraries are always suited, and there are many "variants" with
different language bindings.

------
swaits
phk. definitely one of my heroes. big time contributor to FreeBSD, but most
importantly, inventor of "THE BEER-WARE LICENSE".

------
anonymous246
Weak article. I thought he would talk about some more examples. And then talk
about the difficulties in learning a library vs. rolling your own. Easy-to-
copy code (MD5) was copied. Well duh.

Lots of people have talked here about the difficulty in reusing OpenSSL. I
once had the distinct displeasure of trying to reuse ffmpeg as a library.

In addition, why must every language have it's own standard library. Is it
possible to have a source code format for standard libraries which can
translated into other languages?

There's a good article waiting to be written about this topic, but this is not
it.

~~~
astrange
What problems did you have with FFmpeg libraries?

Assuming you mean the libraries that come with FFmpeg, and not turning
ffmpeg.c into a library.

~~~
anonymous246
Wasnt intuitive, undocumented, had to jump thru lots of setup even for simple
things (no smart defaults), I had to manage the memory for all buffers
including temp buffers myself. It almost seems like ffmeg allocates no memory
on its own. Sorry can't be more specific, it was a while back.

------
iuygtfhnjmn
In a language that left the string library as an exercise for the student?

Where you can't even link libraries built with different version of the same
compiler on some platforms.

~~~
pkaler
To be fair, Unicode didn't exist until the 1990s. C was originally invented in
the late 60s and early 70s.

The C language and standard library are meant to be minimal. It assumes very
little. It barely assumes that Unix/POSIX exist. String.h has to work on
server, desktop, console and mobile CPUs as well as microcontrollers. And this
is the minimal set of functions that will work almost everywhere.
<http://en.wikipedia.org/wiki/String.h>

~~~
Teckla
C is outdated, even by systems programming language standards. Sorry, but it's
true.

There's still room in the standard for one's complement, it denies all
knowledge of multi-threaded programming, it has a bunch of impossible-to-use-
safely functions in its standard library, etc. (This is just a tiny sample,
there's so much else wrong that you could almost write an entire book about
it.)

In my opinion, they should start by removing the substantial cruft. That would
leave plenty of room for some sorely needed additions/modernizations.

In the name of backwards compatibility, they'll remove too little, they'll add
too much unnecessarily stuff, and eventually, another systems programming
language will catch on and start to slowly-but-surely replace C.

~~~
VladRussian
> and eventually, another systems programming language will catch on and start
> to slowly-but-surely replace C.

My personal impression from Linus Torwalds' posts (manifests :) over the years
- it wouldn't happen until he is conscious ...

For me it looks like there is a Catch-22 - whenever a language fixes a
"deficiency" of C it makes the language unsuitable for systems programming. It
is similar to multiplication table - 99% of people can rely on calculators
(very trustworthy, much safer, no side-effects on the brain tissue :), except
for the people who design the calculators.

