

Language for Unix command line utilities? - gn

I code for a small biocomputing company. We download nucleotide sequence and taxonomy information in a number of unrelated formats from a number of public repositories and run various kinds of translation and analysis on it. Much of what we write are small command line tools that search, summarize, or transform certain types of (large trees of) text files. These programs look and feel a lot like traditional core Unix utilities; our most widely used programs are essentially just a specialized version of diff and a specialized version of grep, respectively. We used to prototype most of our utilities as shell scripts or in Perl; we redid shell scripts in Perl or C (or sometimes Java) if they became performance bottlenecks.<p>Some years ago we decided to move from Perl to Python for new projects because Perl programs had a way of always ending up as maintainability nightmares and because Perl seemed on the way out anyway. It largely worked, but we were never really, truly happy with our Python code. I suspect part of the reason is that Python can be (or at least <i>feel</i>) less succinct than even C if you do lots of low-level file system stuff with close error checking. The true reason is probably largely aesthetic. We can't explain what's wrong, we're just vaguely uneasy.<p>What other alternative to C should we be looking at? Ruby? Haskell? Is Go there yet? We have very open minds and are willing to consider pretty much anything that gives us reasonably easy and unmolested access to syscalls and their return values.
======
gaius
I do a lot of command-line-tool-writing, my language at the moment is Haskell,
tho' I still count myself as a Haskell beginner, it's proving to be very
productive. The old adage that if Haskell code compiles it works is mostly
true; bugs are caught up-front rather than after running in the wild for a
bit; type inference, explicit pure/IO and functional composition are real
boons. I need to build more familiarity with the libraries before I can be as
productive in the short term (e.g. for "one offs") as I am in Python but
already I believe in the long term (because one-offs never are!) I'm pulling
ahead.

OCaml would be a good choice too. Both of these languages work very naturally
with tree-like structures. Profiling/code coverage in both is very easy. IMHO
there's no need to go to C for any but the most performance-critical code (and
remember that your I/O etc is already in C in the kernel). The C approach of
checking the return value of every syscall (e.g. no exceptions) is very
cumbersome.

Case in point today: rather than persuade our Unix guys to roll out Expect
across a bunch of new machines, I rewrote a ~200 line Expect script I had in
~60 lines of Haskell and deployed a binary instead of a script.

~~~
mblakele
On OCaml, <http://ocaml.janestreet.com/> might interest the OP.

------
aidenn0
Only thing I can think of is AWK, but that's only slightly more readable than
perl and is probably less maintainable since perl has vastly superior
profiling and debugging tools.

I mostly use Python for the sorts of things you are mentioning. And from what
you're saying you don't like a bout Python, I suspect that going to Ruby or
Haskell or such is going to be worse. Python can more easily call the
underlying C routines then either of those.

It would be nice if you could provide an example of something you think is
inelegant and/or awkward in Python so that we could figure out which direction
to point you.

I do a lot of coding in common lisp and some programming in haskell, but
wouldn't recommend either of those based on what I've heard from you so far.
There's a few dataflow style languages I've seen that would probably allow
very succinct code, but they were all toys and performed quite poorly.

~~~
gn
> It would be nice if you could provide an example

For me personally the main source of unhappiness is error messages. In C I can
say

if (!(f = open(name, "r"))) die(name);

where _die_ is a tiny function that prints _name_ , followed by whatever
strerror has to say to the subject, formatted in the usual fashion. One line,
done with it. The obvious, conventional Python equivalent is four lines long
because both try: and except: insist on a line of their own. Since I cannot
tell Python to produce succinct unixy error messages instead of rambling stack
traces I have to catch and examine more or less every plausible exception.
Some exceptions I can deal with close to the base level of my call stack in a
butt-ugly fourty-line catch-all clause but a large proportion of my syscalls
end up taking three lines extra each. I know it's a trivial problem, but I
agree with pg you tend to get the more productive the more of your actual
application logic you can see.

> I do a lot of coding in common lisp

We did experiment with clisp a while back; it turned out not to be a natural
fit for problems that involve a lot of pathname, datetime, and stat info
manipulation. If there was a reasonably modern Lisp that let me say things
like (localtime (nth 9 (stat "/foo"))) I would go looking for it this very
afternoon.

~~~
huwigs
> Since I cannot tell Python to produce succinct unixy error messages instead
> of rambling stack traces

`sys.excepthook` is how you can do that.

Without:

    
    
         x = {}
         def f():
             print(x['foo'])
         f()
         # ...
    
          Traceback (most recent call last):
    	File "stack.py", line 6, in <module>
    	  f()
    	File "stack.py", line 4, in f
    	  print(x['foo'])
          KeyError: 'foo'
    

With:

    
    
         import sys
    
         def short_err(exc_type, exc, tb):
             sys.stderr.write("error: tracebacks too long\n")
    
         sys.excepthook=short_err
    
         x = {}
         def f():
             print(x['foo'])
         f()
    
         #...
    
         error: tracebacks too long
    

So don't worry about catching exceptions if you're just printing errors.

~~~
gn
> `sys.excepthook` is how you can do that.

Awesome. Thank you kindly.

> error: tracebacks too long

I like your style.

------
ggchappell
> ... Python can be (or at least feel) less succinct than even C ....

That's an interesting statement. Certainly, Python can be less succinct than
_Perl_ , particularly for small scripts where a quick "while(<>) {" and a
regexp get most of your work done. But C??

> ... if you do lots of low-level file system stuff with close error checking.

Hmmm. In my experience, C's I/O libraries tend to make error checking
something we leave by the wayside. Is there any chance that the real reason
your Python scripts are longer, is that you actually check for, and properly
handle, the errors there, while in C, you often don't?

In any case, I'll echo a comment from aidenn0:

> It would be nice if you could provide an example of something you think is
> inelegant and/or awkward in Python so that we could figure out which
> direction to point you.

------
chromatic
With a modern version of Perl and the autodie pragma active (part of Perl
5.10.1), your dissatisfaction with verbose handling can often simply
disappear.

------
CyberFonic
Python is just fine. O'Reilly have a great book "Python for Unix and Linux
System Administration" if you'd like some great suggestions and ideas.

------
konad
Go is certainly there. It even has an easy to use Syscall module.

See the File example <http://golang.org/doc/progs/file.go?h=syscall>

