
Shell-conduit: Write Shell Scripts in Haskell with Conduit - dbaupp
http://chrisdone.com/posts/shell-conduit
======
covi
I still don't see why this is better (in any reasonable metric) than just
plain Bash. Is there really any type safety we gain here? Is it more
convenient than Bash? The only thing I see is the argument about quoting
arguments.

~~~
chrisdone
Forgive me for assuming you aren't a Haskeller and therefore aren't part of
the target audience (maybe you are), but I'll try make up for the assumption
by elaborating some general benefits.

The process arguments aren't type-safe, no. But the code that uses them can
be. I can also use existing Haskell libraries mixed with this (which are type-
safe). For example:

tail' "-f" "foo.txt" $= withRights (intoCSV def $= CL.map (\\[a,b,c] -> b <>
"\n")) $= grep "\--line-buffered" "^4"

This line streams lines from foo.txt, parses each one with the CSV conduit
parser[1], takes the second value and then feeds that into grep which spits
out only columns which match the regex ^4. Or if I wanted to match only valid
email addresses, I could import Text.Email.Validate and use this conduit:

… $= CL.filter isValid $= …

Which, again, comes from a normal Haskell package[2] which validates RFC 5322
emails with a full parser. I can also just extract the domain part of valid
emails:

… $= CL.mapMaybe emailAddress $= CL.map domainPart $= …

I'm operating on structured, well-typed data in between normal UNIX pipes, in
a streaming manner. I think that's pretty sexy.

Finally, when I decide this script is getting more complex than a mere script,
I don't have to worry about maintaining it or refactoring it. I'm already
using Haskell!

There's also the personal benefit to me, my editor support for Haskell is very
good. Editor support for Bash is comparatively lacking.

I don't think it's more convenient than Bash if you take a myopic view of
scripts that you write.

[1]: [https://hackage.haskell.org/package/csv-
conduit-0.6.3/docs/D...](https://hackage.haskell.org/package/csv-
conduit-0.6.3/docs/Data-CSV-Conduit.html#v:intoCSV) [2]:
[http://hackage.haskell.org/package/email-
validate-2.0.1/docs...](http://hackage.haskell.org/package/email-
validate-2.0.1/docs/Text-Email-Validate.html)

~~~
covi
Hey nice examples. However those examples only show that you are already
operating/programming outside of the target use case of a Bash script -- and
that's why a general-purpose language like Haskell comes in handy. IMO if one
is to fully operate within the scripting level and to not worry about more
complicated processing, Bash is easier. So I'm saying this is not really an
apple-to-apple comparison.

(FWIW I did program in Haskell and I loved it.)

~~~
chrisdone
Fair. That's partly what motivated my desire to be able to script within
Haskell in the first place, I often find myself tredding that gray area. Half
way through a Bash script (or a pipe chain) before realising I can't express
what I want to express. A bunch of times I've given up and started writing a
Haskell file. That's my personal experience, anyway. So I'd rather avoid
bumping my head on the ceiling altogether.

Although I just saw this comic which seems somewhat germane :)
[http://www.catacrac.net/crac/w-16](http://www.catacrac.net/crac/w-16)

------
cf
So is conduit the recommended streaming IO library now? I saw there were a
bunch of implementations of iteratees and was waiting for the community to
coalesce around one.

~~~
chrisdone
Conduit and Pipes are popular, as far as community support, libraries and
implementations go. Secondarily there are also io-streams and machines. I
don't think the community has settled or will settle on a single one.

~~~
carterschonwald
Yup! Chris is absolutely correct. There's several different choices of libs
for streaming computation that make different tradeoffs, and behave
differently and are tuned for different workloads

~~~
cf
Are there any summaries of those tradeoffs.

------
cies
See here the discussion on reddit:

[http://www.reddit.com/r/haskell/comments/2h017v/shellconduit...](http://www.reddit.com/r/haskell/comments/2h017v/shellconduit_write_shell_scripts_in_haskell_with/)

------
dscrd
"Its syntax is insane"

=>

    
    
        (do source
            source $= conduit)
        $$ sink
    

vs.

    
    
        source
        source | conduit
    

=> hmm.

~~~
chrisdone
[http://stackoverflow.com/questions/4277665/bash-how-do-i-
com...](http://stackoverflow.com/questions/4277665/bash-how-do-i-compare-two-
string-variables-in-an-if-statement)

------
tinco
Oh man, I just spent a lot of time building a project that executes a bunch of
shell scripts and feeds them into haskell pipes. I most definitely could've
used this. Perhaps I will still port it since it looks so clean.

~~~
danidiaz
You mean the "pipes" library? I have a bunch of helper functions for pipes &
process here: [http://hackage.haskell.org/package/process-
streaming](http://hackage.haskell.org/package/process-streaming)

------
x3ro
From the article: "Its syntax [Bash] is insane, incredibly error prone, its
defaults are awful, and it’s not a real big person programming language."

A "real big person programming language"? Wow, that guy's got some issues :D
Seems like his definitions seems to be a functional programming language? I'd
assume that there is way more "real world big importance" stuff written in
Bash than there is in Haskell.

~~~
jerf
Every so often, after a particularly painful bout with interactive Bash, I
find myself mentally designing some shell replacement based on some other
language, sometimes custom, sometimes something like Python. And I always end
up reminding myself of the same conclusion... nobody will use the resulting
interactive shell, because the bash defaults and error handling and stuff
_mostly_ make sense interactively (I mean, I can quibble, but let's admit that
most of us are pretty comfortable with them in interactive mode), and making
everything more explicit in some hypothetical replacement will also make
everything more annoying to use. It's hard for me to make anything that's more
terse than Bash, and for an interactive shell that's a big deal.

However, the interactive use case makes one big assumption, which is that you,
a human being, are sitting there, statement by statement, and looking at all
the output. It assumes the "return codes" of commands are advisory, and not
something you literally want to see on every command. It assumes that if
you're having trouble with some escape sequence you can interactively work out
what you're doing with "echo" or "ls" or something. It's fundamentally built
around expecting to be interactive.

It is unsurprising that that assumption doesn't work out well when trying to
program in it. Where it makes sense for interactive Bash to be somewhat sloppy
with things and expect the human to pick up the pieces, and to be clear that
is 100% true, not sarcasm, that's a terrible thing in a programming language.
It takes a bare minimum of set -e (stop on errors) and set -u (stop if unset
var used) just to turn it into a semblance of a safe language, and that's
still just putting lipstick on a pig.

I'm not terribly convinced that a shell can be optimized for both interactive
and programmatic use... certainly the two modes will be fighting with each
other in the design even if you pull it off, and the whole will be more
complicated than an interactive-optimized language + "just use
Perl/Python/etc". Of course it's too late to remove shell scripting for bash
or UNIX, but the more serious the task you're trying to do, the less you
should be reaching for shell scripting to do it with.

Unless you're doing something in which you really don't care that the script
hit an error halfway through and it's just fine for the entire script to keep
blundering along despite having no idea what's going on at that point.

~~~
chubot
There is some truth to what you say, but bash has functions, which largely
mitigate it. The easiest way to make bash sane is to put everything in
functions. Then everything can be tested interactively. I don't know why
people put 300+ lines of code at the top level -- that makes changes very
difficult to test.

A good trick for this is to put:

    
    
      "$@"
    

at the bottom when you are testing. Then you can do:

    
    
      ./myscript.sh func-name arg1 arg2 ..
    

If you really want, you can change it to

    
    
      main "$@"
    

before deploying so it only runs the main function. Those lines, and
constants, should be the ONLY things at the top level of a bash script IMO
(even a 10 line one).

Don't stare at bash code without running it -- that's crazy. I have taught
Python and also admonish people to not "stare" at their code; just form a
hypothesis and run it. I suspect this is where Haskellers have problems,
because they think that not running your code is a good thing.

I love shell; it's one of my favorite languages now. I don't think there are
really that many warts once you know it. You're right that set -e and set -u
are generally code ideas, along with set -o pipefail.

~~~
taeric
"Don't stare at bash code without running it -- that's crazy. I have taught
Python and also admonish people to not "stare" at their code; just form a
hypothesis and run it. I suspect this is where Haskellers have problems,
because they think that not running your code is a good thing."

This was awesome. And, I agree, probably close to the mark.

It is somewhat odd in this day and age when computers do run so bloody fast
that so much effort seems to be spent in not just running an application.

Of course, this is clearly easy to take too far. I feel that a lot has been
lost by folks that did not first draft out their intentions on paper. (Well,
at least I know I have wasted a fair bit of time in that regard.)

