

Ask YC: if Haskell is the hammer, what should be the nail? - dhbradshaw

Hi, guys.  I've been thinking of learning Haskell, and so have been casting about for a good project to start in it.<p>I found a top ten list of Haskell projects (http://haskell-news.blogspot.com/2008/01/top-10-most-popular-haskell-programs.html) and it looks like things are sparser than I might have hoped.<p>Anyway, if one wanted to break into that top-ten list by using using Haskell's strengths, what are some kinds of projects that one would take on?<p>Also, if anyone would like to join me, that would only add to the fun.
======
SwellJoe
I've just started watching the Simon Peyton Jones videos on the subject (from
OSCon last year), and I've been kinda thinking I might tackle some systems
monitoring problems (I usually use Perl for that sort of thing...and most
every other sort of thing, lately). I, too, have been struck a bit by how low-
level most of the libraries and such are, at this point (then again, compared
to CPAN almost every language has a dearth of high level libraries). And I'm
still far too much of an amateur at the language to do much about it. But it
is an interesting language, and probably useful to tinker with even if no
useful code ever comes of it.

~~~
rcoder
You may find systems monitoring to be a bit challenging in Haskell if you're
accustomed to Perl. By design, the normal POSIX abstractions that are so
tightly integrated into Perl are held at arm's length in Haskell, if available
at all; something as simple as forking a child process or switching effective
UIDs will require far more code in Haskell than in Perl.

That being said, if you want to be able to _reason_ about system configuration
states, and write concise deterministic rules that dictate the triggers for
and transitions between between such states, Haskell may be worth some
additional attention.

~~~
eelco
I don't think that's necessarily true. Take a look at all the System.Posix
modules:
[http://haskell.org/ghc/docs/latest/html/libraries/unix/Syste...](http://haskell.org/ghc/docs/latest/html/libraries/unix/System-
Posix.html)

~~~
rcoder
That's exactly my point, though -- there's a standard module for POSIX
integration, but it's not part of the language syntax itself. It's a subtle
thing if you don't do systems code often, but it becomes apparent after
working with Perl for a while how much of it really is a domain-specific
language for UNIX systems operations.

Haskell, as a general-purpose functional language, is always going to require
just a few more keystrokes/lines of code to accomplish the equivalent system
operations.

~~~
eelco
Since I don't have any experience with doing system code in Perl, a small
example would be nice.

~~~
SwellJoe
I just talked about this in a JavaScript thread a few days ago as a reason
JavaScript isn't quite ready to take the place of Ruby, Perl, and (kinda)
Python on the server (though it probably will in the next year or two).

The most obvious example in Perl is its file processing abilities (which are
scary at times, and astonishingly beautiful in their conciseness--thus Perl is
still the master of one-liners). Like Haskell, JavaScript also has some file-
related libraries...but they're also off in the ghetto of a clunky library, as
it has been (historically) in SmallTalk and Lisp. The difference between
making a function call and using a core language feature can be subtle...but
it's definitely a friction point (I hate using Python, despite its many
positive aspects, for many system related tasks because regexes are in the
ghetto, for example).

So, here's a simple example:

    
    
        while (<>) {
            for $chunk (split) { # split on white space
                # do something with words
            }
        }
    

This is the entirety of code for processing a file word by word (<> is a magic
filehandle that slurps in files on the command line, and not really
recommended in code used by untrusted folks...so in real world software there
would be a bit more boiler plate, but not much...but it's a good example). Add
in that regexes are first class citizens and a file handle in Perl can be a
pipe or a network socket or stdin/stdout, and you have an exceedingly low
friction environment for building system-related tools.

That's not to say that Haskell can't overcome the fact that systems-level
stuff isn't in the core language (most Perl additions in the past several
years have also been in the form of libraries rather than more keywords and
new syntax--at some point it makes sense to put things into libs rather than
making the language bigger). I don't know enough about Haskell to say.

But, I can say that I've been quite intimidated by the amount of code I need
to write to do things that are one-liners or a handful of lines in Perl. I'm
sure some of this is my lack of knowledge, and some of it is the lack of CPAN
(in ten years, if Haskell is extremely lucky and extremely successful, it'll
have a selection of libraries on par with CPAN of today). But I'm having fun
tinkering, regardless. Worst case, I'll learn something new.

I had a lot of fun with mjd's Higher Order Perl (which exhibits most of the
major functional techniques, like currying, recursion, infinite iterators,
memoization, etc. using Perl), so it'll be cool to "go native" for a while,
and what better way to learn than by doing tasks I'm already familiar with in
a new language...I may get some new perspective on how things can be done in
Perl (since it has lots of functional features) and maybe I'll even find some
new features that can be added to our products uniquely easily by introducing
some Haskell code.

An interesting source of perspective on what makes Perl magical for systems-
related tasks would be a perusal of the perlvar manpage. Perl has a bunch of
"magic" special variables, many of which are related to how Perl behaves when
given files to munch on. Folks find this intimidating, but it's a source of
great power, particularly for one-liners and pipes (Perl is very much of the
UNIX culture, and Perl fits into a long line of pipes as well as grep or awk
or sed). Of course, for many classes of problem you would _never_ use most of
those special variables. But, for systems related code, it's hard to beat.
(I've tried. I spent a few years in a Python shop, and was constantly amazed
by how verbose my code had to be in Python vs. doing the same task in
Perl...it was also generally a lot slower. I _like_ Python for lots of stuff,
but systems tools aint exactly its strong suit. I may find the same is true of
Haskell.)

~~~
fusiongyro
Take a look at Don's reimplementation of some classic Unix tools as Haskell
oneliners: <http://www.cse.unsw.edu.au/~dons/data/Basics.html>.

The mere fact that interact lets you make a pure function into a Unix-style
string -> string utility should show that you are exaggerating a bit.

    
    
      interact (unwords . something . words)
    

handles basically your whole simple example.

Another simple example: capitalize every word of the input:

    
    
      main = interact (unwords . (map (\(x:xs) -> Char.toUpper x : xs )) . words)

~~~
SwellJoe
_show that you are exaggerating a bit_

No, it merely shows that I don't know Haskell--I didn't claim Haskell couldn't
be as concise as Perl for this kind of thing...just that I don't know how to
make it as concise. The link helps very much, thanks. (And, just so no one is
misled, the same example could be done as a Perl one-liner, as well, using the
magic variables I mentioned in the prior comment. I just figured I'd make it
readable, since there are some anti-Perl bigots around just waiting for the
chance to say, "I knew it! It's nothing but line noise!".)

Also worth noting...I love that I can write code like the example you've shown
in Perl (with some minor syntactic differences, but it does have first class
functions and expressions as arguments), and the fact that Haskell seems to be
_entirely_ made up of code like that made it seem very appealing on first
glance. That slurping a file can be written in Haskell in one line and some
lib imports makes me...umm, I'm embarrassed to say...a little giddy. And it
makes me wonder why I didn't start using Haskell sooner.

------
JulianMorrison
Right now, the area Haskell absolutely thrashes everything else is speedy and
lightweight concurrency (it even stomps Erlang in the Debian Shootout). So if
you can find a project that needs fluid responsiveness, multi-connection or or
multi-CPU scaling, Haskell is the ideal tool.

~~~
dhbradshaw
Okay, here's what comes to mind based on those criteria: an operating systems,
databases, large numerical simulations, intensive graphical manipulations,
mind-modeling. Do these sound right?

~~~
JulianMorrison
It's not really much use for OSes unless you're a research project, because
the compilers target an OS and not bare metal.

Also, it's slower than C on actual processing (about the speed of Java). So
numerical stuff is out, unless the benefits of being parallel outweigh the
benefits of being fast. (This may improve as the back-end optimization project
shows results.)

I think the perfect target would be any sort of massively multi-user network
server.

------
rcoder
Personally, I think there are some very interesting possibilities for using
Haskell in the highly-secure web application space. The most trivial example
would be simply using the type checker to protect against SQL injection and
cross-site scripting attacks by representing user input with a different type
than query parameters or HTML output.

I've also thought for a while that Haskell would be a great environment in
which to implement a static analysis tool to check the security of existing
application code. In particular, I think that the PHP community could really
use an information-flow and type-checking tool external to the core runtime
which could be used to run a quick "sanity check" over source code. The
simplicity of the language (relative to, say, Ruby or Perl) makes it a prime
target for parsing and analysis, and the large body of existing code in the
wild makes for an interesting set of test cases.

~~~
tptacek
There's this meme about type checking defeating SQL Injection that I don't
really understand.

There are basically three situations I see injection problems recurring in
modern web code:

* The stupid cases where people are interpolating input strings directly into query strings, so that a query for "O'Neill" will accidentally break your SQL.

* The not-so-stupid cases where column sorts and query builders pass limits, sort orders, and groupings directly from user inputs.

* The cases where stored procedures resort to dynamic SQL.

The first problem is solved not by better type checking, but by switching to
parameterized queries, where the query string is parsed prior to argument
binding.

The second problem is solved by not passing SQL literals in and out of input.

The third problem isn't even happening in the application's programming
language.

Which of these problems is handled well by application type checking?

As for XSS attacks, I'm again skeptical. If the problem was as easy as type
checking, you'd solve it trivially by output filtering everything from the
database, neutralizing HTML metacharacters. It's not that easy: there are lots
of times when you really do need to honor HTML in input.

~~~
eelco
The main problem is that both SQL and HTML are often simply represented by
strings. The programmer has to keep track herself to make sure everything is
escaped and unescaped at the right moment.

The point is that you can use Haskell's type system to get guarantees about
and keep track of escaping. It's really light-weight to add new types. Also,
dynamic typing would probably mess this up.

Still, all this mainly comes down at the shoulders of the library designer.
But, looking at some of the available libraries (such as Text.XHtml) this
works really well.

In the case of Text.XHtml the easy/default case when using a string in HTML is
that it's escaped. When you want to 'parse' a string to HTML you'll have to be
explicit. That makes it really hard to 'accidentily' forget to escape HTML.

The way I look at Haskell's type system is that it's a great tool for easily
enabling 'safe' programming. It won't work automatically, but it gives you the
opportunity to let the type checker take care of guaranteeing that everything
will work as expected ;)

~~~
tptacek
Yeah, I think this is a bit naive. Of the three SQLi cases I mentioned, only
the first is due to the app language's handling of string input and query
strings, and that case is just as easily handled by parameterized queries.

The second case is not due to the fact that the same type is used for input
and query strings; when a web app passes DESC or ASC or LIMIT 100 in via POST
arguments, that's a design problem type systems don't solve.

Likewise, type systems might fix the simplest XSS problems, but the nasty ones
occur in code that is explicitly trying to handle input that has been
laundered through the database and must include HTML characters.

~~~
rcoder
I still don't see why "type systems don't solve" the problem of keeping data
domains separate. If user input is of a different type than SQL query
components, you simply can't allow GET or POST arguments to hit the database
un-sanitized. Yes, you can perform the check by hand (i.e., fix the "design
problem"), but as we've all seen, programmers don't do that _consistently_ ,
which leaves us patching the same class of vulnerability time and time again.

Type systems also don't have to be the algebraic types of Haskell; SELinux DTE
and FlowCaml/Jif information flow analysis both fit loosely under the umbrella
of "type checking," and yet allow for very fine-grained and interesting
security properties of complex, real-world systems to be asserted and
enforced.

------
quasimojo
what isn't haskell for?

i've been throwing it at everything from fastcgi programs to scrapers to
scripting to systems stuff

maybe the one place it isn't well suited is in the realm of throw-away
quickies. haskell code takes longer to write. that doesn't mean you are
writing a lot of code. i still see a role for perl/python in throaway scripts

haskell has practically every meaningful cool concept in CS built in. it is
_the_ functional language (for now). it is very fast and as the number of
cores rises, haskell's performance will start to leave traditional tools in
the dust. with the new GC it will destroy erlang on its own turf.

haskell takes time to learn and time to code. if you are in a rush, it is not
for you. otherwise, haskell can handle almost any problem

in any case, you can come to haskell or it can come to you. the next ten years
will see functional concepts get weaved into every language.

