To anyone who developed a real world application in Haskell, what are the downsides of it?
What is in your opinion more laborious than needed? Are library sufficiently available and documented? Is doing I/O fine? Is working with a DB easy? Please share with people interested in learning Haskell and put it to work in a real world application!
PS: asking for negatives only because I'm very interested and positive about Haskell, but I wonder if I miss some gotchas...
Most of the issues mentioned so far are trivial or just warts. The real issue is understanding the design patterns that work well enough to build high performance applications. Also in order to really exploit the power and productivity of Haskell there is quite a learning curve.
We built an algorithmic trading system, and almost everything else in Haskell. Our code base is over 100K of human written code.
The major library gaps were a time library (you can find a version of what we have been thinking about releasing at https://github.com/time-cube/time-cube). We use our own build system that drives cabal and ghc. Otherwise having many libraries is just painful.
We found that composing applications as conduits to be a very effective design pattern that minimizes many laziness issues. Monad transformers are very powerful but the machinery for them (like forking) is not fully cooked.
Maintaining the codebase is far easier with haskell than with anything else I have worked with (C/C++,C#,Java,etc...). Refactoring in Haskell is great.
You can't fight the language. Fighting with Haskell will cause great pain. When you go with the flow, using lots of strong types, higher order functions and can apply things like a monoid instance to a problem the language is a joy to work with.
Debugging is more painful than it has to be. There are still times when you need some real GDB wizardry.
Haskell's record system (analogue of C's structs) has two major deficiencies.
1. Modifying a record value (that is, making a copy of a record value but with different values in one or two of its fields) is unnecessarily complicated and uncomposable. This makes modifying a subfield painful.
2. Field names are in the global namespace. Thus you cannot have e.g. a field named `map` (conflicts with the standard `map` function); nor a local variable named `owner` in the same scope that you use a field named `owner`; nor may two different record types both have a `name` field. C had this problem in the 70s (which is why e.g. struct tm has tm_sec and tm_min fields instead of sec and min fields), but they solved it a long time ago.
The solution to deficiency 1 is to use lenses. Use the lens package from Hackage, but don't read its documentation at first: it generalises the problem exceedingly well, but this makes it harder to understand at first glance. Instead seek out a basic tutorial. At the cost of a short line of boilerplate for each record type, this works well.
There is no satisfactory solution to deficiency 2. Some people define each record type in its own module, and import each module qualified. I don't think this scales well. I prefer to put a record-type prefix on each of my field names (i.e. the same thing C programmers were forced to do in the 70s).
Both are true, but I consider the second problem to be more like a trade off. In PHP I miss field extraction from an array so often; in Haskell it is just `map fieldName` and voila. Works just as well with namespaces -- I understand it does not scale when creating data types, but you normally don't import all of them at the same time into another module anyways.
There's currently a Google summer of code project implementing overloaded record fields (http://www.google-melange.com/gsoc/project/google/gsoc2013/a...) - which should kinda solve problem 2. You still wouldn't be able to define a field `map` for the same reasons you stated, but you would be able to have multiple record types sharing the same field names.
In my opinion the module system (http://www.haskell.org/onlinereport/modules.html) is a bit weak. For example: "It is not possible, however, to hide instance declarations in the Prelude. For example, one cannot define a new instance for Show Char."
Instances can't be explicitly imported either.
Another thing I don't like is if you have two different functions with the same signature but different implementations meant to give swappable functionality, there's no way of specifying that explicitly. As a user of a library, you just have to realize the functions can be swapped out with modules. For example:
When given the ability to qualify the instances you import you lose confluence of instance resolution. This is BAD.
We rely on confluence to enable us to move class constraints to where they are used rather than have to carry them around in every object.
Scala tries to get away with having the ability to explicitly pass dictionaries around and the result is frankly a muddled mess. Monad transformers wind up nigh unusable, sets can't use efficient hedge merges, etc.
Explicitly enumerating instance imports is one of those things that seems like a good idea until you actually explore its consequences.
I can't defend the ByteString API on the other hand. ;)
That said, I do not find it particularly "Haskelly".
> Another thing I don't like is if you have two different functions with the same signature but different implementations meant to give swappable functionality, there's no way of specifying that explicitly. As a user of a library, you just have to realize the functions can be swapped out with modules.
Haskell is possibly better in this regard than many other languages, as you can make existing functions become polymorphic without modifying them. (say for example, in most OOP languages, you would need to add an additional interface to each class where you want to use a method interchangeably, which you can't always do.)
In hask, you can just create a new typeclass for the particular functions you want to be swappable, and add an instance definition for each.
For me, the biggest downside is lack of solid embedded device support -- arduino (Atmel AVR), android(ARM), iOS (ARM).
After using Haskell pretty much full-time for 10 years, writing C and Java code makes me sad. The support for the above mentioned platforms is in-progress, but is not yet mature.
There are some neat things like Atom which use a Haskell DSL to target arduino.
My other issue is that the garbage collector in GHC is not really sufficient for real-time audio applications because it can pause for too long. GHC HQ has tried to tackle this in the past -- but there is a reason why it is a research topic :)
If your application requires interfacing to a C++ world -- your are not going to have fun. Though there might be a GSoC project for that this summer?
Also, GUI stuff is somewhat lackluster. There are bindings to gtk, etc. And they can get the job done. But they don't really capture the essence of what makes Haskell awesome. We are still searching for the GUI abstraction that really clicks.
The runtime is somewhat immature. It locks up oddly sometimes under heavy load. Dealing with latency and queuing issues around gc pauses is much less understood/documented than in the JVM world. The set of best practices in general for doing intense things with the ghc runtime is just still young and sparse.
STM can exhibit something that looks a hell of a lot like livelock.
Error handling is brutal. Catching all classes of exceptions (at the place you want to catch them!) for recovery is surprisingly tricky. This isn't necessary in theory with things like MaybeT, but in practice, lots of odd libraries use things like partial functions and the error function.
Not having tracebacks in production code is painful
The library community is thriving but it has a lot of volatility. Things break each other quite frequently. Semantic versioning either isn't enough to save it or hasn't been adhered to strictly enough.
Thunk leaks and other consequences of unexpected laziness aren't as common as people worry about, but they're kind of a pain to track down when they occur
Strict vs. Lazy bytestrings, String, Text, utf8-string, etc. You may find yourself doing a lot of string/bytestring type conversion
There's still wars raging about the right way to do efficient, safe I/O streams. Conduit vs. Enumerator vs. Pipes etc. They're all turning into pretty compelling projects, but the fact that there are N instead of 1 is sometimes a drag when you're dealing with libraries and dependencies.
There are not a lot of good open source "application server" type frameworks that really handle thread pooling, resource exhaustion, locking, logging, etc, in robust nice ways. We have one internally, and I'm sure a bunch of other haskell-using shops do too, but the ones on hackage are not nearly sophisticated enough (IMO) and I suspect not very battle tested against the kinds of ugly queuing problems you run into in highly loaded environments.
If I think of more, I'll add em... these are off the top of my head.
Let's begin by stating that Haskell is great, but there are a lot of stuff I don't like about it:
1. Way to many user defined operators. Haskell lets you define almost anything as an infix operator which library authors love to (ab)use. So you get operators like ".&&&." (without the quotes) because they are functions reminiscent of the boolean and operation.
2. But weirdly enough, many operators aren't generic. String concatenation is performed with "++" but addition with "+".
3. Incomplete and inconsistent prelude. It has unwords and words for splitting and joining a string on whitespace. But you dont get to specify what string to use as the delimiter like the join and split functions in other languages lets you do.
5. There are four different "stringish" types in Haskell: List, LazyList, ByteString, LazyByteString. A function like splitStringWith works on one of the types, but not the three others for which you need other functions. Some libraries expect Lists, other ByteStrings or LazyByteStrings so you have to keep converting your string to the different types.
6. Most Haskellers seem to content with just having type declarations as the api documentation. That's not a fault of Haskell per se, but imho a weakness in the Haskell community. For example, here is the documentation for the Data.Foldable module: http://hackage.haskell.org/p
7. This is very subjective and anecdotal but I've found the Haskell people to be less helpful to newbies than other programming groups.
> 5. There are four different "stringish" types in Haskell: List, LazyList, ByteString, LazyByteString. A function like splitStringWith works on one of the types, but not the three others for which you need other functions. Some libraries expect Lists, other ByteStrings or LazyByteStrings so you have to keep converting your string to the different types.
I guess you mean String, ByteString, Lazy ByteString, Text, Lazy Text?
> 6. Most Haskellers seem to content with just having type declarations as the api documentation. That's not a fault of Haskell per se, but imho a weakness in the Haskell community. For example, here is the documentation for the Data.Foldable module: http://hackage.haskell.org/p ackages/archive/base/latest/doc/html/Data-Foldable.html
But there's more documentation than just type signatures on that page! Anyway, being able to rely on type signatures as documentation is testament to the expressivity of the type system, encourages small reusable combinators, and is a great strength, in my opinion.
> 7. This is very subjective and anecdotal but I've found the Haskell people to be less helpful to newbies than other programming groups.
Strange, I've always heard they're one of the communities most helpful to newbies. I don't have any evidence one way or other though.
Regarding 4, the split package is the widely accepted standard for splitting lists in various ways and you will likely already have it installed since quite a few packages (142 to be exact) depend on it.
Regarding 5, I have never heard of LazyLists, nor can I find them on Hackage, nor does that name make any sense since normal (:,) lists are already as lazy as it gets. (Lazy) ByteStrings are, as the name indicates, more for raw binary data, but can also be used for text . The modern standard for strings is Text.
The OverloadedStrings extension makes it considerably easier to work with literals of the differnt string-like types.
Regarding 6, I find documentation in Hackage packages quite good. Your example, Foldable, not only has a decent explanation of each function in the typeclass definition, but there are also good tutorials (e.g. Typeclassopedia) since it is definitely a non-trivial typeclass.
> 2. But weirdly enough, many operators aren't generic. String concatenation is performed with "++" but addition with "+".
Some see concatenation as addition, others as composition. Funnily enough, in Haskell addition is reserved to numbers and composition to functions. I'm guessing that's why "(++)" was chosen for lists, with strings as a particular case.
> 2. But weirdly enough, many operators aren't generic. String concatenation is performed with "++" but addition with "+".
Concatenation and addition don't share the same algebraic properties (concatenation is not commutative). Haskell has a mathy bias (for example see the monoid typeclass, which I guess you can use if you want to use the same operator for concatenation and for addition) so this choice isn't very surprising.
Besides, it seems like Haskell chooses to have more specialized functions/operators in Prelude while their generalizations are reserved for other modules. For example "map" for lists, "composition" (.) for functions, but "fmap" for functors in general.
1. What library are you complaining about? Haskell has a pretty small, well established set of operators. If a library is defining more operators than you like, then don't use it. That is no different than a library defining functions with names you don't like, it happens in every language.
2. Why on earth would string concatenation be the same operator as addition? That isn't the same operation. I want to know if I used + that I have to have gotten a number, or else it won't compile. Getting a string would be very unhelpful. If you just want any monoid, then there is a generic operator for that, it is <> or mappend.
5. The different types are for different purposes. String is a list of chars. You use normal list functions on it. Bytestring is not a string, it is an efficient container for storing bytes. It is used for things like networking. Text is an encoding aware, efficient representation of strings. Use this for text data. There is an IsString typeclass for generic functions that operate on any type that can behave like a string.
7. I've always heard the exact opposite from everyone, and my experience has also been the opposite of yours. The only other computer related community I have seen that is as helpful as haskell is postgresql.
1. HXT, Lens, Arrows.. etc. I think (as in, this is my opinion, someone else may think that the amount of operators they contain are reasonable) they introduce way to many infix operators. They are also pretty important to the Haskell eco-system, so you can't just choose alternatives with less "line-noise."
2. I think the question should be why on earth should they not use the same operator symbol? In most other mainstream languages they are. I know the answer is "because [Char] isn't part of the Number typeclass" but that misses the point.
5. Both Text.Regex and the split library someone else mentioned operators on strings ([Char]). Data.Aeson uses ByteStrings. So not everything in Haskell-land uses Text when they should which leads to lots of annoying string packing and unpacking. Contrast this with Python (3) where you have (unicode) strings and (byte) buffers. Only. If you want more efficiency there are 3rd party lazy implementations of them, but they aren't forced upon you like in Haskell.
1. I don't think they define too many specialized operators personally; Haskell's heritage is more "mathematical" than many other languages so it makes sense that many variables are single letters and operators are defined with symbols instead of reallyLongAndDescriptiveName. It's part of learning the idioms of the language.
2. They should never use the same symbol. As the parent said, if you want the generalized notion of a monoid then use that (<> or mappend), otherwise each library should not be conflating its instance of a TypeClass with other libraries' instances of a TypeClass - it is confusing and you end up with stuff like:
import Some.Lib as L
import Some.Other.Lib as OL
(L.+) 2 3
(OL.+) "2" "3"
(Sum 2) <> (Sum 3)
"2" <> "3"
Which is inevitable for some libraries (Fay for example re-defines a lot of Prelude functions) and sometimes accidental; but it is confusing and bad design unless you define a way to generalize it (Monoids!).
5. Data.Aeson uses ByteStrings for the JSON serialization, when encoding/decoding you provide your own instance defining the Types (can be Text/String/ByteString) so I don't understand your gripe here; Text.Regex shouldn't be used for Data.Text, that's what this package is for: http://hackage.haskell.org/packages/archive/text-icu/0.6.3.5...
When you want to encode and decode JSON data of unknown format then you need to use the encode and decode functions directly which operates on ByteStrings. You really don't want to create a new record for each possible kind of JSON data container you have.
text-icu is an experimental 3rd party binding and doesn't work like pcre anyway. It's not something you would use over Haskell's standard library regexp support, so you still have to deal with X number of different string types.
True re: encoding or decoding; but it does make sense to me to keep it in ByteStrings as that is an efficient representation - from ByteStrings you can do what you want with it (convert it to Text).
I personally haven't used the Text.ICU package but I know many, big, packages that retain the "experimental" flag. So that shouldn't necessarily deter usage; just because it's 3rd party doesn't mean it's bad either - many fundamental components of the Haskell ecosystem are 3rd party! There are also many libraries that implement bindings - I don't necessarily see why that is a bad thing either; obviously a pure Haskell approach would be nice but in some cases it does defeat the purpose to reinvent something.
What I do agree with you on, though, is the lack of clarity in what should be used. I think the Haskell Platform is a good start in that direction but there are few docs written on "this is how you deal with strings and unicode in Haskell and all the libraries we recommend for it".
The good news is, this language has the capability to serve both the research needs of academics and the practical needs of implementers; so uncovering this stuff is very good.
2. Enough languages don't conflate ring addition and string concatenation (in TIOBE order: C, Obj-C, PHP, Visual Basic (somewhat), Lisp, Ada, MATLAB, Lua). Furthermore, there are at least two "sensible" monoids on numbers (sum and product), but just one for strings (concatenation). `+` implies something more specific than a generic monoidal concatenation operator. Though I think `++` was a poor choice, one of many in the standard Prelude.
1. I would not say that HXT is important to the Haskell ecosystem. Lens, yes. Arrows, maybe. The lens library has a lot of infix operators, but they were carefully designed in a structured way. Once you understand the naming patterns you're good to go, and you will be able to figure out the operators you need by construction according to the convention.
1. I don't know that I would consider any of the 3 you mentioned as being so important to the haskell eco-system. Does anyone use Arrows? Lens is new, and there is widespread "whoa that is way too many operators dude" reaction to it. It is not widely used yet at all.
2. Because they are not the same operation. The fact that some languages make them the same operator is not an indication that doing the wrong thing is good. And I do not believe most mainstream languages make that mistake. Again, there is a name for the general concept of "things with an identity and some form of append operation". The name is monoid, it defines the exact operator you want: <>
5. I don't see the problem here. Of course aeson encodes to and from bytestrings, json specifies an encoding already. The entire purpose is to take data, and encode it as a bunch of bytes. You shouldn't be trying to split encoded json as if it were a string, so what is the issue? Your contrasting it with python weakens your point, as it makes it clear that haskell only has one more option than python does.
Depends on what you are doing. The library eco-system used to be a weak link in Haskell, but I see it improving. To clarify, there were (and still are) a lot of broken and/or poorly documented and/or unmaintained libraries on Hackage. Or several libraries for doing the same thing where there is no indication of which library is the best choice. I suspect that is, to some degree, the case in any open-source eco-system, thought. Recently, though, thanks to the effort of the giants like Edward Kmett there have been an influx of great well-documented libraries on Hackage. And of course, you are welcome to contribute new packages/improvements to existing packages.
Working with DBs is easy, especially if you use HaskellDB. There are bindings for non-relational DBs, as well as a DB written in Haskell (acid-state).
As for the language itself, you might find it tricky to develop computation intensive applications with large run-time data-sets due to garbage collection (but that is true for any garbage collected language). Other than that, it's one of the best performing languages in the Debian PL shootout. And the fact that concurrency is (comparatively) easy means you can make use of those extra cores.
Monad transformers and monads are fine, you just need to learn how to use them.
To sum up: it depends on what you do and what you consider a "real world application". Might be a good idea to elaborate. For example, are compilers, games, web apps, automated trading systems, android apps considered "real world"? Because any of these has been done in Haskell.
I mentioned real world app to mean "not a toy project".
That is, i meant a reasonably large, structured, maintainable code base. I am thinking of haskell as the language to use for a new project, and am interested to know more about the potential problems and downsides i should be aware of. Also are there situations where one should absolutely avoid haskell?
Yes, I think you should avoid Haskell (and any language with managed memory) on embedded systems or in very performance critical applications. Beyond that, it's going to be a choice of whether there are enough well-supported libraries that help your cause versus some other language. It would help to know in what domain is your new project is going to be.
You've mentioned web apps, so, to be specific, I think the Haskell web app frameworks (Happstack, Yesod and Snap) are mature. There aren't nearly as many utility libraries, as there are, say, for Rails. But that, in my opinion, is compensated by greater correctness guarantees and performance.
I'd encourage you to join the haskell-cafe  mailing list: it's a great place to get help if you get stuck.
I think that the larger your project gets, the more Haskell will prove to be a win compared to other languages. The value of Haskell's type system in aiding the management and maintenance of large codebases is difficult to overstate.
We wrote our RF radio mesh coordinator software in Haskell, and it's been a great success. Working with binary data formats (various building control protocols) in Haskell is the kind of thing that spoils you forever.
The one issue I've run into is that ghc can't cross compile. If you want to run your code on ARM, you have to compile an ARM version of ghc (QEMU comes in handy here).
Honestly I think the biggest down side is that there's not enough commercial endeavors using Haskell, and thus theres horrifyingly few people working full time on many core pieces of the ecosystem. Yes, my biggest critique is that all the great stuff in the Haskell ecosystem is the result of a small collection of smart folks helping out in their spare time.
It makes me wondering what magic would happen when those folks can work on helping the ecosystem full time!
I have to say that one of my favorite things currently about haskell is how nice and easy the c ffi is to use. So darn simple! (I'm also GSOC mentoring some work to provide a nice C++ ffi tool too).
Theres so many great tools in the Haskell ecosystem, for every problem domain. Its not perfect, and theres always room for more improvement, but those improvements are happening, and the more people invest in supporting the community, the more those improvements happen!
For example, one thing i'll be exploring in the neat future is how to do good Numa locality aware scheduling of parallel computation. It looks like i might be able to safely hack support in via a user land scheduler (though i'll find out once i get there).
My principal work right now is building numerical computing / data analysis tools, and some of the things I'm doing now would be simply intractable in another language.
I found it a lot slower than more imperitive style languages. I've been writing in c-like languages for something like 30 years, so I suppose that's not unexpected. I found that the type system sometimes got in my way. And combining monads (even with monad transformers) was also a faff. In the end I suppose it depends what you're trying to make. I think Haskell is great for DSL applications and less so for things like web dev. Though that being said Yesod is a pretty nice framework.
The thing is, if you feel like the type system is "getting in your way" you have to step back and ask, "do I really understand what I'm achieving here?" Of course, it's very easy to feel like you know what you want and you can't figure out the types for it, but you have to realize, if the type check fails then what you asked for just doesn't make sense, like adding an Int to an (IO Int).
The whole point of the type system is that it substitutes compile-time errors (that are admittedly esoteric) for obscure run-time bugs that might not even ever show up except on that one person's machine and use-case.
But yeah, it's never fun to see a screenful of type errors.
Monad transformers are just monads wrapping other monads. I actually like it and think it's a really clean way to organize your state, IO, etc. For example, you could have a Reader wrapping a Writer wrapping a State wrapping IO. The reader is your configs, the writer is for error logging, the state is for some application map, and the IO is for doing networking or something. That's an extreme case but it's nice to lay it out in that way.
That's not an extreme case at all, I'd even argue RWS(T) (or its 'manual' version) is a very common stack.
As an example, I use RWST in Kontiki, my implementation of the Raft consensus protocol, where the R part contains configuration (e.g. the set of nodes in the cluster), the W part is filled with 'commands' to be executed after an FSM transition as a reaction on some event (incoming message, timeout,...), where 'commands' are things like 'send this message to node N', 'broadcast this message to all nodes', 'append this entry to the replicated log' or 'log this message (for debugging)', and finally the S part provides access to the current state.
Every FSM transition returns the new state, and a list of commands to execute (& the updated state of the S part, but that's discarded).
All you need to provide is an 'interpreter' for the commands.
This approach makes testing very easy: I can start at any desired state and event (that's just pure data), run the FSM on that (which is also a pure operation, although that's slightly complicated due to the MonadLog implementation which should underlie RWST, but that's besides this discussion), and check whether the output state & commands are the ones I expect, without mocking any networking or intercepting other IO.
Records are annoying, but can be worked around. The compiler is incredibly slow and uses tons of RAM (I need 2GB to compile my simple little web app for example). The slow compile times can start to really kill productivity on large projects. The web frameworks are all pretty focused on trying to reproduce industry worst practices rather than doing things right, so if you are doing web development and you don't want your app to be a mess, you are kinda on your own. That's pretty much it.
edit: to clarify on the web thing, when I say "on your own" I mean you won't be able to get much from existing tutorials and examples since you will want to do everything differently. Not that you will have to write your own framework.
The only Haskell web application framework I feel is well-designed is Snapframework. My experiences with Yesod echo your sentiment and Happstack doesn't quite cut it. Scotty is too low-level (from my cursory investigation).
The compiler is slow, yes, but it is also doing a lot of work for you; the benefits of using Haskell outweigh the time to compile, for me personally.
It seems like snap is headed in the yesod direction. By that I mean the "faithfully replicate the mistakes of php/mysql worst practices as interpreted by rails", not the "conflate type safety with DSLs". Happstack doesn't appear to be overly damaged by phpisms, but happstack-foundation isn't useful for a lot of people because it uses acid-state for storage. Also, HSP is terrible and using it should be a criminal offence. So you basically have to build your own collection of stuff as you go on top of snap or happstack-server and the various glue packages for happstack.
I don't buy the "but the compiler is working hard" excuses for ghc's horrible performance. Ocaml is a similar language, and it compiles similar projects in 1/10th the time, and in 1/20th the RAM. It is a huge issue on large projects, waiting 30 seconds for a single line change to compile is brutal, and that 30 seconds ends up being 5 minutes because you get distracted while waiting.
> faithfully replicate the mistakes of php/mysql worst practices as interpreted by rails
I assume you're talking about things like snap-extras and restful-snap. Those libraries are not a part of the Snap Framework. They are simply libraries that I and my coworkers are using to help us ship apps more quickly. We put them on hackage because we thought others might be able to benefit from them. They are still very young and it is uncertain what direction they will finally take. The core framework still consists of just snap-core, snap-server, heist, and snap.
If you as a Snap user still think this is a sign that the framework is going in the wrong direction, we'd love to get your input and involvement.
There's also snap-app http://hackage.haskell.org/package/snap-app which I'm using on hpaste, ircbrowse and haskellnews =p Which sort of backs up his point about making your own utility libraries on top of snap. Although this was implemented for hpaste years ago before snap got all that weird snaplet lens stuff, and I copy-pasted it for ircbrowse and ended up librifying it so I could cabal install it on my different projects.
OCaml is not similar at all. OCaml for better or worse is much closer to compiled form than Haskell code. Haskell requires much more analysis and transformation than OCaml during compilation in order to achieve the performance of GHC compiled code.
You are going to have to be detailed and specific if you want that statement to be taken seriously. Ocaml is very obviously quite similar to haskell. What specifically is so much more work about compiling haskell? And why does only GHC suffer from this extra work while other compilers were able to compile haskell code in reasonable time and space?
The main problem is laziness and to some extent purity. And partly GHC is a bit of a pig.
Pretty much everything starts out life as a thunk, and the compiler does its best to translate these into strict values which affects when things are evaluated and therefore the performance, whether things can be stored on the stack fully evaluated, as registers or boxed things on the heap that need forcing, etc.
The good news is that purity makes it significantly easier to do inlining and reducings/transformations bordering on whole-program compilation. Code that looks like a load of function calls usually gets compiled down (if you use the -fext-core flag) to a couple case expressions.
You have to do this to get reasonable, baseline, performance in Haskell. In OCaml—or ML, or any strict, mutating language—you get that for free, as a starting point. They mirror the architecture underneath so they're already fast from the get-go.
It's not to do with type checking or anything like that. Type checking with GHC is very fast. If you use -fno-code, it can compile, e.g. a 9Kloc codebase of 65 modules in 1.361s on my machine. However, if you enable code generation, it takes 9.532s and that's with -O0: with -O2 it takes 32.910s. Fay, my unoptimized, non-optimizing Haskell→JS compiler will codegen that codebase in 1.940s. It would do it in less if I optimized the compiler's parser and codegen. But it's fast because I don't do anything. I don't do any flow analysis or strictness analysis or simplification or generation of core or compiling to C--. Fay's generated JS is not fast as a result. Compare that to if I were writing a Lisp compiler to JS, which would be easy because the source and the target are basically the same execution model.
I understand optimizing haskell is a harder problem, but I don't mean producing optimized binaries, I only need to do that once in a while. I have to compile it thousands of times while developing, and turn off optimizations there since it is just a waste of time. But non-optimizing ghc is still incredibly slow. Wasn't yhc much faster than ghc back when it existed? Surely it must be possible to make ghc fast enough that development isn't seriously hindered?
I dunno about Yhc. Jhc, at least, is slower than GHC. It depends on the settings and the project in my experience.
If you're on a project like a Yesod app, then you have to conservatively rebuild a lot more than you would normally because TH doesn't lend itself well to incremental compilation and the compiling itself is slower too. Depending on the size of the project you're facing 5-15-30 seconds of waiting. That is actually monstrous and I hate it as much as you do, I cannot stand waiting for my app to rebuild, I turn to reddit, or YouTube. The more time waiting for feedback the less interactive and less enjoyable a programming experience is for me. So I think we're on the same page on that.
On the other hand, if you're just using normal Haskell code with reasonably modularized file structure, GHC will be able to rebuild incrementally and link just that module quickly. E.g. if it's a web app, you can have MyProject.View.Person and just change something in a blaze-html template and hit F12 in your editor (as I do) and it'll rebuild and restart in a second. I mean, to rebuild hpaste entirely from scratch takes 5 seconds:
$ time cabal build --ghc-options='-fforce-recomp -O0'
Linking dist/build/hpaste/hpaste ...
Which is great. hpaste is only 4kloc, but your average codebase is between 3 and 15. In a normal development cycle you're changing just one or two modules at once, so with incremental compilation the refresh cycle is very fast. If you just want to type-check there is the -fno-code flag which brings it down to 1.2s to typecheck the whole project.
Another approach I've taken with an IRC server (hulk) is running it inside GHCi. That works surprisingly well. Especially if you have your run function return the state as a mutable reference, then you can take a look at it while it's running and update it. Here's hpaste running in GHCi:
λ> :set args "hpaste.conf"
λ> tid <- forkIO main
Listening on http://0.0.0.0:1234/
USER hpaste * * *
λ> :t tid
tid :: ThreadId
λ> killThread tid
λ> tid <- forkIO main
Listening on http://0.0.0.0:1234/
USER hpaste * * *
It's currently running on http://chrisdone.com:1234/ (I'll take it down later). Doesn't seem slow! Updating the code takes some milliseconds with :r and restarting is a case of killThread tid and fork again, some other milliseconds.
So yeah, I feel you on dev time. I'm more inclined to the approaches that lead to more immediate development cycles, can't stand waiting. I'm Emacser/ex Common Lisper. GHC could be faster, but there are definitely circumstances which exacerbate its performance, I reckon, and ways to combat it (as above).
I know that's not a very good answer, more of a workaround than a reason. I don't know why GHC is particularly slow if it's a lot slower than Yhc was. Other than all its separate build steps, I get the impression from what people say that it's just mounted up and become quite hairy. And memory usage has never been much of a concern for the compiler. That's a pity and I'm all for work being done on improving its performance.
How is laziness making ghc slow? Completely strict code doesn't compile any faster, and writing completely lazy ocaml code doesn't compile any slower. How would purity make compilation slower? I want specifics here because it is a specific, factual claim. My opinion on frameworks was left somewhat vague as it is simply my opinion and I had no reason to think anyone wanted more specifics on it. I'm happy to be more specific if you like, what did you want to know?
You might also take a look at GRIN. Here's a quote from the GRIN paper:
"For a lazy language like Haskell, compilers typically compile one module at a time. At first sight, this might appear as a good opportunity to optimize several procedures at once. However, it seems as if this does not apply very well to low level optimizations, like those presented in this paper, where the actual dynamic control flow is important... In a lazy language, a function that is local to a module in the source code might very well escape from the module at run time (if it is built into a closure) and then be called from somewhere else."
I'm very interested in your thoughts on frameworks, but this isn't really the right forum. It's usually pretty easy to catch me in #snapframework on IRC.
Unfortunately, I don't know of anything. http://database-programmer.blogspot.ca/ is good for learning about how to use a database as a database instead of a persistent hash table, but it doesn't really address the why of it so much.