Hacker News new | past | comments | ask | show | jobs | submit login
How we secretly introduced Haskell and got away with it (channable.com)
313 points by philippelh on March 3, 2017 | hide | past | favorite | 182 comments

In the 1990s I did research on the efficacy claims of object oriented programming versus procedural programming. This article bares a striking resemblance in the claims. Case study after case study showed that object oriented code had less bugs due to compilers catching bugs, etc. However, almost every study was similar to this report: it was a re-write from procedural to object-oriented. There exists strong evidence to suggest that throwing away ones first prototype produces the same results as to benefits comparing object-oriented versus procedural programming. The conclusion of my research was the same as others: unless one is comparing the same project written from scratch without any sharing of design or code then these studies claims are correlation, not causation.

There's a lot of truth in that. A few days ago I wrote a comment about rewriting a program from C++ to C and coming in at about 1/4th the code size. However, I didn't mention C or C++ in that comment because the real savings came from reevaluating requirements, lessons learned, refactoring, et al.

Some time back, someone here on HN posted a comment about rewriting a Java app in... Java. Same basic story of large savings, without changing language at all.

Studying the actual effects of language is hard (time/effort/budget beyond constraints), so people don't do it.

I don't know how I came across it, but Thomson's Rule for First-Time Telescope Makers seems relevant here:

: "It is faster to make a four-inch mirror then a six-inch mirror than to make a six-inch mirror."


There is another aspect. Changing languages. Rewriting in a language that is sufficiently different from the original forces you to look at the problem with new eyes. You effectively have greater mental coverage of the problem domain.

Articles like: We switched from language X to language Y with the conclusion that language Y is better smile

My favorite is when they make another post half a year later saying they changed from Y to Z.

It seems that following the trend and using the latest and greatest tools doesn't necessary mean the new tools are better, but that you got to rewrite your applications from scratch, with much more knowledge about the problem domain then before.

> My favorite is when they make another post half a year later saying they changed from Y to Z.

Sometimes I wonder if the real subtext is "we have such high turnover that almost nobody was around when the first system was designed. We rewrote it in a new language, and now we're all much happier because we all understand it much better, having been involved in its design." Wait a year, repeat.

Being a hold over employee through 4 rewrites I'd say that can be one factor. Another is that the business landscape changes too. So all these factors combined make it difficult to say for certain which of them had an objective impact.

And they inevitably fail to mention the pissed-off product or QA manager or customer who realizes their favourite feature or corner case or bugfix is missing from the rewritten version because the developer didn't understand or notice that aspect of the original.

Instead of blaming the developers every time a feature is missing, consider who is setting the priorities for development. In places where developers are being paid for their work, this is rarely done by the developers themselves. Always interesting to see how much end users specifically blame developers for whatever's going on, I guess without understanding that there are usually all kinds of managers who may not understand software or the users or both but just want to sit on developers to crank features out as quickly as possible.

Now someone has to sound exasperated at everyone jumping to blame managers.

> It hardly seems worth even having a bug system if the frequency of from-scratch rewrites always outstrips the pace of bug fixing. Why not be honest and resign yourself to the fact that version 0.8 is followed by version 0.8, which is then followed by version 0.8?

LinkedIn: Our iOS app is web-based! Simple and fast! We save money!

2 years later...

LinkedIn: Our iOS app is native! Simple and fast! We save money!

This is so true. I usually write things three times: the first time to get a feel for the space, the second to solve the problem at hand. This then gives me actual understanding and them I'm ready to really solve the problem, usually with a fraction of the memory requirements and one or more orders of speed gain and less code to boot.

Very rarely do I hit anything near to an optimal solution the first time out of the gate.


1st iteration: 'does it' (barely) but kinda looks hacky 2cnd iteration: better. actually works. 3rd iteration: looks clean, concise, pro, something to be proud of!

3rd time is a charm in software.

It's unbelievable how much clearer and more concise the code is 3rd time around.

If you can ever actually get away with this on a project ... highly recommended.

I would love to have the opportunity to do this is my working life projects as most of the time while I'm doing iteration1, I have various others breathing down my neck about getting the next list of tasks in the sprint done as well. Time, and keeping others away is my hurdle.

The interesting thing is that in the longer term the third iteration will be the winner because it is more maintainable and most likely much easier to understand.

So, ship version one, early, and figure out whether it's worth keeping at all, or whether to kill it and not waste time on versions 2 and 3.

Given the awfulness of so many OOP-based frameworks that I encountered in the 90s, I began to seriously consider that OOP wasn't a better paradigm for most things and that instead it was worse, and simply caused people to rewrite code to the point that it overcame the procrustean bed of inheritance and whatever other tools the language of choice was providing.

It is interesting that Go and Rust eschewed OOP inheritance for simple encapsulation. Information hiding, not implementation inheritance, turned out to be the big practical benefit of OOP. Inheritance leads to fragile code because it allows unseen dependencies on private implementation details in far-flung classes and files.

Many C++ game engines also moved away from huge class hierarchies of game objects to property-based game objects that were collections of behavior handlers.

There are many calls for, and proposals to include, some form of inheritance in Rust. Among other things, people have found that Rust is poorly suited to modeling HTML/XML hierarchies, retained mode GUIs, industrial simulations, code specialization, and some forms of video games. Inheritance is a missed feature.

What is clear to me is that Rust does incredibly well without it, but wouldn't be hurt by having it. Most of the egregious sins of OOP are mitigated by having far better abstractions available in Rust. ML-style modules, ADTs with pattern matching, type classes, first class functions, etc. Much like scala, if you give them capable FP, OOP stops being abused and starts being used appropriately.

Aggregations and interfaces cover 99%of what you need. Implementation inheritance and virtual variation points create too tight coupling to be workable for evolving systems.

Maybe 99% of what you need, but more like 70% of what I need. I'm not sure what you mean by aggregations, but I design industrial simulations relatively frequently and I've seen the suggestions for replacing inheritance with composed typeclasses and they are not even close to a true alternative. Rust's current troubles with GUI development would suggest the same pattern there too.

Can you elaborate on that? What's an industrial simulation (compared to a "regular" simulation) and why do you like to solve it with inheritance?

An industrial simulation, in my context, is modeling the stateful flow of things through industrial processes. It typically involves millions of instances of thousands of types. There are typically much fewer very well defined processes, most of which would be trivial to implement with a simple impl trait with default methods. But some require stateful members in order to implement a default behavior, but Rust traits do not have fields/members, just functions.

Rust traits and Scala traits (or abstract classes) are more or less equivalent with this one exception. I was just pointed to a proposal that would allow for this in Rust, but it is not yet implemented [0]. Essentially, you would be able to define type members, and then "link" them to a member of the struct/enum that is being "impl"-ed. While ergonomically not quite the same as scala, it is effectively the same as inheriting an abstract class with a constructor parameter.

[0] https://github.com/rust-lang/rfcs/pull/1546

There was an interesting talk about industrial simulations at the International Conference on Functional Programming this year.

(The gist was that avoiding mutation and keeping everything purely functional allowed them to roll back and forth in time with no problem, or try out multiple different futures. That's somewhat orthogonal to inheritance.)

It's the same story in Java - inheritance used to be the first sledgehammer you reached for whenever you saw a nut, but these days it's barely used. Effective Java, the definitive text on the proper way to write Java, tells you to avoid inheritance [1], and that was published in 2008.

[1] https://books.google.co.uk/books?id=ka2VUBqHiWkC&lpg=PA81&ot...

My personal suspicion is that the problem there wasn't OOP, it was frameworks.

… and dogma. There was a lot of blather about hour Everyone Serious needed this complex structure and a lot of people superstitiously followed it without asking whether e.g. advice from a 2k engineer megaproject in a different industry was applicable to their 3 person in-house app.

The modern counterpart is probably scalability — I see a lot of parallels in all of the Google/Facebook-envy applied to “big data” problems which can fit on an iPad.

I think OOP will always run into the problem that it requires a taxonomical theory about your problem, which never holds up in reality. (This is just another way of saying, inheritance has to be a forest/bundle of disjoint trees. Yes, there's multiple inheritance, but I'm pretty sure acknowledging that is a mortal sin among OO types.) Haskell is slightly better, since typeclasses are less topologically constrained, but I suspect requiring ANY categorical theory about your problem is going to create mischief in the long run.

The problem is people not using abstract taxonomies.

If people keep trying to force code organization around the business jargon, they will keep getting the same awful result. It does not matter if they are writing OOP, Abstract Data Types, FP, or direct bits manipulation with assembly.

But then you end up growing your taxonomy ad-hoc. Also, Python does multiple inheritance sanely, and so does Eiffel.

Python might do nearly as well as any language could do with multiple inheritance, but "sanely" is relative, like finding the sanest way to drive two cars at once.

Multiple inheritance is perfectly sane, what isn't (or at least is less) sane is implicit invocation of methods defined in interfaces (and inheritance from a parent class is, among other things, taking an interface with a default implementation.)

Unfortunately, explicit interfaces were a very late idea (at least in terms of implementation in a major language, not sure if the idea was around earlier) in OOP implementation, and wasn't on v1 of any major language as a core approach, so we've got a bunch of languages where we accept multiple inheritance being messy or we don't have MI at all to avoid it being messy.

But if you had exclusively explicit access to interface (including inherited) methods, MI would be clean.

Yeah, pretty much everyone and his brother had a framework:

- Apple, MacApp

- Microsoft: MFC, and others

- Borland: I forget what they called their stuff

- Anyone remember Taligent? Apple, HP and IBM, all getting together in a money-burning party...

- NeXT / Apple revival: NextStep / OpenStep, etc.

- Any number of minor players, including folks with Honest to Goodness Smalltalk implementations (none of which have survived to this day, I believe)

- Java stuff that I have mercifully forgotten

... they were nearly all crazy, and nothing was portable. So much for the promise of OOP :-)

That's all well and good except for the fact that your standards are impossible to reach. Even if I do write the exact same project with two different languages, there will always be differences in either the skill of the teams or, if the teams are the same, the amount of experience the team has when they tackle the project.

What we can do is compare similar types of projects in different languages. And the things we can say there are pretty significant. For instance, at my last job using Angular we experienced a particular bug in production a couple times. In my current job our frontend is written in Haskell. I don't make definitive statements that often, but in this case I can definitively say that there is a ZERO chance that bug will happen in our codebase. I can say that because the type system guarantees that that class of bug can't possibly happen.

It seems pretty easy to control for this, just write the functional version first, and if it is as big free your results obviously sidestep this criticism.

Not really. Even then you'll get criticism that the teams were more skilled in one language, or that they didn't use the best practices of the other language. Also, even if you could manage to normalize for all these variables. You're still only going to be comparing the cost to create the project. You're never going to get someone to maintain the two projects side-by-side for years and compare the total maintenance costs. That is where the biggest benefit of languages like Haskell comes.

The fact of the matter is there never will be absolute proof of these questions. The naysayers will always have straws to grasp at.

Doing a rewrite in a different language and justifying language choice by a successful outcome of the rewrite is grasping at straws to justify the language change.

Every time I've been involved in a rewrite, including ones using the same language before/after, the outcome has been good. The act of doing a full rewrite is where the benefit comes from, it's hard to separate that from a language switch.

It wasn't a rewrite. It was two separate web apps. Type safety categorically prevents classes of bugs that I've seen in real Javascript apps on multiple occasions. Haskell frontend apps that I worked on both before and after the JS app never suffered from that class of bug.

On a political level, it may often be easier to justify a rewrite which switches languages to take advantage of bullet point features (promotional claims like 10x faster, totally secure, no more bugs, etc.) than it is to simply ask for time to do a rewrite, which will make many average managers start fantasizing about firing you. If you write a blog post afterward, that seems to make the company look good.

I'm not sure what you're getting at. He's saying Haskell's type safety makes impossible the bug that was in his JavaScript project. As far as I can tell, it's impossible to write a type safe system in pure JavaScript.

edit: To clarify this thought a bit, obviously you can transpile code in a type safe language to JavaScript. At the abstraction level of the code written, the project is type safe. The generated JavaScript itself is not; however this isn't a problem if the transpiler is correct. The same principle applies to Haskell compiling to non-typesafe machine code.

If Haskell is better than Python, then writing the first version in Haskell and the second version in Python should yield two versions that are closer in quality than if you did the reverse order. Assuming you can measure quality, this is a totally valid hypothesis to test!

Additional hypothesis: the result of this test depends strongly on which one you are more familiar with and which one you prefer. If I hate writing Cobol, I'm going to spend that part of the test pining for the other language and not using whatever the up-to-date idioms are for making Cobol manageable, but just fighting Cobol until I can switch to the language I really wanted to use.

So, anyone doing this should consider trying to get a sample of people who are more or less indifferent between the options, so they don't sabotage the test

> That's all well and good except for the fact that your standards are impossible to reach.

Right, yes. Perhaps this entire exercise is a complete waste of time. Perhaps we should be investing in skilled people who understand their problem domain and then just trust them to do the best they can, rather than trying to find silver bullets inside programming languages.

If it was a complete waste of time, we might as well write our web apps in machine language. There's never been a side-by-side study comparing machine language to modern languages that meets the scientific bar put forth. So even though you can't definitively prove causation, the history of programming languages has demonstrated that multiple substantial advances have absolutely happened. There's no reason to think there can't be more.

How about let's do both. Both are valid, especially when there are silver bullets like type safety available. Hire the best domain experts and teach them best tools.

To be clear, our new application is better (faster, scales better, more maintainable), for the most part because of the improved architecture.

As we were going to do the rewrite anyway, we considered more alternatives than just Python. For programs that have to be reliable and maintainable, I prefer a language with a strong static type system. Haskell has that, and it has a few practical benefits over other tools that made it an excellent choice for our use case.

I remember just three things about the book "Show Stopper!: The Breakneck Race to Create Windows NT and the Next Generation at Microsoft" (1994): the death march, that they tested the OS by writing a file over and over (!) and that they thought the graphic part written in the new object-oriented C++ would save them a lot of time (it didn't)

Yeah, fascinating book; weren't the graphics issues due in part to the lack of maturity of C++ tools at the time though? (I don't remember exactly because it has been a while since I read it)

Sounds right. Additionally, they didn't even reimplement what they had, the article gives the impression that they basically gave up on trying to reinvent relational query optimization ("We heavily modified pyDatalog to query Redis as its knowledge base") and focused on just solving the problem at hand.

Wouldn't a complete rewrite of a module usually be a lot easier if the program is procedural?

I think that is one of the big factors stopping that from actually happening with OO code. A rewrite is a lot harder if you tap into what exists elsewhere in the codebase.

I bet you have a lot more insight into this than me though.

I don't really think so. With OOP you can easily replace whatever part of the system you want. Obviously if you have written OO code, not if you have written a mixed procedural-spaghetti code in an OO language. You "just" write an implementation for your interfaces that makes all the unit, integration and acceptance tests green. With procedural programming you have no interfaces and more importantly testing was non-existent. Seriously, I thought that in 2017 the advantages of OOP over procedural were obvious, I fell like that I have time-traveled to 15-20 years ago where it was still debatable.

Prodcedural code can also be well compartmentalized and tested.

I was also until recently quite convinced OOP was the only way to go, but I'm seeing signs everywhere that a lot of the design-problems I've met over the past few years are at least magnified by OOP.

The abstraction promised by OOP is a good thing, however very few people are able to consistently make good, reusable and maintained abstractions. To the point where it becomes a weakness rather than a strength. I don't want a billion wrapper-objects that obfuscate my code, and makes the surface area of it bigger than it has to be. Often I struggle understanding code more because of how it was separated, than because of the complexity of what it actually does.

I liked Rich Hickeys talk "Simple made Easy" [1] and "Brian Will: Why OOP is Bad" [2]

[1]: https://www.youtube.com/watch?v=rI8tNMsozo0 [2]: https://www.youtube.com/watch?v=QM1iUe6IofM

I'm not so sure about that. I think a complete rewrite is easier if you correctly separated concerns and compartmentalized needs. OOP is supposed to enforce/encourage that, but it's fairly easy to not treat your objects as black boxes with APIs, and then you've put constraints on replacing a component. Procedural code doesn't necessarily tout it's ability to encourage that, but well planned and implemented functions can give you the same benefit.

In the end, it's all up the the programmer.

Absolutely! Thanks for standing up for rigor here.

(The actual argument for functional programming is its adoption by elite programmers like Standard Chartered's Strats team, Facebook's anti-spam group, and Jane Street as a whole.)

On the one hand, I have definitely had this experience - I rewrote a library three times, with large gains in performance, readability, reliability, etc, each time (but mostly on the third time, which was from scratch).

That said, isn't it also fair to say that with a better understanding of the requirements and problem, you may determine that a different language / paradigm is a better choice, in the same way you may decide a different class division is better?

What about skill too? Assuming the same programmer(s) worked on the same projects, they would be more skilled when coming to do the second project. Experience.

So what was the result of your research? I am no fan of OOP style, but I believe it has advantages over procedural style, and I also believe that functional style has advantages over OOP style.

Do you have anything published? I'd be very interested in reading.

Sorry, this was course work for a graduate class.


I'm just starting with Haskell and PureScript. So far I'm liking the latter better. It solves a few of their gripes with respect to strings, laziness and records, plus has a more granular/extendable effects system and cleans up the standard typeclass hierarchy. Also `head []` doesn't crash.

Of course Haskell is more mature, has support for multithreading and STM, compiles to native, so it's more performant. But PureScript integrates with JS libraries and seems "fast enough" in many cases. I think it's more interesting as a project too: the lack of runtime and laziness means the compiler code is orders of magnitude simpler than Haskell's, so I could see a larger community building around it if it catches on.

Given that they were on Python earlier, I wonder if PureScript would have been a better choice.

I think PureScript should catch on. The runtime performance story is much more predictable than Haskell, it integrates trivially with the most valuable target platform: JS, it has small output, fixes the warts of Haskell and yet is still pure.

Aside from apps at work, I made some simple physics demos with it http://chrisdone.com/toys/ Perfomance seems good.

While I really like it, I think it needs a more familiar syntax (to mainstream devs) to be anything more than a niche thing. I'd like to see something in the middle kind of go the typescript route, where valid+pure JS is valid WhateverScript, and with generally JS-y syntax, but with PureScript's additional features and effects systems.

It may end up really ugly though: how would you define operators while preserving JS semantics (e.g. no currying)?

You use purescript at work? Are they internal-facing apps or external-facing? Would love some more details.

> Aside from apps at work

Now I'm curious :)

> It solves a few of their gripes with respect to strings, laziness and records, plus has a more granular/extendable effects system and cleans up the standard typeclass hierarchy. Also `head []` doesn't crash.

Check out ClassyPrelude[1]. It's a (n opinionated) alternate Prelude that wraps many things up into much more "modern" interfaces. `head` has been replaced with `headMay` (which, as you can figure, returns a `Maybe a`). Most functions can now handle `Text` fairly seamlessly. For an application developer, it's fantastic.

[1]: https://hackage.haskell.org/package/classy-prelude

I can't imagine using a haskell-like language without laziness. It's what makes it possible to actually write small reusable functions.

Tell me, have you ever used foldr in Purescript? It just doesn't lead to reusable logic there, so I have no idea why you would.

But in Haskell, foldr is used everywhere. Laziness means that logic built with it is actually reusable.

Here is foldl implemented using foldr in PureScript, along with an example of using laziness to gain modularity:


> I can't imagine using a haskell-like language without laziness. It's what makes it possible to actually write small reusable functions.

I don't understand your point. Why is laziness a requirement to write small reusable functions? Are you thinking about currying?

OCaml is (relatively) similar to Haskell and is not lazy. Function currying does not require laziness.

It's unrelated to currying.

It's also hard to explain, but if you're used to Haskell, working with an eager-by-default language such as OCaml or ML is mildly annoying. You can adapt, of course, but it does seem as if gluing stuff together is trickier with eager evaluation.

There are downsides to lazy-by-default, of course.

There's a classic paper that answers this well: "Why Functional Programming Matters."

(Not a quick read, but not too huge, and it is a classic that is well worth reading sometime).

Folds are used in purescript all the time! Sure a foldr is less useful because it doesn't have the nice lazy preserving properties, but you can have strict left folds, which are tail recursive, and thus run in constant space.

Elm, another haskell-like, _strict_ language, models entire applications around the strict left fold over events https://guide.elm-lang.org/architecture/

We actually do the same in Jobmachine. The application is driven by a strict left fold over incoming events and current state.

There's a reason I said foldr. In Haskell, even foldl is implemented with foldr. It's that powerful of a tool. In contrast, foldl is far weaker, and necessarily strict in its input. It loses nothing moving to mandatory strictness. But foldr loses everything.

The hard thing to swallow with purescript is whether row types are really worth the complexity they add.

I think it's a much easier pill to swallow than laziness.

I don't really use foldr even in Haskell. It's almost always a performance problem.

> I wonder if PureScript would have been a better choice.

I have an aversion (based for a large part on prejudice) of things that involve Javascript and its ecosystem :)

I hear many good things about Purescript’s effect system, but I haven’t studied it in detail. This is definitely one of the areas where there is room for improvement in Haskell.

Regarding the type class hierarchy and head being partial, those weren’t really an issue in practice.

You don't have to study its effect system in detail, there's not much to it. Instead of IO a you have Eff e a, where e is a record of effects, using PureScript's records support. The neat thing is that statements in the Eff monad tend to get compiled to x; y; z in the resulting JavaScript, which is great, you don't pay a performance penalty. Check out the source code in this demo: http://chrisdone.com/toys/elastic-collision-balls/ There is still some overhead for currying, but there's a lot of room for decurrying saturated calls.

You might be happy to hear there is a PureScript native compiler. I'm also averse to JS things, I use PS but don't use the node-based tools to build it.

That was informative, thanks!

A while ago I wanted a better JS for a project, so I tried PureScript and liked it, I have a little experience with Haskell and it was easy to understand and more consistent, however when I tried to do UI components I chose halogen and damn it was too complicated, ended going back to ES6 and React.

What was complicated about it? I'm just getting started with it and it seems pretty straightforward after two days. In fact, I'd say it's almost stupidly easy -- one state handler, one change monitor, and one renderer, done. None of the mangling local state/props/component/JSX/global state/flux/redux stuff in React. So many knobs and options and plugins, the React ecosystem strikes me as a zoo comparatively.

The comparison between Stack and Python build tools is striking:

> No messing around with virtualenvs and requirement files. Everybody builds with the same dependency versions. No more “works on my machine” and “did you install the latest requirements?”.

I wonder why the Python ecosystem, which is much more mature, doesn't provide a build tool as delightful as Stack (which is less than 2 years old).

Stack requires package sets (aka "snapshots"), which some kind of CI system (Stackage: http://stackage.org/) has to do a daily build job to see if they all build and pass tests together. That requires some money to keep running, and buy-in from package authors as there is a maintenance overhead each release. It took a few years for Stackage to get enough packages in it to be generally useful, and then we wrote the Stack tool which was able to default to using Stackage snapshots.

There was (and still is a little bit) of resistance to the whole idea of Stackage from the community; people liked the idea of build plans magically being figured out on demand, it's an interesting research problem (it can be hard to let go of an interesting problem when a solution side-steps it all together). I believe eventually many people changed their minds after experiencing the simplicity of using Stack and not having build hell be a substantial part of their development cycle.

Python would likely have to go through the same process. Although with Stack and Yarn providing frozen builds (and QuickLisp); the idea has some precedence, which makes it an easier idea to sell. I mean, Debian and a bunch of other OSes do it like this, but experience shows programmers don't pay attention to that.

Stack enables application development in Haskell, as opposed to just library development. A proper library doesn't have more than 20-ish dependencies, in my opinion, and manually handling these and their version bounds is not a problem.

But when writing applications with hundreds of dependencies, manually figuring out a mutually compatible dependency range for all packages just isn't an option. At least not if you want to spend time prototyping code, rather than think about dependency ranges.

hpack solves additional problems with the .cabal format (sane defaults as opposed to build failure), and I highly recommend it, for application development at least. I just discovered it a month ago and now I wouldn't be able to live without it.

Probably for the same reason I greatly prefer cabal to stack. Stack assumes it knows better than me. Cabal just does what I tell it to do. As a domain expert, I greatly prefer the latter. It does what I want, nothing more, nothing less. Stack is a mysterious "solution" to a problem I don't have that works by doing everything differently than I do.

Stack was created because not everyone is a domain expert. A lot of people don't want to be domain experts. They just want something that works without having to know all the details. It was only able (in the business sense) to be created because so many people look at Haskell skeptically anyway, and take any excuse to back away from it. The people behind the development of stack also run a major advocacy initiative trying to get people to use Haskell, so they found it to be an important thing to build.

You don't need to try to get people to use Python. It's already broadly accepted. When people run into trouble, they just say it's the price of using Python, and aren't willing to make the exchange of giving up power to get rid of a minor inconvenience. So there's no business incentive in the Python ecosystem for making the tradeoffs stack does in the Haskell ecosystem.

I am a "domain expert" and that is why I use stack for Haskell projects. It's much preferable to let somebody else handle the burden of ensuring that certain dependencies are compatible with each other.

> Stack is a mysterious "solution" to a problem

There's nothing mysterious about stack. It's just a group of people who step up and say "I am responsible for package $x" and then work together to find stable sets of versions that are guaranteed to work together.

The whole process happens out in the open, for example here is an issue tracking a compatibility breaking change in a common HTML library: https://github.com/fpco/stackage/issues/2246

Cabal made me never use Haskell every again. I work in two different locations and at home. All three locations never worked the same and all had different issues with Cabal. After hours and hours of trying different things I walked away into the wonderland of Racket.

They're working really hard on improving it though . Cabal 2.0 will have a nix-style build system, in which multiple verions of the same dependency can be installed globally (so no separate sandbox per project). This will solve most problems of where cabal breaks down. This gives us almost the same usefulness as Stack. However, you will have to make sure that there is actually a feasible build plan, by setting up your version bounds correctly. With stack, other people take care of this for you, and you never touch the version bounds, which is relaxing but also gives you less control.

A nice feature of stack that cabal AFAIK will not provide is that it takes care of installing GHC in multiple versions. I think that's very important for newcomers.

> relaxing but also gives you less control

More like "leads you to typically exercise less control". You can override versions of packages in a stack snapshot.

Cool I'll give it another go once they release it.

Consider this situation: three different developers are working on the same application. They should all have the exact same dependencies installed, right? Therefore they should be working of of a freeze file of some kind.

Why use an entirely ad-hoc freeze file when you can start from a known-working snapshot (that some of them might already have installed on their machines!) and modify it from there. I find this the perfect option in this kind of situation, and so object to saying that stack is just for non-experts.

The whole "Do you have the dependencies and a Python env installed? Noß Then you can't run this script/program." was one of the main reasons I switched from Python to Rust, where cargo as the (very good) package manager comes with the language and, because Rust is a compiled language, you build all the dependencies into your executable you aren't dependent(heh.) on the user having installed a runtime that maybe or maybe not has all the dependencies at the required versions.

Indeed, Rust + Cargo and Haskell + Stack are very similar in this regard. Both have great package managers, and both produce a shippable executable with only a few dependencies on system libraries. One notable difference is that Stack downloads the compiler, whereas for Rust, every version of the language comes with a compiler and a Cargo. This ensures that you can check out a year-old commit and still build your project with Stack (modulo breaking changes in Stack, which so far I have not encountered), whereas for Rust the compiler version is not pinned.

> for Rust the compiler version is not pinned

Oh, you just need another layer of abstraction! Install rustup, and then (from memory, might be slightly wrong):

  rustup install 1.15.1
  rustup run 1.15.1 cargo build
rustup will take care of getting hold of the right versions of cargo and rustc, and then use them to run the build. I admit that it's not as nice as having the build tool download the right version of everything, but it does work, and you could hide this inside a pretty small shell script or function if you wanted it to be neater.

I have pygradle generate a PEX file for all of my command scripts. Not as seamless as cargo or stack, but it works without rewriting everything.

I should probably have clarified that I wanted the "seamless" way of making the final executable. IIRC, I tried(really hard) and failed on getting some method of "freezing" for Python to work, which made me weary of the prospect of trying something like that in Python in the future(now past).

pyinstaller works great. No issues whatsoever and I'm happily deploying Python 3.6 to machines as far back as RHEL 5.

Single binary to deploy, no dependencies except libc, life is great.

I guess this is true if you don't need to interact with any system libraries.

Could you give an example of what "system libraries" would pose a problem in my example of using Rust with cargo?

Also note that by "no runtime installed" I mean no runtime as in "no Python runtime", "no JVM" etc. not necessarily "no libc"

EDIT: formatting

>Could you give an example of what "system libraries" would pose a problem in my example of using Rust with cargo?

Specifically things like xorg libs, libmpeg, libsdl, and such. Not that Rust would have a problem interfacing with them, just that they would need to be present regardless of whether or not someone was just trying to run a distributed binary.

Agreed that you wouldn't need a VM like CPython or the JVM. However, Rust isn't unique in that department. Almost all languages that compile to binary executables have this advantage.

> Specifically things like xorg libs, libmpeg, libsdl, and such. Not that Rust would have a problem interfacing with them, just that they would need to be present regardless of whether or not someone was just trying to run a distributed binary.

That's why stuff like that is AFAIK usually either distributed with the binary or is absolutely required to have present on the system, regardless of the PL, if you want to/can only distribute a "naked" binary.

> Agreed that you wouldn't need a VM like CPython or the JVM. However, Rust isn't unique in that department. Almost all languages that compile to binary executables have this advantage.

Didn't mean to suggest this is unique to Rust, which is why I wrote

> because Rust is a compiled language.

EDIT: formatting

I think I may have misunderstood your point after re-reading.

I thought you were implying that compiled binaries do not have dependencies. Now I can see that is not the case.

I wonder why the Python ecosystem, which is much more mature

I hope I'm not being too pedantic but Python's ecosystem is much larger than Haskell's, it isn't really more mature. Haskell and Python are very similar in age as languages go.

Maturity comes from:

* Millions of person-hours being poured into a language...

* ...Over a long enough time period that the language can go through several develop-eval-improve cycles - that take real world use cases (And not one-liner bubble sort implementations) into account.

In this sense, it doesn't matter whether or not Haskell was invented in 1890, or 1990. #2 is required for maturity, but so is #1.

(I am not a huge Python fan.)

Absolutely. In fact, one of big things holding back Haskell has been the immaturity of its libraries (because for a long time they were built by hobbyist and academics, with little industry support, in a language where lots of things were new and old architectures didn't work well). Happily that's now mostly behind us: https://github.com/Gabriel439/post-rfc/blob/master/sotu.md (editor support being the main exception)

You can't get 9 women together and produce a baby in 1 month. A lot of developments within a programming language ecosystem originate from new ideas discovered outside of it. It doesn't matter how many people you have working on a project if the crucial piece of tech they need hasn't been discovered yet.

No, but you also can't study 1 woman having 9 babies, and conclude yourself an expert on pregnancy. I'll prefer to get my advice from the doctor who studied 9 women, having one baby each. Note what I said about a mature language having to go through several write-eval-improve cycles.

Diversity matters. A language that one person tinkered on for thirty years is far less likely to be useful, then one that ten people tinkered on for three years. Or, in the case of Python vs Haskell, a hundred people who tinkered on it for twenty-five years.

It is important to note that person-hours are not remotely fungible, and some contributions are even negative in the sense of "developing ecosystem maturity".

I don't really know how to compare.

Python's ecosystem is certainly much more complete, and stable in the sense that radically new concepts don't appear every day.

Haskell's ecosystem is more reliable in the sense that this feature you are using will probably not disappear in a year, and libraries have less conflicts.

I think newer often tends to easier because it has the benefit of hindsight in this case. Also the old tools tend to get more complicated as time goes on.

"It is said that Haskell programmers are a rare species, but actually the majority of developers at Channable had used Haskell before."

Could you imagine if this wasn't the case? The hurdle to actually get people excited about a language such as Haskell especially moving from something like Python would potentially be huge. Kudos for already having that problem solved.

We're based in Utrect, where Haskell is part of the mandatory curriculum at and the language of choice for many master courses. Because of this, almost all our developers are somewhat familiar with it, or have at least had one course in it in the past, which really helps a lot!

In many European countries there are workplaces that only hire people with relevant academic degrees.

A course on functional+logic programming is often placed in the 2nd or 3rd year of a typical European 3-3½ year CS degree.

At one company I experimentally wrote OCaml and named the resulting native binaries whatever.py. None ever looked at them. So there is alot of scope for shenanigans.

While it's an interesting look at a change you introduced, that blog title might not come across quite as intended.

If you're having to "secretly introduce" tech, and "get away with it", that suggests there are unnecessary and unproductive constraints on your work; maybe even suggesting that you'd get in trouble for actually daring to make things better.

The "secretly" part refers to part of the story where we had 1 hour to build up quick prototype to pitch to our boss. The title is a bit tongue-in-cheek. It's not that we built this thing in secret for months and then just deployed it. That's also not what the article says :)

We had been planning on replacing Scheduler for a while now, and had already written down some mumblings about what the new design should look like. We were also already discussing whether we would switch away from python back then.

I think the exact opposite of what you are saying is true. We got the freedom to experiment with something new, and to actually make things better along the way.

>I think the exact opposite of what you are saying is true. We got the freedom to experiment with something new, and to actually make things better along the way.

Sure, and from what I read I mostly took it that way. My original point was just that maybe a bit of caution would be good in the choice of title. If I was just skimming through the titles on HN, or skimming the article, it could be easy to get the wrong impression of channable.

Not the author, but I think that is exactly what they wanted to convey in the title. The implication is that they're fighting the man and won.

There's a tradition of programmers laying claim to subversively Making Things Better in spite of the bean counters. Sometimes, it is even true, as far as it goes.

Glad to see Haskell used in production.

It's kind of funny that build reproducibility (which was a major issue before stack) is one of the strong point.

I wonder if, for your project, using cloudhaskell would have been more appropriate. I have a feeling some of the problems you found could have been solved with that.

You can deploy Python code as a static binary that includes the interpreter along with all dependencies. I heavily use this in production and life is great - deployment means copying one single binary, reverting means running an older one instead. No external dependencies, no pip upgrades, just libc.


> No messing around with virtualenvs and requirement files. Everybody builds with the same dependency versions. No more “works on my machine” and “did you install the latest requirements?”.

While this is nice, of course, I'm not sure that is outcome is unique to Haskell/Stack. It seems like you could accomplish a similar level of reproducibility by building a Docker image or bundling dependencies in some other way.

We are actually using Docker for generating the virtualenv that we ship and running tests now. The motivation for doing this is being able to control the environment; we can run tests and build a package on CI, and we can build the same package locally when CI is down. We don’t use Docker in production.

It is not clear to me how Docker solves the issue of pinning dependencies; I would rather have a file that states the exact version of every package to install, than an opaque blessed container image that has some versions installed, and I do want to have the versions used under source control. Generating the image would not be reproducible (in the sense of having the same Python files inside it) without pinning versions somewhere anyway, right? Or am I missing something obvious?

My understanding is that Docker only stays reproducible if after every change you kill the container and start a new one. Otherwise a particular change may only be working because of a side-effect of a change you introduced earlier and then deleted.

This isn't a huge issue, but still it's nice in declarative systems like Stack and NixOS not to have to worry about that kind of thing.

I'm the author of the post. I'll be happy to answer any of your questions :)

Not a question, but with regards to:

> * We use all of the five different string types. It is annoying, but it is not a major problem.

cs[1] and the OverloadedStrings extension is all you need, in my experience.

[1] https://www.stackage.org/haddock/lts-8.3/string-conversions-...

I like the prototude prelude replacement for its lack of partial functions and toS function to do string conversions

Thanks for the write up! In the beginning you mention that you ran into some bugs in the Python version that would have been caught by the Haskell type checker. Can you go into more detail about what those bugs were?

The most serious one is that we were submitting jobs (as json) that were missing a few metadata fields. In Python we passed around dictionaries, and even though we had json schema validation in place, this slipped through. In Haskell, we define a record type and the corresponding serializers. It is more code, but what you get is that invalid data cannot exist at runtime: it simply cannot be represented.

Also, a compiler refuses to compile your code if you make a typo in a field name.

I'm wondering if integrating MyPy into the build, or even using Cython for performance, could have helped.

We do actually use Mypy! I wrote about my experience with it here:


Did you try PyPy? If so, how did it perform? If not, why not? PyPy adoption is at only 1% or so, and it'd be nice to see more PyPy usage.

The real issue with the Python scheduler was its algorithmic complexity. Using a faster implementation would have bought us a few extra months or maybe even year, but it would only have postponed the need for a real solution.

"Our lead developer Robert usually comes in a bit later, so we had about an hour to build a working prototype" - why did you only have an hour? What would have happened if the working prototype was not done when he came in?

How did you debug the space leaks? Can anything be done to mitigate/avoid them?

The GHC runtime has built-in support for memory profiling, it can produce a graph that shows a breakdown of the heap over time. After trying various combinations of flags I managed to produce a graph where one part was clearly growing over time. The corresponding function was a recursive function with two arguments, the first never changing in recursive calls. I rewrote that to a single-argument nested function, and that made the leak go away.

Instead of using separate monad transformers, we use a single "World" that knows how to provide Redis, logging, iOS, and other typeclass instances.

There is a RealWorld that runs on top of IO and a FakeWorld that runs on top of pure State for unit testing.

This means that we have to wrap every single API into our own "SupportsRedis" and similar APIs, but in the end I think it's worth it! Unit tests are super fast and not intermittent at all.

Nice article. Out of curiosity, what does your write with? Vim/ghc-mod? Emacs/Intero? IntelliJ IDEA/plugins? Atom/VSCode etc.?

I tried spacemacs once but I found it too magic. I decided I should learn proper emacs instead of running a random playbox of plugins ontop of plugin. But haven't had the time to properly learn emacs yet.

I have tried various haskell plugins for vim in the past, but they always tended to break so I gave up fixing my config and threw them all away.

Now it's just plain vim (with some non-haskell related plugins) Next to it I have a terminal that reruns tests when a file changes : `stack test --file-watch` . It's simple but it always works.

I'm not sure if the vim stuff got any better lately, I haven't checked. So if you have any suggestions, please tell :)

Have you tried running tests from stack repl --test? Its so much faster. Id love if i could use file-watch and the repl together.

Have you tried ghcid? In my experience it was a bit quicker than `stack --file-watch`

In case anyone was wondering, the 'stack' command in the article refers to https://docs.haskellstack.org/en/stable/README/. Which actually looks kind of wonderful.

Always when I see haskell demonstrations eveyrthing looks like just interface declarations.

You can do beautiful interfaces with eg. java also. But where is the meat where anything actually happens? I rarely see that in these posts. Yes I could look up the source but I don't have time to read through it randomly.

This looks just so nice and stuff just magically works?:

runWorkerLoop :: (MonadBackingStore m, MonadLogger m, MonadIO m) => WorkerState -> m WorkerState

And monads to boot! (are monads haskells equivalent of java factories? I kid, I kid :)

The runWorkerLoop function logs a few lines and sends out an initial job request (by enqueueing an event in Redis). It then calls the nested function `go`, which dequeues one event of a TBQueue (a thread-safe bounded queue), matches on the event, and calls the right function to handle it. If the event was not a "stop" event, `go` calls itself to do the next iteration of the loop. `go` takes a WorkerState as argument, which is how it keeps track of which jobs are running, and whether there is an unanswered job request.

In reality the signature is a bit uglier, I simplified it for the post because the point was about effects. In particular we also pass in the configuration, Redis connection details, and a callback to manipulate the TBQueue.

> The issue here is that we cannot run runWorkerLoop and runFetchLoop with forkIO, because we don’t have the right transformer stack.

Am I understanding correctly that this is because, while you can lift e.g. runFetchLoop to something of type IO m (), it's not possible to convert use forkIO on it since it requires an input of type IO ()? Isn't that just a consequence of the fact that Haskell has no possible way of knowing if your side effects can be contained in the IO monad?

It's not about side effects, it's about bookkeeping. When you have a type that indicates that you can do IO, but you're also carrying around bookkeeping data implicitly for things like configurations and data sources, forkIO represents a hard problem.

If the implicit configuration is updated, there's no way to communicate that across threads. The same is true with all the other things monadic layering can provide. How do you call a continuation that points to a different thread? That doesn't even make sense.

So.. Why lie in your type and pretend that those things all make sense? Why not make the type explicit about what makes sense and what doesn't? That way, when someone wants to do something that has no a priori way of making sense, they're required to define how to handle it, such that it makes sense in their specific use case. And that's what the post says they did.

All in all, it's things working as designed. Places where you need to stop and think are set up such that you need to stop and think to use them, instead of barging ahead unaware of the issues.

That sounds about right.

At the shallowest level, we can't pass `m ()` to forkIO unless m ~ IO, 'cause the types don't match.

But beyond that, there is the question of how that extra context would be passed through. For something like ReaderT this is straightforward. But consider StateT - `set` in one thread can't be visible in the other.

Good luck with hiring!

Shouldn't be hard.. Haskell to me so far feels like an ecosystem with way fewer jobs / freelance gigs than there are eager-to-go-commercial enthusiasts hacking away in their spare time..

They won't need luck, plenty of people would love to program in Haskell professionally.

See my reply to a similar comment: https://news.ycombinator.com/item?id=13786992

makes me wonder how long they would have lasted had the initial implementation been done simpler (ie, not in their PyDatalog fork).

> but if we could get it done, there would be no going back

The naïveté in this simple statement is so cute.

The list of concerns is also pretty naïve. The main problem you are going to encounter with this project is hiring. If you want to grow this project or if the main developers leave the company, I bet it will get rewritten in a different language in no time.

See https://news.ycombinator.com/item?id=13784085 for why hiring is unlikely to be a problem for them.

I also encourage you to find _any_ experience report that tells of difficulties finding the right candidates for a Haskell job.

It looks like jobmachine is a private repo, maybe don't include the url in your blog post :)

If the URL to a private repo is not secure, then a whole bunch of people (including GitHub) have a big problem, and exposing jobmachine is the least of their worries ;-)

It's not a security problem, or else my comment would have been much more adamant about removing the link. It's mostly just reader confusion. Seeing the git clone url led me to believe the project was open sourced and was disappointed to find that not to be the case.

We do have a different small tool in Haskell that we are likely going to open source though :)

I don't think there is any harm done in putting it in there. All GitHub urls follow a standard schema, so if you wanted to you could anyway guess it.

The point of the code snippet was simply to highlight how nice the Haskell build tools are compared to Python's.

I couldn't help but try to see if they were nice enough to opensource it, though, hah.

It's currently specialized for our specific use case and would not be useful for anyone else, I think. I'm sure that we will keep iterating on it though, so perhaps it will become more general-purpose in the future so we might look into open-sourcing it then. But I can't promise anything, sorry!

Actually if you wanted a statically typed compiled functional language of Haskell while keeping the declarative logic paradigm of Prolog, then the Mercury programming language was made exactly for you!

Haskell is a bad language, in my opinion, because you can't tell what the O(n) run-time is for any operation. Instead you just have to "trust" that it'll be fast enough.

More on this: https://www.reddit.com/r/haskell/comments/1f48dc/what_does_t...

All of the answers seem insufficient. Basically you can't estimate Haskell run-time unless you are very familiar with the internal Haskell engine.

Completely untrue. I'm not a Haskell evangelist (I appreciate it for what it is) but I thought I would at the very least point out that most of the documentation for basic data structures (i.e. Data.List[1]) not only are well-documented but have a link on the far-right side of the documentation site that directly shows you the source code and it's usually easy to tell what it's doing. Any developer should be able to grok that code and determine the run-time complexity.

[1] https://hackage.haskell.org/package/base-

I think he's talking about how it can be hard to tell what's already been evaluated and what hasn't.

I'm not a C evangelist, but I would point out that most documentation of standard C function is very detailed and its code is shown in its man page. Any developer should be able to grok that code and determine its safety. Mattaku...

They really should be able to.

You're completely missing the point: people are not computers. They make mistakes, including while they check things. That's why you don't want to rely on tests, but rather on formal proof.

The complexity of many Haskell collection APIs is actually documented unlike with many other languages. These complexities are not changed by the runtime. Only constant factors and memory use are affected by optimisations and this is no different to any other high-level language that actually optimises code.

As a huge Haskell proponent: this is a totally legitimate question, sorry you're being downvoted.

Writing extremely performant Haskell is a very specialized skill. Happily, Haskell is still extremely fast even without fine optimizations.

It depends on what you want to do: if you're writing a moon lander and don't know anything about GHC internals you may be overreaching yourself=) But for most things like web apps etc. knowing the basics is enough.

There are various mature options for scheduling jobs with dependencies between them.

Why did you choose to write your own, regardless of the language?

Good question! None of the existing options that we investigated supported all of our requirements. In particular, all the arrows that can exist in the dependency graph are known ahead of time, but the nodes are not. This means that a job can depend on a job that does not exist yet. (A user can add extra feeds to download, and the merge job should wait for them, even if it had been submitted already.) Furthermore we have a few specific constraints such as “per project, only one of job type x or y may be running at the same time”.

I'd use haskell if it weren't lazy by default and didn't use Cabal. Aka I'd use OCaml/F#.

> Our lead developer Robert usually comes in a bit later, so we had about an hour to build a working prototype.

I'm guessing he is a Python developer and likely he is no longer the lead.

Don't worry, I'm still here.

Yes, but are you still the lead? :)

I'm still the lead.


Well, he's the co-founder of Channable so... :P

What's the reasoning?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact