
Why GitHub used Haskell for Semantic - yakshaving_jgt
https://github.com/github/semantic/blob/master/docs/why-haskell.md
======
phodge
> An example of this is the concept of resumable exceptions. During Semantic's
> interpretation passes, invalid code (unbound variables, type errors,
> infinite recursion) is recognized and handled based on the pass's calling
> context. ... Porting this to Java would require tremendous abuse of the
> try/catch/finally mechanism, as Java provides no way to separate control
> flow's policy and mechanism. And given Go's lack of exceptions, such a
> feature would be entirely impossible.

Not knowing much about FP, It'd be great to see a more in-depth article
explaining this problem domain a bit more and showing some side-by-side
examples of Haskell's specialized call sites compared to a Java
try/except/finally solution (although Python would be a better procedural
exception-based language to compare to)

One of the main reasons I don't care much about Haskell is because without any
side-by-side comparisons of Haskell vs <insert procedural language> I don't
understand what the Haskell advantages are, and I don't know when I'm dealing
with a problem space where Haskell would help me.

~~~
hardwaresofton
> One of the main reasons I don't care much about Haskell is because without
> any side-by-side comparisons of Haskell vs <insert procedural language> I
> don't understand what the Haskell advantages are, and I don't know when I'm
> dealing with a problem space where Haskell would help me.

This is one of haskell's biggest problems. It's just enough outside of the
normal flow of imperative languages (yet usable for the same problems) that
you can't tell how much of an improvement it is unless you try it. Also,
people who write haskell are more inclined to share
lofty/abstract/interesting-to-other-haskellers code rather than your normal
day-to-day code that is massively improved/safer and benefited from haskell's
features.

I've mentioned this before, but one example of where it became apparent to me
how much haskell had changed what I expected from language was non-nullable
types. It's starting to be really common in languages now (typescript, kotlin,
etc), but if you are used to writing imperative languages, the worry of
nil/None/null is ever present, and a concept like Optional<T> is actually
quite foreign looking. If you really think about it, it means that _none_ of
your language is safe -- none of your functions are safe because they said
they wanted a String but you might have gotten a null that _looks_ like a
String to the typechecker and will blow up at runtime.

Another key improvement in haskell is the removal of class-based code-sharing
(i.e. inheritance) -- the separation of behavior and data is really important,
and most languages are starting to come around to this now (go w/ structs +
interfaces, java w/ data classes, kotlin w/ data classes, rust w/ structs +
traits), but haskell (and other ML languages) have been there for a while.

Yet another key improvement in haskell is the errors-as-values paradigm that
is everywhere. If some function has a possibility of failure, then it _should_
return `Maybe TheThing` or `Either AnError TheThing` (see how nice and legible
those types are?) -- this forces explicit checks on failure and allows cases
where there isn't a chance of failure (just `TheThing`) to speed ahead without
nullchecks and be fairly certain. This actually pressures you into trying to
sequester failure across your codebase -- you try to write functions that have
signatures like `TheThing -> SomeArgument -> OtherThing` (see how legibile
that is?), to minimize on the amount of `Maybe x` or `Either error x` you have
to deal with -- this is often if not always good for codebases.

Maybe this is something I can help with, I write about pedestrian haskell a
bunch, and I've been meaning to do a blog post on why haskell is better <your
language>, something to really rustle the jimmies.

BTW, the quote about resumable exceptions is actually referring to a concept
called a monad, which can be incredibly hard to grasp if you don't look in the
right places (there are a lot of bad tutorials out there), or don't give your
brain long enough to marinate in the concepts. If I were to take a stab at
explaining it simply, in this case it's like a combination of exceptions-as-
values (i.e. _not_ go's approach, and _not_ java's approach) and the value
that is being passed around has enough state in it to continue stop, fix
itself, whatever else. When something goes wrong in most imperative languages,
you kind of get the hell out of dodge, and you lose access (usually) to
whatever work was done up until the function boundary -- it doesn't _have_ to
be this way but it usually is.

~~~
theamk
It's interesting that any time I read about Haskell, I realize most of the
features can be found in other languages.

Functional programming and lazy evaluation are common in Apache Spark
("analytics engine for large-scale data processing."). You cannot write a good
pipeline if you think in terms of imperative language.

Non-nullable types can be found in Java (@NonNull) and in C++ (references).
C++17 got std::optional type.

We have languages without inheritance, like Go, Rust and so on.

Errors-as-values are pretty common as well (C++'s boost had boost::error code
for a while now, and of course there is Go again)

Even monads find themselves in other languages -- using
org.apache.spark.rdd.RDD is pretty close to IO monad.

I find this an unfortunate downside of many Haskell tutorials -- they often
claim there are unique features that are present in Haskell only, but on the
closer inspection, it turns out those features are present / can be trivially
added in many other languages as well

~~~
hardwaresofton
Hey that was kind of my point -- but I think you have it in reverse, Haskell
has had a lot of this stuff _for a long time_ (as in most of them since it's
inception), and it's trickling down to other languages now.

But to make some concrete counter points:

\- Apache Spark is not a general purpose programing language (you're totally
right about FP and lazy evaluation being important in DAG-land of course)

\- "Non-nullable types can be found in Java", yeah except them being _the
default_ is the big innovation, along with the recognition of the problem, and
facilitation of the worldview that recognizes the issue. Optional didn't show
up until a few years ago (Java 8?), first class functions weren't a thing
without subclassing till around then too, Function references, Functional
interfaces, etc. I'm less familiar with C++ and it's commendable that it's
adopting new things and people are moving forward, but it's basically the gold
standard of footguns with type-system scopes attached (again I don't write C++
on a daily basis and haven't felt just how much better the new editions are).

\- Go and Rust _learned_ from Haskell, Rust heavily so. BTW these days I'm
more and more of the opinion that Rust is the one more worth praising of the
two

\- What go does is kind of error as values, but it's also kind of not -- I
mean a near complete lack of use of exceptions at all. The distinction is
subtle, but coding to _always_ handle the error case (because it _is_ the
result) is different from having a sometimes-present error code that you
sometimes check.

\- Again, you're right that Monads are everywhere -- Haskell didn't invent the
concept, but it is one of the places you can go to see it actually used
functionally and _learn_ from what people are doing with it (never mind all
the novel papers).

Haskell is one of the few places that all these features come together to form
a coherent whole.

------
_bxg1
I'm really curious why Haskell has seen so little adoption in industry. Is it
just the difficulty? Or a chicken-and-egg effect with tooling and libraries?

One thing I've wondered is if it actually isn't ideal for a lot of cases. FP
is beautiful for certain things. But in some domains (or pieces of domains),
state and mutation aren't just unfortunate implementation details, but a core
element of the problem space. For these cases, the FP answer is usually
"recompute and replace" (generally with immutable data structures that make
this efficient). This can be syntactically clunky when it's a major part of
your application, and not just a necessary evil to be swept out to the edge.
The most successful languages let you be pure-functional where it makes sense
and then stateful where it makes sense. Haskell doesn't, really (from my
cursory reading about it).

~~~
dmix
It's 100% the difficulty. Anyone saying otherwise is lying because they want
Haskell to be popular, adopted it very early on in their career when they
could really invest in it, or is a natural at this type of stuff and simply
doesn't know any better.

I've learned about 10 different languages now and Haskell easily had the
highest learning curve. Most languages I was able to get semi-usable in within
a couple days, at least to do some minor stuff that works. But Haskell was a
real commitment that took months until I was comfortable doing real stuff.

You have to learn about how to mutate data and how to manage state using
monads and functors, how to query through complex objects using lenses, even
getting it to parse some complex data coming from a JSON feed into a usable
form requires a good understanding of the type system.

But ultimately besides when I first learned Clojure (my first exposure to FP),
I don't think there's a language that has taught me as much about programming
as Haskell. It was very rewarding and something I still continually dabble
with and learn from on the side.

I've yet to pull the trigger and actually build a full side project with
Haskell. Which is typically my biggest test. I've found Erlang/Elixir to be
the far better middle ground from my day-to-day work in Ruby/JS when I want
something modern, fast, and functional.

Once PureScript becomes stable I have a feeling I'll be diving harder into
Haskell and may finally make that full commitment it requires for a real
project.

~~~
_bxg1
You seem like someone who could answer this: why use PureScript over Elm?

~~~
T-R
I've worked in Elm a decent bit, and used PureScript a little.

Elm is a very opinionated language - it's very deliberately missing some
abstraction power (typeclasses), and some functions that are the bread-and-
butter of every functional programmer have steadily been getting removed from
the base libraries, so if you're used to Haskell, you'll find yourself falling
back to duplicating code by hand a lot. Elm also makes certain stylistic
choices into parse errors - "where" clauses are strictly forbidden, and
indentation preferences are strictly enforced. It's basically taken an awkward
edge-case from Haskell's indentation rules and made it not only a requirement,
but a prerequisite to seeing if there are any other errors in your program.
The back-and-forth trying to get the compiler to accept things that would just
work in Haskell, but don't because of someone's stylistic preferences, is
absolutely maddening.

PureScript, from the bit I've used it, is like a strict version of Haskell
with row polymorphism, a feature Haskellers have been hoping for for a while.
I've chosen Elm over PureScript in the past because of PureScript's dependency
on Bower (which I think has changed since then), but that's the only reason.

~~~
pyrale
> indentation preferences are strictly enforced. It's basically taken an
> awkward edge-case from Haskell's indentation rules and made it not only a
> requirement, but a prerequisite to seeing if there are any other errors in
> your program.

Some parsing is less efficient than Haskell because Elm doesn't have 20+ years
of PhDs working on it, but there is no such thing as compiler-enforced
formatting. I can't think of compiler errors regarding format that are the
expression of a choice, as you put it, rather than the expression of less
manpower.

Likewise, `where` clauses aren't forbidden, they are simply not implemented,
which, given that you can already use `let.. in`, is not especially shocking.

~~~
T-R
The omission of where clauses was explicitly a stylistic choice[1].

This is perfectly valid Haskell:

    
    
        #!/usr/bin/env stack
        {- stack script --resolver lts-12.19 -}
        data Test = Test {count :: Int}
        
        test = let
          something = Test {
            count = 5
          }
          in 5+(count something)
        
        main = putStrLn $ show test
    
    

To get the equivalent Elm to compile, it _must_ be indented like this:

    
    
        test = let
                   something = {
                     count = 5
                     }
          in 5+(something.count)
    

Note that `something` _must_ be indented beyond the beginning of `let`, and
the closing curly brace _must_ be indented to the same level as `count`. These
are both not a warning, but a parse error - you can confirm it with Ellie[2].
If that were due to a lack of resources, it would absolutely be
understandable, but this also was an explicit choice[3] which developer time
was spent implementing.

[1]
[https://github.com/elm/compiler/issues/621](https://github.com/elm/compiler/issues/621)

[2] [https://ellie-app.com/5KRg4g5ZMkba1](https://ellie-app.com/5KRg4g5ZMkba1)

[3]
[https://github.com/elm/compiler/issues/1573](https://github.com/elm/compiler/issues/1573)

------
norswap
I don't like these posts (for any language, not only Haskell). Invariably they
will list many points, most of which are subjective or tangential.

There are one or two core points (here I guess it is "Control Flow") that
would benefit going into much deeper. I think, you'll find these points aren't
as solid as they appear (cf. other threads in the comments).

Using Haskell is fine. You don't have to justify it. It's enough to like it,
being comfortable and productive with it.

Own your gut. Don't make pseudo-truth argument lists to justify your
decisions.

~~~
mebassett
I agree with you. It's sufficient to like using a tool and be productive with
the tool to justify using it.

But sometimes, solid engineers who are productive with certain tools are
working at the behest of unenlightened managers. Articles that say "BigCo uses
X for Y because of Scientifically Sounding Reasons" help those engineers use
the tools they like and are productive in.

------
dmix
Seems to be the same reason Facebook used it and one of the well-worn areas
that Haskell has proven itself (language parsing/analysis).

The section about their day-to-day experience of programming serious software
with Haskell compared to other languages is really interesting though.

------
spopejoy
> it's worth mentioning that Semantic, as a rule, does not encounter runtime
> crashes: null pointer exceptions, missing-method exceptions, and invalid
> casts are entirely obviated, as Haskell makes it nigh-impossible to build
> programs that contain such bugs.

This I think is the real drug of Haskell.

There are lots of challenges involved with moving to Haskell for production
software, and the article notes some of them. But once your code builds and
you ship -- man, there is nothing remotely like it. v1.0 software running for
weeks in prod without a bug.

Then, the super-sauce is: maintainers don't break your already-working code.
Sure new code might have bugs, and deep semantic changes to code can break
anything, but workaday fixes to one corner of the codebase simply can't break
all the other corners. v2, v3, v4 all migrate to production with none of the
working stuff falling over.

PS I don't buy that the "control flow" feature couldn't have been done in Java
-- however I would bet huge dollars that you couldn't do it in a way that
wouldn't require massive investment in maintainers understanding a complex and
subtle data-driven control flow pattern that any single dev could easily
subvert, intentionally or not. I was a huge fan of FP-style coding in Java,
but it required buy-in from all maintainers; as soon as somebody got lazy and
went back to mutability etc there was nothing to stop them.

~~~
emilypi
There's another "real drug" hidden in the subtext of your post and Semantic's
post, right in between

"Sure new code might have bugs, and deep semantic changes to code can break
anything..."

and

"Its language features allow concise, correct, and elegant expression of the
data structures and algorithms we work with."

\- the notion that fundamental abstractions (not just the ones someboy thought
up like the Gang of Four) compose well. This is something that we've seen the
industry slowly migrating towards in the form of functional JS flavors like
typescript and purescript, Java's lambdas, Scala's Cats and so on.

The ability to refactor and as importantly be confident that your refactor is
not messing with the semantics of anything upstream is a hidden feature of
static types applied to expressive abstractions.

So far, I have only seen this achieved with Haskell's libraries, type system,
and reliance on fundamental mathematical concepts while also being able to
avoid any "there's something rotten in Denmark" moments. The content in the
Hackage ecosystem is a wonder of the modern programming world, warts and all.

There are many rough edges to the development process, as always, but at least
in Haskell you can limit the language's contributions to that edgeset. It
actually makes you want to produce good code and be a better developer as a
result!

------
Insanity
Apart from being a great language, Haskell also has a great community. I
recommend learning the lang and interacting with the people (irc, #haskell on
freenode for example). :)

------
tottenhm
"Semantic" looks pretty neat. Are there are previous threads or announcements
about Github's goal in developing it (i.e. services for which they plan to use
it)?

~~~
spatulon
I saw on Twitter[1] that they're using it to show which methods/functions
changed in a pull request[2].

[1]:
[https://twitter.com/rob_rix/status/1134537990095720450](https://twitter.com/rob_rix/status/1134537990095720450)

[2]: [https://github.blog/2017-07-26-quickly-review-changed-
method...](https://github.blog/2017-07-26-quickly-review-changed-methods-and-
functions-in-your-pull-requests/)

------
gluegadget
> Editor tooling is sub-par (especially compared to language communities like
> Java and C#) and finicky - we often end up just compiling in a separate
> terminal.

[https://github.com/ndmitchell/ghcid](https://github.com/ndmitchell/ghcid) is
a great tool that does exactly the last part.

------
namelosw
This week I need to do some crazy validation in C#. I wrote a bunch of plain
imperative code. Then I thought I could use applicative validation with LinQ.

I tried to wrote Maybe, Either and Validation, and wanted to extract an
applicative and monadic interface. Then I found there are huge pitfalls and
difficulties to do it properly.

I also tried the same thing in JavaScript, while it's not a safe language, but
the equivalent idea is quite easy to express.

The thing I found disturbing is typical static typed languages limit our
expression of imagination. Every now and then I work in Java or C# project,
when I tried to express some abstract I would mostly face some difficulties.
When I tried to express those ideas to colleagues they mostly didn't think
about those things ever.

On the other hand, the dynamic nature of JavaScript (of course Ruby and
Python, etc) make things easier to express and try new ideas. I can express
new ideas to dynamically typed languages programmer much easier. There's
definitely "box" for programming languages which prevents people from thinking
"out of the box".

A lot of people tend to think of Haskell as a Perlis language. And I think
statically typed languages without a flexible design makes Anti-Perlis
languages.

~~~
lmm
It sounds like you're trying to judge Haskell based on your experience of Java
and C#? Don't do that. The Java/C# type systems have substantial limitations,
but that's not a problem of type systems in general, it's a problem of
C++-family languages specifically.

~~~
namelosw
On the contrary, I'm not judging Haskell and Haskell is one of the most Perlis
languages to me.

I'm judging Java and C# etc (statically typed but not flexible) are Anti-
Perlis because it draws too many imagination boundaries to their users.

------
mlazos
The control flow section perplexed me a bit, just because you can embed a DSL
in Haskell with monads doesn’t mean that “control flow isn’t embedded in the
language”. You still have a main entry point and all functions are executed
top to bottom albeit with lazy semantics. You could write a DSL and interpret
it with C# with all of the properties they want. Haskell is better for this
sort of task but their reasons seemed off. Am I misunderstanding that section?

~~~
willtim
Using C# as an example, foreach and IEnumerable are baked into the language as
is try/catch and more recently Async/Await. These are all just library
functions in Haskell and often more general (e.g. forM in Haskell works for
any Monad not just IO). Because they are library functions they can be
changed/customized very easily. In C#, how could foreach be made to support a
different stream type, exceptions be made checked or Async/Await be made to
suspend across stack frames?

For this reason, Haskell is probably closer to a general purpose language than
many of the imperative systems programming languages (especially Go).

~~~
bakhy
C# has LINQ. Implement Select and SelectMany extension methods for whatever
you like, and you can use the LINQ syntax with your type just as easily as
with IEnumerable. Foreach and async/await are baked in, that's true, but the
LINQ syntax is easily extendable to new use cases.

~~~
ddellacosta
I wonder why this is?

[https://www.infoq.com/interviews/erik-meijer-
linq/](https://www.infoq.com/interviews/erik-meijer-linq/)

(from "3\. How does LINQ work?")

 _What we have done, and what the mathematicians call Monads, we have
identified these sets of operations, we call them standard query operators; we
have a list of about 25 standard operators that you can apply to any data
model._

------
mcny
Request change to

Why Github uses Haskell for its newly released Semantic package

~~~
StavrosK
Why?

~~~
jonahx
Because in its current form the subject and verb don't agree.

~~~
StavrosK
Only in American English. Groups are plural in British English. "The police
are corrupt" vs "the police is corrupt".

~~~
theli0nheart
I have no idea if what you're saying is generally true, but I can say with
certainty that no one would ever say "the police is corrupt" in American
English.

~~~
mlthoughts2018
A better example might be a British soccer club, e.g. Chelsea are corrupt.

------
dev_dull
> _Semantic is a singular project and we often find ourselves at the edges of
> modern computer science research._

This is exactly where I DONT’T want to be when creating production software.
If that’s thrilling to you then by all means. As for me, I like my software
boring.

------
nullwasamistake
I don't understand why they didn't use Java with ANTLR. It can also generate
parsers in many other languages, but Java version supports more advanced
stuff.

There's already parsing support for many languages, and the parser itself is
world-class. It's used internally by tons of systems.
[https://en.m.wikipedia.org/wiki/ANTLR#Projects](https://en.m.wikipedia.org/wiki/ANTLR#Projects)

I mean, I guess Haskell is cool, but their critism of Java sounds like more of
a design choice than a deal breaker. Did they really need to reinvent this
wheel in an unpopular language where almost nobody can reuse/improve their
work?

I don't mean to be dismissive, but when your plans are to open source
something corporate sponsored, you should do it in a way that benefits the
community significantly. There's 20 other languages they could have chosen
that fulfilled that objective better

~~~
saghm
> I don't mean to be dismissive, but when your plans are to open source
> something corporate sponsored, you should do it in a way that benefits the
> community significantly

This is a super weird objection to me; is your argument that open sourcing
something that might not be useful to others is worse than just keeping it
closed-source? Even if literally nobody else ever gets any use from this, I
don't see why it's harmful for it to be open sourced. More generally, I feel
like companies releasing everything that they don't have a strong business
reason not to as open source is strictly positive.

~~~
nullwasamistake
More if you're going to open source something that widely useful you should
write it in a common language.

Otherwise the public benefit to open sourcing it is less. In this case, the
project is almost as useful as binaries. It's unlikely that many will able to
integrate into existing systems or even have the Haskell skills to work with
it

~~~
saghm
I guess I just have a fundamentally different view on open source than you. I
don't see anything wrong with a company writing a tool in the way that is best
for them given their current circumstances (e.g. skills of the team working on
it) and then open sourcing it. The team that wrote this obviously felt that
Haskell was their best choice for this project, and I don't begrudge them for
open sourcing it just because it might not be useful for others.

------
jrockway
> given Go's lack of exceptions, such a feature would be entirely impossible.

Haskell is a nice programming language, but ultimately if a program written in
language A can run on your computer, a program can be written in language B
that can run on your computer and do the same things.

If you think "return foo, err" is a lot different than return "Left foo" or
"Right err", then you might want to think more about how you think about
computer programs.

~~~
bjt
"return foo, err" can return 4 kinds of responses:

\- foo, nil

\- nil, error

\- nil, nil

\- foo, error

What's the right behavior when the function returns both a result and an
error? What's the right behavior when it returns two nils? At best, you make
everybody agree to language conventions to prevent that from happening.

In Haskell (or even Swift and Kotlin with their optional types) the compiler
guarantees those cases cannot happen. In Go, you just hope they never happen
because almost no code handles them.

I code in Go every day. But I can't defend its decision here. I'd much rather
have a richer type system where the compiler can guarantee that I'll get an
error or a result, but not both or neither.

~~~
jrockway
I think you just treat an error being returned as the value being ignored,
which is what 99.9% of go programs do.

As for two nils being returned, I think it's reasonable for a program to run
successfully without returning a result. Search for some work to do, return a
list of tasks -- if there are none, the list is empty (nil) but there was no
problem checking, so there is also no error. I don't see a problem.

As for the compiler checking, try running "func f() (value, error)" like "foo
:= f()". It blows up. What you do with the error is up to you; all Haskell
adds with the Error monad is that it short-circuits to the end of the do {}
block with the error value. No different than a Java exception; all of the
upsides, all of the downsides.

Regardless, I stand by my original point that if you want to handle runtime
errors, Haskell doesn't really add anything over any other language. Semantic
has cases that handle errors. So would the program written in any other
language.

The author should have just said, "I wrote it in Haskell because I felt like
it" instead of making up reasons that simply aren't true.

