Hacker News new | past | comments | ask | show | jobs | submit login
Building a neural network from scratch in Haskell (stanford.edu)
137 points by allenleein on Feb 13, 2020 | hide | past | favorite | 36 comments

Maybe it's just me, but this looks like a bunch of incomprehensible gibberish code. I'm sure I could understand it if I spent many hours pouring over it, but why would you ever do that? When written out in Python (or even in JS!), the whole thing is so much simpler looking and more closely resembles the underlying math. This is especially true if you use a package like Numpy. I know he said he didn't want to use any libraries, but if the underlying primitives in the problem are vectors/matrices, then it seems like you are reinventing the wheel in a very substandard way that doesn't aid in understanding in any way and results in something that isn't beautiful, isn't high performance, and is confusing for someone to read-- even if the person is familiar with the subject matter!

but if the underlying primitives in the problem are vectors/matrices, then it seems like you are reinventing the wheel in a very substandard way that doesn't aid in understanding in any way and results in something that isn't beautiful, isn't high performance, and is confusing for someone to read

You mean like...both of the languages you listed?

There's an obviously superior, faster, simpler language when working with vectors (APL), but people are obsessed with new languages.

If you really think it can be done in Python better than in Haskell, why not demo it in Python? You'll get internet points, and if you're right, you'll have something to show for it.

> Why not demo it in Python?

Not OP, but here: https://gist.github.com/stfwn/62e51d86ca4ff155becd3c6a14adf6...

You should be able to wget the file and run it (Python 3) from start to finish without any set-up and get ~88% accuracy on the test set.

It uses all the data (not one-sixth like in the blog posts) and does 200 iterations by default, so here's the loss plot on the training set if you want to skip all the fun: https://i.imgur.com/F57zmXV.png

This will probably shock you not at all, but I find this way easier to read than Python code. You can’t really compare with Numpy code since the author deliberately avoids loading libraries.

Of course, I’m more familiar with Haskell. The fact you’re more familiar with imperative languages isn’t the argument for readability you think it is.

If someone told me they wrote a neural net in Haskell, my first reaction would be "wow that's Cool!", after all I don't think I can do that. I wouldn't care if the code was gibberish.

Very few comments or type signatures, as well as formatting and syntax highlighting is a little odd there, make it less readable.

I agree that this would look more readable if they used a library for the vector math, instead of doing everything by hand.

I've never found a functional programming language that's easy to read.

The best way to understand functional programming without learning it is to learn an imperative language like Ruby that everyone says is easy but that is actually hard. Then it's:

programming you know -> experience you don't have yet -> code everyone uses

Ruby makes the jump to the last step without explaining the middle one. That's what the whole convention over configuration part is about.

Functional programming languages do the exact same thing. All of those arrows and symbols and syntactic sugar transpile directly to Lisp. They're basically shorthand. Unfortunately, I've never seen tutorials talk about that translation.

To get the middle part, it's probably best to start with the low-hanging fruit. Probably learn something like spreadsheets, then Scheme, then ClojureScript, then F#. I never made it as far as Haskell or Scala.

I always get lost somewhere around monads and impurity. And all FP languages fall down at that point in similar ways anyway. You either treat mutable variables as imaginary numbers that aren't examined until they must be (lazily), or throw the rules out the window and let variables be reassigned or renamed to themselves with new values, which breaks the whole point of using FP in the first place. It's pretty much an open problem, and the failure to solve it satisfactorily is why no FP language has caught on yet in the mainstream IMHO.

I love haskell as much as the next PL nerd, but the community has a real code golf problem. An example from the blog post:

    deltas :: [Float] -> [Float] -> [([Float], [[Float]])] -> ([[Float]], [[Float]])
    deltas xv yv layers = let
      (avs@(av:_), zv:zvs) = revaz xv layers
      delta0 = zipWith (*) (zipWith dCost av yv) (relu' <$> zv)
      in (reverse avs, f (transpose . snd <$> reverse layers) zvs [delta0]) where
        f _ [] dvs = dvs
        f (wm:wms) (zv:zvs) dvs@(dv:_) = f wms zvs $ (:dvs) $
          zipWith (*) [(sum $ zipWith (*) row dv) | row <- wm] (relu' <$> zv)

    descend av dv = zipWith (-) av ((eta *) <$> dv)

    learn :: [Float] -> [Float] -> [([Float], [[Float]])] -> [([Float], [[Float]])]
    learn xv yv layers = let (avs, dvs) = deltas xv yv layers
      in zip (zipWith descend (fst <$> layers) dvs) $
        zipWith3 (\wvs av dv -> zipWith (\wv d -> descend wv ((d*) <$> av)) wvs dv)
          (snd <$> layers) avs dvs

Writing this in 2-3x as many lines with clear variable names for some intermediate expressions would make it so much clearer. Haskell has a nasty reputation for "you have to study the shit out of it to make heads or tails of the code" and I'm pretty certain that 90% of it comes from how terse haskellers try to make code.

Just add intermediate expressions and annotate their types, maybe even with some type synonyms for intermediate types, because code is for humans.

All these building NNs in haskell, ocaml or another language solving an extremely simple problem which is ok if you want to just have some fun. But if the proponents are really serious they would put out a detailed tutorial solving a complex task (e.g building a state-of-the-art language model) showing the ease / difficulty of the process, and how using a typed functional language helps in debugging the model - which is supposed to be the biggest selling point of these languages? This will also show whether it is viable for a real life practitioner to invest time in learning these languages / frameworks.

People have done this but with hasktorch. Which are haskell bindings to the pytorch C++ libraries. The static typing definitely helps when you want to ensure that the input/output shapes of a layer are consistent.

Can you please point to such a tutorial? I am interested.

There's no proper tutorial AFAIK. Just examples in the hasktorch repo.

Learning the basics has its place. People have to start somewhere. A tutorial for more complex, real world problems is a totally different thing. Both are interesting in their own ways.

If you have no background, why would you want to learn the basics in a language no one uses rather than the dominant language / framework of the specific domain? Seems to me an extra cognitive load to bear.

I am learning the basics of neural nets using Haskell because it’s far easier for me to reason precisely and mathematically with Haskell than with a referentially opaque and imprecise language such as Python.

Because it only takes one user to make language more popular. For example, Grammarly's first developer used Lisp as his main language, their linguistics is still written in a Lisp-family lang.

As a Haskell developer, I'd be more comfortable to read Haddock-generated documentation for types of symbols, unfortunately it doesn't have too many good quality libraries in the ML field yet.

I have written quite some Haskell in my life, but never a line of python. For me this tutorial makes sense, it shows the underlying maths. So I don’t really understand your negative attitude.

fast.ai did this with swift in their course https://course.fast.ai/part2

I have been casually playing around with TensorFlow for Swift since it was announced. It shows promise but I question long term support and development.

The Julia language with the Flux deep learning library is another very interesting but not mainstream path to take.

What are your concerns around the support / dev?

Google has a fairly sizable team around it, it has a fast path to adoption by the large pool of swift devs and fast.ai, and of course, google hype.

Chris leaving doesn't seem to be an issue: https://twitter.com/clattner_llvm/status/1222032740897284097

Google has kind of a reputation for killing projects.

I hope there is no long term support problem, but I don’t know. I am sort-of addicted to Lisp languages, but I admit there is a lot to like about Swift, and TensorFlow turtles all the way down in Swift is a great idea.

Not really my scene, but I've been liking the look of Torsten Scholak's Hasktorch:


I think that Hasktorch is a very cool project but it is not turtles all the way down: it is a Haskell API on top of PyTorch.

The Haskell bindings for TensorFlow are a little bit difficult for me to work with. When HaskTorch gets more mature and stable, it will hopefully be easier to use than the TensorFlow bindings.

In typescript (from scratch recognizing hand-written digits in browser):

Live: https://deep-learning.stackblitz.io/

Live edit code: https://stackblitz.com/edit/deep-learning

Nice. I really recommend that practitioners implement simple neural architectures from scratch for learning, but use TensorFlow, PyTorch, mxnet, etc. for production and serious research.

New frameworks like TensorFlow for Swift and Julia’s Flux are a little easier to understand if you read the code, but still complex stuff.

played with this for ~5 mins and it's insanely bad (maybe it "guessed" right 3/15 tries?) i.e. slightly better than random guesses, even with very clear handwriting :(

You had bad luck I guess. I just did 15 tries and it only got one wrong. Not really sure what this says though.

Are you talking about the "Sample" button, or drawing the digits yourself? It seems to have a very good accuracy on the samples, but gets a lot of my hand-drawn digits wrong.

Seems like a classic case of overfitting to be honest.

Ah I didn’t know you could draw yourself. That explains the difference.

The book linked at the top of the article is a great read[1]. Very easy to follow along if you want to do a similar thing yourself.

[1] http://neuralnetworksanddeeplearning.com/

Where is 8? I flipped the sample like 100+ times. Didn't encounter a single #8. I guess Haskell bytes back sometimes.

If this is a failing of the language, then at least blame the right one.


this is really not a good look for haskell.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact