You mean like...both of the languages you listed?
There's an obviously superior, faster, simpler language when working with vectors (APL), but people are obsessed with new languages.
If you really think it can be done in Python better than in Haskell, why not demo it in Python? You'll get internet points, and if you're right, you'll have something to show for it.
Not OP, but here: https://gist.github.com/stfwn/62e51d86ca4ff155becd3c6a14adf6...
You should be able to wget the file and run it (Python 3) from start to finish without any set-up and get ~88% accuracy on the test set.
It uses all the data (not one-sixth like in the blog posts) and does 200 iterations by default, so here's the loss plot on the training set if you want to skip all the fun: https://i.imgur.com/F57zmXV.png
Of course, I’m more familiar with Haskell. The fact you’re more familiar with imperative languages isn’t the argument for readability you think it is.
The best way to understand functional programming without learning it is to learn an imperative language like Ruby that everyone says is easy but that is actually hard. Then it's:
programming you know -> experience you don't have yet -> code everyone uses
Ruby makes the jump to the last step without explaining the middle one. That's what the whole convention over configuration part is about.
Functional programming languages do the exact same thing. All of those arrows and symbols and syntactic sugar transpile directly to Lisp. They're basically shorthand. Unfortunately, I've never seen tutorials talk about that translation.
To get the middle part, it's probably best to start with the low-hanging fruit. Probably learn something like spreadsheets, then Scheme, then ClojureScript, then F#. I never made it as far as Haskell or Scala.
I always get lost somewhere around monads and impurity. And all FP languages fall down at that point in similar ways anyway. You either treat mutable variables as imaginary numbers that aren't examined until they must be (lazily), or throw the rules out the window and let variables be reassigned or renamed to themselves with new values, which breaks the whole point of using FP in the first place. It's pretty much an open problem, and the failure to solve it satisfactorily is why no FP language has caught on yet in the mainstream IMHO.
deltas :: [Float] -> [Float] -> [([Float], [[Float]])] -> ([[Float]], [[Float]])
deltas xv yv layers = let
(avs@(av:_), zv:zvs) = revaz xv layers
delta0 = zipWith (*) (zipWith dCost av yv) (relu' <$> zv)
in (reverse avs, f (transpose . snd <$> reverse layers) zvs [delta0]) where
f _  dvs = dvs
f (wm:wms) (zv:zvs) dvs@(dv:_) = f wms zvs $ (:dvs) $
zipWith (*) [(sum $ zipWith (*) row dv) | row <- wm] (relu' <$> zv)
descend av dv = zipWith (-) av ((eta *) <$> dv)
learn :: [Float] -> [Float] -> [([Float], [[Float]])] -> [([Float], [[Float]])]
learn xv yv layers = let (avs, dvs) = deltas xv yv layers
in zip (zipWith descend (fst <$> layers) dvs) $
zipWith3 (\wvs av dv -> zipWith (\wv d -> descend wv ((d*) <$> av)) wvs dv)
(snd <$> layers) avs dvs
Just add intermediate expressions and annotate their types, maybe even with some type synonyms for intermediate types, because code is for humans.
As a Haskell developer, I'd be more comfortable to read Haddock-generated documentation for types of symbols, unfortunately it doesn't have too many good quality libraries in the ML field yet.
The Julia language with the Flux deep learning library is another very interesting but not mainstream path to take.
Google has a fairly sizable team around it, it has a fast path to adoption by the large pool of swift devs and fast.ai, and of course, google hype.
Chris leaving doesn't seem to be an issue: https://twitter.com/clattner_llvm/status/1222032740897284097
The Haskell bindings for TensorFlow are a little bit difficult for me to work with. When HaskTorch gets more mature and stable, it will hopefully be easier to use than the TensorFlow bindings.
Live edit code: https://stackblitz.com/edit/deep-learning
New frameworks like TensorFlow for Swift and Julia’s Flux are a little easier to understand if you read the code, but still complex stuff.
Seems like a classic case of overfitting to be honest.