
L2 Regularization and Batch Norm - signa11
https://blog.janestreet.com/l2-regularization-and-batch-norm/
======
samgd
Myrtle has a similar series that includes discussion about batch
normalisation: [https://myrtle.ai/learn/how-to-train-your-
resnet/](https://myrtle.ai/learn/how-to-train-your-resnet/)

[https://twitter.com/dcpage3/status/1141700299071066112](https://twitter.com/dcpage3/status/1141700299071066112)

Disclaimer: I work at Myrtle!

~~~
stared
I looked at it, and it is very good: good baselines, explanations,
visualizations and going deeper than a typical "it is a black box but if you
copy & paste it will work".

I was surprised by the nice network vis (and I did dive into the subject
before: [https://medium.com/inbrowserai/simple-diagrams-of-
convoluted...](https://medium.com/inbrowserai/simple-diagrams-of-convoluted-
neural-networks-39c097d2925b)). The only thing that looks clunky is the text
logs for training (a shameless plug:
[https://github.com/stared/livelossplot](https://github.com/stared/livelossplot)).

------
phonebucket
Jane Street is noted for its use of OCaml, so it's interesting to see that
their researchers do indeed use Python (judging from the code in that post, at
least).

~~~
joshvm
They've also used OCaml for deep learning: [https://blog.janestreet.com/deep-
learning-experiments-in-oca...](https://blog.janestreet.com/deep-learning-
experiments-in-ocaml/)

Although it seems to be an OCaml binding into TF, rather than a native
implementation.

~~~
stochastic_monk
There’s a similar tch-rs project wrapping libtorch, and in general, deeclaring
neural networks is particularly intuitive in functional languages.

------
ackbar03
Wow I didn't even know janestreet even posts these kind of articles. i always
assumed they were super secretive

~~~
ovi256
This is a popularization of things already published in open papers, so it
does not reveal anything specific about their activities. Any place employing
deep ML practitioners could have written this.

It could even be a red herring, as the most popular application of batch norm
is to Deep CNNs, and those are mostly used on computer vision problems. CV
does not seem important for option pricing, which is AFAIK Jane Street's big
money maker. Of course I can be very wrong about this. People have tried image
data as auxiliary inputs to financial data. Or you can apply Deep CNNs to 1D
data like timeseries - see WaveNet applied to timeseries forecasting.

------
1980phipsi
I wasn't familiar with batch normalization before, but I've had to do
something similar before in Stan to enforce that some model parameters (not
data) were exactly mean of 0 and standard deviation of 1.

------
SethTro
David Wu wrote KatoGo which is doing new and exciting stuff in Go Ai. He's
writing a ton of code and performing even more experiments.

~~~
nestorD
Have you got a reference for katoGo ?

edit: nevermind [https://blog.janestreet.com/accelerating-self-play-
learning-...](https://blog.janestreet.com/accelerating-self-play-learning-in-
go/)

------
bmh
Excellent article! I'd be really curious to see a treatment of the same topic,
but with ADAM.

~~~
jing
[https://www.fast.ai/2018/07/02/adam-weight-
decay/](https://www.fast.ai/2018/07/02/adam-weight-decay/)

