
Google patents application: Batch normalization layers - wwilson
https://patents.google.com/patent/US20160217368A1/en
======
dynamite-ready
Prior to this, Google also successfully patented dropout layers, is that
right? This being the case, what's the implication for hobbyists (like
myself), businesses and researchers, etc? It's hard to imagine not being able
to use such an abstract concept in your code, for fear of litigation.

For a layman like myself, it almost sounds like someone patenting the idea of
a chair, or a table.

~~~
mkagenius
If you start to become a revenue issue for google, they will come after you.
Till then should be fine.

~~~
v7p1Qbt1im
Except, Google has never used patents offensively. Or at least I can‘t recall
them doing it. They mostly just patent stuff for defense against patent trolls
and companies like Oracle.

------
heyitsguay
Well, attempted to patent batch normalization back in 2015, it looks like its
application status is still pending. Which in some ways is worse, since that
sounds closer to when it was first becoming popular, and from the description
it does sound like they tried to patent the general computational method, not
any specific implementation.

Can people patent computing F(x) when F is just some function when it has such
a low descriptive complexity? Where's the cutoff?

~~~
hprotagonist
The alice corp decision really undercut the “do it in a computer” patents, or
at least that’s how i understand it.

[https://www.supremecourt.gov/opinions/13pdf/13-298_7lh8.pdf](https://www.supremecourt.gov/opinions/13pdf/13-298_7lh8.pdf)

------
dynamite-ready
Apparently, the patent has already been granted in Europe according to this
page - [https://piip.co.kr/en-us/news/batch-normalization-layers-
goo...](https://piip.co.kr/en-us/news/batch-normalization-layers-google)

~~~
rerx
This is extremely surprising to me, but they really seem to have been granted
the patent:
[https://register.epo.org/application?number=EP16704121&tab=m...](https://register.epo.org/application?number=EP16704121&tab=main)

Here is the document describing the claim:
[https://register.epo.org/application?documentId=E2DY02O02738...](https://register.epo.org/application?documentId=E2DY02O02738DSU&number=EP16704121&lng=en&npl=false)

~~~
dx034
Did they check it for validity when it was granted, or will that only happen
once someone disputes it? I'm not sure if checks are that thorough upon
application.

------
antpls
I see two possible explanations to this patent :

\- Google is rather open about deep learning development, they want to protect
the ecosystem from patent trolls. It is a defensive patent to eventually
punish unfair players who don't want to play the collaborative game

\- Google noticed OpenAI, which (legally) built stuff using some Google's
findings in the field. Now OpenAI is aiming at becoming a multi-billion
dollars successful "non-profit" company, and Google wants its share of the
money if it happens.

------
DoctorOetker
Isn't this outdated retro technology compared to layer normalization?

>Training state-of-the-art, deep neural networks is computationally expensive.
One way to reduce the training time is to normalize the activities of the
neurons. A recently introduced technique called batch normalization uses the
distribution of the summed input to a neuron over a mini-batch of training
cases to compute a mean and variance which are then used to normalize the
summed input to that neuron on each training case. This significantly reduces
the training time in feed-forward neural networks. However, the effect of
batch normalization is dependent on the mini-batch size and it is not obvious
how to apply it to recurrent neural networks. In this paper, we transpose
batch normalization into layer normalization by computing the mean and
variance used for normalization from all of the summed inputs to the neurons
in a layer on a single training case. Like batch normalization, we also give
each neuron its own adaptive bias and gain which are applied after the
normalization but before the non-linearity. Unlike batch normalization, layer
normalization performs exactly the same computation at training and test
times. It is also straightforward to apply to recurrent neural networks by
computing the normalization statistics separately at each time step. Layer
normalization is very effective at stabilizing the hidden state dynamics in
recurrent networks. Empirically, we show that layer normalization can
substantially reduce the training time compared with previously published
techniques.

[https://arxiv.org/abs/1607.06450](https://arxiv.org/abs/1607.06450)

just forget batched normalization

instead of computing mean and variance over a batch, compute mean and variance
over all the incoming dendrites of a neuron, for each neuron; result: now you
are using the same function during train and test time, putting it on a more
rigorous mathematical footing, and it's adaptable to RNN

~~~
p1esk
* Isn't this outdated retro technology compared to layer normalization?*

Yeah, no.

Train Resnet-50 with batchnorm, and with layer norm (or weight norm, or group
norm), and you will see batchnorm is still the best.

------
nightcracker
What happened to prior art? Why are all these garbage patents getting
approved?

~~~
dmh2000
the theory and original paper was developed at google (according to
Wikipedia). my guess is that if they had the original tech, then they can
patent it. but by publishing the paper in 2015 they created prior art that
stops others from claiming it. anyone please correct me if this isn't right.

[http://proceedings.mlr.press/v37/ioffe15.pdf](http://proceedings.mlr.press/v37/ioffe15.pdf)

------
hgoel
"Don't be evil" died a while ago, so it's free real estate for then anyway.

------
mlthoughts2018
Why is Google so busy patenting useless layer types?

