Hacker News new | past | comments | ask | show | jobs | submit login
Google patents application: Batch normalization layers (patents.google.com)
63 points by wwilson 15 days ago | hide | past | web | favorite | 32 comments

Prior to this, Google also successfully patented dropout layers, is that right? This being the case, what's the implication for hobbyists (like myself), businesses and researchers, etc? It's hard to imagine not being able to use such an abstract concept in your code, for fear of litigation.

For a layman like myself, it almost sounds like someone patenting the idea of a chair, or a table.

If your interests are against Google in the future, and importantly they find it is necessary to go after you, they will sue you upon using those layers.

For most people it probably doesnt matter. Other big techs need to take notice

It feels like there should be a change in international IP laws to address patents as wide as this. I mean, in this case, even without expert analysis, I feel these network layer patterns can quite easily be likened to the widely used Gang of Four design patterns, or perhaps other concepts that have a highly technical basis. Like lenses, for example.

More precisely, network layer pattern descriptions surely lie in the class of a fact or discovery - https://www.prv.se/en/patents/applying-for-a-patent/before-t... - which should make the claim close to unenforceable at any level.

Unless the patent we're discussing covers a very specific implementation, which is uniquely efficient.

>> Other big techs need to take notice

Megacorps have nothing to worry about. They have enormous patent war chests and routinely violate each other's patents without even thinking twice. Engineers at such companies (including Google) are discouraged from even looking at patents, ever, not to mention deliberately researching anything patent related.

Correct. At my Fortune 100 tech companies engineers are told to not look up patents, and asked to generate two patents a year at my level just to keep the war chest full.

If you start to become a revenue issue for google, they will come after you. Till then should be fine.

Except, Google has never used patents offensively. Or at least I can‘t recall them doing it. They mostly just patent stuff for defense against patent trolls and companies like Oracle.


That probably greatly depends on some of the programs listed on the page above (check out the one at the very bottom, that suggests to my non lawyer mind that Google won't do so unless you sue them about patents first). Again, not a lawyer.

Disclosure: work at Google, do not speak for Google. Just helping with pointing out public info.

How would anyone be able to tell if you train with dropout layers?

Or with batchnorm for that matter. The multiplicative part of BN can be folded into weights after training, and the additive portion can be folded into bias. Nobody would know you used BN in the first place. Dropout disappears altogether at inference time.

Most of the time litigation is just to harass and tire the weaker opponent, truth may not even matter.

Hopefully they would include it in their Open Patent Non-Assertion Pledge:


Well that is also Googles fear. That some patent troll will come along and patent the tech.

You don't need to patent something to prevent others from patenting it, prior art prevents patenting by itself.

The existence of Google's patents on things like these are proof otherwise.

Yes, it is sometimes possible to prevent a patent from being filed with prior art, but you need to know about the application to provide the prior art, or you need to hope the patent examiner does due diligence, which they often don't.

So you have two options:

1. monitor all patent applications and reactively file prior art against any that might be problematic.

2. Just file patents on the things

Patent examiners are more likely to find patents than other forms of prior art, so you're more likely to just accomplish (1) anyway, and in a lawsuit, you don't need to spend time proving that your thing was prior art if it was patented. So in a lawsuit it was cheaper.

In other words, patents prevent patenting, prior art avoids losing lawsuits, but not-losing a lawsuit is still expensive.

Well, attempted to patent batch normalization back in 2015, it looks like its application status is still pending. Which in some ways is worse, since that sounds closer to when it was first becoming popular, and from the description it does sound like they tried to patent the general computational method, not any specific implementation.

Can people patent computing F(x) when F is just some function when it has such a low descriptive complexity? Where's the cutoff?

The alice corp decision really undercut the “do it in a computer” patents, or at least that’s how i understand it.


Such patent includes RSA patent, so I think it is on pretty solid ground.

Apparently, the patent has already been granted in Europe according to this page - https://piip.co.kr/en-us/news/batch-normalization-layers-goo...

This is extremely surprising to me, but they really seem to have been granted the patent: https://register.epo.org/application?number=EP16704121&tab=m...

Here is the document describing the claim: https://register.epo.org/application?documentId=E2DY02O02738...

Did they check it for validity when it was granted, or will that only happen once someone disputes it? I'm not sure if checks are that thorough upon application.

I was under the impression that Europe didn't allow for software patents.

That's true, but batch normalization is not software any more than multiplication is.

Are you saying that Europe allows patents on pure math?

No that's not what I'm saying

I see two possible explanations to this patent :

- Google is rather open about deep learning development, they want to protect the ecosystem from patent trolls. It is a defensive patent to eventually punish unfair players who don't want to play the collaborative game

- Google noticed OpenAI, which (legally) built stuff using some Google's findings in the field. Now OpenAI is aiming at becoming a multi-billion dollars successful "non-profit" company, and Google wants its share of the money if it happens.

Isn't this outdated retro technology compared to layer normalization?

>Training state-of-the-art, deep neural networks is computationally expensive. One way to reduce the training time is to normalize the activities of the neurons. A recently introduced technique called batch normalization uses the distribution of the summed input to a neuron over a mini-batch of training cases to compute a mean and variance which are then used to normalize the summed input to that neuron on each training case. This significantly reduces the training time in feed-forward neural networks. However, the effect of batch normalization is dependent on the mini-batch size and it is not obvious how to apply it to recurrent neural networks. In this paper, we transpose batch normalization into layer normalization by computing the mean and variance used for normalization from all of the summed inputs to the neurons in a layer on a single training case. Like batch normalization, we also give each neuron its own adaptive bias and gain which are applied after the normalization but before the non-linearity. Unlike batch normalization, layer normalization performs exactly the same computation at training and test times. It is also straightforward to apply to recurrent neural networks by computing the normalization statistics separately at each time step. Layer normalization is very effective at stabilizing the hidden state dynamics in recurrent networks. Empirically, we show that layer normalization can substantially reduce the training time compared with previously published techniques.


just forget batched normalization

instead of computing mean and variance over a batch, compute mean and variance over all the incoming dendrites of a neuron, for each neuron; result: now you are using the same function during train and test time, putting it on a more rigorous mathematical footing, and it's adaptable to RNN

* Isn't this outdated retro technology compared to layer normalization?*

Yeah, no.

Train Resnet-50 with batchnorm, and with layer norm (or weight norm, or group norm), and you will see batchnorm is still the best.

What happened to prior art? Why are all these garbage patents getting approved?

the theory and original paper was developed at google (according to Wikipedia). my guess is that if they had the original tech, then they can patent it. but by publishing the paper in 2015 they created prior art that stops others from claiming it. anyone please correct me if this isn't right.


"Don't be evil" died a while ago, so it's free real estate for then anyway.

Why is Google so busy patenting useless layer types?

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact