For a layman like myself, it almost sounds like someone patenting the idea of a chair, or a table.
For most people it probably doesnt matter. Other big techs need to take notice
More precisely, network layer pattern descriptions surely lie in the class of a fact or discovery - https://www.prv.se/en/patents/applying-for-a-patent/before-t... - which should make the claim close to unenforceable at any level.
Unless the patent we're discussing covers a very specific implementation, which is uniquely efficient.
Megacorps have nothing to worry about. They have enormous patent war chests and routinely violate each other's patents without even thinking twice. Engineers at such companies (including Google) are discouraged from even looking at patents, ever, not to mention deliberately researching anything patent related.
That probably greatly depends on some of the programs listed on the page above (check out the one at the very bottom, that suggests to my non lawyer mind that Google won't do so unless you sue them about patents first). Again, not a lawyer.
Disclosure: work at Google, do not speak for Google. Just helping with pointing out public info.
Yes, it is sometimes possible to prevent a patent from being filed with prior art, but you need to know about the application to provide the prior art, or you need to hope the patent examiner does due diligence, which they often don't.
So you have two options:
1. monitor all patent applications and reactively file prior art against any that might be problematic.
2. Just file patents on the things
Patent examiners are more likely to find patents than other forms of prior art, so you're more likely to just accomplish (1) anyway, and in a lawsuit, you don't need to spend time proving that your thing was prior art if it was patented. So in a lawsuit it was cheaper.
In other words, patents prevent patenting, prior art avoids losing lawsuits, but not-losing a lawsuit is still expensive.
Can people patent computing F(x) when F is just some function when it has such a low descriptive complexity? Where's the cutoff?
Here is the document describing the claim: https://register.epo.org/application?documentId=E2DY02O02738...
- Google is rather open about deep learning development, they want to protect the ecosystem from patent trolls. It is a defensive patent to eventually punish unfair players who don't want to play the collaborative game
- Google noticed OpenAI, which (legally) built stuff using some Google's findings in the field. Now OpenAI is aiming at becoming a multi-billion dollars successful "non-profit" company, and Google wants its share of the money if it happens.
>Training state-of-the-art, deep neural networks is computationally expensive. One way to reduce the training time is to normalize the activities of the neurons. A recently introduced technique called batch normalization uses the distribution of the summed input to a neuron over a mini-batch of training cases to compute a mean and variance which are then used to normalize the summed input to that neuron on each training case. This significantly reduces the training time in feed-forward neural networks. However, the effect of batch normalization is dependent on the mini-batch size and it is not obvious how to apply it to recurrent neural networks. In this paper, we transpose batch normalization into layer normalization by computing the mean and variance used for normalization from all of the summed inputs to the neurons in a layer on a single training case. Like batch normalization, we also give each neuron its own adaptive bias and gain which are applied after the normalization but before the non-linearity. Unlike batch normalization, layer normalization performs exactly the same computation at training and test times. It is also straightforward to apply to recurrent neural networks by computing the normalization statistics separately at each time step. Layer normalization is very effective at stabilizing the hidden state dynamics in recurrent networks. Empirically, we show that layer normalization can substantially reduce the training time compared with previously published techniques.
just forget batched normalization
instead of computing mean and variance over a batch, compute mean and variance over all the incoming dendrites of a neuron, for each neuron; result: now you are using the same function during train and test time, putting it on a more rigorous mathematical footing, and it's adaptable to RNN
Train Resnet-50 with batchnorm, and with layer norm (or weight norm, or group norm), and you will see batchnorm is still the best.