
Score matching with Langevin Sampling: a new contender to GANs - sebg
https://ajolicoeur.wordpress.com/the-new-contender-to-gans-score-matching-with-langevin-sampling/
======
Fede_V
Alexia is one of the most original researchers working on GANs: she invented
the relativistic discriminator (a brilliant and obvious idea in hindsight:
[https://arxiv.org/abs/1807.00734](https://arxiv.org/abs/1807.00734)) which is
one of the easiest tweaks you can make to instantly boost your GAN results.

~~~
derefr
There's gotta be a better word for this than "obvious."

The key thing about ideas that are "brilliant and obvious in hindsight" is
that the world was already ready for them, and so nothing needed to change for
them to happen; i.e., they don't have any prerequisites that aren't already in
place. They "just" needed someone to actually notice that there were some
pieces that could be fit together in a novel way.

There's no word I know of that captures this idea of "the world being ready
for" the idea, though. Is the idea "incremental" in hindsight? "Elegant" in
hindsight? "Free" in hindsight?

~~~
nmstoker
Apposite?

~~~
jimmydorry
Apposite: apt in the circumstances or in relation to something.

Neat, I learned a new word.

------
fxtentacle
In short, this is a super cool approach to replace the discriminator in GANs
with something that doesn't need to be trained and provably converges to the
correct result.

"256×256 images cannot be done reliably without 8 V100 GPUs or more!"

That's quite sad because that means this approach is far out of reach for any
hobby researcher and for most universities.

~~~
deeeeeplearning
AWS has V100's available so most Universities with a decent budget should be
able to swing this.

~~~
fxtentacle
Except that cloud-ified V100s are significantly less powerful than if you have
direct access to the hardware. Last time I checked, in AWS they're actually
external devices mapped in over GBit ethernet, which is significantly slower
than the 8GB/s that PCIe x4 has.

~~~
p1esk
I routinely switch between AWS 8x V100 instances and on-premise 8x V100
servers and I observe no difference in speed (time per epoch).

~~~
Reelin
Presumably that depends on maximum PCIe bandwidth consumption before your
workload bottlenecks elsewhere? A 2018 benchmark
([https://www.pugetsystems.com/labs/hpc/PCIe-X16-vs-X8-with-4-...](https://www.pugetsystems.com/labs/hpc/PCIe-X16-vs-X8-with-4-x-Titan-
V-GPUs-for-Machine-Learning-1167/)) seems to indicate that x8 isn't generally
a bottleneck for common (at the time) workloads. x8 is a far cry from the
claimed gigabit ethernet though!

~~~
p1esk
AWS is tricky in terms of how storage is provisioned - I don't remember
details, but it's easy to put your datasets on storage that is connected to
your GPU servers over 1Gb link. That could easily become a bottleneck.
Datasets should live on Elastic Block Storage or something like that, over
high speed links. Again, it's been a while since I looked into that, so I
don't remember the details.

~~~
Reelin
The earlier comment claimed that the GPUs (!!!) were located elsewhere on the
network; I suspect that the scenario you describe is what they intended to
refer to.

(IIRC AWS offers compute optimized instances with a volume that's guaranteed
to be backed by blocks on a local NVMe drive.)

~~~
nl
I think they are confused with AWS Elastic Inference. That is a different
thing which does have network attached accelerators:

 _Amazon Elastic inference accelerators are GPU-powered hardware devices that
are designed to work with any EC2 instance, Sagemaker instance, or ECS task to
accelerate deep learning inference workloads at a low cost. When you launch an
EC2 instance or an ECS task with Amazon Elastic Inference, an accelerator is
provisioned and attached to the instance over the network._

[https://aws.amazon.com/machine-learning/elastic-
inference/fa...](https://aws.amazon.com/machine-learning/elastic-
inference/faqs/)

------
rjeli
Reminds me of JEMs from Grathwohl et al last year [0]. They train a generative
model by treating it an an Energy Based Model using stochastic langevin
gradients. I’m curious as to how it relates to the models in this post.

[0] [https://arxiv.org/abs/1912.03263](https://arxiv.org/abs/1912.03263)

------
GistNoesis
From the Pascal Vincent link : "Beware that this usage differs slightly from
traditional statistics terminology where score usually refers to the
derivative of the log likelihood with respect to parameters,whereas here we
are talking about a score with respect to the data."

------
toxik
I wish this essay wasn’t littered with emojis :(

~~~
aperrien
I learned something new and valuable today. The emoji make no difference at
all to me, as they do not affect the quality of what I have learned by any
discernible measure.

~~~
notsuoh
If Geoff Hinton or Francois Chollet add emojis to their writing, it is
visually less pleasing in my opinion but I agree with you overall. When it's
someone I don't know though, it makes me trust the writing less because I
can't strictly verify the contents of what is being discussed easily for work
like this, so it does make a difference because I guess it's less
"professional" in a way.

~~~
sheikheddy
I've noticed it more and more in the ML research community, I think it's
mostly the influence of twitter and medium articles. For a blog it really is
fine though and I'm comfortable with language evolving in an imprecise fashion
as long as the emojis don't try to do much more than add flavor.

~~~
grp000
Most math heavy content is written very drily. Seeing emojis makes reading it
feel more comfy and human. If someone maps the greek alphabet to emojis, you
could enjoy the integration on a deeper, more intimate level. The math of
emotion!

------
natch
Someday the field will figure out that not all images of interest are squares.
That will be a great advance. I realize some people have hacked up personal
branches of projects to support non-square rectangles but it really needs to
become mainstream or else we’re going to stay stuck in this “AI square
winter.”

~~~
p1esk
I don't get it - is there anything preventing you using non-square images
today?

~~~
natch
Just the massive inconvenience of having to track down and implement the
changes necessary to add non-square image support on most platforms. So if you
think that is a minor thing that can be dismissed as nothing, then no. Or one
can always crop, or resize, both of which throw away information, or pad, for
which the underlying behavior is undocumented. I just don’t think any of these
are great options. Do you?

If the answer is “you can always write your own” that is true, but it’s just
underlining my point that the problem is not yet solved.

~~~
p1esk
_support on most platforms_

What do you mean by "platforms"?

~~~
natch
Deep learning frameworks: Keras, Pytorch, Tensorflow, Core ML, etc.

~~~
p1esk
You might be confused. There's nothing in any of those platforms that dictate
you use square images.

~~~
natch
It's possible I am confused, yes. But I'm just going by what is in all the
documentation and tutorials I have encountered.

It may have been solved in a lab somewhere, but the solution hasn't made it
out into code usable by mere mortals, as far as I can tell. You may be
applying a very special meaning of the words "nothing" and "dictate" it's just
that it's very well hidden how to do this.

I'm not alone. Here are examples of other people struggling with non-square
images and not succeeding:

[https://stats.stackexchange.com/questions/240690/non-
square-...](https://stats.stackexchange.com/questions/240690/non-square-
images-for-image-classification)

[https://github.com/tanakataiki/ssd_kerasV2/issues/10](https://github.com/tanakataiki/ssd_kerasV2/issues/10)

[https://github.com/allanzelener/YAD2K/issues/51](https://github.com/allanzelener/YAD2K/issues/51)

[https://github.com/eriklindernoren/PyTorch-
YOLOv3/issues/277](https://github.com/eriklindernoren/PyTorch-
YOLOv3/issues/277)

[https://stackoverflow.com/questions/49893741/tensorflow-
cnn-...](https://stackoverflow.com/questions/49893741/tensorflow-cnn-for-non-
square-image)

[https://github.com/ml-hongkong/keras-transfer-learning-
for-o...](https://github.com/ml-hongkong/keras-transfer-learning-for-
oxford102/issues/3)

~~~
p1esk
You are confusing platforms with individual models. A platform, such as
Pytorch or Tensorflow, does not care what shape is your input. You design and
train a model for the whatever inputs you want. On the other hand, if someone
trained a specific model architecture (e.g. YOLOv3 or Resnet-50) on some
dataset of square images, then yes, this particular pre-trained model will
expect the same input size and shape as what it was trained on. Does it make
sense? If you take a beginner level course on deep learning (e.g. Stanford
CS231n or FastAI course) you will immediately realize there's nothing (in any
sense of the word) that prevents you from using any input shape to train your
model.

If you don't want to learn the tools you use, you will need to find someone
who will train a model on your images. If you're willing to pay for it you
will find plenty of help. However, if you want to neither learn nor pay, then
what exactly are you complaining about?

