
Glow: Better Reversible Generative Models - ryanmercer
https://blog.openai.com/glow/
======
liquidise
Ethics/Implication question: How much time is left until video/photographic
evidence ceases to be a reliable factor in courts? I know that while projects
like this are happening, so are projects that work to identify fakes. If the
former manages to outpace the latter, this could become a real problem. My
mind tends to think of a dystopian extension of cops planting drugs on
suspects, where they could literally invent photographic evidence.

~~~
aerovistae
Would it be possible for phones/cameras to embed cryptographic information in
video/photo files in such a manner that proved their origin, and which was
tied to the pixels in image/video?

I don't know if that's a silly suggestion.

~~~
throwawaymath
Sure, you could design a camera that signs photos with a private key stored in
a hardware security module, then only trust signed photos and videos. It would
have to be resistant to physical tampering and side channel attacks. One way
to do this would be to stream bytes to the HSM as they're recorded, which
would then output the bytes and their signature for local storage.

The tricky bit here is closing the analog loophole (using this camera to
record a carefully constructed, high resolution fake) and preventing the HSM
from signing anything which wasn't recorded by the camera lens.

~~~
drdeca
If GPS satellites were to cryptographically sign their signals, would that be
sufficient to prevent spoofing of GPS signals (assuming the receiver of the
signals treated the signing appropriately)?

Like, I can imagine maybe there could be an attack where one could record gps
signals at some nearby place, and then play them back in slightly different
orders/rates to try to fool a receiving device into thinking it is somewhere
else (nearby).

But I don't know how much of a time delay would be needed in order to do that.
Could a timestamping service do timestamping quickly enough to prevent this
attack for internet connected devices?

I tried looking at how the signals for GPS work (like, how frequently it sends
time information and how detailed it is for civilians) but it seemed
complicated and I got confused and/or didn't try hard enough, so I didn't
arrive at an answer for how long it would take to spoof positions if one could
only delay real signals that one received.

Edit: the purpose of making the gps coords unspoofable would be to make it so
that even if the screen-in-front-of-camera attack was done, it would have to
be done at the same location and time as it would be claimed to have happened.

~~~
darkmighty
> If GPS satellites were to cryptographically sign their signals, would that
> be sufficient to prevent spoofing of GPS signals (assuming the receiver of
> the signals treated the signing appropriately)?

Nope, classic replay attack case (just record the GPS signal and replay to the
device at the desired location). You'd need a true time signal within the
device, e.g. and atomic clock to make it work (so you'd authenticate the
signed time against true time).

\---

There is another way, however. If we assume the hardware is tamper-proof
(unless drastically different methods are needed), then with strict timings we
can device a challenge-response system that's immune to replay attacks due to
relativity: simply transmit a signal A, have a known 3rd party (e.g. US
government servers in cellphone towers) sign your signal Sig(A) and
retransmit, and check that the delay matches the propagation delay you'd
expect from the cellphone tower distance, plus the fixed (and immutable since
it would be gov-controlled) processing delay. Your tamper-proof crypto-camera
would record its location and whether it trusts the location. Using cellular
signals is also better because GPS doesn't work indoors and is sensitive to
interference.

Since we're adding a cellular connection to our device, it would also be a
good idea to log its position on the state-controlled servers (again can be
done with cryptographic safety assuming non-tampered device), along with some
kind of intrusion detection system. As soon as it'd detect an attempt at
tampering, it would relay this attempt to the servers, storing the intrusion
and invalidating the authenticity subsequent recordings; a self-destruction
attempt of the key would probably be also wise.

\---

And now that I think about it, you'd probably want to put several keys/auths
in the device, from different organizations -- not only governments. That way
if the government authentication is positive but NGO's mismatch, you can
suspect a government-backed forgery attempt (analogously vice-versa).

------
DoctorOetker
Can someone please explain to me what a "1x1 convolution" is?

I've been wondering ever since I first started reading about 1x1 convolutions
a while back.

My background is not in artificial neural networks, but I understand their
single neuron operation: a linear combination of inputs (optional extra input
or offset), so this part behaves like any linear correlator, and then a
nonlinear but typically differentiable compression sigmoid.

I understand how convolutional neural networks operate, and that the synaptic
weights correspond to filter kernel weights (like point spread functions, or
impulse responses).

Given this engineering-like interpretation I have, can someone explain to me
what use convolution with a 1x1 filter has??

~~~
QuadmasterXLII
The trick is that convolutional neural nets operate on multichannel images.
Your intuition that the synaptic weights form point spread functions is
correct, but you have to remember that the activation of each neuron depends
on a sum of point spread functions, one for each channel in the previous
layer.

I like to think of 1x1 convolutions as pixel wise dense layers

Here’s a toy example of a useful 1x1 convolution: you could convert a color
image to greyscale by doing a 1x1 convolution with the kernel (.33, .33, .33)

~~~
DoctorOetker
thank you _very much_ , I suppose we can all agree that 1x1x3 (Width x Height
x Color) convolution would have been a lot clearer.

again, thanks for succinct and clear explanation!

~~~
daenz
Just to elaborate a bit more, 1x1 convolution is often used to reduce the
number of parameters from the previous layer to keep the total number of
parameters manageable.

~~~
DoctorOetker
yes, thank you, I understand the motivation, the reason I never understood
before was quite simple, the description "1x1 convolution" (correctly)
described that they were not doing convolutions across pixels.

It is good to point out what they _aren 't doing_, they should also point out
what they _are doing_.

Humor me by considering:

"Today I did not go to the zoo, I did not go to school, ..., and I did not go
by foot, nor by vehicle without weels, nor by a vehicle with only one wheel,
nor by a vehicle that had 3 or more wheels, and the vehicle was not motorised,
and I did not need to stand up on the vehicle"

While true, and possibly important to point out what I didn't do, it's
generally more helpful to describe what I _am doing_ , like "I rode my bicycle
to the supermarket"

------
brunt
Manipulating Neil deGrasse Tyson's face in the demo yields some hilariously
bad-looking results.

~~~
omgwtfbyobbq
If you max out smiling and beard, the result kinda looks like Tim Meadows.
Granted, that could be just me being yet another annoying white person. I'll
ask my wife about it later just in case.

------
dnautics
Interesting thing about the default image, "beard" on a woman creates a more
typically "masculine" jawline but doesn't add facial hair.

Increasing blondeness also increases smile.

~~~
NegatioN
# EDIT: it seems the post below me is actually correct, and my post is
incorrect in this case.

Just as an aside here: "Blondeness" and "beard" are probably just labels the
authors found correspond the most to the latent variables in this case. This
means that there won't be a perfect translation between those words and what
these variables directly respond to in the network.

So although the training data may have been biased with more smiling blonde
people, it doesn't necessarily have to have been so. It might be that what
this latent variable encodes just does something else in edge cases where
there are few examples.

~~~
matheist
Not so. They split the data between "bearded faces" and "non-bearded faces"
and compute the vector from one to the other, and use that to alter a given
face. (It reminds me of the typical word2vec example of man - woman + queen =
king.)

See their code snippet halfway down their page, or "Semantic Manipulation" on
page 8 of their linked paper.

~~~
nerdponx
This sounds like yet another appearance of the tank-sky problem.

~~~
blt
There is no mechanism in this paper (or in a standard VAE, or GAN) to
encourage that a single human-understandable semantic quantity should be
captured in a single dimension of the latent code. So, in general, it won't
happen. "Blondeness" is spread out over all dimensions of the latent code.
Therefore the method in this paper (summarized by matheist) is totally
reasonable. There are other autoencoder-type generative models that try to
concentrate each attribute in one dimension, usually by using the class labels
as an additional input. But that is not the focus of this paper.

~~~
hanrelan
Do you have a link to a paper that does this by any chance? I'm interested in
learning more but not sure what to search for.

------
homarp
Code is now available:
[https://github.com/openai/glow](https://github.com/openai/glow)

------
girlpower32
Glow is also the name of the machine learning compiler that's built inside
PyTorch: [https://github.com/pytorch/glow/](https://github.com/pytorch/glow/)

------
tw1010
I mean, it has potential. But lot's of things have potential. I think I'll be
more interested in a few years, but right now it mostly just looks too weird
(not even uncanny valley) to feel like it could be used for anything.

------
chris_mc
Reminds me of the Ocean's 12 (13?) scene where the brothers are in the boring
machine and are manipulating the stolen FBI mugshots to disguise the group
members before they get found out.

