
Apple machine learning in 2020: What’s new? - dsr12
https://machinethink.net/blog/new-in-apple-machine-learning-2020/
======
saagarjha
Remember when Apple created a “Machine Learning journal”? Well, it seems like
they’ve stopped publishing to it and now have gone back to introducing stuff
at presentations, if at all:
[https://machinelearning.apple.com/](https://machinelearning.apple.com/)

~~~
jldugger
IIRC, that blog is paired with conference proceedings and COVID has thrown a
lot of that in the air.

~~~
DonaldPShimoda
I'm not in ML, but in PL and HCI almost all conferences have proceeded on-
schedule, just in a virtual format.

The only exception I'm aware of is HOPL (History of Programming Languages).
They still published the papers/proceedings as usual, but have postponed a
physical gathering instead of meeting virtually because the conference
convenes only once every 10-15 years.

------
singhrac
I think it's still wild that neither Tensorflow nor PyTorch work on Apple's
MBP GPUs - AMD can't run ROCm on anything but Linux, and NVIDIA drivers aren't
supported if you wanted to get an external GPU.

~~~
grej
This combined with Microsoft's roadmap with a WSL that works on CUDA GPUs is
going to cost Apple a lot of ML/AI/HPC developer mindshare. Yes, we do a lot
of our work on remote machines, but it's not always the most convenient way to
experiment. I doubt my next machine will be a MacBook.

------
teruakohatu
> For example, the camera on the iPhone is different than the camera on the
> iPad, so you may want to create two versions of a model and send one to
> iPhone users of the app and the other to iPad users.

Are app developers shipping models that are so brittle they cannot handle a
different revision of Apple's camera?

I can understand shipping more complex models for devices with better CPU/GPU
or whatever Apple's AI accelerator is called, but not different cameras!

~~~
sillysaurusx
Image augmentations are _hard_ to add to training. It may seem easy, but it
requires a lot of thought.

(To back up a bit: Image augmentations are how you solve that problem. "How do
I make my model robust across different cameras?" It might be tempting to
gather labeled data from a variety of cameras, but that doesn't necessarily
result in a model that can handle newer, larger-res cameras. So one solution
is to distort the training data with augmentations so that the model can't
tell which resolution the input images are coming from.)

The other way to deal with it is to just downscale the camera's image to, say,
416x416. But that introduces a question: can different cameras give images
that look different when downscaled to 416x416? Sure they can! Cameras have a
dizzying array of features, and they perform differently in different lighting
conditions.

To return to the point about image augmentations being hard to add: It's so
easy to explain what your training code _should_ do "Just distort the hue a
bit" and there seem to be operations explicitly for that:
[https://www.tensorflow.org/api_docs/python/tf/image/adjust_h...](https://www.tensorflow.org/api_docs/python/tf/image/adjust_hue)
but when you go to train with them, you'll discover that backpropagation isn't
implemented, i.e. they break in training code.

I've been trying to build an equivalent of Kornia for tensorflow
[https://github.com/kornia/kornia](https://github.com/kornia/kornia) which is
a wonderful library that implements image augmentations using nothing but
differentiable primitives. Work is a bit slow, but I hope to release it in Mel
[https://github.com/shawwn/mel](https://github.com/shawwn/mel) (which will
hopefully look less like a TODO soon).

But all of this still raises the question of _which_ augmentations to add.
Work in this area is ongoing; see Gwern's excellent writeup at
[https://github.com/tensorfork/tensorfork/issues/35](https://github.com/tensorfork/tensorfork/issues/35)

Training a model per camera isn't necessarily a terrible idea, either. In the
future I predict that we'll see more and more "on-demand" models: models that
are JIT optimized for a target configuration (in this case, a specific
camera).

Robustness often comes at the cost of quality / accuracy
([https://arxiv.org/abs/2006.14536](https://arxiv.org/abs/2006.14536) recently
highlighted this). In situations where that last 2% of accuracy is crucial,
there are all kinds of tricks; training separate models is but one of many.

~~~
spott
Why are you trying to backpropagate over data augmentations? I've never done
that (or heard about it being done). Usually I just do the augmentations on
the input samples and then feed the augmented samples to the network.

Differentiable augmentations aren't necessary unless the augmentations are
midstream (so you have to propagate parameters above the augmentations, which
is weird) or have parameters (at which point you aren't learning how to work
on different views of the same sample, you are learning how to modify a sample
to be more learnable, which is a different problem that you are trying to
solve).

Don't get me wrong, augmenting samples to reduce device bias is a _hard_
problem, but you might be making it harder than it needs to be.

~~~
gwern
The data augmentations we are interested in are in fact 'midstream', as they
augment the examples before passing into the D or the classification loss but
you must backprop from that back through the augmentation into the original
model, because you don't want the augmentations to 'leak': the G is _not_
supposed to generate augmented samples, the augmentation is there to
regularize the D and reduce its ability to memorize real datapoints. It would
probably be better to consider them as a kind of consistency or metric loss
along the lines of SimCLR (which has helped inspire these very new GAN data
augmentation techniques). It's a bit weird, which is perhaps why despite its
simplicity (indicated by no less than _4_ simultaneous inventions of it in the
past few months), it hasn't been done before. You really should read the
linked Github thread if you are interested.

~~~
spott
Ah! I can see that in a GAN architecture. That makes much more sense.

It wasn't clear from your original post that you were augmenting _generated_
images, not real data.

~~~
gwern
You're augmenting the real data too.

------
ur-whale
Apple's secretive engineering culture is complete anathema to the ML world and
what the likes of DeepMind, OpenAI, Google AI are doing in terms of sharing.

This is IMO very visible in the "output" Apple has produced in the space of ML
: mostly infrastructure, and very little in the way of innovative tech and
research.

And changing culture is a very hard proposition from a management perspective,
unless you build a complete skunkworks-like independent entity within the
mothership.

------
elpakal
I hope Apple makes a CoreML -> Keras/other model types converter. This will
make it much more appealing for me to use their GUIs and buy a Mac

~~~
a-wu
Why not use something like onnx?

[https://onnx.ai/](https://onnx.ai/)

