Hacker News new | past | comments | ask | show | jobs | submit login
Apple machine learning in 2020: What’s new? (machinethink.net)
95 points by dsr12 3 days ago | hide | past | web | favorite | 25 comments





Remember when Apple created a “Machine Learning journal”? Well, it seems like they’ve stopped publishing to it and now have gone back to introducing stuff at presentations, if at all: https://machinelearning.apple.com/

They are still publishing in other venues, like this pretty recent CVPR repo and paper.

https://github.com/apple/ml-quant


IIRC, that blog is paired with conference proceedings and COVID has thrown a lot of that in the air.

I'm not in ML, but in PL and HCI almost all conferences have proceeded on-schedule, just in a virtual format.

The only exception I'm aware of is HOPL (History of Programming Languages). They still published the papers/proceedings as usual, but have postponed a physical gathering instead of meeting virtually because the conference convenes only once every 10-15 years.


Huh? Their posting was never consistent. They had a gap from December 2018 to June 2019. Their last post was in December 2019. It's likely they just don't post in the months leading up to WWDC.

I think it's still wild that neither Tensorflow nor PyTorch work on Apple's MBP GPUs - AMD can't run ROCm on anything but Linux, and NVIDIA drivers aren't supported if you wanted to get an external GPU.

This combined with Microsoft's roadmap with a WSL that works on CUDA GPUs is going to cost Apple a lot of ML/AI/HPC developer mindshare. Yes, we do a lot of our work on remote machines, but it's not always the most convenient way to experiment. I doubt my next machine will be a MacBook.

There seems to be an ongoing work for Vulcan Compute support for Tensorflow. But the mlir repo moved at the end of 2019 and I don't see where (or if) the discussion and PR continue, because the new repo doesn't even use Github Issues.

  https://github.com/tensorflow/mlir/issues/60
  https://github.com/tensorflow/mlir/pull/118
  https://github.com/tensorflow/mlir/

> For example, the camera on the iPhone is different than the camera on the iPad, so you may want to create two versions of a model and send one to iPhone users of the app and the other to iPad users.

Are app developers shipping models that are so brittle they cannot handle a different revision of Apple's camera?

I can understand shipping more complex models for devices with better CPU/GPU or whatever Apple's AI accelerator is called, but not different cameras!


There might be some other reasons for shipping different models for the iPad vs the iPhone. F.e., if the iPad is more often used inside rather than outside, you could use a fine-tuned version of your big CNN to this smaller set of classes.

Another reason is that the more powerful iPad processor can handle larger networks

Depends on the model. E.g. image enhancement or super resolution models are sensitive to the camera model and can be trained to fix artifacts introduced by specific cameras.

SE has one camera, 11 has two, iPad has two and LIDAR ...

Fairly precise camera calibrations remain important in photogrammetry applications.

Training data pays attention to really weird things.

Someone was telling me of ML cancer detection that was unexpectedly training on the ruler found in most images of cancer.

I can see models based on an image sensor could inadvertently optimize for sensor size or geometry.


Image augmentations are hard to add to training. It may seem easy, but it requires a lot of thought.

(To back up a bit: Image augmentations are how you solve that problem. "How do I make my model robust across different cameras?" It might be tempting to gather labeled data from a variety of cameras, but that doesn't necessarily result in a model that can handle newer, larger-res cameras. So one solution is to distort the training data with augmentations so that the model can't tell which resolution the input images are coming from.)

The other way to deal with it is to just downscale the camera's image to, say, 416x416. But that introduces a question: can different cameras give images that look different when downscaled to 416x416? Sure they can! Cameras have a dizzying array of features, and they perform differently in different lighting conditions.

To return to the point about image augmentations being hard to add: It's so easy to explain what your training code should do "Just distort the hue a bit" and there seem to be operations explicitly for that: https://www.tensorflow.org/api_docs/python/tf/image/adjust_h... but when you go to train with them, you'll discover that backpropagation isn't implemented, i.e. they break in training code.

I've been trying to build an equivalent of Kornia for tensorflow https://github.com/kornia/kornia which is a wonderful library that implements image augmentations using nothing but differentiable primitives. Work is a bit slow, but I hope to release it in Mel https://github.com/shawwn/mel (which will hopefully look less like a TODO soon).

But all of this still raises the question of which augmentations to add. Work in this area is ongoing; see Gwern's excellent writeup at https://github.com/tensorfork/tensorfork/issues/35

Training a model per camera isn't necessarily a terrible idea, either. In the future I predict that we'll see more and more "on-demand" models: models that are JIT optimized for a target configuration (in this case, a specific camera).

Robustness often comes at the cost of quality / accuracy (https://arxiv.org/abs/2006.14536 recently highlighted this). In situations where that last 2% of accuracy is crucial, there are all kinds of tricks; training separate models is but one of many.


> To return to the point about image augmentations being hard to add: It's so easy to explain what your training code should do "Just distort the hue a bit" and there seem to be operations explicitly for that: https://www.tensorflow.org/api_docs/python/tf/image/adjust_h.... but when you go to train with them, you'll discover that backpropagation isn't implemented, i.e. they break in training code.

Why not do the data augmentation during preprocessing (so that the transformations don't have to be done by differentiable transforms)? I.e., map over a tf.Dataset with the transformation (and append to the original dataset).


Why are you trying to backpropagate over data augmentations? I've never done that (or heard about it being done). Usually I just do the augmentations on the input samples and then feed the augmented samples to the network.

Differentiable augmentations aren't necessary unless the augmentations are midstream (so you have to propagate parameters above the augmentations, which is weird) or have parameters (at which point you aren't learning how to work on different views of the same sample, you are learning how to modify a sample to be more learnable, which is a different problem that you are trying to solve).

Don't get me wrong, augmenting samples to reduce device bias is a hard problem, but you might be making it harder than it needs to be.


The data augmentations we are interested in are in fact 'midstream', as they augment the examples before passing into the D or the classification loss but you must backprop from that back through the augmentation into the original model, because you don't want the augmentations to 'leak': the G is not supposed to generate augmented samples, the augmentation is there to regularize the D and reduce its ability to memorize real datapoints. It would probably be better to consider them as a kind of consistency or metric loss along the lines of SimCLR (which has helped inspire these very new GAN data augmentation techniques). It's a bit weird, which is perhaps why despite its simplicity (indicated by no less than 4 simultaneous inventions of it in the past few months), it hasn't been done before. You really should read the linked Github thread if you are interested.

Ah! I can see that in a GAN architecture. That makes much more sense.

It wasn't clear from your original post that you were augmenting generated images, not real data.


You're augmenting the real data too.

> Training a model per camera isn't necessarily a terrible idea, either. In the future I predict that we'll see more and more "on-demand" models: models that are JIT optimized for a target configuration (in this case, a specific camera).

Meta-learning, or perhaps learning camera embeddings to condition on, would be one way. Although that might all be implicit if you use a deep enough NN and train on a sufficiently diverse corpus of phones+photos.


Apple's secretive engineering culture is complete anathema to the ML world and what the likes of DeepMind, OpenAI, Google AI are doing in terms of sharing.

This is IMO very visible in the "output" Apple has produced in the space of ML : mostly infrastructure, and very little in the way of innovative tech and research.

And changing culture is a very hard proposition from a management perspective, unless you build a complete skunkworks-like independent entity within the mothership.


I hope Apple makes a CoreML -> Keras/other model types converter. This will make it much more appealing for me to use their GUIs and buy a Mac

Why not use something like onnx?

https://onnx.ai/




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: