Corporations like Facebook and Microsoft do not like interoperability... until and unless it's in their best interest.
The goal here is to make AI frameworks easily interchangeable: build your model with whatever framework you like; it will run unchanged on all other frameworks/platforms/stacks.
Regardless of the motivations, I actually think this is GREAT NEWS, and I hope that Amazon and maybe Apple, and eventually hopefully Google too will introduce and promote compatibility with this standard.
We all benefit from more interoperability.
TensorFlow (and stacks built over it) has the greatest mindshare, which allows Google to both optimize the capabilities of its ML offerings based on TensorFlow, knowing most developers will be familiar with it, and to increase profit margins on those services at the same time. TensorFlow makes a significant part of the ML stack convenient and accessible, and releasing it for free opened the flood gates for Google to expand the machine learning industry significantly while shifting the profit generation away from software and onto the cloud (data storage + compute resources).
Sublime execution, really. Once machine learning passes through this hype cycle (over the "trough of disillusionment" and onto the "plateau of productivity"), Google will likely have made the majority of the machine learning industry accessible enough for most software engineers to pick up something like TensorFlow and hit the ground running, which will gradually exert a downward pressure on ML specialized salaries for all but the top end. But this will continue even as the industry continues to expand, which means Google will come out ahead earning even more profit even if salaries plateau or decrease.
1) Compilation speed for a jumbo CNN architecture: Tensorflow took 13+ minutes to start training every time network architecture was modified, while PyTorch started training in just over 1 minute. PyT suits my style of interactive coding far better.
2) Memory footprint: I was able to fit 30% larger batch size for PyTorch over Tensorflow on Titan X cards. Exact same CNN architecture.
Both frameworks had major releases since May, so I am sure these metrics might have changed by now. However I ended up adopting PyT for my project.
Google is actually very scared of PyTorch since nearly all new AI research papers are using PyTorch. This means in 1-2 years or so, most companies will be using PyTorch for training and Caffe for deployment. Tensorflow doesn't even have an implementation of the latest ImageNet winner, DenseNet, but PyTorch does!
I have sources on the Tensorflow team saying they are scrambling to make another higher level wrapper for tensorflow at the level of PyTorch.
EDIT: I second that regardless of motivation, this is a step forward.
> We developed ONNX together with Microsoft to bridge this gap and to empower AI developers to choose the framework that fits the current stage of their project and easily switch between frameworks as the project evolves.
ML workflows tend to be very heterogenous with people using everything from sklearn to DL4J to play with data. standardized serialization formats make it possible to bring results from those experiments back into a hardened productionized environment for serving the model.
While checking the wikipedia page, I'm surprised to see that it is still being developed (with a 2016 release): https://en.wikipedia.org/wiki/Predictive_Model_Markup_Langua...
Please note that in 1997 it was NOT ridiculous to use XML for this.
Frameworks (say, DL4J) have been using Keras as a loose way to share models between frameworks. It'll be fascinating to see if Theano / Tensorflow / DL4J / MXNet walk this path as well.
> Currently, our tracer works with many common neural networks, but not some of the more advanced programs in PyTorch such as those with dynamic flow control. Over time, we will enhance ONNX and the tracer to support these programs, so that developers can leverage full flexibility of PyTorch with the high-performance robust deployment capabilities of Caffe2.
It is useful, of course. But it's rare for contemporary models to not use dynamic flow. In fact, PyTorch is popular because it encourages this dynamism.
I know a lot of cynics are looking at this and are thinking this is all politics. Honestly: I am glad something like this exists as a framework developer.
We would like to build an easy to maintain set of benchmarks for perf comparisons and have found that to be hard to do. We were originally going to lean on keras with our model import for this, but if we can get say:
dynet and chainer on board with this as well, I think it will help the ecosystem as a whole move forward.
(If you're asking "why" it's hard, main reason for us was the primary java interface, we are fixing this with a new python interface based on pyjnius)
Overall, I don't think the OpenCL folks will be able to pull this off (the field moves too fast). I feel like this is something the people who actually build the tools should do.
Ultimately, I would be surprised if tensorflow didn't hop on board at some point. A lot of ML models are being built in pytorch first now. No one loses having that easily accessible.
Does ONNX aim to serve as a human-readable and human-maintainable network description for experimentation and research, or does ONNX aim to serve only as a model export interchange format (primarily to be machine-interpreted)?
This could serve very well in the latter case, but would be quite problematic in the former -- where conditionals, for loops, etc. are essential to a compact and readable representation of many network architectures.
Reading a static neural network architecture descriptor of a very deep neural network is akin to reading assembly code from a compiler that does loop unrolling. Static network descriptions work great when pushing a model to production, but are much less suitable for the research and development stages (IMO).
If I have my own framework and I want to operate with this standard, I need to have the ability to map to and from ONNX's format. I don't think it is intended for people to write directly in ONNX's format to maintain compatibility. If it were, I'd imagine it would fall flat very quickly as people wouldn't want to rewrite everything in the hopes that their framework of choice will adopt it. Seems the intent is more like .odt to me than, say, Markdown.
It is also under the 'onnx' namespace, presumably because it was a team effort.
It looks like Microsoft often uses MIT (vscode, dotnet, CNTK, ChakraCore), does this mean Microsoft was the driving force (even though it's a Facebook blog)?
* just noticed the link to Microsofts blog at the bottom of the page .
Earlier in the year we provided compatibility between Caffe2 and Qualcomm's SNPE library, which follows this similar philosophy.
I want to clarify that Apple advertises Keras support for use with CoreML, but the converter uses a graph from the TensorFlow backend. It begs the question, have you spoken with anybody from the TensorFlow (or Keras) communities about collaborating?
To answer your question: I had no knowledge of this ONIX project before the public announcement today. Speaking purely for myself, if I wanted to develop a universal model exchange format, the first step I would take would be to get in touch with the makers of the frameworks that sum to 80-90% of the market share -- TF, Keras, MXNet. But maybe such a strategy was thought to be superfluous in this case -- for instance, because ONIX may not actually be intended as a universal model exchange format.
Doesn't it sound like a better long-term option than corporate-driven political initiative?
They invented the format so I'm sure we'll see everything facebook does in the new ONNX format. From the placement of ads to whatever...
(sorry for the sarcastic comment)
I understand there could be something unique in this format, but really keen to understand what.
// Note [Protobuf compatibility]
// Based on experience working with downstream vendors, we generally can't
// assume recent versions of protobufs. This means that we do not use any
// protobuf features that are only available in proto3.
// Here are the most notable contortions we have to carry out to work around
// these limitations:
// - No 'map' (added protobuf 3.0). We instead represent mappings as lists
// of key-value pairs, where order does not matter and duplicates
// are not allowed.
proto3 was designed to be a simplification of proto2 schemas. It removed several features such as extensions and field presence. Each .proto file specifies whether it is a proto2 or proto3 schema.
Protobuf 3.0 added several features such as maps, but these features were added to both proto2 and proto3 schemas.
Sorry this is kind of confusing.
- not using extensions, as 3 does not support it
- not using map, as 2 does not support it
and we basically landed on restricting ourselves to use the common denominator among all post-2.5 versions. Wondering if this sounds reasonable to you - always great to hear the original author's advice.
Plus, I am wondering if there are recommended ways of reducing protobuf runtime size, we use protobuf-lite but if there are any further wins it would be definitely nice for memory and space constrained problems.
You are only using proto3 schemas if you start your .proto file with 'syntax = "proto3;"': https://developers.google.com/protocol-buffers/docs/proto3 But if you do this, your .proto file will no longer work with Protobuf <3.0.
We have been working on some optimizations to help the linker strip generated code where possible. I recommend compiling with -ffunction-sections/-fdata-sections if you aren't already, and -gc-sections on your linker invocation.
It is based on Google Flatbuffers, but is undergoing enough engineering specifically from a big data/machine learning perspective. Instead of building directly over Protobuf, it might be interesting to build it on top of Arrow (in exactly the same way that Feather is based on top of Arrow https://github.com/wesm/feather).
We chose protobuf mainly due to a good caffe adoption story and also the track record of it being compatible with many platforms (mobile, server, embedded, etc). We actually looked at thrift - which is Facebook owned - and it is equally nice, but our final decision was mainly to minimize the switching overhead for existing users such as Caffe and TensorFlow.
To be honest, protobuf is indeed a little bit hard to install (especially if you have python and c++ version differences). Would definitely be interested in taking a look at possible solutions - serialization format and the model standard is mor e or less orthogonal, so one may see a world where we can convert different serialization formats (JSON <-> protobuf as an overly simplified example)
Projects like ONNX define said mapping for a specific domain (in ONNX's case, by agreeing on a proto schema for ML models, and its interpretation).
To use a simplistic metaphor: protobufs are the .docx format; onnx is a resume template you can fill out in Word.