Hacker News new | past | comments | ask | show | jobs | submit login
Facebook and Microsoft introduce ecosystem for interchangeable AI frameworks (fb.com)
335 points by runesoerensen on Sept 7, 2017 | hide | past | web | favorite | 82 comments

Translation from corporatespeak: "Our AI frameworks are losing developer mindshare to TensorFlow, which is controlled by Google, so we are joining forces."

Corporations like Facebook and Microsoft do not like interoperability... until and unless it's in their best interest.

The goal here is to make AI frameworks easily interchangeable: build your model with whatever framework you like; it will run unchanged on all other frameworks/platforms/stacks.

Regardless of the motivations, I actually think this is GREAT NEWS, and I hope that Amazon and maybe Apple, and eventually hopefully Google too will introduce and promote compatibility with this standard.

We all benefit from more interoperability.

The way in which Google released TensorFlow (and the meteoric success it's enjoyed) is an excellent example of a company successfully commoditizing its complements using open source software.

TensorFlow (and stacks built over it) has the greatest mindshare, which allows Google to both optimize the capabilities of its ML offerings based on TensorFlow, knowing most developers will be familiar with it, and to increase profit margins on those services at the same time. TensorFlow makes a significant part of the ML stack convenient and accessible, and releasing it for free opened the flood gates for Google to expand the machine learning industry significantly while shifting the profit generation away from software and onto the cloud (data storage + compute resources).

Sublime execution, really. Once machine learning passes through this hype cycle (over the "trough of disillusionment" and onto the "plateau of productivity"), Google will likely have made the majority of the machine learning industry accessible enough for most software engineers to pick up something like TensorFlow and hit the ground running, which will gradually exert a downward pressure on ML specialized salaries for all but the top end. But this will continue even as the industry continues to expand, which means Google will come out ahead earning even more profit even if salaries plateau or decrease.

Two problems. 1) Tensorflow sucks 2) People that are actually experts in deep learning prefer Pytorch or Caffe2 so your mindshare thing is wrong, once the dust of the Tensorflow hype train clears people will choose tools that are actually any good

Why do you believe "Tensorflow sucks", and why do you believe PyTorch or Caffe2 is better, and for what metrics?

Two metrics stood out in my experiments with CNNs back in May 2017:

1) Compilation speed for a jumbo CNN architecture: Tensorflow took 13+ minutes to start training every time network architecture was modified, while PyTorch started training in just over 1 minute. PyT suits my style of interactive coding far better.

2) Memory footprint: I was able to fit 30% larger batch size for PyTorch over Tensorflow on Titan X cards. Exact same CNN architecture.

Both frameworks had major releases since May, so I am sure these metrics might have changed by now. However I ended up adopting PyT for my project.

This post is spot on.

Google is actually very scared of PyTorch since nearly all new AI research papers are using PyTorch. This means in 1-2 years or so, most companies will be using PyTorch for training and Caffe for deployment. Tensorflow doesn't even have an implementation of the latest ImageNet winner, DenseNet, but PyTorch does!

I have sources on the Tensorflow team saying they are scrambling to make another higher level wrapper for tensorflow at the level of PyTorch.

Jumbo CNNs are not the battleground. The real battleground is distribution. The first framework that scales out without placing much onus on the programmer will win, IMO. Facebook already showed that Caffe2 scales to 256 GPUs for imagenet. Tensorflow need to show it can scale as well. PyTorch needs to work on usability - model serving, integration in ecosystems like Hadoop, etc.

Today I learned: "commoditize your complements". Quite interesting compact way to put it.

It apparently isn't a new strategy, and is in the shared internet mind sphere. https://www.joelonsoftware.com/2002/06/12/strategy-letter-v/

As another commenter mentioned, I can't take credit for it. I first read it from Joel's blog.

The corporations that got data are also very interested in open development of machine learning algorithms. They know that the data is key for getting value out of the algorithms anyway...

EDIT: I second that regardless of motivation, this is a step forward.

You got it right. Tensorflow is HUGE in the AI community, Facebook and Microsoft are scared. I, for one, am a huge fan of Tensorflow and don't plan on learning anything new to do the work I'm already doing. Facebook and Microsoft should just join Google's coalition and use Tensorflow company-wide.

Actually, Tensorflow is huge in the data science community, but PyTorch is taking over ML research. TF is last year's cool thing. That's one reason why fast.ai just adopted Pytorch for their courses. Tensorflow is surprisingly low-level and needs libraries like Keras to be used easily. The field of Python ML tools moves fast...

I hope the ML community won't have to suffer the same kind of fatigue syndrom that has plagued the javascript community for the last decade.

Excuse the newbie question but are PyTorch and/or TensorFlow sufficient for developing ML algorithms that can be used directly by a user-facing application? Or is there typically some translation layer involved?

Right, until you don't like something that Google does with Tensorflow in the future or breaks compatibility with your app. I think alternatives in complex abstractions like these are always a good thing. Only a few companies can afford to do it so we have a guarantee that you won't have tens of them. Just a viable alternative.

It's been a while since I played around with tensorflow, how difficult is it to deploy a model as a web service? I'm currently using azure and looking for other options (besides prediction.io).

It's really designed around ease of use. What you're looking for is called TensorFlow Serving.

Cool, thanks for the info.

I'd love to know what sort of ML/AI communities/forums/slacks/discords/whatever communities you participate in, since I'm interesting in joining some.

I don't take Tensorflow seriously anymore... PyTorch is the way to go (gotta try caffe2 at some point).

The implication is that this will make your ML better somehow. Not sure why that would be.

from the article

> We developed ONNX together with Microsoft to bridge this gap and to empower AI developers to choose the framework that fits the current stage of their project and easily switch between frameworks as the project evolves.

ML workflows tend to be very heterogenous with people using everything from sklearn to DL4J to play with data. standardized serialization formats make it possible to bring results from those experiments back into a hardened productionized environment for serving the model.


True. No matter what their intention is, they are doing the right thing. I don't like that DL framework controlled by Google like what they do with Android. This would probably motivate Facebook/MSFT to develop a shared intermediate representation for DL frameworks, that would benefit so many people.

Whatever their incentives this hyper competition between companies is always good for the consumer. Same with iOS vs Android.

Predictive Model Markup Language (PMML) was released in 1997 to solve a very similar problem (but for lots of predictive models). I don't know that it ever really caught on. That was also promoted by the tool builders but not by the users.

While checking the wikipedia page, I'm surprised to see that it is still being developed (with a 2016 release): https://en.wikipedia.org/wiki/Predictive_Model_Markup_Langua...

Please note that in 1997 it was NOT ridiculous to use XML for this.

PMML is definitely in use. There are occasional compatibility issues, but for the present, it is the standard for exchanging models.

PMML is alive and well for sure. It really only covers simpler model types so something aimed at deep nets models is a welcome addition. However, PMML is still alive and well because plenty of ML will continue to be based on simpler model types and modelers don't want to be tied down on choice of modeling tech even for simple model types.

You might also be interested in openscoring https://github.com/openscoring/openscoring . It lets you run PMML models and score examples through a rest API.

Automatic productionizability of PyTorch models through Caffe 2 will speed the transition from research to production dramatically - this gives the environment a chance to compete with Tensorflow / Tensorflow Serving.

Frameworks (say, DL4J) have been using Keras as a loose way to share models between frameworks. It'll be fascinating to see if Theano / Tensorflow / DL4J / MXNet walk this path as well.

I was curious about their implementation since PyTorch and Caffe2 semantics are very different. Unfortuantly, the authors write:

> Currently, our tracer works with many common neural networks, but not some of the more advanced programs in PyTorch such as those with dynamic flow control. Over time, we will enhance ONNX and the tracer to support these programs, so that developers can leverage full flexibility of PyTorch with the high-performance robust deployment capabilities of Caffe2.

It is useful, of course. But it's rare for contemporary models to not use dynamic flow. In fact, PyTorch is popular because it encourages this dynamism.

Hi, Adam from DL4j here. I'm really excited to see this. We will work on supporting this format as well. We have a new pytorch like api coming out (dynamic graphs etc), so this fits nicely in to that.

I know a lot of cynics are looking at this and are thinking this is all politics. Honestly: I am glad something like this exists as a framework developer.

We would like to build an easy to maintain set of benchmarks for perf comparisons and have found that to be hard to do. We were originally going to lean on keras with our model import for this, but if we can get say:

dynet and chainer on board with this as well, I think it will help the ecosystem as a whole move forward.

(If you're asking "why" it's hard, main reason for us was the primary java interface, we are fixing this with a new python interface based on pyjnius)

Overall, I don't think the OpenCL folks will be able to pull this off (the field moves too fast). I feel like this is something the people who actually build the tools should do.

Ultimately, I would be surprised if tensorflow didn't hop on board at some point. A lot of ML models are being built in pytorch first now. No one loses having that easily accessible.

Hey Adam, I don't know about the "OpenCL folks" but we're able to use OpenCL to exceed TensorFlow + cuDNN throughput on the same hardware for real networks. We (Vertex.AI) mostly care about embedded platforms and we currently use Keras as our research front end but we're open to collaborations.

I am talking about their neural net format khronos is working on. We likely dont overlap. I am mostly focused on banking gov and telco. We are supporting this out of interest in being able to run these neural nets in our distribution. We have a few embedded use cases but we will likely write our own graph executioner for arm and the like.

Ah, I don't know too much about what Khronos is doing. We have thought about your work from the standpoint of supporting DL4j on a range of datacenter accelerators, we do tune for NVIDIA etc with good results. We're pretty framework and hardware neutral.

Feel free to file an issue on nd4j. That is our tensor lib. https://github.com/deeplearning4j/nd4j

> To directly export this code, ONNX would have to support conditionals [...]

Does ONNX aim to serve as a human-readable and human-maintainable network description for experimentation and research, or does ONNX aim to serve only as a model export interchange format (primarily to be machine-interpreted)?

This could serve very well in the latter case, but would be quite problematic in the former -- where conditionals, for loops, etc. are essential to a compact and readable representation of many network architectures.

Reading a static neural network architecture descriptor of a very deep neural network is akin to reading assembly code from a compiler that does loop unrolling. Static network descriptions work great when pushing a model to production, but are much less suitable for the research and development stages (IMO).

my guess is it is the latter.

If I have my own framework and I want to operate with this standard, I need to have the ability to map to and from ONNX's format. I don't think it is intended for people to write directly in ONNX's format to maintain compatibility. If it were, I'd imagine it would fall flat very quickly as people wouldn't want to rewrite everything in the hopes that their framework of choice will adopt it. Seems the intent is more like .odt to me than, say, Markdown.

Great move by Facebook to look beyond Caffe's legacy prototext file format which has become somewhat standard in computer vision, and unite Pytorch and Caffe2. Unfortunately there are limitations to the neural network logic that can be represented in configuration files, especially when dealing with dynamic networks. Most networks used in vision are static so that's where I expect this will add the most value. My only wish is that Caffe2 followed Pytorch's interface. After spending 5 minutes with Pytorch it's pretty clear that they got the interface design right.

Michelangelo, announced by Uber 2 days ago (https://eng.uber.com/michelangelo/), use another format for storing models/features and for serving data. It would be interesting to hear why they didn't use protobufs for storing features instead of their own system. My guess would be that they have a mix of scikit learn and tensorflow, and they didn't want to be too tightly coupled to tensorflow. Still, this move by FB and Microsoft is what happens when you're desperate - your frameworks are not being adopted, last chance role of the dice.

Its interesting that this tool has no PATENTS grant.

Yangqing here from facebook. We consciously made it a MIT license as onnx is intended to be widely shared by a lot of participants, and MIT seems to be more widely agreeable among different parties co-owning it. It's also simpler.

Oh, so now there was no danger of this nasty patent trolls, like with react case?

Interesting, it is also MIT instead of the usual BSD + Patents Facebook uses. [1]

It is also under the 'onnx' namespace, presumably because it was a team effort.

It looks like Microsoft often uses MIT (vscode, dotnet, CNTK, ChakraCore), does this mean Microsoft was the driving force (even though it's a Facebook blog)?

* just noticed the link to Microsofts blog at the bottom of the page [2].

[1] https://github.com/onnx/onnx [2] https://www.microsoft.com/en-us/cognitive-toolkit/blog/2017/...

While this is awesome I'm personally wondering when big companies like FB and MS are going to standardize on a format for predictive models in general. [0] PMML exists but there are very few companies that are standing behind it.

[0] http://dmg.org/pmml/v4-3/GeneralStructure.html

Just imagine the world we will live in, when Facebook or Microsoft wins the race to general AI.

I imagine it will be like the Manhattan Project, impossible to contain to one company or country for long.

I'm curious if FB/Microsoft considered Khronos Neural Network Exchange Format (NNEF) - https://www.khronos.org/nnef?

So "AI" is equivalent to "neural networks" nowadays, eh?

That's not implied by the title. Neural Networks is a member of Machine Learning, which is a member of AI. PyTorch, Caffe 2 and CNTK are Deep Learning Frameworks. All Deep Learning Frameworks are AI frameworks, but not all AI frameworks are neural network / deep learning frameworks.

I realize you are joking, but in some fields, like computer vision, yes.

pretty much

Why would you want to pile another black box on top of the black box?

It's black boxes all the way down.

Any TensorFlow devs who might be able to comment on plans to interoperate with TensorFlow/reasons why it's not on the cards?

I am a regular TensorFlow (and Keras) contributor and I saw the announcement when you did. In either situation, Google contributors partcipating without speaking to the TensorFlow community or the Facebook and Microsoft authors not reaching out to the TensorFlow community, it's a bad look. However, it's entirely possible we'll hear more next week at the TensorFlow symposium.

Now, it would be lovely if coremltools could support ONNX and convert ONNX model to CoreML.

Yangqing here (created Caffe and Caffe2) - we are much interested in enabling this path. Historically CoreML has provided Caffe and Keras interfaces, and having ONNX / CoreML interop would help a lot for everyone to ship models more easily.

Earlier in the year we provided compatibility between Caffe2 and Qualcomm's SNPE library, which follows this similar philosophy.

Hi, Yangqing! Nice project. B)

I want to clarify that Apple advertises Keras support for use with CoreML, but the converter uses a graph from the TensorFlow backend. It begs the question, have you spoken with anybody from the TensorFlow (or Keras) communities about collaborating?

CoreML supports Keras but not TensorFlow because Keras models form a well-structured subset of all possible TensorFlow graphs. It would be quite difficult to support completely arbitrary TensorFlow graphs, but supporting every Keras layer is relatively straightforward.

To answer your question: I had no knowledge of this ONIX project before the public announcement today. Speaking purely for myself, if I wanted to develop a universal model exchange format, the first step I would take would be to get in touch with the makers of the frameworks that sum to 80-90% of the market share -- TF, Keras, MXNet. But maybe such a strategy was thought to be superfluous in this case -- for instance, because ONIX may not actually be intended as a universal model exchange format.

To be fair, CNTK (BrainScript) has quite impressive list of features to support dynamic control structure (in a symbolic fashion, comparing to PyTorch which delegated much of the dynamic control structure to underlying language Python). I think Tensorflow and CNTK probably the only two frameworks pursued such implementation strategy. IMHO, looking back, supporting control structures may not be that useful (see the recent attention based models, all of them can be unroll'ed to ordinary graphs), but it is so interesting to implement!

Thanks! Haven't yet, but our TPMs are going to reach out for collaborations. I wish we were grad school mode where latency is <1 hour, but it pays to get things proper across multiple companies. Kindly stay tuned.

I have heard about something similar in the open source community:


Doesn't it sound like a better long-term option than corporate-driven political initiative?

Does that mean that we can see the AI implementations facebook uses?

They invented the format so I'm sure we'll see everything facebook does in the new ONNX format. From the placement of ads to whatever...

(sorry for the sarcastic comment)

ONNX plans to support TensorFlow as well: https://github.com/onnx/onnx/issues/3

Doesn't seem to have support for control structure (If, While etc.).

Why a new format ? Why couldn't protobuf, hdf5 or the Apache Feather/Arrow format (which counts Hadley and McKinney as contributors).

I understand there could be something unique in this format, but really keen to understand what.

PyTorch/ONNX dev here. ONNX is a proto2 format https://github.com/onnx/onnx/blob/master/onnx/onnx.proto -- we definitely wanted to make it easy for people to parse and load ONNX exported graphs :)

Yangqing here (caffe2 and ONNX). We did use protobuf and we have an extensive discussion about its versions even, from our experience with the Caffe and Caffe2 deployment modes. Here is a snippet from the codebase:

// Note [Protobuf compatibility] // ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Based on experience working with downstream vendors, we generally can't // assume recent versions of protobufs. This means that we do not use any // protobuf features that are only available in proto3. // // Here are the most notable contortions we have to carry out to work around // these limitations: // // - No 'map' (added protobuf 3.0). We instead represent mappings as lists // of key-value pairs, where order does not matter and duplicates // are not allowed.

Hi there, I work on the protobuf team at Google. A friendly note that protobuf 3.0 and proto3 are actually two separate things (this is admittedly confusing). Protobuf 3.0 was a release that added support for proto3 schemas, but Protobuf 3.0 continues to support proto2 schemas.

proto3 was designed to be a simplification of proto2 schemas. It removed several features such as extensions and field presence. Each .proto file specifies whether it is a proto2 or proto3 schema.

Protobuf 3.0 added several features such as maps, but these features were added to both proto2 and proto3 schemas.

Sorry this is kind of confusing.

Thanks so much @haberman! Yep, the whole thing is a little bit confusing... We basically focused on two things:

- not using extensions, as 3 does not support it - not using map, as 2 does not support it

and we basically landed on restricting ourselves to use the common denominator among all post-2.5 versions. Wondering if this sounds reasonable to you - always great to hear the original author's advice.

Plus, I am wondering if there are recommended ways of reducing protobuf runtime size, we use protobuf-lite but if there are any further wins it would be definitely nice for memory and space constrained problems.

Hmm, I'm not sure I quite get your strategy. If you want to support back to Protobuf 2.5, then you'll need to use proto2 schemas (https://developers.google.com/protocol-buffers/docs/proto). Proto2 schemas will support extensions forever (even after Protobuf 3.0), so there's no need to avoid extensions.

You are only using proto3 schemas if you start your .proto file with 'syntax = "proto3;"': https://developers.google.com/protocol-buffers/docs/proto3 But if you do this, your .proto file will no longer work with Protobuf <3.0.

We have been working on some optimizations to help the linker strip generated code where possible. I recommend compiling with -ffunction-sections/-fdata-sections if you aren't already, and -gc-sections on your linker invocation.

So what we do is to keep syntax=proto2, but allow users to compile with both protobuf 2.x and protobuf 3.x libraries. Minumum need is 2.6.1. We kind of feel that this gives maximum flexibility for people who have already chosen a protobuf library version.

Makes sense! All I'm saying is that there is no need to avoid using extensions if that is your strategy. Extensions will work in all versions of the library you wish to support.

pretty cool and thanks for that reply! Did you look at something like Arrow/Feather, which is looking to get adopted as the interoperable format in R/Pandas ... and maybe even Spark. There's been quite a bit of momentum behind it to optimize it for huge usecases - https://thenewstack.io/apache-arrow-designed-accelerate-hado...

It is based on Google Flatbuffers, but is undergoing enough engineering specifically from a big data/machine learning perspective. Instead of building directly over Protobuf, it might be interesting to build it on top of Arrow (in exactly the same way that Feather is based on top of Arrow https://github.com/wesm/feather).

Interesting data, thanks!

We chose protobuf mainly due to a good caffe adoption story and also the track record of it being compatible with many platforms (mobile, server, embedded, etc). We actually looked at thrift - which is Facebook owned - and it is equally nice, but our final decision was mainly to minimize the switching overhead for existing users such as Caffe and TensorFlow.

To be honest, protobuf is indeed a little bit hard to install (especially if you have python and c++ version differences). Would definitely be interested in taking a look at possible solutions - serialization format and the model standard is mor e or less orthogonal, so one may see a world where we can convert different serialization formats (JSON <-> protobuf as an overly simplified example)

I want to second considering Arrow. In addition to Pandas and Spark, it is (or was) considered for scikit-learn model interchange.

A tip: If your comment includes code, indent it at least two spaces (before copying from your editor) and it will be formatted correctly.

Learned one more thing today!

hdf5, Feather, Arrow, protobufs, json, xml -- all solve the problem of binary representation of data on disk. They all leave the question of how to map said data to a specific problem domain up to the developer.

Projects like ONNX define said mapping for a specific domain (in ONNX's case, by agreeing on a proto schema for ML models, and its interpretation).

To use a simplistic metaphor: protobufs are the .docx format; onnx is a resume template you can fill out in Word.

can we send messages through it?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact