Facebook is a huge company with lots of money, but production-ready ML frameworks are a HUGE undertaking. I don't get how tech companies can be simultaneously recruiting and doing interviews year-round, paying huge salaries and then putting them under a layer of management that thinks two different C++ ML frameworks with the same goals is a good idea.
Facebook's work on Tensor Comprehensions, Halide (not originally FB but they have contributed heavily), Glow, and PyTorch all contributed to the ML space by offering alternatives (with innovative UX/technical differences) to the Tensorflow ecosystem. Not all of these contributions had novelty, but I respect FB's choice to work on something for the sole purpose of not having it's direction beholden to the whims of Alphabet (see: Swift for Tensorflow).
I just don't see what this adds which FB isn't already working on within a different project. What am I missing?
1 = https://pytorch.org/cppdocs/
I don't see a contradiction here, in fact it makes sense.
You have some goals, and a couple of promising approaches. It's hard to say which will work better, but there's enough budget and people to work on them to just try them both and see.
I've heard anecdotes of similar strategies at banks, who have sufficient budget to hire two parallel teams to build literally the same product, sometimes without even knowing about each other. At the end, the one that ends up being faster/better gets used.
I guess it's like a microcosm of free market competition within an org, as opposed to top down planning.
From my understanding, Tensor Comprehensions and Halide are both very tentative research projects.
> not having it's direction beholden to the whims of Alphabet (see: Swift for Tensorflow).
I don't think this is an accurate recreation of the history that led to FB working on pytorch.
It appears as though Flashlight is built on ArrayFire. I haven't seen how gradients are managed in arrayfire-ml, but perhaps it is the case that the autograd implementation in PyTorch was a bottleneck and this is a ground up approach.
Editing as I didn't address your second point. I can neither confirm or deny the motivations for creating Torch being related to FB's desire to depend on an Alphabet-managed codebase. I know there are lots of reasons why programmers prefer the UX of the PyTorch Python API (I do as well), and there are probably other reasons I can't recall off the top of my head. I am only saying that PyTorch is contributing to the ML ecosystem already by the sole virtue that it isn't a Google product.
 = https://github.com/microsoft/DeepSpeed
 = https://pytorch.org/docs/stable/jit.html
Halide is still quite active, and was used in products at Adobe and Google circa 2016-2017. Not sure about the current state of industry usage though.
Maybe for training? But there aren't that many cpu bound components there, and you can write those in native code.
What would you say the motivation is for yet another ML framework, but in C++ this time?
> Flashlight’s modular internals make it a powerful research framework for research frameworks.
> We’re already using Flashlight at Facebook in our research focused on developing a fast speech recognition pipeline, a threaded and customizable train-time relabeling pipeline for iterative pseudo-labeling, and a differentiable beam search decoder.
> Our ongoing research is further accelerated by the ability to integrate external platform APIs for new hardware or compiler toolchains and achieve instant interoperability with the rest of Flashlight.
They found it easier and faster to make a framework optimized for their research than to try to iterate on a larger and more complex codebase. I think there is less cognitive overhead and less things to change when experimenting with a new idea, and since they've optimized for fast rebuild times, it is much faster to try out new things. They get native integration with the C++ ecosystem for free and since their team all know C++ well, it makes sense to just do it in plain C++.
FB makes $1.3M/employee while average pay is $120K, i.e. less than 10% - thus the employees are actually very cheap and FB can waste a lot of employee resource, like doing 10 ML frameworks, and even though ML related employees earn $1M+ it still wouldn't move the needle financially for FB.
As far as Facebook engineering resources are concerned, I believe facebook takes the approach that supporting open source high visibility projects is itself a recruiting draw for pulling talented engineers into their umbrella.
Using modern c++:
> Modern C++ also obviates the need for tasks like memory management while providing powerful tools for functional programming.
> Flashlight supports doing research in C++ with no need to adjust external fixtures or bindings and no need for adapters to do things like threading, memory mapping, or interoperating with low-level hardware.
Easy integration with other C++ libraries:
> Flashlight makes it trivial to build new low-level computational abstractions. You can cleanly integrate CUDA or OpenCL kernels, Halide AOT pipelines, or other custom C/C++ code with minimal effort.
Customizable with fast build times:
> And when you change Flashlight’s core components, it takes just seconds to rebuild the entire library and its training pipelines, thanks to its minimalist design and freedom from language bindings.
I guess performance was considered less critical than clarity/flexibility. But it seems that people are discovering that complex code tend to be hard to read/modify no matter the language...
A lot of it is also asynchronous for performance: the Python code just enqueues more work to a queue which some native C++ code processes.
For TensorFlow the Python code traces an entire computation graph that is stored a protobuf and then executed by a C++ native stack, potentially remotely/distributed. Serving ML with TensorFlow does not involve any Python code in many scenarios.
Python is still quite useful for scientist to quickly glue everything together, and to describe their dataset, or when they collect result and need to produce graphs or other data analyses.
Most quants do not want to learn complex build systems that have quirky behavior on different platforms, wait for very long compile times when making small changes to the code, dense and incomprehensible error messages, and a host of painful problems that one has to consider when writing C++.
Python just works really well on almost all platforms.
The biggest downside of Python is its parallelism, which means there is a lot of hackyness around writing parallel code. In most cases we can break things down and run different tests independently of one another, but in many other cases we have to make use of awkward workarounds, use multiprocessing, and other tricks.
Which C++ version you guys are using?
The reasoning wat the same, all the heavy lifting done with fast native code, and everything written in Lua was mostly glue code without real performance impact.
Turns out the engine was slow and difficult to maintain because of the many interfaces. They ditched it a few years later...
This is probably related in some ways to Amdahl's law
However a couple of years later and Epic is bringing scripting back with Verse
Because the problem is not the scripting, but how it is done.
I imagine that the engine you mention did not use any kind of compiler for Lua, e.g. LuaJIT, nor batch calls across the marshalling layer.
The code was a mess, absolutely no batching, quite the contrary.
Many interesting features in this framework, autograd looks neat too.
I guess the deployment in real world C++ apps will be easier than PyTorch or Tensorflow, especially at the edge in scenarios with little or no network access.
Julia has a fast maturing data wrangling super-project (Queryverse).
C++ seems ideal for me right now because it is the only other language with a somewhat mature stack (perhaps Julia as well, but I haven't played too much around with that).
In this case, julia absolutely is worth checking out. It does static analysis on it's intermediate representation to automatically identify and isolate statically inferrable regions of programs and then stitches them together if it ever encounters any dynamism.
Julia's type system is extremely powerful and expressive and can do a lot of things that are incredibly difficult in fully static languages precisely because it does allow dynamism.
- Python allows for higher level description of algorithms, which means researchers can focus more on the ML stuff and less on low level details.
- There is no performance gain in going from Python to C++, because in both cases the models are compiled to specific binary formats to be executed on dedicated hardwares. TensorFlow enables accelerators not only for training, but also for data transformations and preprocessing.
The backend of TF/PyTorch is written in C++ anyway, so the more complex the model, the less time it needs to spend in the glue code (frontend) that is written in Python. Therefore, rewriting complex models in full C++, for example by using TF/PyTorch C++ API, probably won't much improve the performance.
In this paper the author rewrites some ML models in Rust using tch-rs (Rust binding for PyTorch C++ API) and finds the performance not that much better (even some models perform worse):
I'm not sure about that. It's mostly and English/American usage split, isn't it?
Sample size of 1, but this North American immediately thought "oh no, how unfortunate" on seeing the headline. (Admittedly, I do listen to more podcasts than the average person.)
that said, interesting decision to ditch the scripting language. when i think of these sorts of things, immediately think that embedding or supporting a scripting language makes a ton of sense. i'm curious what the thinking was to just go full c++? i suppose they just decided that modern c++ was easy enough and this was better for simplifying production?
(Although not, let's be honest, without amusement.)
I assume they are going for "batteries included" connections?
Is this some sort of performance art comment?
Perhaps because I can't see very well I had uncertainty when reading the article title. But unless you're quite innocent (i.e. not a user of the internet) I don't see how you could avoid knowing about such devices. Knowing many programmers, I also know that they span the full spectrum from Bible-studies to BDSM (though that might be a horseshoe), and many are prone to juvenile humour which can become sexually toned. So I know that someone will use it in this context. When you become a manager you choose to avoid creating such problems. Some of you probably see this tone policing -- you are correct, suck it up.
For those who don't understand what Hamming distance is: