Hacker News new | past | comments | ask | show | jobs | submit login
Flashlight: Fast and flexible machine learning in C++ (facebook.com)
150 points by andyxor 26 days ago | hide | past | favorite | 91 comments

This seems really cool, but I don't get why they would pour work into this while simultaneously building a C++ front-end for PyTorch[1]. Per the blog post, both frameworks have the goal of empowering ML researchers to iterate on ML models in such a way that it becomes easier to reason about performance than it would be talking to a bunch of dynamically linked object files behind an interpreter.

Facebook is a huge company with lots of money, but production-ready ML frameworks are a HUGE undertaking. I don't get how tech companies can be simultaneously recruiting and doing interviews year-round, paying huge salaries and then putting them under a layer of management that thinks two different C++ ML frameworks with the same goals is a good idea.

Facebook's work on Tensor Comprehensions, Halide (not originally FB but they have contributed heavily), Glow, and PyTorch all contributed to the ML space by offering alternatives (with innovative UX/technical differences) to the Tensorflow ecosystem. Not all of these contributions had novelty, but I respect FB's choice to work on something for the sole purpose of not having it's direction beholden to the whims of Alphabet (see: Swift for Tensorflow).

I just don't see what this adds which FB isn't already working on within a different project. What am I missing?

1 = https://pytorch.org/cppdocs/

> huge company with lots of money... [yet] management that thinks two different C++ ML frameworks with the same goals is a good idea

I don't see a contradiction here, in fact it makes sense.

You have some goals, and a couple of promising approaches. It's hard to say which will work better, but there's enough budget and people to work on them to just try them both and see.

I've heard anecdotes of similar strategies at banks, who have sufficient budget to hire two parallel teams to build literally the same product, sometimes without even knowing about each other. At the end, the one that ends up being faster/better gets used.

I guess it's like a microcosm of free market competition within an org, as opposed to top down planning.

Flashlight is much lower level and gives more fine-grained performance control. For instance, I don't think there is really any way to do real-time speech recognition that is fast with PyTorch because of how it is architected.

From my understanding, Tensor Comprehensions and Halide are both very tentative research projects.

> not having it's direction beholden to the whims of Alphabet (see: Swift for Tensorflow).

I don't think this is an accurate recreation of the history that led to FB working on pytorch.

This is a very interesting claim. I find it credible because it stands to reason that projects like DeepSpeed[1] and TorchScript[2] wouldn't need to exist if inference performance of research PyTorch models was satisfactory for production, but often case it isn't.

It appears as though Flashlight is built on ArrayFire. I haven't seen how gradients are managed in arrayfire-ml, but perhaps it is the case that the autograd implementation in PyTorch was a bottleneck and this is a ground up approach.

Editing as I didn't address your second point. I can neither confirm or deny the motivations for creating Torch being related to FB's desire to depend on an Alphabet-managed codebase. I know there are lots of reasons why programmers prefer the UX of the PyTorch Python API (I do as well), and there are probably other reasons I can't recall off the top of my head. I am only saying that PyTorch is contributing to the ML ecosystem already by the sole virtue that it isn't a Google product.

[1] = https://github.com/microsoft/DeepSpeed [2] = https://pytorch.org/docs/stable/jit.html

PyTorch is based off of Torch, which was first released in 2002, predating TensorFlow by 13 years.


Torch (Lua) predates TensorFlow and is Lecun's pet project for a few years at that point already. But Lua as a language is unpopular at a time. PyTorch would be a welcome addition then. But even if no PyTorch (nor Caffe2) in an alternative timeline, I would imagine FB would be stuck with Lua Torch for quite some time.

Don’t forget that Lua Torch backend is C while Pytorch is C++ mostly.

Adding to the above - Tensor Comprehensions was path-finding research and is no longer maintained. The git repo is frozen (archived) as a research artifact.

Halide is still quite active, and was used in products at Adobe and Google circa 2016-2017. Not sure about the current state of industry usage though.

I'm unfamiliar with the word "path-finding" in this context. From the usage, is it similar to groundbreaking?

Halide is still used by Google in a few different places.

I ship an app (Talon) which runs many kinds of wav2letter ASR models on consumer CPUs. It has real-time inference pipelines for both pytorch and flashlight. (I wrote the pytorch code). Performance of both inference pipelines is fine and comparable between the frameworks for me. I'm not sure what you're talking about wrt speech recognition performance.

Maybe for training? But there aren't that many cpu bound components there, and you can write those in native code.

Fair enough, I've followed your stuff a bit and you definitely know more about Flashlight than I do.

What would you say the motivation is for yet another ML framework, but in C++ this time?

The end really helped me with understanding their motivations:

> Flashlight’s modular internals make it a powerful research framework for research frameworks.

> We’re already using Flashlight at Facebook in our research focused on developing a fast speech recognition pipeline, a threaded and customizable train-time relabeling pipeline for iterative pseudo-labeling, and a differentiable beam search decoder.

> Our ongoing research is further accelerated by the ability to integrate external platform APIs for new hardware or compiler toolchains and achieve instant interoperability with the rest of Flashlight.

They found it easier and faster to make a framework optimized for their research than to try to iterate on a larger and more complex codebase. I think there is less cognitive overhead and less things to change when experimenting with a new idea, and since they've optimized for fast rebuild times, it is much faster to try out new things. They get native integration with the C++ ecosystem for free and since their team all know C++ well, it makes sense to just do it in plain C++.

I'm a bit late to the reply, but I wanted to thank you for the succinct reponse. This is the exact sort of thing I was looking for when I made my comment.

Oh sweet summer child, it’s so refreshing that you think big famous companies always act logically!

I tend to agree. If they said they had cleaned up the C++ backend that PyTorch uses and are releasing it separately on it's own with some usability enhancements as Flashlight, I could see that - they wouldn't have to maintain some other new thing. But this is apparently a completely separate thing.

Have you ever thought that it is not good for a big company like FB to be dependent on a single language/technology? This is the reason why these companies will invest in multiple solutions in different languages.

> paying huge salaries and then putting them under a layer of management that thinks two different C++ ML frameworks with the same goals is a good idea.

FB makes $1.3M/employee while average pay is $120K, i.e. less than 10% - thus the employees are actually very cheap and FB can waste a lot of employee resource, like doing 10 ML frameworks, and even though ML related employees earn $1M+ it still wouldn't move the needle financially for FB.

The documentation for flashlight focuses heavily on low level customizability and rapid rebuilding. My suspicion is that pytorch is fundamentally not designed around rapid iteration of the core model code. Flashlight targets a different use case.

As far as Facebook engineering resources are concerned, I believe facebook takes the approach that supporting open source high visibility projects is itself a recruiting draw for pulling talented engineers into their umbrella.

btw, swift for tensorflow have been discontinued https://github.com/tensorflow/swift

This makes a lot of sense when done well. Researchers need to have full control and quickly change things with the least amount of cognitive overhead for comprehending the engine. For people who know C++ well, it is a minimalistic C++ library with exactly what you need and nothing else. No cross language concerns or bindings, just plain C++.

Using modern c++:

> Modern C++ also obviates the need for tasks like memory management while providing powerful tools for functional programming.

> Flashlight supports doing research in C++ with no need to adjust external fixtures or bindings and no need for adapters to do things like threading, memory mapping, or interoperating with low-level hardware.

Easy integration with other C++ libraries:

> Flashlight makes it trivial to build new low-level computational abstractions. You can cleanly integrate CUDA or OpenCL kernels, Halide AOT pipelines, or other custom C/C++ code with minimal effort.

Customizable with fast build times:

> And when you change Flashlight’s core components, it takes just seconds to rebuild the entire library and its training pipelines, thanks to its minimalist design and freedom from language bindings.

Given the very high computing requirements of Machine Learning, I've always be perplexed by the seemingly widespread and unquestionable preference of Python over native code (typically C/C++).

I guess performance was considered less critical than clarity/flexibility. But it seems that people are discovering that complex code tend to be hard to read/modify no matter the language...

Python isn't really driving the compute intensive part of ML actually, whether it's JAX, PyTorch, or TensorFlow the code is really mostly native. Convolution are implemented by hand in highly optimized libraries (Intel MKL-DNN, Nvidia cuDNN) and the Python glue is really just a light "dispatcher".

A lot of it is also asynchronous for performance: the Python code just enqueues more work to a queue which some native C++ code processes. For TensorFlow the Python code traces an entire computation graph that is stored a protobuf and then executed by a C++ native stack, potentially remotely/distributed. Serving ML with TensorFlow does not involve any Python code in many scenarios.

Python is still quite useful for scientist to quickly glue everything together, and to describe their dataset, or when they collect result and need to produce graphs or other data analyses.

I work at a quant firm and we use Python because of how painful it is to build things in C++. Our framework is built in C++ but we then expose all of it to Python using pybind11 (amazing library).

Most quants do not want to learn complex build systems that have quirky behavior on different platforms, wait for very long compile times when making small changes to the code, dense and incomprehensible error messages, and a host of painful problems that one has to consider when writing C++.

Python just works really well on almost all platforms.

The biggest downside of Python is its parallelism, which means there is a lot of hackyness around writing parallel code. In most cases we can break things down and run different tests independently of one another, but in many other cases we have to make use of awkward workarounds, use multiprocessing, and other tricks.


Which C++ version you guys are using?

MSVC 2019, g++ 10, clang 11.

I was asking C++ version not compiler version. So, C++11 or C++14?

C++17 and some C++20.

Okay. I suggest you to try Clang's Cling. It's the JIT version of C++ and everything is instant.


Python (in common ML frameworks) is really just wrapping well optimized native code. The overhead is very minimal. The advantage here is about easy access to the internals. I don't know that any material speedup is expected just because its written in C++

I've worked at some point on a commercial game engine written in C++ at the core but with many Lua components and API, for convenience.

The reasoning wat the same, all the heavy lifting done with fast native code, and everything written in Lua was mostly glue code without real performance impact.

Turns out the engine was slow and difficult to maintain because of the many interfaces. They ditched it a few years later...

I think you could say the same thing about javascript on the web, this is mostly glue code, all the heavy lifting is done by optimized C++ or Rust code in the browser. And yet it is difficult to run a web mail client if your computer is a bit old...

This is probably related in some ways to Amdahl's law https://en.wikipedia.org/wiki/Amdahl%27s_law

That was the same reasoning why Unreal dropped UnrealScript.


However a couple of years later and Epic is bringing scripting back with Verse


Because the problem is not the scripting, but how it is done.

I imagine that the engine you mention did not use any kind of compiler for Lua, e.g. LuaJIT, nor batch calls across the marshalling layer.

Same happens on the Web, JavaScript JIT can outperform WebAssembly if one keeps switching all the time between layers, instead of doing batch requests.

The engine used LuaJIT, but this is not available on iOS.

The code was a mess, absolutely no batching, quite the contrary.

Python is basically a configuration file for the native code that actually does the machine learning. Python is easier to write than native code or most other complex configuration formats (see terraform issues).

Notebooks may also be a big player, being able to visualize experimental changes rapidly.

I sometimes use Talon, a voice control app mostly used by developers. Iirc the developer incorporated Flashlight but encountered sigificant bugs and slow response times to issues and ended up switching to a different framework. At the very least it didn't feel much tested for real-world usage yet.

Anyone has experience with the JIT in arrayfire? https://arrayfire.com/performance-improvements-to-jit-in-arr...

Many interesting features in this framework, autograd looks neat too.

I guess the deployment in real world C++ apps will be easier than PyTorch or Tensorflow, especially at the edge in scenarios with little or no network access.

Happy to speak to JIT in ArrayFire if you have any specific questions. I am ArrayFire's CEO/Co-Founder.

Great work by the team, I like that they re-used ArrayFire instead of Yet Another Tensor Library feels refreshing to use native code again to build ML Pipelines for these kind of tasks.

Thank you!

Very cool to see more C++-based machine learning efforts. The language still needs a good dataframe abstraction (maybe XFrame?), but with matrix algebra provided by Armadillo/Eigen and other long-time machine learning libraries like mlpack, Shogun, Shark (and if you want to include C, Darknet), personally I think the future is bright for machine learning in C++---especially for production and deployment applications.

Does anyone else think that C++ makes more sense for ML work than Python? I'd been thinking so for years. Both for deployment/performance and data wrangling purposes.

Take a look at Julia's Flux.jl. It has a really nice API and is quite intuitive to use at a low level, and has higher level components as well.

Julia has a fast maturing data wrangling super-project (Queryverse).

I'm guessing you'd have (slightly) fewer deployment options with Julia though.

100% agree, and there are a number of efforts in the space. mlpack (https://www.github.com/mlpack/mlpack/), Shogun (https://www.shogun-toolbox.org/), and Shark (https://www.shark-ml.org/) are three that have been around for over a decade now. They're a little niche because C++ is not that popular for data science, but they are generally pretty fast (especially mlpack, which focuses on speed).

Absolutely. In 2010 I used FANN for my Msc thesis research, and found it pretty easy to make my own little training sim for stock price data on top of it. Haven't done any ML work since, but I always scratched my head over how Python became the most popular language for this domain.

stock price ml? why aren’t you rich?

Who says they're not...

In general, I think languages with static typing are preferable.

C++ seems ideal for me right now because it is the only other language with a somewhat mature stack (perhaps Julia as well, but I haven't played too much around with that).

Julia is dynamically typed FYI. However, I find that usually when people say they want a statically typed language, they don't actually want static typing (which is just a restriction of language semantics), but instead they want a language with a powerful type system that does work for them.

In this case, julia absolutely is worth checking out. It does static analysis on it's intermediate representation to automatically identify and isolate statically inferrable regions of programs and then stitches them together if it ever encounters any dynamism.

Julia's type system is extremely powerful and expressive and can do a lot of things that are incredibly difficult in fully static languages precisely because it does allow dynamism.

In my opinion, no, it makes no sense to write ML in C++ :

- Python allows for higher level description of algorithms, which means researchers can focus more on the ML stuff and less on low level details.

- There is no performance gain in going from Python to C++, because in both cases the models are compiled to specific binary formats to be executed on dedicated hardwares. TensorFlow enables accelerators not only for training, but also for data transformations and preprocessing.

> There is no performance gain in going from Python to C++

The backend of TF/PyTorch is written in C++ anyway, so the more complex the model, the less time it needs to spend in the glue code (frontend) that is written in Python. Therefore, rewriting complex models in full C++, for example by using TF/PyTorch C++ API, probably won't much improve the performance.

In this paper the author rewrites some ML models in Rust using tch-rs (Rust binding for PyTorch C++ API) and finds the performance not that much better (even some models perform worse):


You will see Modern C++ more like Python these days. It has really neat stuff. A comment is not enough to describe this.

yes. especially with AI on the edge.

Flashlight is a great project, and we're proud to be part of it. I'm with ArrayFire. If anyone has questions or feedback for ArrayFire, I'm here to be useful however I can!

Wonder why they didn't leverage TorchScript ? However this is a welcome news in ML Community. Would love to see this library break away from NVIDIA CUDA dependency and optimize for Apple M1 chips

Torchscript is very complicated. Its internal API was not truely documented for external use last time I checked it. You can't even export to torchscript from libtorch C++, it only works from Python AFAIK.

Flashlight vs. Scikitlearn ...

I wish Flashlight had a way to easily export models to other frameworks.

that can be done is they support ONNX

flashlight is such a common word used for an appliance. I would never think of anything else upon hearing this. Millions of people I assume as well.

So is Torch? A much more common word to refer to a flashlight.

> A much more common

I'm not sure about that. It's mostly and English/American usage split, isn't it?

Can this statically compiled? Why would anyone opt for this instead of Torch or DLIB?

The name has unfortunate verbal similarities with a well known sex toy, especially when different accents are considered. I am actually astounded they went ahead with this name.

I'm going to venture a guess and say it's more likely that people will associate this with millions upon millions of flashlights that have been sold in the United States, rather than whatever it is you're talking about.

Really? In North American english, a flashlight is a handheld light, there is no thought of the toy you're referring to when it is mentioned. That "light" got its name as a play on flashlight. It's funny to see that now become the primary connotation in some peoples minds.

> In North American english, a flashlight is a handheld light, there is no thought of the toy you're referring to when it is mentioned.

Sample size of 1, but this North American immediately thought "oh no, how unfortunate" on seeing the headline. (Admittedly, I do listen to more podcasts than the average person.)

you do realize that the recently introduced sex toy was both named and designed to resemble flashlights, right?

that said, interesting decision to ditch the scripting language. when i think of these sorts of things, immediately think that embedding or supporting a scripting language makes a ton of sense. i'm curious what the thinking was to just go full c++? i suppose they just decided that modern c++ was easy enough and this was better for simplifying production?

And yet millions of electric torches are sold in the United States each year without confusion.

(Although not, let's be honest, without amusement.)

I'd just choose a name with a greater Hamming distance from sexual harassment.

Now that I think about it... Docker does sound a bit like Dick-er.

theres also "docking." a sex act between two men of which the details I'll let you refer to google at your own discretion.

We really are an innovative species!

Until your super chaste name is used for something related with sex.

The sex toy came after flashlight the portable light source.

Is this not a thing in the US? https://en.wikipedia.org/wiki/Flashlight

I assume they are going for "batteries included" connections?

Flashlight vs torch (pytorch). ^^

Yes, I could see the construction, I just can't think why they wouldn't anticipate the problems it could cause down the track.

What? Nobody in their right mind thinks "fleshlight" immediately upon hearing the word "flashlight"

Is this some sort of performance art comment?

Reviving my old throwaway account to confirm that I indeed thought of "fleshlight" immediately after reading the announcement. Keep in mind that the average coder is far more likely to read Reddit daily (where this reference comes up surprisingly often) than to do things that require a flashlight (own a house somewhere remote, etc)

as a non-native english speaker I'll humbly admit that fleshlight was indeed the first word that popped into my mind when reading the title, because I never say flashlight but lampe-torche (while a "fleshlight" is an occasional fun present / recurrent meme for friends birthday or stuff like that)

As a non native english speaker I knew the word flashlight years before I knew that you could use torch to mean the same thing (and not just a burning stick). Websters even seems to claim that this is mostly a British thing.

Nobody calls flashlights torches in the US.

You spend too much time on the internet.

In fairness, so does the average programmer.

Wait until people accidentally an ML model?

the whole model !

I don't see such a significant problem.

Well this observation has produced a furious negative reaction, which I suspect is of both puritanical and chauvinist origin.

Perhaps because I can't see very well I had uncertainty when reading the article title. But unless you're quite innocent (i.e. not a user of the internet) I don't see how you could avoid knowing about such devices. Knowing many programmers, I also know that they span the full spectrum from Bible-studies to BDSM (though that might be a horseshoe), and many are prone to juvenile humour which can become sexually toned. So I know that someone will use it in this context. When you become a manager you choose to avoid creating such problems. Some of you probably see this tone policing -- you are correct, suck it up.

For those who don't understand what Hamming distance is: https://en.m.wikipedia.org/wiki/Hamming_distance

You accidentally the whole thing

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact