For NN or DL in general, the correctness doesn't really lie too much on the code quality level, like ownership Rust people love to talk about. It is more about Numeric stability under/overflow and such. Choice of programming language offers limited help here.
I don't think Rust has a killer app for ML/DL community to offer as of now, the focus is vastly different.
At the same time, comparatively tedious languages like Rust will never attract data science practitioners. They don't care about the kind of safety it brings, they don't care about improving performance in a component that's idle 99% of the time.
The bulk of the load in an DL workflow is CUDA code and sits on the GPU. Even the intermediate libraries like cublas would only see marginal-to-none benefits of being reimplemented in rust.
This is a cool project, but it has no chance to displace or even complement Python in the data science space.
Iff Rust helps us take it into production we will use it.
But it’s a lot of land to cover to reach Pythons libraries so I’m not holding my breath.
That said, Pythons performance is slow even when shuffling to Numpy.
The bottlenecks, in order, are: inter-node comms, gpu/compute, on-disk shuffling, serialisation, pipeline starvation, and finally the runtime.
Why worry about optimising the very top of the perf pyramid which will make the least difference? Why worry if you spent 1ms pushing data to numpy when that data just spent 2500ms on the wire? And why are you even pushing from python runtime to numpy instead of using arrow?
I agree with your general point, however, but the role I'd hope for with Rust is not optimizing the top level, but replacing the mountains of C++ with something safer and equally performant.
The problem are usually when you do novel feature engineering not the actual model training.
But I was a C++ dev before checking the assembly for performance optimization so I guess I have more wiggle room to see when things are not up to snuff. If I got a cent for every: -you are not better than the compiler writers you can’t improve this.
Especially from the Java folks. They simply don’t want to learn shit, which is fine if they just where not so quick with the lies/excuses when proven wrong.
In a "serving environment" where latency actually matters there are already a plethora of solutions for running models directly from a C++ binary, no python needed.
This is a solved problem and people trying to re-invent the wheel with "optimized" implementations are going to be disappointed when they realize their solution doesn't improve anything.
Yeah it’s not reasonable right now because Python has the best ecosystem. But that will not always be the case!
1. In tensorflow and similar frameworks the Python runtime is used to compose highly optimized operations to create a trainable graph.
2. C++ is used to implement those highly optimized ops. If you have some novel feature engineering and you need better throughput performance than a pure python op can give you, you’d implement the most general viable c++ (or rust) op and then use that wrapped op in python.
This is how large companies scale machine learning in general, though this applies to all ops not just feature engineering specific ones.
There is no way that Instagram is using a pure python image processing lib to prep images for their porn detection models. That would cost too much money and take way too much time. Instead they almost certainly wrap some c++ in some python and move on to more important things.
EDIT: clarification in first sentence
This smells like an overgeneralization. Often things that aren’t a bottleneck in the context of the problems you’ve faced might at least be an unacceptable cost in the context of the 16.6 ms budget someone else is working within.
It seems some commenters on this thread have not really thought through the lifecycle of a learned model and the tradeoffs existing frameworks exploit to make things fast _and_ easy to use. In training we care about throughput. Thats great because we can use a high level DSL to construct some graph that trains in a highly concurrent execution mode or on dedicated hardware. Using the high level DSL is what allows us to abstract away these details and still get good training throughput. Tradeoffs still bleed out of the abstraction (think batch size, network size, architecture etc have effect on how efficient certain hardware will be) but that is inevitable when you're moving from CPU to GPU to ASIC.
When you are done training and you want to use the model in a low latency environment you use a c/c++ binary to serve it. Latency matters there, so exploit the fact that you're no longer defining a model (no need for a fancy DSL) and just serve it from a very simple but highly optimized API.
Current project we can’t use GPUs for production so we can only use it for development. Not my call, but operations. They have a Kubernetes cluster and a take it or leave it attitude.
We did end up using C++ for somethings and Python for most. I’d feel comfortable with C++ or Rust alone if there was a great ecosystem for DS though.
The poster you are replying to is 100% correct.
Still, it's more likely a figure used for exaggeration, for effect.
I do so not unfrequently and I don't see how rust bindings would help me at all
You clearly have more experience than I do that can solve everything in Python instead of C++ and/or CUDA.
I would love to understand why Rust would be more effective for productionizing ML models than the existing infrastructure written in Python/C++/CUDA.
To me, the nice thing about switching to Rust for that kind of stuff was that it dramatically raised the bar of what I could do before reaching for those hyper-optimized descriptive libraries.
Want to calculate the levenshtein distances of the cross product of 100k strings? Sure, just load it into a Vec, find a levenshtein library on crates.io, and it'll probably be fast enough.
Could it be done in Python? Sure, but with Rust I didn't have to think about how to do it in a viable amount of time. Does that mean that Rust is going to take over the DS world? Probably not in the short term, Rust currently can't compete with Python's DS ecosystem.
But if I'm doing something that I don't know will fit into an existing Python mold (that I know about) then I'll strongly consider using it.
The thing is, for anything performance intensive and scientific, you're almost guaranteed to find a Python binding.
It has all these bindings because scientists are almost always writing either Python, C++, or Fortran (with a smattering of R or Octave on the side).
Want to do linear algebra? NumPy is always there, otherwise you can go lower level: https://docs.scipy.org/doc/scipy/reference/linalg.blas.html
Old-school optimization? https://www.roguewave.com/sites/rw/files/attachments/PyIMSLS...
Computer vision? https://pypi.org/project/opencv-python/
In fact, I'd basically argue against using most language-native implementations of algorithms where performance is at stake, because most implementations don't have all the algorithmic optimizations.
Python has the best ecosystem, but Rust was made by a competent team so we will root for it.
I think the industry is moving to 'MLIR' solution (Yes, there is a Google project called exactly that, but I am referring to the general idea here), where the network is defined and trained in one place, then the weights are exported, delegated to optimized runtime to be executed.
If such trend furthers down, then there will be very little reason to replace Python as the glue layer here. Instead it will become that everything is training in Python -> exported to shared format -> executed in optimized runtime, kind of flow.
Rust's opportunity could be to replace C++ in this case. But do mind that this is also a competitive business, where the computation is pushed further down to the hardware implementations, like TPUv1 and T4 chips etc.
It's amazing how many great tools there are available in Python, but it does sometimes seem like it's an under-powered tool which has been hacked to serve bigger problems than it was intended for.
Not to mention, this problem seems to be getting worse, not better. People are moving off of python 2.7, which was kind of the de-facto LTS release of python... leaving (currently) no LTS version of python and no clear path for the community to establish a new LTS version that the community will still support — there are so many releases with so many breaking changes in python 3 within the last few years that there is seemingly no consensus and no way to compromise.
> It's amazing how many great tools there are available in Python, but it does sometimes seem like it's an under-powered tool which has been hacked to serve bigger problems than it was intended for.
This is becoming more and more clear with every release of python IMO. The language is evolving too quickly for it to be used on large developments, but it’s still being used for that.
We have an entire test framework and libraries for high performance embedded systems level testing which is written entirely in python. The amount of high speed operations (timing tests, measurements, etc) is very obviously overwhelming for the intention of the language and libraries, yet the company keeps pushing ahead with this stuff. In order to mitigate the issue, we are developing more high speed embedded systems to offload the work from the test framework and report measurements to the framework. I think it’s quickly becoming extremely expensive and will only become more expensive — the framework is extremely “pythonic” to the point of being unreadable, using a number of hacks to break through the python interpreter. Jobs much better allocated to readable C++ libraries are being implemented in unreadable python spaghetti code with no clear architecture - just quick-and-dirty whatever-it-takes python.
I love python but I think it’s a quick-and-dirty language for a reason. What python does well cannot be beat by other languages (for example, prototyping), but I think it is often misused, since people can get something up and running quickly and cleanly in python, but it eventually has diminishing (and even negative) returns.
Raising a StopIterationErorr in a generator in 3.7 now raises instead of ending the iteration, which broke several functions at my workplace.
3.8 has several further backwards incompatible changes incoming: https://docs.python.org/3.8/whatsnew/3.8.html#changes-in-pyt...
A good ML language is going to need smart and static typing. I am so tired of having to run a whole network just to figure out that there's a dimension mismatch because I forgot to take a transpose somewhere - there is essentially no reason that tensor shapes can't just be inferred and these errors caught pre-runtime.
> I had the impression that even in static languages having tensors with the exact shape as a parameter would stress the compiler, forcing it to compile many versions of every function for every possible size combination, and the output of a function could very well have a non deterministic or multiple possible shapes (for example branching on runtime information).
I was a bit lazy in my original comment - you're right. What I really think should be implemented (and is already starting to in Pytorch and a library named NamedTensor, albeit non-statically) is essentially having "typed axes."
For instance, if I had a sequence of locations in time, I could describe the tensor as:
(3 : DistanceAxis, 32 : TimeAxis, 32 : BatchAxis).
Sure, the number of dimensions could vary and you're right that, if so, the approach implied by my first comment would have a combinatorial explosion. But if I'm contracting a TimeAxis with a BatchAxis accidentally, that can be pretty easily caught before I even have to run the code. But in normal pytorch, such a contraction would succeed - and it would succeed silently.
Julia for example does have an array library that does compile-time shape inference (StaticArrays), but it cannot scale for large arrays (over 100 elements) exactly because it gets too hard for the compiler to keep track, I'm definitely curious about possible solutions.
But AOT/JIT compiled languages that can naturally talk to the GPGPU, without 2nd language syndrome, like Julia, Swift, Java and .NET will certainly be more attractive to data science practitioners.
I can already envision those life science guys that migrate to VB.NET when they have outgrown their Excel/VBA code, to start playing with ML.NET.
I don't see any other language even making a dent in the Python ecosystem without some kind of new killer feature that can't be quickly replicated in Python.
I don't see anything in the article's python code that the numba's jit decorator cannot handle. When numba works (it's rapidly improving), it's seriously impressive.
For this particular case, you should be able to get really good performance without sacrificing readability.
Also Jax - https://github.com/google/jax
Well, fast.ai is using swift now.
... I think it's fair to say 'never say never'.
You're probably right, rust isn't really the sweet spot for this stuff, but its also a case that python has some down sides that are pretty severe, and well acknowledged.
eh, I give up. Believe whatever you want to believe.
My point is that even fast.ai still views S4TF as somewhat niche, and that data science practitioners as a whole still don't care.
None of these languages are going to be speedier in the GPU. But python has serious design problems that manifest itself when developing a large project.
You can only go so far with dynamic typing
With type inference, immutability, etc, Swift is far from tedious:
It’s not quite as nice as Python but it’s an enjoyable language.
> Because Swift for TensorFlow is the first serious effort I’ve seen to incorporate differentiable programming deep in to the heart of a widely used language that is designed from the ground up for performance.
> But Python is not designed to be fast, and it is not designed to be safe. Instead, it is designed to be easy, and flexible. To work around the performance problems of using “pure Python” code, we instead have to use libraries written in other languages (generally C and C++), like numpy, PyTorch, and TensorFlow, which provide Python wrappers. To work around the problem of a lack of type safety, recent versions of Python have added type annotations that optionally allow the programmer to specify the types used in a program. However, Python’s type system is not capable of expressing many types and type relationships, does not do any automated typing, and can not reliably check all types at compile time. Therefore, using types in Python requires a lot of extra code, but falls far short of the level of type safety that other languages can provide.
ie. My point: OP's point is categorically false.
...not that swift is tedious. I <3 swift.
Validating the original point that nothing will replace python for DL applications any time soon but middleware will continue to be implemented in c++/rust/swift/whatever you fancy.
S4TF isn't the first and certainly not the last end to end non-python DL stack. It might be worth highlighting as an example if it ever reaches mindshare above the noise floor amongst those stacks.
Well, the tldr: you’re wrong.
The more approachable reading: python isn’t going anywhere, but people are looking at other things for more than just low level implementations with a python wrapper.
...it’s early days yet, who knows where things will go... but maybe do a bit more reading and have an open mind?
There’s more to life than python.
IMO Swift strikes the best balance between strictness and productivity of any language I've worked with.
boy do I have a continent full of python developers to introduce you to... “python - by any means necessary” is their motto.
But in all seriousness, there are a lot of people (where I work) who started with python and have been daily driving it for so long that any other language is too tedious to get started with. Even if it means writing unreadable, hacky python, python still wins out for them.
I suspect there are a lot of similar people in data science.
A well written swift framework almost becomes a DSL for the problem domain, which is a great property for a data science tool to have.
Modern Rust isn't difficult to use. This is becoming a really tired meme from detractors. The compiler is incredibly helpful, non-lexical lifetimes are a thing, and unless you're doing a lot of sharing and parallelism, you can avoid many borrow checker problems until you learn RAII.
> Modern Rust isn't difficult to use. This is becoming a really tired meme from detractors.
I beg to differ. I've been programming professionally for over a decade, and I have shipped projects in a variety of languages, and I can safely say that Rust has a steeper learning curve and requires more cognitive overhead to use than many other languages. I find it relatively nice to work with rust in spite of this because the tooling is so great, but it's undeniable that Rust has made tradeoffs which sacrifice ease of use in favor of safety and performance.
I should be able to pull a docker image and have s4tf immediately at my fingertips
The two problems are a. I don't have a GPU locally
b. I would rather have a compiler than just a REPL/jupyter
The top level comment from an early adopter in the last Rust release thread on HN was complaining about the tedious complexity.
I do research and prototyping in Python, but I have to deploy on mobile devices. I was going to roll my own implementation, but now that this exists, it's something I'm going to look into.
Don't see why Rust made this possible.
Calling Python code from Rust or Rust from Python is totally doable, and there is in my view no reason why you shouldn't use both in the use cases that suit them.
And the speed part is serious. Some guy once asked for the fastest tokenizer in any given language and my naive implementation came second place in his benchmark right after an stripped down and optimized C variant.
So using Rust for speed critical modules and interfacing them from easy to use Python libraries isn't exactly irrational.
What would be the best route to do so in your experience?
Hence, Rust offers no performance benefit that isn't already there. It really only offers safety and modern language features, at the cost of being tedious to use.
For the record, I also use C and I know that expert level C can beat naive Rust at any time (expert level Rust should allow you to do literally the same thing you do in C, just wrapped in a “I can handle this”-unsafe block.
Your response tells more about yourself than it does about the subject at hand. Rust isn’t stealing your cookies and nor am I.
But while Julia targets Python fast and concise (while not compromising speed or power), it does not target the slower but more correct (though there is a culture of testing, which is quite important for math-oriented problems since the type system will not catch the most subtle and troublesome problems). There is space for a language to do exploratory/research work that can be quickly deployed in a fast iterative cycle and another for the new Spark/Flink or critical production areas that needs to take the extra effort (like self-driving cars), which could be Rust (or Scala, or Haskell, or Swift or stay with C++/Fortran).
It’s something more fundamental. Similar issues happens elsewhere in numerical computation. See https://en.m.wikipedia.org/wiki/Numerical_stability
It happens with ints too, but it is so much more obvious than with floats that no one would be surprised.
Repeating i += 0.1 a million times will produce the same value as i += 0.2 a million times if i is an int, since the rounding will make both effectively i += 0.
Rust vs Python is a weird question because in reality no one writes their own neural network with numpy, and no one expects Rust to act like an interpreted language suitable for data science workflows. It would be more apt to compare Rust and C++.
Python is an interface to C, C++, and FORTRAN for a lot of stuff. There are even crossover libs for running R.
This is like comparing apples and steaks.
Also, does Rust have a GPU/CUDA backend yet?
Most of the major deep learning frameworks for python (tensorflow, keras, torch, mxnet, etc) will not normally be spending the majority of their time in python. Typically, the strategy is to use python to declare the overall structure of the net, and where to load data from, and then the actual heavy lifting will be done in optimized libraries written probably in C++ (or fortran, I seem to recall BLAS used fortran).
Yep; poor wording on my part. Thanks. :)
I'm not sure rust is really aiming to be something used for data science workflows. I'm not sure the community will be putting much effort into making this a reality.
Rust seems great, to be honest, just not universally so. Nothing wrong with defining the boundaries.
I’m not totally convinced rust knows what it’s aiming to be, other than replacing c++...
While Julia's single language mantra is great, as long as things like Python exist, there will be a need for C/C++/Rust.
The only downside i've seen is sometimes the programmer will want more safety. In such a scenario Rust for the "frontend" would be very useful.
So we have two concerns, frontend and backend. For the backend Rust would perfectly acceptable, but i'm not sure it is fixing a safety issue/etc in other (C/etc) languages - aka, perhaps little value in the backend. For the frontend it only has value in some areas.
Regardless, i love Rust and would totally welcome any tooling to keep me in Rust. However i'm not an ML person hah.
And this kind of workload really isn't the problem Rust wants to solve.
And, how about Gonum (Go equivalent) ?
Finally, I’m currently going through the deeplearning.ai program. I got one week left, and will experiment with building some apps. Which technical stack should I choose ?
But for neural networks, people often prefer to use special hardware like graphics cards, since graphics cards are really good at doing relatively simple math on many pieces of data at once. So they create special libraries like TensorFlow that can send commands to the graphics card instead of doing the math on the CPU. (And they don't use Numpy because even though it's highly optimized, it's highly optimized for CPUs, and graphics cards are a lot faster than CPUs at running neural networks.)
1. Numpy doesn't run on GPU.
2. Numpy isn't high level enough for NN building.
3. Numpy doesn't have auto differentiation.
Other options solve some of these - Autograd solves 3, Jax solves 1 and 3, etc.
But if you want all 3 then you want to use Pytorch.
If I'm not doing research on new methods but want to build a model for a particular problem using well-known best practices, then all the custom code that my app needs and what I need to write is about the transformation and representation and structure of my particular dataset and task; but things like, for example, optimized backpropagation for a stack of bidirectional LSTM layers are not custom for my app, they're generic - why would I need or want to reimplement them except as a learning exercise?
That'd be like reinventing a bicycle, for generic things like that I'd want to call a library where that code is well-tested and well-optimized (including for GPU usage) by someone else, and that library isn't numpy. Numpy works at the granularity of matrix multiplication ops, but applied ML works at the granularity of whole layers such as self-attention or LSTM or CNN; which perhaps are not that complex conceptually, but do require some attention to implement properly in an optimized way; you can implement them in numpy but you probably shouldn't (unless as a learning exercise).
They'll also, equally well, serve the needs of Rust programs which need serious number crunching (presumably a small niche); there is no "versus" in the comparison.
In my own experience, Rust has been excellent for the more boring side of data science - churning through TBs of input data.
They didn't. At the end of the article they discuss this.
"In fact it’s worse than that. One of the exercises in the book is to rewrite the Python code to use vectorized matrix multiplication. In this approach the backpropagation for all of the samples in each mini-batch happens in a single set of vectorized matrix multiplication operations. This requires the ability to matrix multiplication between 3D and 2D arrays. Since each matrix multiplication operation happens using a larger amount of data than the non-vectorized case, OpenBLAS is able to more efficiently occupy CPU caches and registers, ultimately better using the available CPU resources on my laptop. The rewritten Python version ends up faster than the Rust version, again by a factor of two or so."
The original python code was written to be easily understood; not performant. Shifting more of the work to the libraries improved the performance.
In which case, IMO, it’s fairly surprising that the rust implementation is only 2x slower.
This is why Python has eaten the world. Not because its the best at any one thing, except bringing all those things together - at which it is unparalleled, and is unlikely to be surpassed anytime soon.
numpy, scipy, pandas, tensorflow all those have very little actual Python code, its c++ and even fortran here and there.
This whole Python vs Bla thing is just silly nonsense. I know Python and some Bla, and so should you. Tonight someone will release SuperFantasticNewThing implemented in Bla, tomorrow someone else will wrap that in Python, and tomorrow night the rest of us will use PySuperFantasticNewThing, and that's exactly how it should be.
Thanks to ML.NET, I can enjoy .NET JIT/AOT compilers performance, while binding to the same Tensorflow C++ libraries that are wrapped for Python.
Swift and Kotlin enjoy similar bindings on their respective mobile platforms.
Then there is Julia.
Python might have gotten there first, but its kingdom is already getting slowly eroded.
Python is an ok interface language, in that it's script-like, dynamically typed and simple to comprehend. It's popular, which makes on-boarding efficient due to the sheer volume of tutorials online. And, it has built up a large ecosystem, because of the last two points.
That said, it's naive to suppose that python is the currently-ideal or future-ideal interface language. It's just ok, plus it's popular.
This is the same logic one might have used to conclude that the LAMP stack would be dominant forever in web development if you took a snapshot of the world in 2004.
On the other hand, you typically have to reach for something like c++ for a low latency, high thruput environment. It can be a bear to write a server. Anything more ergonomic would be welcome
Its possible to transpile. But the tech is not mature yet
My takeaway: stick to a GPU where everything is more parallel.