Hacker News new | comments | show | ask | jobs | submit login

Is there a reason why all the above software cannot perform as "fast" or "safe" as Rust when written in other programming languages? After all, every program compiles down to machine code/assembly.



As soon as you need to be able to access low-level pointers for performance, you run into a problem: you can easily end up holding onto a reference to memory that's been cleared, or otherwise invalidated, by some other piece of code. This type of bug is insidious, and can easily be missed by the most stringent unit tests and even fuzzing. Of course you can try to get around this with abstractions on top, as every higher-level language does, and as you can do in C++ if you're willing to build reference-counted pointers around absolutely everything... but these are not zero-cost abstractions.

What Rust does is track reference lifetimes at compile time, giving you certainty about who can safely "own" or "borrow" memory in every single line of code, without any runtime pointer indirections or other slowdowns. The language is built around this feature at every level, with "lifetimes" being a syntactic construct at the level of type and mutability.

Imagine if you wanted to safely parse user-submitted JSON, maybe cache derived products from that JSON, and then make sure that when the input buffer is released, you weren't holding any handles into strings inside it. The only safe way to do that in any other language is to proactively copy the strings, or rigorously reference-count the input buffer. But Rust has your back here. If you use zero-copy deserialization from Serde ( https://github.com/serde-rs/serde/releases/tag/v1.0.0 ) then the compiler will refuse to compile if you are using any of that data longer than the lifetime of the original buffer, and do so without needing to add any runtime bookkeeping.

Yes, it's an annoying language to learn because of that "fight with the borrow checker." I LOVE that the language designers and steering committee are so open to quality-of-life improvements for newbies, like that string warning. The language will only get easier to learn over time. It may never be what you use to make your next app, but if you're doing systems programming, it's the wave of the future.


So, I recently saw a study claiming that memory management-related issues constituted 25% of attacks, the rest was other vectors. I may be misremembering, but in any case, simply getting rid of memory faults does not get rid of all attack vectors.

Given this, I wonder if the (seemingly) added complexity of Rust could result in more attack surfaces of other kinds.

I don't know anything about Rust mind you.


Rust is "fast" because it runs close to the bare metal, like C or C++. It doesn't feature garbage collection, many of which stop the world to clear memory.

Other non-garbage collected languages (ie those with manual memory management) lack Rust's memory safety semantics and are this subject to segfaults, buffer overflow exploits, etc. Rust is extremely "safe" since it prevents these types of errors at compile time.


So why does C, for example, lack Rust's memory safety semantics? Is it something to do with the design of the language itself? Can Rust predict user input, whereas C cannot?


Because Rust (the language) makes you reason about and define the concepts of ownership, borrowing and lifetimes of variables, and enforces this to the degree that certain classes of bugs are not possible. C does not require (or natively support) that, so this information is not available to the compiler.

It's just like typed and untyped languages. Typed languages require more up front work in that you must define all the types and data structures and which functions can accept them. This is more work than just creating them ad-hoc and using them as needed, but it prevents certain types of errors by catching them at compile time. The ownership and lifetime information for variables is loosely equivalent to that. It prevents certain types of problematic usage. It isn't perfect, and sometimes you have to work around its limitations, but the same could be said of most type systems.

There are plenty of primers on this feature of Rust, I advise you to take a look, you might find it very interesting.


There is a lot of "undefined behavior" (UB) in C, including straightforward stuff like overflowing signed integer addition. More insidiously, multithreaded code can be quite hard to write in C, because it's very very very easy to trigger UB in your multithreaded code. For example, if you have a shared variable that's protected by a lock, it's pretty easy to accidentally forget to lock the lock (or lock the wrong lock) before accessing the variable, and now you've invoked undefined behavior. Rust doesn't allow you to make those mistakes.


To be clear, Rust's model of locking data rather than locking code is really lovely, but that doesn't mean that it's not possible to mess up locks: Rust only prevents data races, not race conditions in general.

(However, you're correct in that it's not undefined behavior to mess up locking in Rust, at least not without an `unsafe` block involved.)


True, you can certainly deadlock in Rust or do other logic mistakes. What I was trying to get at is you cannot access data protected by a lock without holding the lock, and you cannot leak any references to that data past the unlock either, so you cannot stray into undefined behavior by accessing the same data from multiple threads without appropriate locking/synchronization (like you can do oh so easily in C). At least not without an `unsafe` block.

You know all this, of course. I'm just commenting for others' sake.


They're different languages with different philosophies. C provides a set of very powerful and potentially dangerous tools (direct memory access and management, for instance), and does not police how you use them. Rust want you to carefully explain what you want to do with those tools via its ownership system, unless you opt for "unsafe".

Rust is like the safety mechanism on a sawblade that shuts off once it realizes it's cutting into your finger.


Not sure which C compilers you're referring to. If you mean Clang which is also based on LLVM. https://clang.llvm.org/

Clang is a competitor to GCC.


I did not mention compilers in my comment. Do you mean that if I use LLVM compile a C program then I get the same assurances as when I compile a Rust program?


LLVM doesnt speak C, it is an optimization and codegen layer. Both rustc and clang output to LLVM.

Rust's main benefit is in the compiler itself, not optimization and codegen.


LLVM is a compiler construction, not a compiler.

If I understand, no languages offer the same assurances, I remember GodBolt is a nice way to explore how it's compile to assembly code you can compare.

https://rust.godbolt.org https://gcc.godbolt.org https://go.godbolt.org


Help me out. What is compiler construction?


Most compilers can be broken up into two steps, which is what we call a front end and back end. [1] The front end of a compiler does syntactical (parsing + lexing) and semantic analysis (type checking, etc). The back end of a compiler takes in an intermediate representation of the code, performs optimizations, and emits the assembly language for a target CPU. Clang is an example of a front end and LLVM is the back end for Clang. Clang and rustc both share LLVM as a back end, meaning they both emit LLVM IR.

[1] Many compilers have much more than two stages. For example, Rust has another intermediate representation called MIR.


Many thanks, kind stranger.


If your team of programmers tried to build something like Servo in assembly, you would never complete the task because the task is so laborious and error prone. Finding and correcting all the bugs in your hand-written assembly isn't feasible.

So it's theoretically possible to express such a program in assembly language, but it's not something humans could realistically produce without tools such as Rust.


It might be able to, but there are no guarantees. That is the real benefit of strong and expressive type systems: It can prove properties of the program at compile time. An equivalent program could be written in c, brainf*ck, or even a Turing machine, but it gets harder and harder to prove properties like memory safety the less structured the language.


Let's say you are using LLVM to compile a Rust program, and an "equivalent" C program. You can compile both of them down to IR, and then enforce type safety at the IR level. Doesn't that ensure that you can prove properties about the program at compile time?


Possibly, but types can encode far more than just the structure of data. Rust, for example, uses types to encode lifetime and ownership information. Haskell uses the IO monad to encapsulate non-determinism. Neither of those have equivalent concepts at the IR level.

It's not a set law, but more expressive type systems almost always increase the class of properties that can be "easily" proved in a language. I work on a verification tool for C/C++ programs and we constantly struggle with the languages. Pointer arithmetic and aliasing dramatically complicate any possible analysis, and these problems are only exacerbated at a lower level IR/ASM level.


You can't enforce type safety at the IR level. LLVM IR has very little in the way of type information.


I've been manually writing some LLVM IR recently to prepare for a project involving JIT compilation, and LLVM's type system is actually shockingly expressive. The majority of the problems I run into are the fact that you have to copy-paste more often and that leads to errors.

I wouldn't recommend anyone write real code using LLVM IR, but it's not as bad as you'd expect.


The hard part is getting the "equivalent" C program. :)


Just to the same links to your question.

https://gcc.godbolt.org https://rust.godbolt.org


> Is there a reason why all the above software cannot perform as "fast" or "safe" as Rust when written in other programming languages? After all, every program compiles down to machine code/assembly.

Yes, the reason is things like garbage collection and language runtime. Every program does ultimately run some form of machine code, but the amount and type of code generated can vary very widely, not even considering things like VM where you have another couple of layer of abstraction that slow things down.


Yes, in theory all programmes in all languages eventually run machine code. However some programming languages (e.g. Python) will do things that you can't get away from, like reference counting. So you'll be running extra machine code and you can't "turn that off". The designers of that language have made valid trade offs to result in that.


The GP comment mentioned Ruby, Python and JS which are not suitable for this kind of things.

Of course you can use some other compiled language instead of Rust, I think the choice boils down to productivity and ecosystem.

EDIT: where I said "some other compiled language" I should have really said "some other compiled and non-garbage-collected language"


Rust is based on LLVM just like Swift, Pony, Crystal, etc to compile to native code. Java uses bytecode which will need to translate to machine code in JVM.

Node.js, Java, Go use Garbage Collection for use case where programmers do not have to manage memory.


You could use LLVM to compile any language to LLVM IR, and then to machine code using a backend. Does that mean every language has the same properties as Rust?

If Node/Java/Go use GC (or VMs), then aren't they more safe than Rust?


> You could use LLVM to compile any language to LLVM IR, and then to machine code using a backend. Does that mean every language has the same properties as Rust?

    Nope, it's largely depend on the language design.
> If Node/Java/Go use GC (or VMs), then aren't they more safe than Rust?

    Yes and No, memory allocation is one of the issue in GC. Go is generally more safe on networking but not security where Rust can manage it securely.
This is a good read on Java vs Rust: https://llogiq.github.io/2016/02/28/java-rust.html

https://news.ycombinator.com/item?id=14173716 https://github.com/stouset/secrets/tree/stack-secrets

In fact, we don't have to bother much with GC because it's depend on programmers and job availability. GC was created based on the idea that manage memory is hard for large scale project. It work well for Azul Systems, and they have recently advertised for LLVM engineer to bring more performance where JVM could not.


Parts are completely untrue. The JVM has the advantage to selectively jit-compile hot or small functions, whilst Rust has to compile all at once. Rust binaries are thus big, whilst you don't ship JVM binaries, you ship small bytecode.

Go has the advantage of memory safety (via GC), plus better concurrency safety, which is lacking in Rust. There are concurrency safe languages, but not mentioned in this thread.


> Rust binaries are thus big, whilst you don't ship JVM binaries, you ship small bytecode.

This is not borne out by practical experience. While we've not experimented much with Rust, our Java deployments are significantly larger than other native languages like Go (and I expect Rust would actually be a bit smaller than that since it requires less runtime than Go).

JVM deployed binaries are large, not especially because the bytecode is large, but because you have to ship all the bytecode for all your code and all its transitive dependencies; there's no linker and the semantics of the language make it essentially impossible to statically prove that individual functions or classes aren't needed. You can trim it down with tools like Proguard, but that's a non-trivial undertaking and prone to error, which again you won't know until runtime.

Plus the drawback that you need a relatively large VM to run a JVM binary, but you can run Rust binaries completely standalone (out of a scratch container if you want).

> Go has the advantage of memory safety (via GC), plus better concurrency safety, which is lacking in Rust.

I'm curious what you mean by "better concurrency safety". My understanding is that Rust attempts to statically prove that concurrent accesses are safe (e.g. https://blog.rust-lang.org/2015/04/10/Fearless-Concurrency.h..., especially the section on locks). Go does nothing of the sort - it provides some nice concurrency primitives and tooling to detect data races, but the compiler does nothing to actively prevent it.


Go, in my understanding, is not memory safe even with its gc, and also doesn't have "concurrency safety" because of this. That's why https://golang.org/doc/articles/race_detector.html exists.

Rust however prevents these kinds of errors at compile time.

Which things were you thinking of that Rust was lacking here?


The go race detector is still miles better than enforcing manual mutexes and locking in concurrent Rust code.

Much better would be a proper type system to get rid of those at compile-time of course. Look at pony. And a better memory-safety system than RC.


You don't always need those; it depends on what you're doing. And furthermore, that _is_ the compile time system that enforces things. Go's race detector can only prove the presence of things, not their absense.

As for Pony,

> Pony-ORCA, yes the killer whale, is based on ideas from ownership and deferred, distributed, weighted reference counting.

That's it's GC.


What you mentioned JIT is exactly my thought when IBM explained why they like about Swift over Java in JVM.

More technical explanation: https://www.ibm.com/support/knowledgecenter/SSYKE2_7.0.0/com...

"In practice, methods are not compiled the first time they are called. For each method, the JVM maintains a call count, which is incremented every time the method is called. The JVM interprets a method until its call count exceeds a JIT compilation threshold. Therefore, often-used methods are compiled soon after the JVM has started, and less-used methods are compiled much later, or not at all. The JIT compilation threshold helps the JVM start quickly and still have improved performance. The threshold has been carefully selected to obtain an optimal balance between startup times and long term performance."

> Go has the advantage of memory safety (via GC), plus better concurrency safety, which is lacking in Rust.

    Are there any examples you could list?
One of Go GC issue is based on this stouset's discussion https://news.ycombinator.com/item?id=14174500


Java bytecode looks rather large to me.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: