Rust (despite the common understanding) is not a memory-safe language in its ent...

burntsushi · on Dec 1, 2022

Is there any practical programming language that is memory safe in its "entirety"? Python, for example, certainly is not. It has unsafe escape hatches (via ffi, at the very least). Yet, everyone I know of says and thinks of Python as a memory safe language. I do as well.

> which makes it easier for developers to compartmentalize code to achieve memory-safety

The problem here is that this is incomplete. Many many many languages have achieved this before Rust. Where Rust is (somewhat although not entirely) unique is bringing this compartmentalization into a context that (mostly) lacks a runtime and garbage collection.

I have no problems calling Rust a "memory safe language" precisely because I have no problems calling Java or Python "memory safe languages." What matters isn't whether the language is "entirely" memory safe. What matters is what its default is. C and C++ are by default unsafe everywhere. Rust, Java, Python and many others are all safe by default everywhere. This notion is, IMO, synonymous with the more pithy "memory safe language."

bluGill · on Dec 1, 2022

> Is there any practical programming language that is memory safe in its "entirety"?

This isn't possible. Eventually you are sitting at a block of memory and need to write the allocator. Maybe (like python) your allocator is written in C and you hide it, but there is always something that isn't memory safe sitting under your language.

You could write a language for an actual Turing machine which since it has infinite memory is by definition memory safe. However as soon as you need to run on real hardware you have to work with something unsafe.

You can of course prove a memory allocator is correct, but it would still have to use unsafe in rust. I supposed you could them implement this alloator in hardware, and make rust use that - but since this doesn't seem like it will happen I'm going with all languages have unsafe somewhere at the bottom.

burntsushi · on Dec 1, 2022

Yes, exactly. That's why I asked the question: to drive out the point that the ontology the GP was using was probably not terribly useful.

Although I did use the weasel word "practical" to narrow the field. If you don't limit yourself to general purpose languages, then I'm sure you can find one that is "entirely" safe.

AnimalMuppet · on Dec 1, 2022

That depends on your definition of "practical" and "entirety".

The article was about languages being used to implement Android. Clearly, no, you can't have an entirely memory safe language that can be used to implement Android, for the reason you said. But there's a wide gap between "practical for doing useful work of any kind" and "practical for implementing Android".

Then, "entirely". What's "entirely"? Entirely until you get to library calls? Entirely until you get to OS calls? Entirely including the OS? If you include the OS then again, you are right for the reason you said. But if you exclude the OS, I'm not so certain.

saagarjha · on Dec 2, 2022

Sure, but the language doesn’t have to expose it to you. There’s a bunch of other processes running on your system too aside from your program, but the OS prevents you from scribbling all over their address space.

bluGill · on Dec 2, 2022

Rust is a system programming language. If I have a new idea for an allocator they want me to write the experimental version in rust. If you never write an allocator and other such tricks you don't need unsafe - you could use one of the other languages. Java doesn't have unsafe, but you cannot write a custom allocator in java (well you can, but it will by a manual process to use it - you have to drop back to C if you want java to use your custom allocator by default)

masklinn · on Dec 1, 2022

> It has unsafe escape hatches (via ffi, at the very least).

Yep, ctypes is part of the stdlib and lets you corrupt the VM on the fly. Fun stuff like changing the value of cached integers and everything.

But ctypes being a terrifying pain in the ass, people tread very carefully around it. Cffi’s a lot better though it requires an external package. At the end of the day I think I’d be more enclined to bind through pyo3 or cython than write C in python (which is what ctypes has you do without even what little type system C has, to say nothing of -Wall -Weverything).

rraval · on Dec 1, 2022

> But ctypes being a terrifying pain in the ass, people tread very carefully around it.

I'm not sure how much people treading carefully actually translates into safety in practice.

CPython in particular has ad-hoc refcounting semantics where references can either be borrowed or stolen and you have to carefully verify both the documentation and implementation of functions you call because it's the wild west and nothing can be trusted: https://docs.python.org/3.9/c-api/intro.html#reference-count...

This ad-hoc borrowed vs stolen references convention bleeds into cffi as well. If you annotate an FFI function as returning `py_object`, cffi assumes that the reference is stolen and thus won't increment the ref count. However, if that same function instead returns a `struct` containing a `py_object`, cffi assumes the reference is borrowed and will increment the ref count instead.

So a harmless looking refactoring that changes a directly returned `py_object` into a composite `struct` containing a `py_object` is now a memory leak.

Memory leaks aren't so bad (even Rust treats them as safe after the leakpocalypse [1] [2]). It's when you go the other way and treat what should have been a borrowed reference as stolen that real bad things happen.

Here's a quick demo that deallocates the `None` singleton:

    Python 3.9.13 (main, May 17 2022, 14:19:07)
    [GCC 11.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import sys
    >>> sys.getrefcount(None)
    4584
    >>> import ctypes
    >>> ctypes.pythonapi.Py_DecRef.argtypes = [ctypes.py_object]
    >>> for i in range(5000):
    ...     ctypes.pythonapi.Py_DecRef(None)
    ...
    0
    0
    0
    0
    0
    [snip]
    Fatal Python error: none_dealloc: deallocating None
    Python runtime state: initialized

    Current thread 0x00007f28b22b7740 (most recent call first):
      File "<stdin>", line 2 in <module>
    fish: Job 1, 'python3' terminated by signal SIGABRT (Abort)

[1]: https://rust-lang.github.io/rfcs/1066-safe-mem-forget.html [2] https://cglab.ca/~abeinges/blah/everyone-poops/

masklinn · on Dec 1, 2022

> Here's a quick demo that deallocates the `None` singleton:

As I said, you can trivially corrupt the VM through ctypes. However I don't think I've ever seen anyone wilfully interact with the VM for reasons other than shit and giggles.

The few uses of ctypes I've seen were actual FFI (interacting with native libraries), and IME it's rare enough and alien enough that people tread quite carefully around that. I've actually seen a lot less care with the native library on the other side of the FFI call than on the FFI call itself (I've had to point issues with that just this morning during a code review, if anything the ctypes call was over-protected, otoh the update to the so's source had multiple major issues).

kaba0 · on Dec 2, 2022

Well, there is still an important difference between Java and Rust — are you driving with a guardrail on a field vs are you driving next to a cliffhanger.

The JVM has well-defined bad execution as well, e.g. data racing is well-defined. Safe Rust does prevent data races statically, but if they do happen due to a bad unsafe block, you are entirely on your own. While memory safety can abruptly stop both processes, FFI is very rare in Java, it is an almost completely pure platform being Java all the ways down, so in my experience the former is safer from this aspect.

burntsushi · on Dec 2, 2022

I don't have much Java experience, so I'll have to take your word for it. But it's not completely obvious to me that you're correct. We've moved from an absolutist idea of memory safety to trying to build an implicit ontology of tiers of memory safety based on usage. Now you're talking about going and doing surveys of code and trying to measure the relative frequency of certain things and then using that to drive a tiered hierarchy of memory safety in programming languages.

Sounds hard to do and you also haven't accounted for what problems are being solved in each language. I can pretty much decide to never ever use `unsafe` again, but I'll be leaving perf on the table. If I were writing Java, I would probably be fine with that. But I'm working on interesting problems that want the most perf possible, and so U do very occasionally justify `unsafe` when writing Rust.

kaba0 · on Dec 2, 2022

As others mentioned, there is no absolute safety, nor memory, no anything. The hardware can have bugs, the verification toolkit can have, or the properties to be verified could have been incorrectly specified to begin with.

I’m just saying that corrupting the heap is much easier with Rust than with Java, and there is no coming back from heap corruption on a process basis, while most exceptional cases are recoverable by the JVM (hence the cliff analogy).

And Java can have surprisingly good performance, especially in multi-threaded code that has a non-predictable allocation pattern (where ARC is just not too good) — if you want significant performance improvements you really have to go down the inline asm road, which you can do from anywhere.

burntsushi · on Dec 2, 2022

> there is no absolute safety

Now we've come full circle. I recommend you go back and read my initial comment in this thread and the comment I was responding to. You've veered far off course from there into waters in which we likely have very little disagreement of any consequence.

> And Java can have surprisingly good performance

Show me a regex engine written in Java that can compete with my own, RE2, PCRE2 or one of a number of production grade regex engines written in C, C++ or Rust. I'm not aware of any.

That Java can "have surprising good performance" is not a statement I'd ever disagree with in general terms. That has absolutely zero to do with anything I've written in this thread (or elsewhere, ever).

Will all due respect, I think you've lost the script here.

kaba0 · on Dec 2, 2022

You may be right, I’m not really disagreeing, I can absolutely stand behind this sentence of yours:

> Where Rust is (somewhat although not entirely) unique is bringing this compartmentalization into a context that (mostly) lacks a runtime and garbage collection

I just think that the model of “breaking down” is different between the two platforms and that might matter for some use cases.

burntsushi · on Dec 2, 2022

Right, that's why I said:

> But it's not completely obvious to me that you're correct.

:-)

Which is to say, I don't know you're wrong. But it's a pretty subtle thing that requires a careful survey. And likely discussion of lots of concrete examples. It's far more nuanced than the thing I was responding to originally (not to you), which was this wrong-headed notion that Rust isn't memory safe "entirely." Because once you go down that path, the entire notion of "memory safety" starts to unravel. That is, of course Rust isn't "entirely" memory safe. Pretty much nothing practical actually is in the first place. I tried to force this issue by asking for counter-examples. The only good one I got was Javascript in browser, but that basically falls under the category of "programs in a strictly controlled sandbox" rather than "programming language" IMO.

I think this comment of mine might also be helpful, which reflects a bit on terms like "memory safety" and why they are a tricky but very common type of phenomenon: https://news.ycombinator.com/item?id=33825307

cesarb · on Dec 1, 2022

> It has unsafe escape hatches (via ffi, at the very least).

Playing devil's advocate, I can think of at least one language which has no escape hatches: Javascript running within a web page.

kaba0 · on Dec 2, 2022

It is very easy to create a safe language. Hell, most brainfuck interpreter is likely completely safe, you allocate a large enough array and just iterate over its basic instructions that only ever modify that array and print a character. A Turing machine in itself can do no harm.

The hard part comes at allowing it to do something useful, but only the parts I believe should be able to. E.g. plugging in file system access to our brainfuck interpreter will make it quite unsafe. Node for example does have C FFI.

burntsushi · on Dec 1, 2022

Yes, but that's not just a language. It's a language within a certain context. But it is a worthy mention.

dahfizz · on Dec 1, 2022

I think a distinction can be made in that you never really need to use unsafe operations in python or Java. In rust, you need unsafe. Just about every data structure in the stdlib uses unsafe.

I think it's fair to call Rust a memory safe language. But I don't think it's on the same tier as a fully managed language like python.

burntsushi · on Dec 1, 2022

I suppose reasonable people can disagree, but I don't think it's anywhere near as clear cut as you seem to be implying. You talk about data structures in std using unsafe, but you don't mention the heaps and piles of C code used to implement CPython's standard library.

It's not like you need `unsafe` in Rust to build every data structure. I build oodles of data structures on top of the fundamental primitives provided by std without using any `unsafe` explicitly whatsoever.

And it is not at all uncommon to write application code in Rust that doesn't utter `unsafe` at all. Even ripgrep has almost none of it. At the "application" level it has exactly two uses: one related to PCRE2 shenanigans and one related to the use of file backed memory maps. Both of those things are optional.

Then there's another whole perspective here, which is that if you're using Rust in the first place, there's a non-trivial chance you're working on something "low level" that might require `unsafe`. Where as with Python you probably aren't doing "low level" work and just don't care much about perf within certain contexts. That has less (albeit not "nothing") to do with the design of the languages and more to do with the problems you're trying to solve.

To be clear, I am not saying you're definitely wrong. But as someone who has written many tens of thousands of lines of both Rust and Python, I would put them on the same or very very close level in terms of memory safety personally. Certainly within the same tier.

dahfizz · on Dec 1, 2022

You make a good point that much of pythong stdlib is implemented in C. But you could implement python's list in pure python, safely. You can't implement something like that in rust without unsafe.

kibwen · on Dec 1, 2022

You can implement lists in Rust safely; with enums, a list is four lines of safe code. You can even implement a doubly-linked list safely. You just can't do either of these things by wielding pointers willy-nilly. If you're willing to accept a performance tradeoff by implementing a list in pure, bootstrapped, FFI-free Python, then you can do the same in Rust.

burntsushi · on Dec 1, 2022

You certainly can! And that's a good example, because it exposes just how important context is to this discussion. Perf matters in certain contexts. If you implemented a list in pure Python, do you think its users would find the overall perf of Python to be acceptable?

alserio · on Dec 1, 2022

How would you implement a python list in python? I mean, what would you consider "acceptable" primitives to do so?

kllrnohj · on Dec 1, 2022

> I think a distinction can be made in that you never really need to use unsafe operations in python or Java.

You can't write any code at all in Python or Java without relying on unsafe operations. Both of them have their runtimes written in C/C++.

So based off of this unusual line of reasoning, Rust is strictly more memory safe than either of those as it's at least possible to have a Rust program without any unsafe code. That program will be of questionable value, sure, but it can at least exist at all whereas it can't for Python or Java.

kaba0 · on Dec 2, 2022

I’m not sure going down this road is meaningful because as soon as we get to machine code generators you get a “reset” on safety, no matter the language you implement a compiler in, it can have logic bugs which will result in many sort of serious bugs, including memory ones. This is true of both the rust compiler and Java’s JIT compiler.

Interpreters and the rest of the VM is a different beast, while they also have to be bootstrapped from some unsafe language one way or another, they are usually written in a much more expert, security- and correctness oriented way than your average program. So while they can and do have bugs, they are exceptionally well tested and, well, I wouldn’t expect the JVM to die out under my program the same way you don’t really expect the kernel to freeze either. This is also true of Rust stdlibs, I assume, but is it true of third party libs?

Sirened · on Dec 2, 2022

>You can't write any code at all in Python or Java without relying on unsafe operations. Both of them have their runtimes written in C/C++.

This isn't a meaningful distinction, in the end. Hardware is unsafe too. Real production CPUs have bugs in them which lead to cache lines becoming corrupted, address translations being wrong, branches going to the wrong place, etc. under extremely weird conditions. But, in the end, we don't really do much about it because we trust that it probably won't impact us since we assume the people who built the SoCs or those who wrote the standard library did a good enough job.

masklinn · on Dec 1, 2022

> You can't write any code at all in Python or Java without relying on unsafe operations. Both of them have their runtimes written in C/C++.

Technically, pypy is a Python runtime written in Python.

pjmlp · on Dec 1, 2022

There are at least three Java runtimes written in Java, Jikes RVM, MaximeVM, and GraalVM.

Jweb_Guru · on Dec 1, 2022

At least two of which use an unsafe dialect of Java for significant parts of the runtime, which I'm pretty sure you know well (maybe not Graal, but if not it's because it's bootstrapping on top of existing unsafe code).

pjmlp · on Dec 2, 2022

Easy localizable via grep, and not full of UB and memory corruption issues, which is what the 70% of unsafety issues due to memory corruption on C, C++ and Objective-C relate to.

At some level of the stack some Assembly or compiler intrisics are needed, not at every line of code.

Jweb_Guru · on Dec 2, 2022

Jikes is the one I'm most familiar with and people working on its runtime absolutely suffered from UB and memory corruption issues... obviously not throughout the whole standard library but that's not the case for other JVMs either. In fact the Jikes people found it nicer to work in Rust than in Java on components like the garbage collector, because it was a better fit for working safely with this kind of code and they didn't have to write in a restricted subset of the language to avoid triggering the GC.

pjmlp · on Dec 2, 2022

Since when Jikes uses Rust?

Also bootstrapting a language always requiring using its subset for low level layers, apparently not an issue that many parts of C, C++ cannot be implemented only with what ISO provides on the standard.

fiedzia · on Dec 1, 2022

> You can't write any code at all in Python or Java without relying on unsafe operations.

There are Python interpreters written in other languages: there is one in Rust, and there is Jython and IronPython.

dahfizz · on Dec 1, 2022

By that logic, you can't write any safe rust at all because it relies on a compiler written in C++.

We are discussing the languages themselves, not any particular implementation.

steveklabnik · on Dec 1, 2022

Only the codegen and many optimization parts of the compiler is in C++. The rest of it is in Rust.

Ar-Curunir · on Dec 1, 2022

One could implement many data structures without unsafe, but with less efficiency. E.g. using an arena allocator

estebank · on Dec 1, 2022

I would like to dispute the "with less efficiency" simplification, because depensing on the size and usage patterns of your code, a doubly linked list or sikilar graph datastructure, backed by an arena will be faster than the way those data structures appear in books.

dahfizz · on Dec 1, 2022

Sure, but that is kind of what I mean. Safety in rust is something you actively have to think about and work around (at least some of the time). It doesn't just come for free like in python.

bogeholm · on Dec 2, 2022

Point is, “for free like like in Python” really means “the C implementation hides that for you”

lmm · on Dec 2, 2022

Standard ML is entirely memory safe (some but not all implementations offer nonstandard escape hatches). I've heard someone here claim that the strictly standard version is a practical programming language, although I'm not sure I believe them.

afdbcreid · on Dec 1, 2022

Surely this is true, but I still have the feeling that libraries in Rust tend to have more unsafe code than Java, Python, C# or others, maybe even more unsafe code than needed. Perhaps this is related to the problem domain.

tialaramex · on Dec 2, 2022

You would definitely need to control for domain.

A Rust library for some sort of mathematical modelling might well need no unsafe at all, while a Java library for controlling some hardware might soon turn into JNI talking to some C++ code and oops you're unsafe.

In C# you need to reach for unsafe to do some of the stuff Rust can just do safely anyway. Did you know a C# struct with an array of 8 ints in it, doesn't actually have the eight ints baked inside the struct? It was easier in the CLR not to do that, so they didn't. Which means C# structs which look like a compact single object that surely lives in a single cache line don't actually do that in safe C#. You need unsafe.

pjmlp · on Dec 2, 2022

It does if you learn to use C# properly,

https://learn.microsoft.com/en-us/dotnet/api/system.runtime....

In actual native code produced by RyuJit, you don't need to worry about cache lines for single instances, because the struct might not even exist at all, the Jit having mapped fields into CPU registers instead.

When it matters, like the struct being part of an array, use StructLayout.

tialaramex · on Dec 2, 2022

That link seems like it's about alignment rather than about arrays inside structures?

pjmlp · on Dec 2, 2022

No, it is about alignment and packing, you use StructLayout attribute alongside LayoutKind and FieldOffsetAttribute.

https://learn.microsoft.com/en-us/dotnet/api/system.runtime....

You main issue was how structures arrange their fields.

Also regarding arrays and structs, as of C# 7 you can use fixed to declare static arrays inside structs, however these structs need to be marked as unsafe.

tialaramex · on Dec 3, 2022

> as of C# 7 you can use fixed to declare static arrays inside structs, however these structs need to be marked as unsafe.

That is exactly what I was talking about.

pjmlp · on Dec 3, 2022

Ah ok, somehow misunderstood that.

However there are actually good reasons for it to be unsafe, although it is debatable if that alone should be it.

One is due to the interactions with the GC, in case it moves the data and there are references to its elements, and stack size.

One way to get around it is to use AoS instead of SoA, which is any the best option if performance is the ultimate goal.

raxxorraxor · on Dec 2, 2022

> It was easier in the CLR not to do that

This has also advantages in that you don't need to allocate the struct in a coherent memory block. Edge case of course, but there are domains where this is relevant.

There was an allocation bug once because unsafe code needs to be allocated consecutively but most memory checks that only returned available memory failed to account for fragmented memory.

kaba0 · on Dec 2, 2022

> You need unsafe.

You beed it for that feature. It is questionable whether you really want to mandate a special memory layout (because you can’t really do that even in Rust, you don’t have explicit control of struct alignments, paddings, order(!) )

steveklabnik · on Dec 2, 2022

Rust absolutely gives you control over alignment, padding, and ordering. It’s just not the default. Ask for those things and you shall be given it.

kaba0 · on Dec 2, 2022

Could you point me to some resources on that? I only know about #[repr] options, but that isn’t absolute control (e.g. for having structs usable from rust and internal asm)

steveklabnik · on Dec 2, 2022

What is “internal assembly”? I’m not familiar with that term.

Is there anything else that the various repr options don’t give you? My team at work does OS dev in Rust, and haven’t ever run into cases where Rust can’t do what we need it to do in these cases.

kaba0 · on Dec 2, 2022

* inline assembly, just my brain stopped working for a sec :D

Well, my specific case is writing a fast interpreter in Rust, where I would like to use elements like a stackframe from both inline asm and proper Rust code. In my first iteration I chose a dynamically sized u64 array, wrapped in a safe API, because I couldn’t be more specific. But even with known size elements the best I can do - to my knowledge - is Layout? Or just a raw pointer and a wrapper with helper functions, as otherwise I can’t modify the object in question from both places.

steveklabnik · on Dec 2, 2022

Ah yeah no worries :)

It’s sort of tough because I am only familiar in passing with the patterns in that type of code, but Layout is an allocator API, so I’m not 100% sure why it would be used here. I’d guess that if I was doing something like this, I’d be casting it to and from a struct that’s defined correctly. This is one area where stuff is a little simpler than C, thanks to the lack of TBAA, though many projects do turn that off.

saagarjha · on Dec 2, 2022

Rust code frequently is used in a systems programming context, where it interoperates with unsafe code or needs to occasionally overrule the compiler to satisfy performance requirements.

burntsushi · on Dec 2, 2022

Maybe. I don't know. You'd have to collect some data.

> Perhaps this is related to the problem domain.

Yes, I included that possibility in my comment here: https://news.ycombinator.com/item?id=33821787

fiedzia · on Dec 1, 2022

> Is there any practical programming language that is memory safe in its "entirety"?

Whatever can be compiled to BPF meets this requirement. The price though is that it wouldn't be very useful.

burntsushi · on Dec 1, 2022

Right, that's why I used the word "practical."

saagarjha · on Dec 2, 2022

JavaScript? It’s not typical to provide it with any access to unsafe APIs.

burntsushi · on Dec 2, 2022

Someone already mentioned that. That only works if you restrict yourself to JavaScript in the browser. There's a huge ecosystem for using JavaScript outside of the browser.

saagarjha · on Dec 4, 2022

It’s not very useful to talk about the memory safety of languages as a whole without looking at specific implementations. JavaScript in a browser is memory safe. JavaScript with access to /proc/mem is no longer memory safe. C on most hardware is not memory safe. C running on the abstract machine itself can be.

burntsushi · on Dec 4, 2022

This looks like a comment in response to https://news.ycombinator.com/item?id=33820918 and not to me.

The high level idea of my original rebuke was this idea that Rust was somehow lesser because it isn't "entirely" memory safe, and that its purpose was to divide safe from unsafe. But that really misses some very big points, because the programming language implementations used to build programs virtually everywhere are similarly not "entirely" memory safe, and many many many languages before Rust divided safe from unsafe.

Notice how I modified my rebuke to include your caveat. Does my point change? Does the strength of my rebuttal change? Does anything materially change at all, other than using yet more word vomit to account for caveat? No, I don't think there's anything materially different other than more words.

I tried to sidestep all of this by using the weasel word "practical." So next time I'll just say, "any practical non-sandboxed programming language." You might still chide me for confusing "programming language" with "implementation of programming language," but I've never much cared for that semantic because the ambiguity is almost always obviously resolvable from the context.

> It’s not very useful to talk about the memory safety of languages as a whole without looking at specific implementations.

Not sure I would agree with this, but it probably depends on what you mean. We can meaningfully discuss the memory safety properties of the programming languages (not just the implementations) of Rust, C and C++. I think you have to still acknowledge the practical realities of any particular implementation that others will use to build real programs, but I contend you need not do so more than what the language design does on its own already. Because languages aren't designed in a vacuum. Even if you can build an abstract machine, for example, C was not designed to be an abstract machine. It was designed to get stuff done in the real world, and the real world influenced that design. Same for Rust.

Things like CHERI will potentially change this conversation quite a bit. I was even thinking about it when I wrote my original comment in this thread. But I think it is, at present, covered by the weasel word "practical." It isn't practical to use CHERI yet, as far as I know.

saagarjha · on Dec 8, 2022

I should probably preface this comment by mentioning that I don't think there is anything new in it for either of us. Nor do I think we actually disagree on any of the facts. My earlier comment, and this one, was really just a response predicated on what I think the colloquial meaning of "memory safety" is, and to whether a practical language can be "truly memory safe"…which of course depends on what you see a programming language as being.

Memory safety is, as you have already mentioned, not black and white: I wouldn't even put it on an axis, because that suggests the scale is one-dimensional, and I don't even think it is practical to discuss it in that context. I prefer to categorize languages (for a definition of "language") in a couple of rough groups where most of them hang out.

In the first group is C and C++ as you're typically used to it, where pretty much every operation can do something unsafe and there's really no safe subset of the language, much less safety by default.

The second group is the "safe by default" languages like Rust or Python or Java, were you can write functional programs in the entirely safe subset (which is usually the default). This is where things get more complicated, though, because what the unsafe bits look like differ. Some give you language-level constructs to do unsafe things, such as Rust (with unsafe) and Java (with sun.misc.Unsafe or whatever). I think CPython technically also falls here because of some weird implementation choices where you can corrupt memory, but it's really more of being in the other category where you can do unsafe things via FFI and external interfaces. That's kind of where most Lua implementations live, or nodejs stuff.

Then you have the things which (usually intentionally) do not give you any of these things. That's JavaScript or WebAssembly in a browser. The final stop in this line is where you start placing significant limits to what the language itself can do, such as eBPF running in the kernel, or domain-specific parsers like WUFFS.

I've been pretty sloppy with what I call a "language" here, because you can always take a programming language and slap memory safety on it: though not trivial, you can sandbox it, pick some subsets of it, put in hardware, etc. (FWIW CHERI doesn't actually make C/C++ completely memory safe, it just helps.) And going the other way is pretty easy, you just add features to let programs mess with the execution environment.

I get that the comment that you're replying to is trying to well acktually you and I agree with the rest of your response, but the takeaway I have here is "you [the commenter you were responding to originally are coming in with a definition of memory safety, yes in this context Rust does have these escape hatches and this is what they do, but in vernacular it is safe because this is how we typically evaluate languages for this sort of thing". Which, again, is like 90% of what you wrote already, I just think that it is probably worth bringing up that there is a pretty common environment for a popular language that actually takes things a step further than this, with whatever tradeoffs that entails. Not really a disagreement, just a "hey I think this is worth mentioning".

civopsec · on Dec 1, 2022

Then even Java is not memory safe according to the implicit standard that you allude to here since one can use the `Unsafe` class.

kaba0 · on Dec 2, 2022

Not for much longer, access has to be very explicitly specified at start time so that (deliberate) hole is getting smaller and smaller. But of course native functions is a thing (but very infrequent)

insanitybit · on Dec 1, 2022

No language in use meets your definition of memory safe.

scj · on Dec 1, 2022

Perhaps the problem is with the term "memory safe".

No language can prevent a person from allocating a writable buffer, then reusing it without cleaning it. Do that on a server, and you have step 1 to a security vulnerability.

If requests to allocate memory come faster than the garbage can be collected.

Or a data container holding many/large references that will never be used. The difference between that and a lost pointer in C are moot in a practical sense.

All of these _can_ be prevented. But it's programmer care, rather than the language, that prevents them. Hence, the term "memory safe" is inaccurate. "Memory safer" would be more accurate, but far less catchy.

burntsushi · on Dec 2, 2022

Yes, but this problem exists everywhere all the time in virtually any context, even outside of technology. It's a very general problem that plagues communication. My perspective on the matter is the following:

1. We love to simplify matters down to black & white thinking with absolutist statements.

2. Attention spans are short (and probably getting shorter), so we try very hard to be pithy.

3. General seeming statements are actually narrower than they appear.

4. When taking a statement at its literal absolutist meaning leads you to an absurd conclusion, you're "supposed" to use your own judgment to interpret it imprecisely rather than ridiculously.

"memory safety" fits these criteria pretty well, especially the third point. Clearly, you really can't have a programming language be practical/general-purpose while simultaneously being completely and totally "memory safe." It's just ridiculous given our current predominant operating systems and architectures. The pithiness of "memory safety" relies on you, dear reader, knowing that and interpreting "memory safety" as something more reasonable than that.

> No language can prevent a person from allocating a writable buffer, then reusing it without cleaning it. Do that on a server, and you have step 1 to a security vulnerability.

This is a good example of (4), where you interpret something generally, but it's actually much narrower. When folks say "memory safety," they are not referring to the problem you speak of here. The problem you speak of might be a vulnerability, but it is not, in and of itself, something that would be recognized as memory safety. A memory safety bug could lead to the circumstances you describe, but it is not necessary. (Some people like to claim that other people think memory safety is the only kind of safety that matters, but few people with any credibility actually espouse that view as far as I'm aware. But it's important to call out: if you fixed every single memory safety issue ever, you would not fix every single security or vulnerability issue.)

The important life lesson here is that jargon is abound, and a good skill to pick up is knowing when to recognize it. If we go around interpreting every very literally, it's going to be a bad time.

We could also stubbornly demand that everyone use crystal clear, unambiguous, precise and accurate terms all of the time everywhere so that nobody ever gets confused about anything ever again. But of course, I'm quite certain that is simply not possible.

pclmulqdq · on Dec 1, 2022

Ada seems to fit.

burntsushi · on Dec 1, 2022

Ada has no "unsafe"? Ada has no ffi? Ada has no escape hatches or unchecked APIs whatsoever? Does it have any pragmas that can disable safe checking? Because if it does, it's not "entirely" memory safe.

thesuperbigfrog · on Dec 1, 2022

Ada has "unchecked" operations:

Unchecked Access (unsafe pointers): http://www.ada-auth.org/standards/22rm/html/RM-13-10.html#I5...

Unchecked Deallocations ("free"): http://www.ada-auth.org/standards/22rm/html/RM-13-11-2.html#...

Unchecked type conversions (unsafe casting): http://www.ada-auth.org/standards/22rm/html/RM-13-9.html#I57...

Ada has FFI:

C / C++: http://www.ada-auth.org/standards/22rm/html/RM-B-3.html

COBOL: http://www.ada-auth.org/standards/22rm/html/RM-B-4.html

Fortran: http://www.ada-auth.org/standards/22rm/html/RM-B-5.html

Ada has pragmas to both enable more security measures or relax security measures:

http://www.ada-auth.org/standards/22rm/html/RM-L.html

Just like in unsafe Rust, sometimes in Ada you need to turn off some security features or tell the compiler "I know what I am doing for this part" when interfacing with some hardware or similar low-level stuff.

pcwalton · on Dec 1, 2022

Ada has an FFI.

modshatereality · on Dec 1, 2022

As well as address clause, unchecked_conversion and address_to_access_conversion. Extremely useful tools that give you the choice when to write risky code, and generate a compiler note exactly where such risk lives.

insanitybit · on Dec 1, 2022

A simple heuristic that I expect to work universally is "could I write a program that prints to my terminal on a linux machine?" and if the answer is "yes" then it does not fit.

raphlinus · on Dec 1, 2022

The blog speaks to this explicitly, in the "what about unsafe Rust" section. The tl;dr is that the number of unsafe sections is a small fraction of the total code size, and it's much easier to audit the usage of unsafe, as the reason to justify it is focused. Thus, the use of unsafe in Rust is not a significant driver of actual vulnerabilities.

I think this has always been the goal, but it wasn't obvious at the outset that it would be achievable. The fact that we now have empirical evidence in real shipping products is significant.

SleepyMyroslav · on Dec 3, 2022

disclaimer. I am sympathetic to the cause. I think Android needs to address security since they are processing personal data. I like how Rust community tries to educate others on what 'memory safety' is and is not.

But i am completely baffled by arguments that count number of unsafe blocks or code lines. Like this:

>the number of unsafe sections is a small fraction of the total code size

Code execution combinatorial effects makes number of sections or code size completely useless metrics to judge security. They do help mechanical part of auditing security in sense that they help to locate things. But locating things was never enough to judge if security is there.