Hacker News new | past | comments | ask | show | jobs | submit login
Memory Safe Languages in Android 13 (googleblog.com)
618 points by brundolf on Dec 1, 2022 | hide | past | favorite | 594 comments



Their notes about vulnerability severity are particularly interesting.

Defenders of C/C++ frequently note that memory safety bugs aren't a significant percentage of the total bug count, and argue that this means it's not worth the hassle of switching to a new language.

Google's data suggests that while this is true, almost all severe vulnerabilities are related to memory safety. Their switch to memory-safe languages has led to a dramatic decrease in critical-severity and remotely-exploitable vulnerabilities even as the total number of vulnerabilities has remained steady.


It's even worse. The majority, not of all bugs, but all vulnerabilities (of all severities) do come from memory safety bugs. TFA: "For more than a decade, memory safety vulnerabilities have consistently represented more than 65% of vulnerabilities across products, and across the industry."

On top of that, memory safety vulnerabilities are disproportionately high severity: "Memory safety vulnerabilities disproportionately represent our most severe vulnerabilities. In 2022, despite only representing 36% of vulnerabilities in the security bulletin (NOTE: down to 36% from 65% because of moving from C++ to Rust and other memory safe languages), memory-safety vulnerabilities accounted for 86% of our critical severity security vulnerabilities, our highest rating, and 89% of our remotely exploitable vulnerabilities. Over the past few years, memory safety vulnerabilities have accounted for 78% of confirmed exploited “in-the-wild” vulnerabilities on Android devices."


> NOTE: down to 36% from 65% because of moving from C++ to Rust and other memory safe languages

Imagine if in any other field, a process or technology were developed that cuts the number of high-severity issues in half.

For example, a modification to the standard anesthesia protocols that demonstrably reduces anesthesia-related fatalities by 50% in clinical practice.

And now imagine, in reaction to this revolutionary development, thousands of anesthesiologists publicly said things like "what matters is not the technique but the skill of the physician", "good anesthesiologists don't make mistakes like that in the first place", "but this new technique takes 1%-3% longer than the previous one" or similar.

Utterly unthinkable, isn't it?

Yet in software engineering, this is exactly what has been happening every day for more than a decade.


> Utterly unthinkable, isn't it?

No. Ignaz Semmelweis faced it in the 1800s for daring to suggest (what we know know as germs) made people sick and hand washing could drastically reduce medical complications. He was able to prove it too.

By the end was locked up in an asylum for his ‘crimes’.

Want more recent?

How many stories have you heard of instruments or gauze or whatever left in surgical patients? Of operating on the wrong part or person?

People are fallible. But checklists help a ton. We know that for sure. Why is aviation obsessed with following them? Because they work to increase safety.

Surgeons have resisted them. I don’t know the current state of it, but they were making that exact “good surgeons don’t need it” argument. I remember it being a plot point on an episode of a medical drama (ER? Or maybe Grey’s Anatomy).

There are probably tons of other examples in other fields.


Semmelweis is ancient history. He was active at a time when regulations and "best practices" simply weren't a thing anywhere.

Surgeons resisting checklists is new to me. Do you have a reference other than a TV show? My understanding until now was that checklists are extensively used in medicine.


Here are a couple articles to start pulling threads on

> Despite all the evidence, Gawande admits that even he was skeptical that using a checklist in everyday practice would help to save the lives of his patients.

> "I didn't expect it," Gawande says with a chuckle. "It's massively improved the kind of results that I'm getting. When we implemented this checklist in eight other hospitals, I started using it because I didn't want to be a hypocrite. But hey, I'm at Harvard, did I need a checklist? No."

https://www.npr.org/2010/01/05/122226184/atul-gawandes-check...

Not sure if this supports the reluctance idea since it says 93% use checklists, but most surgeons don't think it improves safety.

> Of the 353 survey respondents, 93.6% use SSCs and 62.6% would want one used in their own child’s operation, but only 54.7% felt that checklists improve patient safety.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6221594/


That first quote (as well as the rest of the article) supports the view that people tend to underestimate the usefulness of procedures they are not yet using and especially to overestimate their own abilities. It also shows that social pressure is the opposite of what it was in Semelweiss time, even people who feel they are above such things start using them (and then some of them get convinced that they were wrong).

The part you quoted from your second link directly contradicts your claim that most surgeons don't think it improves safety (even if the 54.7% is among the 93.6% that use it that still gives 51% that think it improves safety).


The usefulness of checklist in medical and the resistance to it is mentioned several times in The Checklist Manifesto.

https://en.wikipedia.org/wiki/The_Checklist_Manifesto


That IS the central issue though, processes, techniques, and so on are not a THING until they are. And something has to change for that to happen.


We have smaller version of that with masks in pandemic. Cheap and easy means to reduce problem by a bit ? Nah, let's not do that /s


Well, not exactly. There have been plenty of advancements in bridge / highrise construction over the past 100 years, but in practice we don't go around tearing down old infrastructure that is still functional because it was built with outdated designs and technologies, even when it could theoretically save lives. Buildings get "grandfathered" into meeting code all of the time.

Software is not terribly different from that. The cost of replacing foundational software is tremendous, much higher than just "adopting a new protocol".

Of course, new software should still take heed of this and try to improve.


This isn't primarily about replacing existing software. There are plenty of engineers that argue for continuing to use memory-unsafe programming languages. New projects written in C are being started every day.

This is the exact equivalent of physicians continuing to use unsafe medical procedures, and what's worse, many of those engineers defend their dangerous practices by claiming there is no real danger in the first place if the programmer is "smart enough".


> This isn't primarily about replacing existing software. There are plenty of engineers that argue for continuing to use memory-unsafe programming languages.

> New projects written in C are being started every day.

But it is mostly about existing software, even if its not about replacing existing software. I write C++ code every day. I hate it, and I'd rather not. But I use C++ libraries written by my teammates, and reference implementations in prior C++ work in our past projects, and have a set-up C++ toolchain, and my company even has all sorts of written C++ docs and style guides and linters and macros and ... and ...

Even if we wanted to stop, there's so much extra stuff to consider. I get your point, <<why start a new thing knowing there's a better way?>> and I would want to stop using c++ but most software isn't one-and-done like a surgery. It is a continuous commitment and ongoing operation, it is the tools used, and the knowledge learned, and the libraries built.

> This is the exact equivalent of physicians continuing to use unsafe medical procedures

Plenty of platforms don't support rust. Just because you've improved knee surgery, doesn't mean it works on an elbow... yet. And sometimes you still gotta perform surgery on elbows.

> if the programmer is "smart enough".

Can't defend this, but tbh I've never heard it.


>> This is the exact equivalent of physicians continuing to use unsafe medical procedures

> Plenty of platforms don't support rust. Just because you've improved knee surgery, doesn't mean it works on an elbow... yet. And sometimes you still gotta perform surgery on elbows.

That's a non-argument, obviously if it isn't even applicable it's not a discussion. Also most of what people code on does support Rust

>> if the programmer is "smart enough".

>Can't defend this, but tbh I've never heard it.

Programmer is never smart enough. Decades of bugs showed that


> > if the programmer is "smart enough".

> Can't defend this, but tbh I've never heard it.

Take a look at this comment: https://news.ycombinator.com/item?id=33824934

> Modern C++ has many memory safety features. If a company has learned that its people fail to use them, then bad for them.

It is a somewhat common attitude in this type of thread.


> take a look at this comment

It’s hacker news. People say all sorts of inflammatory crap here. I discount anything said here, especially in response to an article about said topic.

If my c++ writing coworker shared that opinion with me, or someone at a conference, that’s be different.

> Modern C++ has many memory safety features. If a company has learned that its people fail to use them, then bad for them

I will admit I vaguely agree with this. My company has all sorts of tools that perform basic checks for memory safety and a style guide that is very opinionated. Just because we can’t switch easily doesn’t mean we can’t try to improve as an org.

I don’t avow the belief that any program is truly smart enough/too dumb to make bugs, but I do think the organization maintaining the code has a responsibility to improve. Especially a large organization. Whether that’s better code review processes, automated tooling, or even soft-banning the use of certain unsafe practices.


> It is a somewhat common attitude in this type of thread.

Except the safety features are often a lot easier to use than the original C isms that tend to cause the most issues. So it is less an issue of smart and more one of bad habits. C strings instead of std::string, plain arrays instead of std::vector, implicit ownership instead of smart pointers, ... .


And yet, you can happily have a UAF through a string_view.

This is not about C++ developers using old school C approaches. The language is fundamentally dangerous. There were huge efforts within Chrome to get all memory managed by smart pointers even more powerful than those offered by the standard, and they still have UAFs all the time.


The stats come from Google project(s) - you can be sure that they've used the best practices. If they've failed, rest assured that 90%+ of the rest of the devs will fail, and much worse.


> he stats come from Google project(s) - you can be sure that they've used the best practices

The first Google Style Guide for C++ I ever came across in the wild espoused C with classes, had "standard" in scare quotes and banned most of boost for encouraging functional programming. I almost threw a fit when someone unironically tried to push that POS at work because "Google", it was entirely nonsensical especially given that we made heavy use of boost and math libraries with operator overloading.


> someone unironically tried to push that POS at work because "Google"

Especially since Google has a ton of automated tools to perform tests and analysis on code, and enforces certain behavior before you can merge your code in. Something that is probably missing from a smaller organization that’s simply adopting their style guide. Also probably missing is googles alternative stdlib they use.


I'm one of those people who still write in C, and I like the experience. I've had a lot of fun, and haven't been burned by it, although statistically it's likely that I will be at some point.

I've tried a lot of languages in the past, and am currently not willing to dive into a whole new ecosystem, re-learn all the best practices, and unlearn what's worked very well for me with sometimes no good replacement. Best practice for Rust seems to be to avoid linked lists for example. I don't know what's the safe replacement for memset and memcpy to anonymous data structures (void pointers -- generic code) but I have a sense that it is more painful. The general recommendations seem to be to switch to different, more complicated datastructures with more failure modes, or to put boxes and Arcs around things.

I don't think this fits me - I like the feeling of understanding what I'm doing, and once in a while coming up with something that compiles and runs fast and robustly. If you can do that in Rust, good for you.

Another part is that it still seems easier to interface with existing ecosystems in C. I tried writing some Win32 Rust code once in an evening and I have to admit I failed. Maybe I picked the wrong bindings library or whatever. At this point in my life, I have little patience to spend my time like this.

I appreciate the work that is being done, and I feel it's not unlikely that at some point we all will switch. At this point though I feel I'm way more productive staying in my current habitat. And that's not only on me, but also that the ecosystem and developed practices likely are not quite ready for a complete switch.

Comparing the investment to simply washing hands and putting on gloves or following a checklist as a surgeon feels unfair to me.


Implementing a linked list in Rust is somewhat challenging because of the safety issues that arise. Luckily you don't have to, there's one in the standard library: https://doc.rust-lang.org/std/collections/struct.LinkedList....

The equivalent of a generic memcpy is probably something like a .clone() call on a generic type that implements Clone.


> The equivalent of a generic memcpy is probably something like a .clone() call on a generic type that implements Clone.

If you type "memcpy" into the documentation search, rustdoc will point you to

* https://doc.rust-lang.org/stable/std/primitive.slice.html#me...

* https://doc.rust-lang.org/stable/std/intrinsics/fn.copy_nono... (Though it should really point to the reexport at https://doc.rust-lang.org/stable/std/ptr/fn.copy_nonoverlapp... )

The latter will also mention

* https://doc.rust-lang.org/stable/std/ptr/fn.copy.html


> Implementing a linked list in Rust is somewhat challenging because of the safety issues that arise

Implementing linked list in any language that is not memory-safe is challenging because of safety issues. Rust just points it out.


This isn't an intrusively linked list.

Probably not the Clone trait but the Copy trait.


My limited experience with Rust was that I need to know exactly what I want to do in the code and compiler is there to make sure I write that intent as actual code, not my assumptions about how the code will work.

I don't feel that I am less in control, just that all of that needs to be put in code and not just go "okay, I know this part don't need a lock coz I will never call it concurrently" and hope for best.

Even writing for embedded (as in no os, tens of kilobytes of RAM microcontrollers) haven't been too bad althought I haven't managed to convince borrow checker to borrow non-contigous block of bits from a register yet... althought that's what unsafe{} is for after all

> Comparing the investment to simply washing hands and putting on gloves or following a checklist as a surgeon feels unfair to me.

The closer one would be "read that 300 pages of how to do stuff safely and apply it". Once you get into good habits it's not a problem but investment is there


It's not even about languages. Just using valgrind or sth similar as a part of CI would solve A LOT of issues.


> There are plenty of engineers that argue for continuing to use memory-unsafe programming languages

And why do you think it's wrong, per se?

> This is the exact equivalent of physicians continuing to use unsafe medical procedures

You mean with different safety tradeoffs?

Because what you propose is exactly like forcing very expert physicians to switch to a procedure they are novice of and that it's not been battle tested like the old one, that proved to be very effective in most cases.

It's the same reason why patients prefer to be treated with established procedures and to undergo experimental treatments they need to sign a document that proves their informed consent.


I mean if you want to use that comparison, C/C++ bridge would have random holes dropping the cars off the cliff below.

The issue is not "bridge is suboptimally designed" or "bridge will need some extra maintenance because something started to break".

The memory safety issues are "some cars randomly explode when passing that bridge" or "when driver breaks 6 seconds after entering the bridge, every other driver dies.

> Software is not terribly different from that.

It is MASSIVELY different, especially anywhere near anything security-related. Most buildings don't have a group of people with hammers trying to find a weak point that will never happen in any actual use conditions and then hit that weakpoint in every similar building built in every place of the world.

Please don't make horribly useless comparisons like that


> we don't go around tearing down old infrastructure

All the time! And the stuff we don't tear down, we retrofit.


> Yet in software engineering, this is exactly what has been happening every day for more than a decade.

Not really. Rust is the first major attempt to achieve c/c++ performance & capability while also being safe.

Prior to that nearly every memory safe language came with crippling tradeoffs, and that is why people rejected them. Especially as performance and efficiency took center stage again with the massive increase in battery powered devices & the general plateau of single core CPU performance over the last decade


More than a decade? It started back in the 1970's when Pascal and C were duking it out. Early programmers were mathematicians and they took their craft seriously, including safety. They designed Algol which Pascal and C came from but Pascal had the safety and C didn't. C won the battle because programmers felt saving a few CPU cycles was more important than safety. Unfortunately, Pascal had a few other issues that kept it from beating C. Wirth later came out with Modula-2 in the late 1970's, which was vastly superior to C, but it was too late to compete with C's popularity. We've been hamstrung by C and its derivatives ever since. The importance of safety became much more apparent when networking became popular but there was no turning back at that point. Now, finally, 50 years later, Rust is pulling us back toward safety in a C-syntax language.


> Utterly unthinkable, isn't it?

> Yet in software engineering, this is exactly what has been happening every day for more than a decade.

Consider what happened to poor Semmelweis when he discovered in the middle of the 19th century that hand-washing improved medical outcomes: doctors of the time were too proud to accept his results and drove him out of the profession (and to his early grave) rather than change their practices.

All these people who insist on continuing to write new C programs have the same mentality as those doctors who refused to wash their hands.


Woah.

A bit much, eh? I write programs in whatever I want as an hobby. Why does that make me a murderer? Everyone needs to chill about this.


That seems to be a very limited view on the issue, since the stakes with ansthesia related issues are much higher (a human life).


Every human life is directly or indirectly impacted by software. Software controls transport, food and energy production, and all human communication. If all software suddenly stopped working, society would collapse instantly and hundreds of millions of people would die within a year.

The attitude "it's just computers, nothing truly important like medicine" might have been viable 40 years ago, but it certainly isn't anymore.


It's not like you make a bad git commit and suddenly the world stops working. There are checks and processes and redundancies that reduce the impact of human error dramatically.

The earth is still spinning, despite so many things not working everywhere. That's not just in software. In every system, there's relatively few single points of failure. As you zoom out, failure points disappear and new ones appear.

Sure, still sometimes someone notices a critical security flaw that's been in there for a while, and that had found its way into large parts of the infrastructure already. (The last one I heard of was not a memory vulnerability).


>It's not like you make a bad git commit and suddenly the world stops working.

https://qz.com/646467/how-one-programmer-broke-the-internet-...


Sure, I know this. I don't think this contradicts my argument. It didn't suddenly make running infrastructure exposed. But yeah, people needed to find a replacement to continue development (which I hope wasn't hard)?

If this breakage is an argument for anything, it is against depending on lots of code that you don't even know. The last Rust projects I tried to build all had on the order of 500 transitive dependencies, by the way.


I assume you're not familiar with https://en.wikipedia.org/wiki/Therac-25 ?


> Defenders of C/C++ frequently note that memory safety bugs aren't a significant percentage of the total bug count, and argue that this means it's not worth the hassle

This says more about those C++ defenders.


C-nile developers have been making incorrect arguments for a while now. The reality is which almost everyone can see is that memory safe languages are pretty much always what you want to be using for new code. OS and security sensitive components are the prime targets for rewrites in more secure languages.

Now Google has put this to the test and has the data to prove it. We should not allow the worlds technology security to be held hostage by a group of people too lazy to adapt with the times.


> The reality is which almost everyone can see is that memory safe languages are pretty much always what you want to be using for new code.

Nitpick, this is not quite true. Memory safe languages are what you should be using in contexts where security and reliability are critical. This is generally the case but there are some contexts where other concerns are genuinely more important. Of course when this is necessary is often misrepresented but these cases do exist.


writing exploits, for example, is an utter pain in today's safe languages :) C is unmatched in the sheer ease of manipulating raw bytes with some light structuring on top for convenience. I've tried writing exploits in Swift a handful of times but I always gave up after I found myself buried under a pile of UnsafeMutableRawBufferPointers.


You mean, manipulating strings of bytes? Bytes don't have to be memory, you can just use bytestrings in Python or whatever.

Raw memory access is something you normally need to create vulnerabilities, not to exploit them :)


har har :)

It's an ergonomics thing, not a "can't" issue. There is a reason I called out Swift in particular—their "unsafe" APIs are so horrid to use that they make you regret doing unsafe things in the first place. Plus, throw in FFI and now you've got an even worse problem because not only are you forced to use the unsafe types, but often a lot of the critical APIs you need to interface with (in *OS exploitation, mach is the worst offender) have such funky types due to their generic nature that you have to go through half a dozen different conversions to get access to the underlying data.


I still don't understand why would you prefer raw memory manipulation to bytestring manipulation. If you want, just make a Swift library that will implement the memory like you want but without unsafe raw access (but just a few methods over a byte array). Back in the days when I did CTFs, I used Python for writing binary exploits, never C.

https://github.com/hellman/libformatstr

You can do something like this, no need to work with raw memory.


Swift has the advantage that it can directly “speak” C, without any indirection needed. (C can already do this of course.) In particular operations like scanning an address space for things, easily expressing the layout of something, and so on are much easier to do in these languages. In theory you could do the same in Python but it’s often not worth the effort.


Meh, bad argument. How about you write better code in Rust to replace existing stuff. There is so much more to consider and I think one of Rusts most repellent features is the preaching about its advantages. I definitely has some that are hard to deny. Still... how about just doing it, nobody is held hostage here.


> How about you write better code in Rust to replace existing stuff.

No matter what people do, it's never enough for the critics. Many people criticize Rust fans for "rewriting it in Rust"!

> how about just doing it

People absolutely are. That's what the article is about, even.


>The reality is which almost everyone can see is that memory safe languages are pretty much always what you want to be using for new code.

Not everybody is writing security-critical code. For some things productivity and time-to-market is more important and security is not enough of a concern to justify dealing with a language with horrible compile times and a self-righteous, dogmatic community.


Where has time to market and productivity ever been a factor for C and the C ecosystem? My most "productive" languages have been declarative languages like Haskell (depending on how you measure that). I don't care if it takes an extra few minutes to compile either, amortized away in the long term when you don't have to deal with entire additional classes of bugs to deal with. Also why is security never an important concern? There are a lot of security issues which are self inflicted if you use C. Using something like rust means a lot of issues simply don't exist. The optimal solution.


I spent a considerable amount of time learning Haskell but I always felt like a slave to the language. Oh, you want to do this other simple thing? Try language extension XYZ, but you'll have to learn some more language theory first. Also sorry but the extension isn't compatible with the extensions you're already using, and you need to require a new dependency on the outside interface.

There are some things that the language is good at, but at some other things (I think a lot actually) it isn't. At the very least, I wouldn't recommend it for writing a video decoder.


> sorry but the extension isn't compatible with the extensions you're already using

In nearly 10 years of professional Haskell I've never come across a practical situation where extensions that were mutually incompatible. Can you name any extensions that are incompatible in a way that actually matters in practice?


No, I haven't followed the language in 6 years. I remember there was at least one instance but can't recall. Other extensions significantly change the semantics of language module interfaces in more obtrusive ways than I'd like - compared to say C where it's pretty easy to offer a simple interface.

Some of these things would make working in the language pretty painful. I remember trying some library to toy around with a basic 400 line OpenGL program. It always needed 8 seconds to rebuild at the time. I don't recall and probably didn't understand why, but I suppose it has to do with some extra type or template hackery in the library that would just overcomplicate everything, probably even at the outset.

What remains is the feeling of, I can't do this thing yet in the type system, so take that extension. Oh wow. Now I can't do this other thing that I also need, the only fix is another extension (if it's available yet). I couldn't get around of this feeling of constantly having to hack around the language.

In my experience you need to have a very good overview and understanding of all the available tricks and extensions to be able to navigate your way around the language and not paint yourself into a corner. Maybe I have the wrong personality, the wrong motivations, or am just not smart enough. Obviously I'm not you, and am not Edward Kmett (who would resort to lots of GHC specific hacks and drop down to C++ as well).


> I remember trying some library to toy around with a basic 400 line OpenGL program. It always needed 8 seconds to rebuild at the time.

This is a fair criticism. Compile times are slow.

Extension confusion is not a fair criticism. Extensions typically remove restrictions. They don't create incompatible languages.

Granted, it's a bit annoying to have to turn them all on, one by one. These days one should just enable GHC2021 and forget about language extensions. That saves one having to be Edward Kmett, or from bothering to think about language extensions at all.


Which memory-unsafe language is more productive and has faster time -to-market than any managed language?


I'm pretty confident that systems programming (you know - moving data around) is easier with raw memory access compared to managed languages.


Is this a joke? Systems programming is a lot about accessing APIs, dealing with all sorts of intricacies like interrupts, different execution contexts, and managing memory as you said.

If you write in C and your program is complex enough, you will spend a lot of time just chasing segfaults and concurrency bugs and getting it to work the first time you write it. If your systems programming is in userspace, that's sort of fine. But if you're in kernel or on bare metal, the cost of debugging goes up by an order of magnitude. There's no debuggers on bare metal, and nobody can tell you how your program crashed—the device just stops responding, that's it.

That's why safe languages make even more sense in these restricted environments. If you get twice fewer memory safety bugs while you're getting your program to work, that can reduce your development time like 5-6 times.


> Systems programming is a lot about accessing APIs, dealing with all sorts of intricacies like interrupts, different execution contexts

So now you need to make your interrupts talk to your Java objects? Is this any safer?

Is it easier to get a VM running in your kernel (probably no mean feat to do that in the first place) and you'll never get any concurrency bugs? And if you reduce memory bugs by half, those will be easier to debug?

And you won't be annoyed that you can't guarantee to be able to link objects in queues (because of allocation failure) and access them with generic code to copy data, link/unlink them, and so on? You're fine to pay for callbacks and interfaces everywhere, both in terms of runtime and performance as as maintenance headaches?

I'm asking incredulously, but seriously. Because frankly I've never looked at a project like MirageOS or whatever. But given real world evidence of what has survived, I don't see why you should assume I'm joking.

That you can't debug a "bare metal" kernel isn't quite true, either. But sure, the more complex a system becomes, the more contemplation it requires to figure out problems. This is universally true, but you can't simply discuss complexity away. And adding complex object models on top without consideration doesn't make your task easier just like that.


An OS will always need some tiny assembly part, as some instructions needed for the kernel simply never gets generated by compilers. Also, an OS itself is pretty much a garbage collector for resources, it could very well reuse/link into its own GC for better performance.

Are we really talking about the “price” of managed code, when C code is chock-full of linked list data structures? A list of boxed objects is more cache-friendly than that. It is simply not always that performance sensitive to begin with (e.g. does it really matter if it queries the available display connectors in 0.0001s or 10x that?)

Regarding MirageOS, they actually achieved better performance by using a managed language than contemporary C OSs for select tasks. This is possible due to context switches being very expensive, and a managed env can get away with (some?) of those.


Those linked lists give some nice guarantee that significantly simplify a lot of code and give some guarantees that you just can't have otherwise.

Context switching can be expensive, but I don't see what's specific about managed envs about that. Fundamentally you have to have trust in code, and need to have the hardware support to enforce authenticity of the trusted code in order to avoid context switching. A different, promising development to reduce context switches is CPUs growing more and more cores, and more and more kernel resources being available through io_uring and similar async interfaces.


I have a really hard time following anything you say here.

How did we start talking about Java and VMs? Is this some sort of strawman argument?

> But sure, the more complex a system becomes, the more contemplation it requires to figure out problems

Strawman again? I wasn't talking about complexity. I said that the more "systems" your programming is, the higher is the cost of memory safety bugs.


> How did we start talking about Java and VMs? Is this some sort of strawman argument?

Is it a strawman if my comment was in response to "managed languages"?

> Strawman again? I wasn't talking about complexity. I said that the more "systems" your programming is, the higher is the cost of memory safety bugs.

For clarity, you spoke about the cost of debugging of memory bugs. And I said, it's universally true that programs are harder to debug the more "systems" they get. The reason is that it typically isn't sufficient to simply trace an individual thread anymore. "Logical tasks" are served over a number of event handlers executed in various (OS) threads.

It's not first and foremost a refutal of what you said. But an observation that I even placed in opposition to my other statement that it's not quite true that you can't use debuggers with kernels. FWIW and so on. I don't get why you are calling "strawman" repeatedly, and don't get the aggressive tone of your comments.


We probably need to bring in some kind of criminal liability for the companies that only cared about time to market and put their users at risk.

After memory safe languages become a bit more battle tested, C/++ needs to be regulated like asbestos.


Not everybody's building a product that deals with sensitive user data. By your logic all code not written in proof languages should be illegal.


Anyone who writes a library that might get used in a context where it is presented with inputs derived from potentially malicious data, is writing security-critical code whether they acknowledge it or not.


Only because unfortunely liability is still not enforcement by law as it should.


> enforcement of security vulnerability should be by law

I think whether there "should" be a law making you liable could depend on the details of the exploit.

If you get exploited via rowhammer, I don't think anyone would blame you. It would be unreasonable if every small business running a website could be sued if they didn't defend against electromagnetic interference within the RAM.

However, if you're Apple and say -- you could get pwned because someone clicked a button to register version 9000 on the public npm/pypi registry (https://medium.com/@alex.birsan/dependency-confusion-4a5d60f...) -- maybe I agree there's an argument for some accountability there :)


Yes it definetly should.

Computing is the only industry, where people accept to live with tainted goods instead of forcing whoever sold them to pay back, cover for their damage or whatever.

We already have high integrity computing, digital stores with returns, consulting with warranty clauses, and some countries are finally waking up that computing shouldn't be a special snowflake.

https://www.twobirds.com/en/insights/2021/germany/the-german...


Just pointing that all software is exploitable. And punishing the application developer might not be right if the vulnerability is caused by a lower level dependency. For example, log4j.

I agree if there's a high social cost to a breach then the government should punish those involved. Also, the security of your software depends on your threat model and which threats are in scope and you're willing to invest in protecting against. The tradeoff is ease of development and velocity. So maybe such laws will incentive this process differently, and maybe it's a worthwhile change.

I look at computing as a big experiment. Personally, I am very careful to use trustworthy services and don't depend on software for anything critical (besides banking, but luckily FDIC). Most people don't take the same precautions and rely very heavily. It's obviously critical infrastructure at this point. Maybe it's time to stop thinking of it as an experiment, and maybe these laws make sense.

I don't like the concept for emotional reasons; to me it's sad and signals another step towards the end of the golden age of the internet.



> self-righteous, dogmatic community.

Worst Rust feature by far.

I understand how Rust solves some problems and these are indeed very important ones. But it still is a constraint that has to prove itself.

C is horrible, out of the question. Really dated too without thousands of band aids. C++ has millions of those. But why not start re-implementing stuff in Rust if that is so close to your heart?

We can also reimplement everything in JavaScript. It is memory safe too. Wait, where is the enthusiasm now?


> We can also reimplement everything in JavaScript

Look at js package stats - that's exactly what's happening. Many of apps and packages created today in js would be created in C/C++ few years ago. People who learn programming today don't know what C/C++ is. If they need something low-level, it's Rust.


> For some things productivity and time-to-market is more important

Who would choose C/C++ and not something like Go in that situation?


You can judge a person by the company they keep.


Lie down with C++, wake up with bugs.


> and argue that this means it's not worth the hassle of switching to a new language.

Defenders of C++ argue that there's no reason to change the language, because new features around safety guarantees are being introduced into every C++ standard starting from C++11 at a remarkable pace, so remarkable that compilers implement them faster than the existing adoption rate. And the adoption rate speaks volumes about existing capacity to port/rewrite big codebases in entirely new stacks. The new stacks also tend to have fewer custom static code quality analyzers from third-party vendors, and they are used a lot in mission-critical C++ codebases.


> The new stacks also tend to have fewer custom static code quality analyzers from third-party vendors, and they are used a lot in mission-critical C++ codebases.

Are these static code quality analyzers detecting code quality problems that Rust and company are also vulnerable to? Or are they mostly looking out for the hundreds of legacy footguns that C++ still officially supports?


They focus on quality control and compliance to safety requirements in specific domains and industries, for instance MISRA.


This might be of interest https://github.com/PolySync/misra-rust


Google is one of the biggest C++ shops out there, and also authors and maintains many of the static analysis tools and safety features you mention.

If they’re saying that C++ can’t be saved, maybe they’re worth listening to.


> If they’re saying that C++ can’t be saved, maybe they’re worth listening to.

It might be true, but it also sounds like an appeal to authority. I suspect there also might be voices that are being silenced or aren't given a similar platform to speak up and provide an alternative viewpoint on the matter within the same organisation, because <team budget/political reasons why>. After all, there are greenfield projects that are being started in C++20 and people are enthusiastic about their prospects. I wouldn't just blindly dismiss their reasons in favour of Google ones.


An appeal to authority is a fallacy because it doesn’t actually mean anything. It’s false credibility.

If the authority comes along with a bunch of well researched and documented data from experiences in the real world… that seems worth listening to.

It’s no longer an appeal to authority. It’s just looking at evidence.


C++ is needlessly complex and puts too much of a cognitive burden on the developer. I just wasted a day of my life traced to an errant semicolon in a legacy cpp base. I've used the language for 20 years. It can't be saved.


No offense, but I haven’t heard if people wasting days in semicolons outside of memes and really junior developers. What was the issue?


In real-time systems with millions of lines of code, no debugging capabilities outside of logs, and user misuse use cases, you'd be surprised what can lurk beneath.


In my experience, a good, auto, code formatter helps alot. You can’t hide a semi colon from code formatter.


Are you suggesting a code formatter as a mechanism for static analysis? There are really good tools like coverity, and free ones like cppcheck and clang-tidy that will catch that and so much more. Using c++ without cppcheck and clang-tidy in your cmake and pipeline is like leaving the seat up. It takes so little time, and the benefits to others is great.

That said, they won't catch a ton of memory and thread safety issues. You'll need tests with 100% coverage for that. Or you could just write it in rust and the compiler will catch it.


If it was about a semicolon it even sounds more like running with not all warnings on. C++ is "bad" (shorting a lot) due to its compatibility being able to also compile last centuries code.. but if today on active code bases you are not even running with at least that, why would you ever switch to Rust?

And full agree, all what cppcheck does imo should have long gone into the warning suite, and Werror and Wall should be the default..


Google’s C++ coding guide is (was?) not really up-to-date, so there is that.

Modern C++ is indeed a huge upgrade on what came before and with a good amount of static and dynamic analysis the state of low-level programming is much better now, but there really is no reason for new programs to be written with these. Besides the bottom of the stack, managed languages are more than fast enough for nigh everything.


Never mind you don’t give up any performance to use Rust anyway. They focused heavily on only incorporating features with zero cost abstractions.


> Besides the bottom of the stack, managed languages are more than fast enough for nigh everything

And yet, there is a lot of enthusiasm (at least here on HN) for web development in Rust...


There are plenty of people who want to write in Rust for the fun of it. But I'm not aware of too many people saying that you should avoid writing a web application in java/python/js in favor of rust. Pretty much the only people being told to stop doing what they are doing are the people starting new projects that process untrusted data in C and C++.


You and your parent are saying slightly different things with regards to web dev; they said there's enthusiasm for doing it, not that you shouldn't write web applications in those languages.

Someone can think it's fine to write a web application in java/python/js and Rust.


Google is great at engineering projects that look great in promotion packets. Just because they're big and funded doesn't mean they're good at this.

It seems plausible that splashy projects in new languages are better for careers than grinding through "stable" codebases using "boring" engineering practices.

I also gather that Google has a challenge, possibly for similar reasons, keeping their third party dependencies updated and up to standards. A lot of those are written in C and C++, probably.


//third_party is indeed a challenge, but it is definitely not the case that vulns are only coming from //third_party. I mean, the linked article is about the Android codebase (not attached to //third_party) and the Chrome codebase publishes a ton of vuln data (again, not attached to //third_party).


I was saying Google might be pessimistic about C and C++ for some reasons specific to Google culture, like inability to get engineers to care about "boring" work. I wasn't making the point you're addressing.


I assure you that "fix vulns caused by memory safety issues through some means other than a total language shift" is not boring work, but the sort of problem that will happily get people promoted to L8. It is just hard as hell.

One of the people most involved in the systems described in the blog post that are used to harden the C++ side of things is a L9 here.


I more meant the mid range engineer work to just buckle down, test, and fix things. As in, just owning and cleaning up a lot of important projects, including third party ones.

Designing the ultimate everything sanitizer with zero performance overhead would surely be impressive even at Google. Especially if it was actually adopted across the org.


But "buckle down and own it" doesn't actually prevent vulns in any sort of systematic way.

And I assure you that, despite the memes, code health efforts do end up with promos here. The org responsible for third_party and large scale code health had above average promo rates for ages.


They’re not saying that C++ can or can’t be saved. And there’s no “they”, there are hundreds of teams with different expectations and policies.

You’re merely reading what you want between the lines.


True, but what is said is:

  We continue to invest in tools to improve the safety of our C/C++. Over the past few releases we’ve introduced the Scudo hardened allocator, HWASAN, GWP-ASAN, and KFENCE on production Android devices. We’ve also increased our fuzzing coverage on our existing code base. Vulnerabilities found using these tools contributed both to prevention of vulnerabilities in new code as well as vulnerabilities found in old code that are included in the above evaluation. These are important tools, and critically important for our C/C++ code. However, these alone do not account for the large shift in vulnerabilities that we’re seeing, and other projects that have deployed these technologies have not seen a major shift in their vulnerability composition. We believe Android’s ongoing shift from memory-unsafe to memory-safe languages is a major factor.


They "believe" the major shift is due to Rust, while they continously improve also their C++ tools, and the count in also all violations (even mabe more theoretical ones?) found by those .. I have no doubts about the actual claim, but especially this quite sounds like they may have made more out of this correlation==causation than there maybe is, I believe ;)


I know some of the people who own the tools described above. I can assure you that if those tools were the primary cause of the reduction in vuln they'd be screaming it from the hilltops. A huge amount of work at Google goes into answering questions like "what actually accounts for this change." This is one of the benefits of the promo culture that is often criticized.


>If they’re saying that C++ can’t be saved, maybe they’re worth listening to.

Google's one of the worst C++ shops because their code standard basically forbids using modern C++, and their C++ is more like 90s Java than modern C++. It's no wonder they want to get away from it.


I...what?

I write C++ at Google, and it encourages use of modern C++ features, and many things you see adopted in std have roots in our libraries.

I'm curious what you think Google prevents us from using and why you think our C++ is like 90s Java.

https://abseil.io/tips has a lot of our philosophies and abseil is chunks of our internal libraries published externally.


I've weirdly heard people say this in the past too and I am equally baffled.

I think it stems from the fact that Google was slow at making C++11 available internally so there was a period of time where the rest of the world was using smart pointers and we couldn't. That may have just solidified a "Google uses old C++" meme out in the wild despite it being wildly out of date.


Programming languages are tools for a job. As the saying goes, a bad workman blames his tools. It's not worth taking anyone who blames defects on a programming language too seriously, whether it's Google or not.

Modern C++ has many memory safety features. If a company has learned that its people fail to use them, then bad for them.

Of course, there are languages that abstract memory safety to the point that they eliminate those types of mistakes. But languages are tools for a job, and only some tools are applicable where C++ is applicable. We should not bury C++ prematurely before answering the question - "what else is as fast and efficient as to replace it for OOP?" And if a project doesn't need fast and efficient code, then why is it using C or C++ in the first place?

Overall, selecting the correct tool for a job is more important than figuring out which tool is better in some abstract way.


> As the saying goes, a bad workman blames his tools. It's not worth taking anyone who blames defects on a programming language too seriously

Are you serious? A bad workman blames his tools, because workmen are reponsible for their tools. A large part of being a good workman is identifying what tools are good and using them.

And C++ is a terrible tool for any task where you are not forced to use it because of existing libraries. All the memory safety features of modern C++ are a tiny, almost vanishingly small step in the right direction.

> "what else is as fast and efficient as to replace it for OOP?" And if a project doesn't need fast and efficient code, then why is it using C or C++ in the first place?

If you need fast and efficient code, why on earth would you be doing OOP?


>> As the saying goes, a bad workman blames his tools. It's not worth taking anyone who blames defects on a programming language too seriously

>Are you serious? A bad workman blames his tools, because workmen are reponsible for their tools. A large part of being a good workman is identifying what tools are good and using them.

Also C/C++ made into real life tools would be OHSA violation on OHSA violation in real world


> A bad workman blames his tools, because workmen are reponsible for their tools. A large part of being a good workman is identifying what tools are good and using them.

As I said in the comment to which you are responding, "selecting the correct tool for a job is more important than figuring out which tool is better in some abstract way."

> C++ is a terrible tool for any task where you are not forced to use it

Many game developers, OS developers, and massive hardware-software makers doing embedded programming who use C++ would disagree. What would you say to them?

> If you need fast and efficient code, why on earth would you be doing OOP?

For small projects, I could agree. What would your recommended alternative be for massive codebases in large tech companies that need fast and efficient code?

P.S. Please read https://news.ycombinator.com/newsguidelines.html about snarky comments. Thanks.


> For small projects, I could agree. What would your recommended alternative be for massive codebases in large tech companies that need fast and efficient code?

The poster wrote very clearly:

> where you are not forced to use it

If one is forced, there's obviously no option.

The idea is not that C/C++ should be replaced right now, rather, that devs finally understand that C/C++ should not be used where possible.

I actually see this pattern used by some, who defend C/++: "C/++" should be deprecated" - "No, it's impossible to eliminate C/++ today".

Deprecation is not elimination. Linux started introducing it, and Google is doing as well, so it can be done gradually.


> As the saying goes, a bad workman blames his tools.

And a good workman put his old/obsolete/dangerous/etc tool behind when something better show up.

The bad workman, instead, continue blaming his tools, when the problem is that he CONTINUE using bad tools, anyway!.

P.D: I learn about mechanical engineering. Get rid of bad tools fast is like key around that...


First rule of tool buying, never buy the cheapest for the safety-related tools. C/C++ IS the cheapest


Modern C++ has many memory safety features. If a company has learned that its people fail to use them, then bad for them.

This recapitulates an argument at least as old as C89. You can probably find Usenet posts deploying it to argue against the adoption of strncpy, because if people don't know how to use sizeof and strlen, then bad for them.


Yes, my argument sounds similar. But it's in support of modernity rather than primitivism.

C++ nowadays can be used in a very memory-safe way without much effort. In my professional experience, memory leaks and corruptions are sporadic in modern C++ code and common in old-style pre-C++ 11 code.

That's why I'm a bit skeptical of this article from Google. It seems reasonable that Android has quite a lot of pre-C++ 11 code. And the article seems to lump two very different approaches to memory safety in pre and post-C++ 11 style programming.


You can do some basic analysis about your assumption: go pick out a bunch of the CVEs and look at the age and style of the source code.

Another approximation is to look at the Android source tree to see what proportion of it is as old as you assume in your argument. There are 431 results for a search `"Copyright 200" filepath:.\.cpp`. There are 7465 results for `"Copyright 201" filepath:.\.cpp`. 4152 results for `"Copyright 202" filepath:.\.cpp`. 220 for 2012, 354 for 2011. If you exclude tests the ratio is even less favorable for your theory.

In case you're wondering the project policy is to add a copyright header at the time the file is created, they do not update years in headers arbitrarily. As a spot check the first file matching "Copyright 200" that wasn't just essentially C code wrapped in extern "C" was: external/angle/src/libANGLE/Config.cpp. This file contains the use of std::make_pair.

You can perform these searches yourself here: https://cs.android.com/search?q=%22Copyright%20200%22%20file...


Thanks, that was very insightful. Yes, as you say, the copyright year in a .cpp/.h doesn't necessarily say whether C++ 11 features are used.

I've looked at many "Copyright 201" and "Copyright 202" headers. I needed to see more use of C++ 11 or equivalent smart pointers or containers to say that this codebase uses modern C++ memory safety features. Other modern C++ features (like std::make_pair that you mention) are easier to spot.

I expect this codebase to have many memory safety issues. It may not pass code review in a company/team that expects their people to use modern C++ memory safety features. After seeing it, I'm more convinced that the reason Google has so many problems with C++ in Android really is because they don't insist their engineers use modern C++ (or equivalent in-house containers/pointers).

Here's another insightful pair of searches:

" std::make_" filepath:.*\.cpp

"delete " filepath:.*\.cpp


"std::make_" isn't a great comparison (IMO) because Google has a widespread culture of noexcept so it wasn't critical to adopt this style for constructing smart pointers. std::unique_ptr<Foo>(new Foo()) was a thing for a while there. There is also an alternative absl::MakeUnique<T> that was available before we had std::make_unique available internally so you'll need to search for that too.

The style guide, C++ readability, and general code review has all but banned raw "new" for years and years. You can find plenty of CVEs where the root cause is a UAF on a managed object.


Yes, there are more examples of smart pointer initialization than just std::make_. I couldn't find instances of "absl::Make" in Android Code Search. But your point still stands, and I should add that not all new-delete pairs are evil. With that said, what I've seen still has too many raw pointers.

Thanks for the context about UAF. I am curious about this. Much of my C++ experience comes from working with in-house reimplementations of std/stl, so my question might be a bit stupid, but how is use after freed of an obj managed by smart pointers so prevalent? Should the smart pointer not be nulled after the object is destroyed? Maybe you have a good example CVE? Are these cases of using the raw pointer in the smart pointer without checking it first?


> Should the smart pointer not be nulled after the object is destroyed?

I'm not sure what you are going for here.

The way this often happens is there is some module that owns an object with a unique_ptr and references to that object are used elsewhere. But the ownership of the object is complicated so a bug sneaks in where a non-owning reference to the object gets dereferenced after the unique_ptr is deleted. You can prevent this by having literally everything use shared_ptr for everything but that sucks for lots of reasons.


There are also other smart pointers, like std::'s weak ptr and proprietary stuff.

You can avoid multi-ownership problems of shared ptrs with weak pointer member variables (which only need to be turned into shared in a given {} scope). Some other problems can be solved by marking objects as pending kill without destroying them immediately and ensuring all threads finish access before actual deletion.

Unreal Engine uses both weak pointers and object marking in a global object array. It also uses GC but that's besides the point.

Would the same approach to modern memory management not help Android?


Of course there are designs that help prevent bugs. Chrome is doing plenty of stuff like this but the ownership and lifetime design for something like a JIT are complex as hell and problems happen.

We've got like 30 years of people insisting that it really is possible to write safe C and C++ programs if you just follow the One True Way (TM) and its never been the case. Each new One True Way helps, but it sure as hell doesn't solve the problem altogether.


I mentioned this elsewhere, but the real A/B test here would be to do a rewrite in the existing language and compare it to doing a rewrite in Rust with respect to memory safety, etc.


strncpy doesn't do what you want. Maybe you mean snprintf. Indeed use of sizeof, memcpy, and snprintf. Avoid strlen, it is rarely necessary.


strncpy is bad for other reasons, mind you.


> We should not bury C++ prematurely before answering the question - "what else is as fast and efficient as to replace it for OOP?"

We have an answer: Rust. It's no longer premature, bury it.


Rust is over engineered in some area, immature in other.

See how many reference types are there, how async is handled and the underspecified unsafe semantic.

For higher level tasks, I prefer a language with GC like go or java. Rust can work with references counting, but it don’t mix well with the larger ecosystem. For lower level task, the underspecified unsafe model make it worse than C aliasing problem


> See how many reference types are there,

2, & and &mut. What else?


I guess parent meant ARC, RC, and the like.


I vastly prefer "just write you code serially and never worry about async then spawn 10000 goroutines" approach over async/await/Future nexus of bad ideas (just do message passing like Erlang or don't do it at all...), but I wouldn't use Go in place where I'd use Rust and vice versa, they kinda feel different (if overlapping in places) niche


It is very freeing. And one concise readable go routine doing the channel reads and the socket ops becomes your connection pool, but is as easy to understand as a Network programming 101 assignment.


How's Rust at doing OOP these days? (Assuming we're in a domain or situation where OOP is a good choice, of course.)

I would love to know about any projects that do OOP well in Rust.


IMO Rust has most of the best bits of OOP (the ability to encapsulate functionality in an object with private fields and present a restricted public interface), without the bad bits (inheritance, and complex soups of objects all holding pointers to each other that make code flow hard to reason about)


Rust isn't OOP though. Structs, traits, etc. let you approximate some aspects of OOP, but not much more than plain old C does.


Amen.


A bad workman may blame his tools, but a good workman uses the right tool for the job. If a better tool exists, use it.

(And sure, it doesn't apply to every niche yet, but it sure applies to a lot of them)


Languages encourage or prevent its users from doing things

Concepts like sugar syntax and syntax salt do exist

Approaches to problems may vary by languages because lang environment shapes its users in some ways


While new safety features in C++ may be impressive, Google's data shows that memory safety vulnerabilities are still a major issue. Switching to a memory-safe language like Rust can help reduce the risk of vulnerabilities and improve the overall security and reliability of a product. The potential benefits make it a worthwhile investment, even if it requires some effort to migrate from C++. #RustIsTheRealDeal


How much of the benefit comes from the rewrite itself? A more precise comparison would be rewriting that C or C++ in the same language but with memory safety in mind and see how things turned out.

The same question comes up when an existing system is rewritten from language A to language B and big performance gains are seen. The language could be the big cause, but so could the extra engineering effort itself -- updated design, fresh attention to the requirements, etc.


Google isn't rewriting more now than they were before, they're just discussing the use of C/C++ for new code. Presumably, if rewriting chunks of code were enough in its own right, they would never have had so many critical security flaws.


> if rewriting chunks of code were enough in its own right, they would never have had so many critical security flaws.

Reducing defects is one of the main reasons (others being maintainability, readability, better integration, and similar) for refactoring and rewriting code. There's usually not enough time/money to do it, especially for large codebases.

I quite like rewriting parts of a codebase to modernize it, and I have often closed tons of bugs in a short time this way. It is definitely effective. But not as cost-effective as deprioritizing bugs into "won't fix" territory, which is what many companies like to do.


I also agree that it's a presumption. I don't know that I agree with it is all. It seems like more engineering attention and excitement is actually good for project quality, and maybe that's a confounding factor here. More data would help, though all this might never be definitively conclusive.


And yet, people still store a string_view in a field and then access it past the lifetime of the underlying string.

Yes, things have gotten better. Smart pointers are a godsend. Sanitizers are a godsend. Various static analysis tools work pretty well.

But even codebases that adopt all of these things religiously still are riddled with security vulns.


> Defenders of C/C++ frequently note that memory safety bugs aren't a significant percentage of the total bug count

Well, first of all, this is said but not proven.

But it's easy to prove that memory safety bugs are not a significant percentage of the total number of bugs, even Google agrees.

Vulnerabilities are not the same thing as bugs, a vulnerability like spectre or meltdown are not due to a bug in the software, have an ubiquitous immediate impact on 100% of the devices and are much harder to fix or mitigate, sometimes it's could even prove impossible.

The same bias can be explained using the exact same words used in the article

"Despite most of the existing code in Android being in C/C++, most of Android’s API surface is implemented in Java. This means that Java is disproportionately represented in the OS’s attack surface that is reachable by apps."

It can be read as: of course most of the vulnerabilities are due to memory safety bugs, it's much harder to gain root privileges exploiting a bug on the colors of a specific element of the UI, assuming it would be possible.

It can also be read as: most of the userland software is based on Java, which is memory safe by default, assuming there are no bugs in the implementation of the JVM, which is entirely not Java.

Given that, the problems become

- rewriting the entire ecosystem in memory safe languages requires rewriting everything from scratch, which is a task that even Google will have huge problems to complete (reminder: Google is the number one killer of its own projects) in reasonable time or without wasting more money that it's worth on it. Is an half complete not battle tested complete rewrite actually safer? Historical data says it usually isn't.

- are the user actually safer when memory is safe? I mean, memory safety bugs gave us jailbreaking for locked devices, memory safe languages gave us bugs like CVE-2021-44832

I wouldn't classify the issue as black/white, there's a lot of grey to be considered.


> Vulnerabilities are not the same thing as bugs, a vulnerability like spectre or meltdown are not due to a bug in the software, have an ubiquitous immediate impact on 100% of the devices and are much harder to fix or mitigate, sometimes it's could even prove impossible.

Using language-independent bug example in discussion about language-caused bug vectors isn't exactly honest.

Rust would stop Heartbleed for example, and that was one of huge vulnerabilities


> Rust would stop Heartbleed for example, and that was one of huge vulnerabilities

using a low budget project with few developers maintaining one of the most used libraries in the whole World as an example of non memory safe languages perils is not exactly honest.

Heartbleed could have been easily fixed if the companies profiting from using OpenSSL donated a few more eyes to look at the code.

Similarly to what happened to Log4j bug, which had an enormous impact, similar to the heartbleed one and affected a fully memory safe language.


> using a low budget project with few developers maintaining one of the most used libraries in the whole World as an example of non memory safe languages perils is not exactly honest

Why? That's the situation of enormous amount of code people use.


because Log4j is part of that enormous amount of code you talk about and it's probably used by much less experienced programmers on average, because Java it's safe by design, isn't it?

Anyway heartbleed was discovered after many years, let's wait the same many years and see what kind of bugs we'll find in code written today with different languages.

Full disclaimer: I do not write C code since long time ago and have no intention of going back, but dogmatic programmers that believe in "saviours" are a real mistery to me.

The same people forgetting to free a resource are the same people that will forget to sanitize some input, meaning all of us make mistakes and will keep making them, in any language and those mistakes will be abused by some malevolent actor.

Google problems are not everyone's problems.

Goggle's solutions to problems are not everyone's solutions to the same problems.

Assuming that what Google says is applicable everywhere is at best naive.


> all of us make mistakes and will keep making them, in any language

.. unless said language makes making those mistakes difficult or impossible. Sanitizing input for example has not been an issue for me for decades, as every framework I used handles that by default, I'd have to work extra hard to make a mistake there.

> Google problems are not everyone's problems.

In this case they are. Not only memory safety is an issue for many codebases that have at least some C somewhere, but also because Google products are used by milions.

> Goggle's solutions to problems are not everyone's solutions to the same problems. > Assuming that what Google says is applicable everywhere is at best naive

Any other time I'd agree with you, but I don't see anything Google-specific here.


> unless said language makes making those mistakes difficult or impossible

You're missing the point [1]

(or I was unclear)

Yes, improvements in neuro surgery can save lives, but the bulk of preventable deaths it's in human mistakes [2] that are almost impossible to make impossible.

Just like the majority of the bugs are not prevented using rust, just a minority of them, which are also arguably the hardest to find and exploit, while a SQL injection can be exploid by a script kiddie with average IQ.

[1] https://portswigger.net/daily-swig/mastodon-users-vulnerable...

[2] The three risk factors most commonly leading to preventable death in the population of the United States are smoking, high blood pressure, and being overweight.


> Just like the majority of the bugs are not prevented using rust,

"For more than a decade, memory safety vulnerabilities have consistently represented more than 65% of vulnerabilities". That's not minority. We are getting into minority territory now because of Rust and other memory-safe languages.

> while a SQL injection can be exploid by a script kiddie with average IQ.

Use any framework and it's solved problem.


Members of the C++ community are working on fixing that. The Herb Sutter CPP2 idea:

https://www.youtube.com/watch?v=ELeZAKCN4tY


cpp2 is more about freeing c++ from its syntax nightmare rather than making it a safe language to work with.


Ah, yes, the age-old debate of memory safety vs. total bug count. It's like choosing between having a really bad headache or a really bad cold - either way, you're still feeling pretty lousy. But in all seriousness, I think Google's data shows that prioritizing memory safety can have a significant impact on the overall security of a product. I'm sure the C/C++ defenders will continue to argue their case, but at least now they have some hard numbers to contend with.

#RustForTheWin


...but still, even with Android's importance and Google's resources, they're not planning to "rewrite it in Rust", at least not for now - only new code will use Rust.


The overwhelming majority of bugs of any sort live in new code. The longer a piece of code has been around, the safer it generally is (with occasional high-profile exceptions). This means two things:

1) The most cost-effective way to eliminate the majority of memory bugs is to just start writing all new code in a memory-safe language. If you were going to write new code anyway, you may as well do it safely.

2) Going back and re-writing existing code that doesn't need to be changed may solve latent memory bugs, but it will likely introduce other regressions that could be worse for security or for user experience. If code doesn't need to change, it's often better to leave it as is.

Not that a rewrite is never called for, but it's not necessarily the best course of action by any metric (even when neglecting the cost).


Part of that safety is users working around known bugs, leading to inefficient solutions to whatever the code is supposed to do.

People tend to forget that they don't have to live with those problems, but they still have a cost


New code includes rewrites, like the Bluetooth stack.


Mostly new code. They've rewritten some core, high-importance pieces. But any mature OS is a massive codebase, so it just wouldn't be feasible to systematically rewrite the entire thing indiscriminately


Seems like drawing too many conclusions from evidence while arguing against a straw man?

Defenders of C might note that Android is java and IOS is not and compare the security of those two systems and say clearly memory-safe is focusing on the wrong thing. This is equally true but no more valid an argument.

The one that really bothers me in all these language-booster discussions (that we should and need to have) is the functional programming formal verification claims. We have no ssl library written in a memory-safe, functional language that has been proven correct that has dominated the space. Heartbleed wasn't yesterday.

I look down the list here: https://en.wikipedia.org/wiki/Comparison_of_TLS_implementati...

And I think something is not being discussed as far as replacing memory unsafe languages of critical security infrastructure. What is it?


WireGuard is a recent and prominent example of a system that has been formally verified (https://www.wireguard.com/formal-verification/). There are implementations in a variety of languages due to integration considerations.

You will find at the bottom of that page C implementations of curve25519 that are proofed and derived from F* and Coq. Curve25519 is a relatively simple implementation and only one part of any system that uses it. As you can see in both of these implementations the papers recognize a team of contributors each - this should provide some insight as to the cost of such work. That doesn't make it unimportant, it just makes it rare.


Out of curiosity did the Wireguard formal verification effort end up identifying any security issues in the implementation?


Now that is interesting. I didn't know that and will have to look further to understand what it means.

Wireguard seems to be written almost entirely in C, is that right?


The symbolic proofs of the protocol are independent of the implementation. The portions of the implementation that are formally verified are written in F* and Coq, and emitted as machine-generated C.


> We have no ssl library written in a memory-safe, functional language that has been proven correct that has dominated the space. Heartbleed wasn't yesterday.

Heartbleed didn't affect https://hackage.haskell.org/package/tls even though it isn't formally verified.


Does anybody at all use that library in production at scale, ever? Genuine question. Maybe they do?

Why hasn't this really good result meant _everybody_ now uses that library by default and has to justify using something else?

There is something here not being discussed, what is it?


There's a hint it was used at Dell in some capacity 6 years ago, judging by this comment https://www.reddit.com/r/haskell/comments/5gyrdv/what_is_war... The thread discusses "warp-tls" which is a webserver extension that uses that "tls" package as a dependency for TLS support.


Ok but this is surely not compelling evidence of literally anything. Perhaps, in fact, the opposite. This is what we have for evidence and nothing more then WHY???

There is something here, at least one thing, that seems to be dominating outcomes, and is not being discussed.

Nobody has even a half-suggestion of what it might be and that is not making it (or them) go away as problems that are not being solved.


rustls isn't formally verified but no critical CVEs have been found in it yet. The only CVE is one DoS.

And I don't know about "used at scale" but we use it in production for Pernosco.


> Safety measures make memory-unsafe languages slow > > Mobile devices have limited resources and we’re always trying to make better use of them to provide users with a better experience (for example, by optimizing performance, improving battery life, and reducing lag). Using memory unsafe code often means that we have to make tradeoffs between security and performance, such as adding additional sandboxing, sanitizers, runtime mitigations, and hardware protections. Unfortunately, these all negatively impact code size, memory, and performance.

Even more evidence that the negative performance impact of bounds checking is minimal, nay, it can even be positive.


No it isn't. It's just evidence that it's a trade-off you might want to make in order to achieve some other goal, specifically security.

But if "security" isn't remotely a concern for a given project (like almost anything graphics / gaming related), this is not at all evidence for changing anything. It could be that Rust's optimizer eliminates the bounds checking so regularly as to be a moot point, but this isn't saying anything of the sort. It's saying that the cost, whatever it was, was judged to be worth paying for the improved security for these projects


> But if "security" isn't remotely a concern for a given project (like almost anything graphics / gaming related)

Gaming platforms have gotten a lot less lenient over time, and with pretty much every game these days having online components, "security isn't remotely a concern" has become a lot less true.


Sure, but then there's things like HPC / offline graphics / simulation (VFX/CG), where performance is the end-all concern (or memory efficiency sometimes at the expense of CPU time), and security isn't a concern at all there, with lots of things like random index lookups into sparse arrays / grids, etc. I know for a fact that bound checks do make a bit of a difference there, as the data's random, so the branch predictors are close to useless in that situation...


Everybody who works with Maya, Flash (now Adobe Animate) etc. knows that they crash all the time, and often corrupt files so you back them up every hour or so. Carmack insists, in a gaming context, to run heavyweight, high-false-positive-rate static analyzers because users don't like crashes.

When C++ dies (which in 20 years it will, and I wasn't that hopeful 20 years ago), people will look in the same bewilderment at excuses made for its insane behavior as they look now at mid-20th-century arguments against high-level languages (and assemblers before them - like Mel said, "you never know where it will put things so you'd have to use separate constants.")


At least in VFX, at the high-end Maya's only really used for modelling/UVing/layout now, other apps have taken over the rendering/lighting side of things...

But anyway, in my experience a lot of the crashes are often due to quickly hacked together plugins for the various DCCs written for artists, that don't have good error checking or testing, and it's not completely clear to me how that situation's going to improve that much with something like Rust, if the same programmer time constraints are going to exist in writing them: i.e. I think it's very likely people will just unwrap() their way to getting things to compile instead of correctly handling errors, so it will be the same situation from the artists' perspective: technically it may be a panic rather than a segfault, but from the artists' perspective, it will likely be identical and take the DCC down.


I kinda hate .unwrap() even exists, it leads to excessively shitty error messages with no good context


panics can be caught and the presence of those unwraps come with enough data that filing a bug report upstream is helpful enough to fix the bug.


Sure, but that (we do it) happens currently with C/C++ and signal handler traps which gather the callstack and collate them: the issue isn't usually that we don't know the crashes aren't happening or having the call stacks - the issue is hacky code that was written for one purpose is now being used for other things it wasn't designed for (because it is useful to artists, despite its limitations), and there isn't the time to go back and write it properly for the new expanded use-case. That's my point: a new safer language isn't going to improve much in this area without more development time provided to write better code, given the time constraints are going to be the same as they are currently.


It does change the situation if the plugin throws an exception on errors instead of causing a segfault that brings down the parent application.


> which in 20 years it will, and I wasn't that hopeful 20 years ago

That's too optimistic. C++ would easily die in 20 years if it didn't already have 30+ years of still-active legacy that can't easily be converted or rewritten.

I've recently even had to start new projects in C++ because platforms I depend on demand it or because I have to interface with existing code and libraries that still only exist as C++. I'm not a fan of the language by any means, but I'll eat my shoe if it's "dead" in 20 years for anything except maybe greenfield development.


Dead - no, dying COBOL-style - quite possibly.


How much C and C++ do we have now? How much COBOL did we have at it's peak? I'm not sure the analogy holds for that reason alone.

What if the better analogy is updating building codes in Manhattan?


"random index lookups into sparse arrays" is almost always an anti-pattern in HPC. Successful data structures are designed for streaming access and fine-grained parallelism, even when the problem domain seems irregular. Bounds checks sometimes matter (less in the logic than in inhibiting vectorization), but can sometimes be safely eliminated using existential lifetimes/branding or different control flow.

Rust is starting to make inroads in HPC/scientific computing. The libraries have a ways to go for widespread end-to-end adoption, but to give a concrete example, a current project has drastically beaten OpenBLAS across a suite of matrix factorizations. It was developed over a few months by one person with much less arch-specific or unsafe code. (The library is on GitHub/crates.io, but the author isn't ready for a public announcement so I won't link it yet.) Expect to see lots more Rust in HPC over the next few years.


If you're talking about the library I think you are, from what I could see there weren't any tests of the numerics (nor comparisons of the output from competing libraries), which is quite concerning?

I do think Rust will make inroads, but more because of better WASM toolchain,so loading data into the browser is significantly easier than with JS (e.g. https://crates.io/crates/moc).


Security is absolutely a concern in these areas (except maybe offline graphics).

In my experiences with university HPC clusters, security is very important because you have a lot of young students with no Unix experience accessing the resources. We've had real compromises of individual research machines because of this.

This happens all the time at research universities, but it's not always public. In one public example from my uni, hackers from China compromised a research machine, which was used to attack IT infrastructure, which lead to PII including SSNs being compromised.


That's security of the generic infrastructure the code's running under though is it not? It's not security of say CUDA kernel code being executed on a GPU?

I'm talking about the actual HPC algorithm code heavily priortising performance (or in some cases memory efficiency), at the expense of pretty much everything else (other than correctness, obviously).


Ah, I think I understand what you mean. Let me rephrase:

Students writing code are not prioritizing security or performance. (I've seen FEM analysis written in Matlab, large neural-networks written in nearly-pure Python, etc.) The real 'performance' priority is human time, at the cost of everything else. To this extent, extra security "for free" from memory safety is nice.

There are exceptions, of course. The 2012 AlexNet breakthrough was a result of performance-engineering, for example. But generally speaking, publish-or-perish rewards neither optimizing performance nor optimizing security.

So, students will be installing Docker images (which have super user privileges), sudo running bash scripts, sudo installing pip or npm packages. I've seen students replace libraries (including CUDA) with modded binary blobs from researchers from other universities. All to save time in pursuit of ~~interesting~~ publishable results.

These are horrible things I've seen during my time in academia. We (should) do virtualization, jails, firewalls, etc. to insulate the rest of us from these horrible things. (I'd add "keep machines offline", but that's rare, and even rarer because of the pandemic.) This insulation is imperfect, and many of those imperfections are due to memory safety flaws.


If your high performance code running on a sensitive cluster is vulnerable, then it opens up the rest of the system to exploitation also. How is it a problem of the infrastructure around the code, and not the code itself?


I work in HPC, and while security isn't an issue for your typical simulation code, correctness certainly is. Spending a million CPU hours on a supercomputer computing junk because memory unsafely caused the simulation to corrupt itself, and then writing a paper publishing those results isn't good.

Many times when I've helped some researcher make their code run on a cluster I have discovered that the code crashes at runtime if bounds checking is enabled. The usual response is that "this can't be a problem because we've (or someone else) published papers with results computed with this program". Sorry sunshine, this isn't how it works. Maybe the corruption is entirely benign, but how can you tell?


That’s why default bound safety with optional unsafe access is a thing. Remove bound safe access after measuring its performance impact. But one should start with the safe thing, as for most part of even HPC, it doesn’t really matter (not everything will be the hot loop).


I take it the opposite: C programmers out of abundance of caution of putting in bounds checks for code that will never be called with out of bound data. As such rust is eliminating code that is being manually written. If you don't write the bounds check in C, and rust for the equivalent determines that the bounds check isn't needed the code should be the same (to the assembly level). However if you write a bounds check in C the optimizer might not eliminate it.


Can you point to any such bounds check in C that an optimizer cannot eliminate but it can eliminate the equivalent one in Rust?

I'm sure it's possible to construct such a thing, but I cannot imagine it ever being common enough to show up on any sort of head to head comparison.


I would guess all the aliasing stuff will get you. In C it's very difficult for the compiler to know whether two pointers are aliased, if we change X maybe Y changes too (because actually X and Y were the same). In Rust if we can write to it then it isn't aliased, and if we can't write to it then nobody can change it, thus changing X definitely can't change Y and the emitted machine code is sometimes simpler as a result, doing what you naively expected rather than what the C needs to do just in case you're crazy and there is an alias.

Now, in modern C you can say you don't have aliasing, but you're probably wrong and so there's a high risk when you do that you now get "impossible" bugs because you swore to the optimiser that if X changes, Y is unaffected, then created a situation where that wasn't true and now your program has no defined meaning, which is going to be tricky to debug. So, on the whole C programmers do not use this, indeed in places like the Linux kernel they even turn off the C standard's very minimal aliasing rules (which forbid aliasing objects of different types), they just don't trust themselves.


But the compiler doesn't need to care in that case because it's not bounds checking those pointers in the first place in C. So that's not going to give you slow C code from bounds checking that the optimizer failed to eliminate.

Like yeah there's aliasing changes, but in "idiomatic" C/C++ how is that getting you bounds checking that's not being optimized away fairly consistently?


Wait, previously you were talking about bounds checks which can't be optimised out, now you seem to be saying in C you wouldn't bother writing any bounds checks, which is a quite different claim.


I think what they're saying is that C compilers don't care about aliasing here because it's not actually bounds checking the pointers, it's just a numeric comparison between two random arguments. It's much easier for an optimizing compiler to eliminate a duplicated boolean check between two numbers this way because passing numbers from one function to the next has no aliasing concerns.


This is a misunderstanding of what's useful about aliasing information. A lot of C code can't be autovectorized because it can't tell that accesses to two arrays don't alias, for example. Similarly it often can't reorder / eliminate redundant updates to arrays because it can't tell they're not aliased. These aren't related to bounds checking specifically but they are things that can improve performance in Rust over C (theoretically, anyway; in practice LLVM is still very much tuned for C so it doesn't take advantage of a lot of this stuff yet, but it likely will in the future).


there's almost no bounds checking in rust code before the optimizer even looks at it because we use iterators and not goofy manually indexed for loops that are begging you to make a typo that crashes your code :)


Yeah but idiomatic modern C++ is also using iterators and even before that there's no bounds checking to eliminate in the first place since operator[] is unchecked so the optimizer can't be struggling to eliminate it since it's not there.

The question isn't "does Rust have bad bounds checking optimizations" but rather "what is this mythical heavily-bounds-checked C code that the compiler can't optimize away?"


No the claim is always that Rust "must" be slower than C/C++ because it has pervasive bounds checking for array indexing.

Then people insist on wanting to replace every x[i] in prod with x.get_unchecked(i) only to learn that, not only was that indexing not slowing the code down (the branch is perfectly predictable in a correct program!), but actually any difference is so in the noise that the random perturbation is worse (or that the asserts were actually adding extra facts for more profitable optimizations in llvm).

There is definitely specific hot loops with weird access patterns where it can be high impact but those are the exception, not the rule, as the Android team demonstrated.


Trivially, anything using the Iterator trait.

I don't know that I've ever actually manually indexed an array over years of using Rust.


In the HPC / offline graphics / simulation world, there's lots of things like sparse arrays / grids, where you index into compacted grids and iterators wouldn't be that practical in that scenario (i.e. one single item, although for things like filtering with surrounding cells they can still be useful sometimes), and bounds checks do make a bit of a difference there (it's definitely measureable above the noise threshold), and due to the random nature of the data, the branch predictors don't help avoid the overhead.


Sure, I know there's paradigms where manual indexing is important. GP asked for an example where bounds-checking could be eliminated in Rust, so I gave the first one I thought of.


I don't think your answer is relevant to the question:

> Can you point to any such bounds check in C


It's not just about bounds checks. I am way more agressive about borrowing fields of other types, particularly mutable borrows in ways that I wouldn't attempt in C to protect myself from future code from breaking invariants I'd have to rely on. This means I can write the hyper optimized version of an algorithm on any language, but I'm more likely to even attempt it in Rust.


An other bit I first saw observed by Armin Ronacher and which is indeed quite common is that Rust does not require you to be defensive (and pay the price) around protected resources e.g. you can hand out references to rc’d types or mutex’d, and you know it’s safe, if a bit constraining.

And so you save the overhead of the extra refcounting or the re-entrant locks (though that would be unsound anyway), and you can safely use anything which works off of references.


There were (or are?) some older call of duty games being sold on steam that had unpatched RCE vulnerabilities in them. Simply joining a server ran by a malicious host can result in the players system being totally compromised.

These games would go on sale once a year or whatever and attract new players, and people would post warnings in the steam forums and whatnot to try to stop people from being effected by the issue, but I am sure some people either didn't listen or didn't notice the warnings.

I haven't looked into the issue in a few years at this point, but it's very possible that the games are still unpatched and being listed in the store to this day.

Anything that connects to the internet needs to be strongly concerned about security!


Similarly there was an RCE discovered in Dark Souls III about a year ago. And even though modders had s fix for it before the details of the exploit were even publicly revealed, it took the devs 8 months or so to finally fix it.


That’s why managed languages in itself are not enough. Security on desktop OSs are a joke, with perhaps mac being a bit ahead than the rest.

But especially desktop linux.. every random bash script could encrypt your documents, or leak out your browser cache, do whatever it wants..


I wonder how much of the positive performance is just because we have not proved that some impossible case is really impossible and so we have if statements checking for a situation that mathematically cannot happen just out of caution. (note that in many cases we don't even have the theoretical tools to prove the code)

Though that makes non-memory safe code more reliable in the case of a bit flip. (this is not a serious advantage - only some bit flips can be prevented this way)


Glad to see robust work here. This strongly supports what should already be obvious, but sadly is not always understood; that memory safe languages are radically safer than memory unsafe languages. The impact is blatantly demonstrated here.


When Heartbleed was a topic of discussion, some pointed out that Rust wouldn't have 100% protected from that vulnerability. So it is good to see some proof that using a safer language does in fact pay off in terms of fewer defects. I just wish there were some info around cost associated with development effort. Did the Rust code take longer to develop? If initial development was longer, what if we include time saved from reduced effort for bug resolution?


An example from an experiment to benchmark Rust and Java I did recently, where I sent files from one app to another: perf was good enough without tuning, with tuning I could triple the speed and final total time on both versions was comparable. Memory was much greater for Java (even with graalvm). The Rust version didn't suffer from any memory safety issues or race conditions when sending multiple files, but I did have a vuln where you could specify a relative path that could escape and write anywhere in the receiver's filesystem. Rust didn't protect me from that, and those are the kind of vulnerabilities that we'll continue seeing regardless of language. But the threat surface I had to be scared about was much smaller than it would have been in other languages. And because the fallible APIs are obvious, I handled many edge cases that I might have forgotten about otherwise.


> Rust didn't protect me from that, and those are the kind of vulnerabilities that we'll continue seeing regardless of language.

It didn't on its own, but it is worth noting that with type-safe languages, you can protect yourself from this by encoding that invariant into the type system.

Using Rust as an example, take a &std::path::Path (or &camino::Utf8Path or whatever) in your public API; have custom InternalPathBuf and InternalPath types that perform validation to ensure they aren't using relative paths to "break out" during construction, and then pass those around in your internal API. Bingo bango, now there's no way (short of transmuting, an `unsafe` operation) to pass invalid paths to the functions that hit the filesystem without a compile error. No redundant runtime checks required, and no need for you as the developer to keep track of which codepaths have already validated a Path and which haven't.

I'm sure you already know this, and I would imagine that Java can do the same, but it's a big step above languages like Python where you can do whatever you want to anything you want.

EDIT: lol while I was typing this you made a post about the same thing below.


This technique is mentioned in the article. It is called the Typestate pattern:

http://cliffle.com/blog/rust-typestate/


Great minds and all that :)

I guess it is also an often used square peg that fits really nicely in the square Rust typesystem hole (no, not talking about those typed holes, haskellers).


>Rust didn't protect me from that, and those are the kind of vulnerabilities that we'll continue seeing regardless of language

I'm still thinking about how we could integrate something like that in a language or the languages package manager. I'm unsure if it's possible.


The only things I can think of is the use of newtypes around PathBuf that enforces things like expansion and that makes the check for you when restricting tk a specific directory. Now that I'm writing this out, this feels like it could be a very useful small crate or addition to Camino. Thank you for making me think further about this. Of course, the impl would have an associated runtime cost for the check and a more involved API surface because it's asking the developer for more information. But once you do that you can have an TryInto<PathBuf> impl to pass it to any standard method.


It's possible and easy (have types for path coming from untrusted source), but it's a matter of a standard library rather than a language.


How about in the OS? This sounds like exactly the type of thin SELinux is meant to handle


Yeah, I think the thing is... path traversal is pretty trivial to solve. If you have a single tenancy app and you just don't want the service accessing shit it shouldn't just throw it in docker. If you have a multi-tenancy app just put every user behind a uuid.


In C++\Java this would be solved by a static analysis tool. For example Fortify covers this error.



Capabilities are a good way to mitigate this problem, yes. At minimum cap_std::fs would prevent "../" attacks.

"../" attacks are also just way less of an issue when you shove your programs into minimal containers, which at this point is more or less standard practice.


Rust and C++ are about equally difficult (or easy) to program in, the languages are much more alike than they are different.


Yeah, Rust is basically modern C++ idioms made mandatory by the compiler (move, copy semantics).


I think even heartbleed would be mitigated with (safe) Rust. IIRC heartbleed was caused by a missing bounds check, which allowed attackers to read past the message buffer and leak secrets from nearby memory. Safe Rust would just panic (crash) if you tried to slice past the end of the buffer.


I think this indeed echoes similar experiences and studies at other large companies (e.g. Apple, Microsoft, Meta, etc.) regarding the characteristics of their recent investments into rust code vs. C/C++ code. I don't think it is surprising to anyone at this point. But it is nice to see it re-confirmed.


Rust (despite the common understanding) is not a memory-safe language in its entirety. It is a language designed to have a strict division of safe/unsafe which makes it easier for developers to compartmentalize code to achieve memory-safety.


Is there any practical programming language that is memory safe in its "entirety"? Python, for example, certainly is not. It has unsafe escape hatches (via ffi, at the very least). Yet, everyone I know of says and thinks of Python as a memory safe language. I do as well.

> which makes it easier for developers to compartmentalize code to achieve memory-safety

The problem here is that this is incomplete. Many many many languages have achieved this before Rust. Where Rust is (somewhat although not entirely) unique is bringing this compartmentalization into a context that (mostly) lacks a runtime and garbage collection.

I have no problems calling Rust a "memory safe language" precisely because I have no problems calling Java or Python "memory safe languages." What matters isn't whether the language is "entirely" memory safe. What matters is what its default is. C and C++ are by default unsafe everywhere. Rust, Java, Python and many others are all safe by default everywhere. This notion is, IMO, synonymous with the more pithy "memory safe language."


> Is there any practical programming language that is memory safe in its "entirety"?

This isn't possible. Eventually you are sitting at a block of memory and need to write the allocator. Maybe (like python) your allocator is written in C and you hide it, but there is always something that isn't memory safe sitting under your language.

You could write a language for an actual Turing machine which since it has infinite memory is by definition memory safe. However as soon as you need to run on real hardware you have to work with something unsafe.

You can of course prove a memory allocator is correct, but it would still have to use unsafe in rust. I supposed you could them implement this alloator in hardware, and make rust use that - but since this doesn't seem like it will happen I'm going with all languages have unsafe somewhere at the bottom.


Yes, exactly. That's why I asked the question: to drive out the point that the ontology the GP was using was probably not terribly useful.

Although I did use the weasel word "practical" to narrow the field. If you don't limit yourself to general purpose languages, then I'm sure you can find one that is "entirely" safe.


That depends on your definition of "practical" and "entirety".

The article was about languages being used to implement Android. Clearly, no, you can't have an entirely memory safe language that can be used to implement Android, for the reason you said. But there's a wide gap between "practical for doing useful work of any kind" and "practical for implementing Android".

Then, "entirely". What's "entirely"? Entirely until you get to library calls? Entirely until you get to OS calls? Entirely including the OS? If you include the OS then again, you are right for the reason you said. But if you exclude the OS, I'm not so certain.


Sure, but the language doesn’t have to expose it to you. There’s a bunch of other processes running on your system too aside from your program, but the OS prevents you from scribbling all over their address space.


Rust is a system programming language. If I have a new idea for an allocator they want me to write the experimental version in rust. If you never write an allocator and other such tricks you don't need unsafe - you could use one of the other languages. Java doesn't have unsafe, but you cannot write a custom allocator in java (well you can, but it will by a manual process to use it - you have to drop back to C if you want java to use your custom allocator by default)


> It has unsafe escape hatches (via ffi, at the very least).

Yep, ctypes is part of the stdlib and lets you corrupt the VM on the fly. Fun stuff like changing the value of cached integers and everything.

But ctypes being a terrifying pain in the ass, people tread very carefully around it. Cffi’s a lot better though it requires an external package. At the end of the day I think I’d be more enclined to bind through pyo3 or cython than write C in python (which is what ctypes has you do without even what little type system C has, to say nothing of -Wall -Weverything).


> But ctypes being a terrifying pain in the ass, people tread very carefully around it.

I'm not sure how much people treading carefully actually translates into safety in practice.

CPython in particular has ad-hoc refcounting semantics where references can either be borrowed or stolen and you have to carefully verify both the documentation and implementation of functions you call because it's the wild west and nothing can be trusted: https://docs.python.org/3.9/c-api/intro.html#reference-count...

This ad-hoc borrowed vs stolen references convention bleeds into cffi as well. If you annotate an FFI function as returning `py_object`, cffi assumes that the reference is stolen and thus won't increment the ref count. However, if that same function instead returns a `struct` containing a `py_object`, cffi assumes the reference is borrowed and will increment the ref count instead.

So a harmless looking refactoring that changes a directly returned `py_object` into a composite `struct` containing a `py_object` is now a memory leak.

Memory leaks aren't so bad (even Rust treats them as safe after the leakpocalypse [1] [2]). It's when you go the other way and treat what should have been a borrowed reference as stolen that real bad things happen.

Here's a quick demo that deallocates the `None` singleton:

    Python 3.9.13 (main, May 17 2022, 14:19:07)
    [GCC 11.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import sys
    >>> sys.getrefcount(None)
    4584
    >>> import ctypes
    >>> ctypes.pythonapi.Py_DecRef.argtypes = [ctypes.py_object]
    >>> for i in range(5000):
    ...     ctypes.pythonapi.Py_DecRef(None)
    ...
    0
    0
    0
    0
    0
    [snip]
    Fatal Python error: none_dealloc: deallocating None
    Python runtime state: initialized

    Current thread 0x00007f28b22b7740 (most recent call first):
      File "<stdin>", line 2 in <module>
    fish: Job 1, 'python3' terminated by signal SIGABRT (Abort)
[1]: https://rust-lang.github.io/rfcs/1066-safe-mem-forget.html [2] https://cglab.ca/~abeinges/blah/everyone-poops/


> Here's a quick demo that deallocates the `None` singleton:

As I said, you can trivially corrupt the VM through ctypes. However I don't think I've ever seen anyone wilfully interact with the VM for reasons other than shit and giggles.

The few uses of ctypes I've seen were actual FFI (interacting with native libraries), and IME it's rare enough and alien enough that people tread quite carefully around that. I've actually seen a lot less care with the native library on the other side of the FFI call than on the FFI call itself (I've had to point issues with that just this morning during a code review, if anything the ctypes call was over-protected, otoh the update to the so's source had multiple major issues).


Well, there is still an important difference between Java and Rust — are you driving with a guardrail on a field vs are you driving next to a cliffhanger.

The JVM has well-defined bad execution as well, e.g. data racing is well-defined. Safe Rust does prevent data races statically, but if they do happen due to a bad unsafe block, you are entirely on your own. While memory safety can abruptly stop both processes, FFI is very rare in Java, it is an almost completely pure platform being Java all the ways down, so in my experience the former is safer from this aspect.


I don't have much Java experience, so I'll have to take your word for it. But it's not completely obvious to me that you're correct. We've moved from an absolutist idea of memory safety to trying to build an implicit ontology of tiers of memory safety based on usage. Now you're talking about going and doing surveys of code and trying to measure the relative frequency of certain things and then using that to drive a tiered hierarchy of memory safety in programming languages.

Sounds hard to do and you also haven't accounted for what problems are being solved in each language. I can pretty much decide to never ever use `unsafe` again, but I'll be leaving perf on the table. If I were writing Java, I would probably be fine with that. But I'm working on interesting problems that want the most perf possible, and so U do very occasionally justify `unsafe` when writing Rust.


As others mentioned, there is no absolute safety, nor memory, no anything. The hardware can have bugs, the verification toolkit can have, or the properties to be verified could have been incorrectly specified to begin with.

I’m just saying that corrupting the heap is much easier with Rust than with Java, and there is no coming back from heap corruption on a process basis, while most exceptional cases are recoverable by the JVM (hence the cliff analogy).

And Java can have surprisingly good performance, especially in multi-threaded code that has a non-predictable allocation pattern (where ARC is just not too good) — if you want significant performance improvements you really have to go down the inline asm road, which you can do from anywhere.


> there is no absolute safety

Now we've come full circle. I recommend you go back and read my initial comment in this thread and the comment I was responding to. You've veered far off course from there into waters in which we likely have very little disagreement of any consequence.

> And Java can have surprisingly good performance

Show me a regex engine written in Java that can compete with my own, RE2, PCRE2 or one of a number of production grade regex engines written in C, C++ or Rust. I'm not aware of any.

That Java can "have surprising good performance" is not a statement I'd ever disagree with in general terms. That has absolutely zero to do with anything I've written in this thread (or elsewhere, ever).

Will all due respect, I think you've lost the script here.


You may be right, I’m not really disagreeing, I can absolutely stand behind this sentence of yours:

> Where Rust is (somewhat although not entirely) unique is bringing this compartmentalization into a context that (mostly) lacks a runtime and garbage collection

I just think that the model of “breaking down” is different between the two platforms and that might matter for some use cases.


Right, that's why I said:

> But it's not completely obvious to me that you're correct.

:-)

Which is to say, I don't know you're wrong. But it's a pretty subtle thing that requires a careful survey. And likely discussion of lots of concrete examples. It's far more nuanced than the thing I was responding to originally (not to you), which was this wrong-headed notion that Rust isn't memory safe "entirely." Because once you go down that path, the entire notion of "memory safety" starts to unravel. That is, of course Rust isn't "entirely" memory safe. Pretty much nothing practical actually is in the first place. I tried to force this issue by asking for counter-examples. The only good one I got was Javascript in browser, but that basically falls under the category of "programs in a strictly controlled sandbox" rather than "programming language" IMO.

I think this comment of mine might also be helpful, which reflects a bit on terms like "memory safety" and why they are a tricky but very common type of phenomenon: https://news.ycombinator.com/item?id=33825307


> It has unsafe escape hatches (via ffi, at the very least).

Playing devil's advocate, I can think of at least one language which has no escape hatches: Javascript running within a web page.


It is very easy to create a safe language. Hell, most brainfuck interpreter is likely completely safe, you allocate a large enough array and just iterate over its basic instructions that only ever modify that array and print a character. A Turing machine in itself can do no harm.

The hard part comes at allowing it to do something useful, but only the parts I believe should be able to. E.g. plugging in file system access to our brainfuck interpreter will make it quite unsafe. Node for example does have C FFI.


Yes, but that's not just a language. It's a language within a certain context. But it is a worthy mention.


I think a distinction can be made in that you never really need to use unsafe operations in python or Java. In rust, you need unsafe. Just about every data structure in the stdlib uses unsafe.

I think it's fair to call Rust a memory safe language. But I don't think it's on the same tier as a fully managed language like python.


I suppose reasonable people can disagree, but I don't think it's anywhere near as clear cut as you seem to be implying. You talk about data structures in std using unsafe, but you don't mention the heaps and piles of C code used to implement CPython's standard library.

It's not like you need `unsafe` in Rust to build every data structure. I build oodles of data structures on top of the fundamental primitives provided by std without using any `unsafe` explicitly whatsoever.

And it is not at all uncommon to write application code in Rust that doesn't utter `unsafe` at all. Even ripgrep has almost none of it. At the "application" level it has exactly two uses: one related to PCRE2 shenanigans and one related to the use of file backed memory maps. Both of those things are optional.

Then there's another whole perspective here, which is that if you're using Rust in the first place, there's a non-trivial chance you're working on something "low level" that might require `unsafe`. Where as with Python you probably aren't doing "low level" work and just don't care much about perf within certain contexts. That has less (albeit not "nothing") to do with the design of the languages and more to do with the problems you're trying to solve.

To be clear, I am not saying you're definitely wrong. But as someone who has written many tens of thousands of lines of both Rust and Python, I would put them on the same or very very close level in terms of memory safety personally. Certainly within the same tier.


You make a good point that much of pythong stdlib is implemented in C. But you could implement python's list in pure python, safely. You can't implement something like that in rust without unsafe.


You can implement lists in Rust safely; with enums, a list is four lines of safe code. You can even implement a doubly-linked list safely. You just can't do either of these things by wielding pointers willy-nilly. If you're willing to accept a performance tradeoff by implementing a list in pure, bootstrapped, FFI-free Python, then you can do the same in Rust.


You certainly can! And that's a good example, because it exposes just how important context is to this discussion. Perf matters in certain contexts. If you implemented a list in pure Python, do you think its users would find the overall perf of Python to be acceptable?


How would you implement a python list in python? I mean, what would you consider "acceptable" primitives to do so?


> I think a distinction can be made in that you never really need to use unsafe operations in python or Java.

You can't write any code at all in Python or Java without relying on unsafe operations. Both of them have their runtimes written in C/C++.

So based off of this unusual line of reasoning, Rust is strictly more memory safe than either of those as it's at least possible to have a Rust program without any unsafe code. That program will be of questionable value, sure, but it can at least exist at all whereas it can't for Python or Java.


I’m not sure going down this road is meaningful because as soon as we get to machine code generators you get a “reset” on safety, no matter the language you implement a compiler in, it can have logic bugs which will result in many sort of serious bugs, including memory ones. This is true of both the rust compiler and Java’s JIT compiler.

Interpreters and the rest of the VM is a different beast, while they also have to be bootstrapped from some unsafe language one way or another, they are usually written in a much more expert, security- and correctness oriented way than your average program. So while they can and do have bugs, they are exceptionally well tested and, well, I wouldn’t expect the JVM to die out under my program the same way you don’t really expect the kernel to freeze either. This is also true of Rust stdlibs, I assume, but is it true of third party libs?


>You can't write any code at all in Python or Java without relying on unsafe operations. Both of them have their runtimes written in C/C++.

This isn't a meaningful distinction, in the end. Hardware is unsafe too. Real production CPUs have bugs in them which lead to cache lines becoming corrupted, address translations being wrong, branches going to the wrong place, etc. under extremely weird conditions. But, in the end, we don't really do much about it because we trust that it probably won't impact us since we assume the people who built the SoCs or those who wrote the standard library did a good enough job.


> You can't write any code at all in Python or Java without relying on unsafe operations. Both of them have their runtimes written in C/C++.

Technically, pypy is a Python runtime written in Python.


There are at least three Java runtimes written in Java, Jikes RVM, MaximeVM, and GraalVM.


At least two of which use an unsafe dialect of Java for significant parts of the runtime, which I'm pretty sure you know well (maybe not Graal, but if not it's because it's bootstrapping on top of existing unsafe code).


Easy localizable via grep, and not full of UB and memory corruption issues, which is what the 70% of unsafety issues due to memory corruption on C, C++ and Objective-C relate to.

At some level of the stack some Assembly or compiler intrisics are needed, not at every line of code.


Jikes is the one I'm most familiar with and people working on its runtime absolutely suffered from UB and memory corruption issues... obviously not throughout the whole standard library but that's not the case for other JVMs either. In fact the Jikes people found it nicer to work in Rust than in Java on components like the garbage collector, because it was a better fit for working safely with this kind of code and they didn't have to write in a restricted subset of the language to avoid triggering the GC.


Since when Jikes uses Rust?

Also bootstrapting a language always requiring using its subset for low level layers, apparently not an issue that many parts of C, C++ cannot be implemented only with what ISO provides on the standard.


> You can't write any code at all in Python or Java without relying on unsafe operations.

There are Python interpreters written in other languages: there is one in Rust, and there is Jython and IronPython.


By that logic, you can't write any safe rust at all because it relies on a compiler written in C++.

We are discussing the languages themselves, not any particular implementation.


Only the codegen and many optimization parts of the compiler is in C++. The rest of it is in Rust.


One could implement many data structures without unsafe, but with less efficiency. E.g. using an arena allocator


I would like to dispute the "with less efficiency" simplification, because depensing on the size and usage patterns of your code, a doubly linked list or sikilar graph datastructure, backed by an arena will be faster than the way those data structures appear in books.


Sure, but that is kind of what I mean. Safety in rust is something you actively have to think about and work around (at least some of the time). It doesn't just come for free like in python.


Point is, “for free like like in Python” really means “the C implementation hides that for you”


Standard ML is entirely memory safe (some but not all implementations offer nonstandard escape hatches). I've heard someone here claim that the strictly standard version is a practical programming language, although I'm not sure I believe them.


Surely this is true, but I still have the feeling that libraries in Rust tend to have more unsafe code than Java, Python, C# or others, maybe even more unsafe code than needed. Perhaps this is related to the problem domain.


You would definitely need to control for domain.

A Rust library for some sort of mathematical modelling might well need no unsafe at all, while a Java library for controlling some hardware might soon turn into JNI talking to some C++ code and oops you're unsafe.

In C# you need to reach for unsafe to do some of the stuff Rust can just do safely anyway. Did you know a C# struct with an array of 8 ints in it, doesn't actually have the eight ints baked inside the struct? It was easier in the CLR not to do that, so they didn't. Which means C# structs which look like a compact single object that surely lives in a single cache line don't actually do that in safe C#. You need unsafe.


It does if you learn to use C# properly,

https://learn.microsoft.com/en-us/dotnet/api/system.runtime....

In actual native code produced by RyuJit, you don't need to worry about cache lines for single instances, because the struct might not even exist at all, the Jit having mapped fields into CPU registers instead.

When it matters, like the struct being part of an array, use StructLayout.


That link seems like it's about alignment rather than about arrays inside structures?


No, it is about alignment and packing, you use StructLayout attribute alongside LayoutKind and FieldOffsetAttribute.

https://learn.microsoft.com/en-us/dotnet/api/system.runtime....

You main issue was how structures arrange their fields.

Also regarding arrays and structs, as of C# 7 you can use fixed to declare static arrays inside structs, however these structs need to be marked as unsafe.


> as of C# 7 you can use fixed to declare static arrays inside structs, however these structs need to be marked as unsafe.

That is exactly what I was talking about.


Ah ok, somehow misunderstood that.

However there are actually good reasons for it to be unsafe, although it is debatable if that alone should be it.

One is due to the interactions with the GC, in case it moves the data and there are references to its elements, and stack size.

One way to get around it is to use AoS instead of SoA, which is any the best option if performance is the ultimate goal.


> It was easier in the CLR not to do that

This has also advantages in that you don't need to allocate the struct in a coherent memory block. Edge case of course, but there are domains where this is relevant.

There was an allocation bug once because unsafe code needs to be allocated consecutively but most memory checks that only returned available memory failed to account for fragmented memory.


> You need unsafe.

You beed it for that feature. It is questionable whether you really want to mandate a special memory layout (because you can’t really do that even in Rust, you don’t have explicit control of struct alignments, paddings, order(!) )


Rust absolutely gives you control over alignment, padding, and ordering. It’s just not the default. Ask for those things and you shall be given it.


Could you point me to some resources on that? I only know about #[repr] options, but that isn’t absolute control (e.g. for having structs usable from rust and internal asm)


What is “internal assembly”? I’m not familiar with that term.

Is there anything else that the various repr options don’t give you? My team at work does OS dev in Rust, and haven’t ever run into cases where Rust can’t do what we need it to do in these cases.


* inline assembly, just my brain stopped working for a sec :D

Well, my specific case is writing a fast interpreter in Rust, where I would like to use elements like a stackframe from both inline asm and proper Rust code. In my first iteration I chose a dynamically sized u64 array, wrapped in a safe API, because I couldn’t be more specific. But even with known size elements the best I can do - to my knowledge - is Layout? Or just a raw pointer and a wrapper with helper functions, as otherwise I can’t modify the object in question from both places.


Ah yeah no worries :)

It’s sort of tough because I am only familiar in passing with the patterns in that type of code, but Layout is an allocator API, so I’m not 100% sure why it would be used here. I’d guess that if I was doing something like this, I’d be casting it to and from a struct that’s defined correctly. This is one area where stuff is a little simpler than C, thanks to the lack of TBAA, though many projects do turn that off.


Rust code frequently is used in a systems programming context, where it interoperates with unsafe code or needs to occasionally overrule the compiler to satisfy performance requirements.


Maybe. I don't know. You'd have to collect some data.

> Perhaps this is related to the problem domain.

Yes, I included that possibility in my comment here: https://news.ycombinator.com/item?id=33821787


> Is there any practical programming language that is memory safe in its "entirety"?

Whatever can be compiled to BPF meets this requirement. The price though is that it wouldn't be very useful.


Right, that's why I used the word "practical."


JavaScript? It’s not typical to provide it with any access to unsafe APIs.


Someone already mentioned that. That only works if you restrict yourself to JavaScript in the browser. There's a huge ecosystem for using JavaScript outside of the browser.


It’s not very useful to talk about the memory safety of languages as a whole without looking at specific implementations. JavaScript in a browser is memory safe. JavaScript with access to /proc/mem is no longer memory safe. C on most hardware is not memory safe. C running on the abstract machine itself can be.


This looks like a comment in response to https://news.ycombinator.com/item?id=33820918 and not to me.

The high level idea of my original rebuke was this idea that Rust was somehow lesser because it isn't "entirely" memory safe, and that its purpose was to divide safe from unsafe. But that really misses some very big points, because the programming language implementations used to build programs virtually everywhere are similarly not "entirely" memory safe, and many many many languages before Rust divided safe from unsafe.

Notice how I modified my rebuke to include your caveat. Does my point change? Does the strength of my rebuttal change? Does anything materially change at all, other than using yet more word vomit to account for caveat? No, I don't think there's anything materially different other than more words.

I tried to sidestep all of this by using the weasel word "practical." So next time I'll just say, "any practical non-sandboxed programming language." You might still chide me for confusing "programming language" with "implementation of programming language," but I've never much cared for that semantic because the ambiguity is almost always obviously resolvable from the context.

> It’s not very useful to talk about the memory safety of languages as a whole without looking at specific implementations.

Not sure I would agree with this, but it probably depends on what you mean. We can meaningfully discuss the memory safety properties of the programming languages (not just the implementations) of Rust, C and C++. I think you have to still acknowledge the practical realities of any particular implementation that others will use to build real programs, but I contend you need not do so more than what the language design does on its own already. Because languages aren't designed in a vacuum. Even if you can build an abstract machine, for example, C was not designed to be an abstract machine. It was designed to get stuff done in the real world, and the real world influenced that design. Same for Rust.

Things like CHERI will potentially change this conversation quite a bit. I was even thinking about it when I wrote my original comment in this thread. But I think it is, at present, covered by the weasel word "practical." It isn't practical to use CHERI yet, as far as I know.


I should probably preface this comment by mentioning that I don't think there is anything new in it for either of us. Nor do I think we actually disagree on any of the facts. My earlier comment, and this one, was really just a response predicated on what I think the colloquial meaning of "memory safety" is, and to whether a practical language can be "truly memory safe"…which of course depends on what you see a programming language as being.

Memory safety is, as you have already mentioned, not black and white: I wouldn't even put it on an axis, because that suggests the scale is one-dimensional, and I don't even think it is practical to discuss it in that context. I prefer to categorize languages (for a definition of "language") in a couple of rough groups where most of them hang out.

In the first group is C and C++ as you're typically used to it, where pretty much every operation can do something unsafe and there's really no safe subset of the language, much less safety by default.

The second group is the "safe by default" languages like Rust or Python or Java, were you can write functional programs in the entirely safe subset (which is usually the default). This is where things get more complicated, though, because what the unsafe bits look like differ. Some give you language-level constructs to do unsafe things, such as Rust (with unsafe) and Java (with sun.misc.Unsafe or whatever). I think CPython technically also falls here because of some weird implementation choices where you can corrupt memory, but it's really more of being in the other category where you can do unsafe things via FFI and external interfaces. That's kind of where most Lua implementations live, or nodejs stuff.

Then you have the things which (usually intentionally) do not give you any of these things. That's JavaScript or WebAssembly in a browser. The final stop in this line is where you start placing significant limits to what the language itself can do, such as eBPF running in the kernel, or domain-specific parsers like WUFFS.

I've been pretty sloppy with what I call a "language" here, because you can always take a programming language and slap memory safety on it: though not trivial, you can sandbox it, pick some subsets of it, put in hardware, etc. (FWIW CHERI doesn't actually make C/C++ completely memory safe, it just helps.) And going the other way is pretty easy, you just add features to let programs mess with the execution environment.

I get that the comment that you're replying to is trying to well acktually you and I agree with the rest of your response, but the takeaway I have here is "you [the commenter you were responding to originally are coming in with a definition of memory safety, yes in this context Rust does have these escape hatches and this is what they do, but in vernacular it is safe because this is how we typically evaluate languages for this sort of thing". Which, again, is like 90% of what you wrote already, I just think that it is probably worth bringing up that there is a pretty common environment for a popular language that actually takes things a step further than this, with whatever tradeoffs that entails. Not really a disagreement, just a "hey I think this is worth mentioning".


Then even Java is not memory safe according to the implicit standard that you allude to here since one can use the `Unsafe` class.


Not for much longer, access has to be very explicitly specified at start time so that (deliberate) hole is getting smaller and smaller. But of course native functions is a thing (but very infrequent)


No language in use meets your definition of memory safe.


Perhaps the problem is with the term "memory safe".

No language can prevent a person from allocating a writable buffer, then reusing it without cleaning it. Do that on a server, and you have step 1 to a security vulnerability.

If requests to allocate memory come faster than the garbage can be collected.

Or a data container holding many/large references that will never be used. The difference between that and a lost pointer in C are moot in a practical sense.

All of these _can_ be prevented. But it's programmer care, rather than the language, that prevents them. Hence, the term "memory safe" is inaccurate. "Memory safer" would be more accurate, but far less catchy.


Yes, but this problem exists everywhere all the time in virtually any context, even outside of technology. It's a very general problem that plagues communication. My perspective on the matter is the following:

1. We love to simplify matters down to black & white thinking with absolutist statements.

2. Attention spans are short (and probably getting shorter), so we try very hard to be pithy.

3. General seeming statements are actually narrower than they appear.

4. When taking a statement at its literal absolutist meaning leads you to an absurd conclusion, you're "supposed" to use your own judgment to interpret it imprecisely rather than ridiculously.

"memory safety" fits these criteria pretty well, especially the third point. Clearly, you really can't have a programming language be practical/general-purpose while simultaneously being completely and totally "memory safe." It's just ridiculous given our current predominant operating systems and architectures. The pithiness of "memory safety" relies on you, dear reader, knowing that and interpreting "memory safety" as something more reasonable than that.

> No language can prevent a person from allocating a writable buffer, then reusing it without cleaning it. Do that on a server, and you have step 1 to a security vulnerability.

This is a good example of (4), where you interpret something generally, but it's actually much narrower. When folks say "memory safety," they are not referring to the problem you speak of here. The problem you speak of might be a vulnerability, but it is not, in and of itself, something that would be recognized as memory safety. A memory safety bug could lead to the circumstances you describe, but it is not necessary. (Some people like to claim that other people think memory safety is the only kind of safety that matters, but few people with any credibility actually espouse that view as far as I'm aware. But it's important to call out: if you fixed every single memory safety issue ever, you would not fix every single security or vulnerability issue.)

The important life lesson here is that jargon is abound, and a good skill to pick up is knowing when to recognize it. If we go around interpreting every very literally, it's going to be a bad time.

We could also stubbornly demand that everyone use crystal clear, unambiguous, precise and accurate terms all of the time everywhere so that nobody ever gets confused about anything ever again. But of course, I'm quite certain that is simply not possible.


Ada seems to fit.


Ada has no "unsafe"? Ada has no ffi? Ada has no escape hatches or unchecked APIs whatsoever? Does it have any pragmas that can disable safe checking? Because if it does, it's not "entirely" memory safe.


Ada has "unchecked" operations:

Unchecked Access (unsafe pointers): http://www.ada-auth.org/standards/22rm/html/RM-13-10.html#I5...

Unchecked Deallocations ("free"): http://www.ada-auth.org/standards/22rm/html/RM-13-11-2.html#...

Unchecked type conversions (unsafe casting): http://www.ada-auth.org/standards/22rm/html/RM-13-9.html#I57...

Ada has FFI:

C / C++: http://www.ada-auth.org/standards/22rm/html/RM-B-3.html

COBOL: http://www.ada-auth.org/standards/22rm/html/RM-B-4.html

Fortran: http://www.ada-auth.org/standards/22rm/html/RM-B-5.html

Ada has pragmas to both enable more security measures or relax security measures:

http://www.ada-auth.org/standards/22rm/html/RM-L.html

Just like in unsafe Rust, sometimes in Ada you need to turn off some security features or tell the compiler "I know what I am doing for this part" when interfacing with some hardware or similar low-level stuff.


Ada has an FFI.


As well as address clause, unchecked_conversion and address_to_access_conversion. Extremely useful tools that give you the choice when to write risky code, and generate a compiler note exactly where such risk lives.


A simple heuristic that I expect to work universally is "could I write a program that prints to my terminal on a linux machine?" and if the answer is "yes" then it does not fit.


The blog speaks to this explicitly, in the "what about unsafe Rust" section. The tl;dr is that the number of unsafe sections is a small fraction of the total code size, and it's much easier to audit the usage of unsafe, as the reason to justify it is focused. Thus, the use of unsafe in Rust is not a significant driver of actual vulnerabilities.

I think this has always been the goal, but it wasn't obvious at the outset that it would be achievable. The fact that we now have empirical evidence in real shipping products is significant.


disclaimer. I am sympathetic to the cause. I think Android needs to address security since they are processing personal data. I like how Rust community tries to educate others on what 'memory safety' is and is not.

But i am completely baffled by arguments that count number of unsafe blocks or code lines. Like this:

>the number of unsafe sections is a small fraction of the total code size

Code execution combinatorial effects makes number of sections or code size completely useless metrics to judge security. They do help mechanical part of auditing security in sense that they help to locate things. But locating things was never enough to judge if security is there.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: