Hacker News new | past | comments | ask | show | jobs | submit login
Safe-Linking – Eliminating a 20 year-old malloc() exploit primitive (checkpoint.com)
115 points by eyalitki 13 days ago | hide | past | web | favorite | 28 comments





Blocking 15 out of 16 exploit attempts sounds great, but you have to ask yourself how many shots you get, and since most people don't think "this program crashed: I should never run it ever again", this isn't anywhere near as helpful as it seems. If you crash my webserver, I spawn a new one. You might even already have 16 webservers running right now! If the user clicks a link and their tab crashes, they might hit reload a few times before giving up. If you receive a text message or an email that crashes a parser on your phone, the retry might seriously happen in a tight loop. Like, I hear about this sort of thing being a good idea constantly, but I have never once heard of it actually blocking an attack against any "normal" target (I say "normal", as some super hardened military target might be configured to go into complete lockdown immediately upon a single unexpected event, with a protocol to call in a forensics expert to analyze logs to see if it is safe to restart the service before ending the outage).

We can detect corruption in 15/16 times. In our tests, when we didn't detect a corruption, the program still crashed, but without a proper error message. One should remember that the pointer is masked, so an attacker without the secret mask won't be able to control the reveal()ed pointer. And using a garbled pointer won't give the attacker anything useful, and most probably it will crash the target program.

Are you saying the technique is effective at preventing a successful attack nearly every time, but that pointer modification are detected (thus logged) 15 out of 16 times?

Yes. We have 12 bits masking just the lower page, in addition to more than 10 bit above that. Chances to guess that correctly are VERY slim.

Interesting article -- thanks! I see they used the `key ^ P` encoding where the `key` is `L >> PAGESHIFT` to use the ASLR randomized bits from the free list position `L`.

In [mimalloc](https://github.com/microsoft/mimalloc) we use a similar strategy to protect the free list in secure mode. However, there are some weaknesses associated with using a plain _xor_ -- if the attacker can guess `P` that reveals the key immediately (`P^key^P == key`) or even if there is a read overflow, and the attacker can read multiple encodings, they can xor with each other, say `(P1^key)^(P2^key)` and then we have`(P1^P2)` which may reveal information about the pointers (like alignment).

What we use in _mimalloc_ instead is two keys `key1` and `key2` and encode as `((P^key2)<<<key1)+key1`. Since these operations are not associative, the above approaches do not work so well any more even if the `P` can be guesstimated. For example, for the read case we can subtract two entries to discard the `+key1` term, but that leads to `((P1^key2)<<<key1) - ((P2^key2)<<<key1)` at best. (We include the left-rotation since xor and addition are otherwise linear in the lowest bit).

Just some thoughts. Of course, this may be too much for the use-case. However, we found just little extra overhead for the extra operations (as most programs are dominated by memory access) so it may be of benefit.


FreeBSD solved that problem 22 years ago:

https://papers.freebsd.org/1998/phk-malloc/

(And got better malloc(3) performance at the same time.)


Nobody has solved "that problem".

No malloc has absolute detection of all corruption and even tools like Valgrind don't catch everything.

Let's not pretend that a few pointer hacks and checks have solved a problem.


The particular problem they boast about having solved,does simply not exist in the malloc(3) I wrote 23 years ago, because it does not have the silly linked list to begin with.

In addition to all the security problems inherent in mingling metadata directly next to user-data, the linked list gives O(Nalloc) performance on free(3) and realloc(3) whereas my malloc had O(1) performance.

That may not seem like a lot, but when the 'O' is a page-in from disk, and Nalloc is from C++ code, it is nothing to sneeze at.

Not only did that make my malloc faster, but it would, not "could", but "would" detect several classes of malloc-usage errors, double-free etc, unconditionally.

Over the next 10-ish years, that practically eliminated entire classes of malloc-mistakes, making a lot of FOSS software safer, no matter which malloc you used it with.

Given the definition of the malloc(3) family API, there is no way you can detect all corruption without hardware support, but there are people working on that too, notably Robert Watsons CHERI project at Cambridge.

So yeah, nice pointer arithmetic, but how about people solved the real problem instead ?

(On big memory and multi-core systems use jemalloc, on small memory and single-core systems use phkmalloc.)


That there are entire guides (see below) and even games written on phkmalloc exploitation proves kazinator's point rather than yours.

[1] http://www.ouah.org/BSD-heap-smashing.txt

[2] https://events.ccc.de/congress/2005/fahrplan/attachments/636...

[3] https://overthewire.org/wargames/vortex/vortex11.html

[4] https://www.usenix.org/legacy/events/woot11/tech/final_files...


No ?

The only problem OP has solved is detecting overwriting of metadata inadvisably comingled with the user data.

That problem does simply not exist in phkmalloc, or jemalloc for that matter, writing past the end or before the beginning of an allocation does not automatically give you the ability to violate the internal integrity of the malloc implementation.

If you actually read the first link you pasted carefully, you will see what an amazing tour-de-force it was to exploit CVE-2003-0015, which was a run-of-the-mill double-free mistake in CVS.

As I remember it, there were proof-of-concept exploits against list-based mallocs in a matter of days, it took months before bbp's article appeared.

When I wrote phkmalloc, most of the exploitable malloc-related problems were usage problems, double frees, freeing modified pointers, writing past start or end of allocation etc.

If you had seen my presentation in New Orleans 1998, you would undoubtedly remember my handwritten slide with around 20 setuid programs phkmalloc had found issues in within the first year.

That is the primary reason why there are so few malloc(3) related exploits these days: phkmalloc and later jemalloc complained about them.

Those are mistakes which list-based mallocs do not find, because it is too expensive to walk the list all the time, and on top of that, they run slower, because the list is O(n)!

There is neither a reason, nor an excuse for using a list-based malloc in this day and age.

Neither OP nor I claim to have solved the "big" malloc problem, because, as I already explained: That takes hardware support.


Too sad we had to wait 22 years to a similar change to find its way to the more widely used glibc / uclibc(-NG)

Windows also XORs chunk metadata with a random secret from PEB, at least on some OSes.

Good!

I was initially surprised at their benchmark results, until I read more closely and noticed that they were benchmarking on a cloud VM. I wish people wouldn't do this for this form of test - it adds so many variables it's hard to take their benchmark too seriously.

We executed the benchmark suite multiple times, and results were consistent. It was way more stable than the same execution on a PC with multiple programs. Anyway, the maintainers of glibc performed the same checks and confirmed our measurements.

Thumbs up with reaching out to upstream to mainline it.

Did you talk to musl team as well?


Not yet. Working with each open-source demanded a lot of effort, so we were hoping that integrating it into a few dominant libraries will attract the attention of the rest. We will happily assist them if they need/want our help.

The musl mailing list is very active, and communication with the maintainers is generally effortless. It would be interesting to see how this plays out. They are also in the process of rewriting their malloc implementation.

Thank you for your work!


And musl is just working in merging its new mallocng.

In my experience, cloud VMs are often a more consistent benchmarking platform than your personal machine.

A desktop environment and consumer-grade hardware add a lot of noise.

Ideally you'd always benchmark on completely idle dedicated servers, but those are rarely available.


You can just run it on your local machine and dedicate some of the cores to it with a tool like "cset shield".

If you _really_ want to avoid any contention, then build the benchmark as a Linux kernel module and call stop_machine(), which will prevent anything else including interrupt handlers from running.


Not that surprising: CPUs can perform scalar in parallel with computation, so I would expect a few scalar ops to be free in any code that is not scalar-bound.

Does this apply to jemalloc?

While ptmalloc/dlmalloc/tcmalloc use a single-linked list meta-data that is stored adjacent to the user buffers, in jemalloc there is no such meta-data. jemalloc's design stores the sensitive meta-data separately, thus removing the need to add a dedicated protection layer for it.

This is a crude, unreliable debugging aid being trotted out as a security enhancement.

That being said, everyone attacks the tcache because it’s so poorly protected. It clearly needs something better.

So you'd be opposed to merging this change upstream?

Yes; it wastes machine cycles.



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: