Hacker News new | past | comments | ask | show | jobs | submit login
Microsoft's list of banned C functions (github.com/x509cert)
108 points by yagizdegirmenci 41 days ago | hide | past | favorite | 57 comments

Note that Microsoft’s approach to taming the C language is a bit different from Git’s approach (https://github.com/git/git/blob/master/banned.h). Microsoft’s approach involves replacing the relevant functions with their Annex K variants, which is why the list is so long!

For example, you see memcpy() on Microsoft’s list. That doesn't mean that Microsoft simply “banned memcpy”, it means that Microsoft wants their developers to use an alternative function called memcpy_s(). The _s suffix means “secure”. These functions are defined in Annex K of the C standard, which is an optional part of the C standard. If you look at memcpy_s(), it may seem a bit weird:

    errno_t memcpy_s( void *restrict dest, rsize_t destsz,
                  const void *restrict src, rsize_t count );
Note that this is more than plain bounds checking, and I believe it’s also intended to be used with static analysis tools to ensure that it’s being called correctly—and the API is designed to make static analysis tools easier to write!

As far as I can tell, Microsoft is really the only major user of Annex K. There may be a couple other users, but Microsoft is the big one. Microsoft reportedly pushed for the inclusion of Annex K into the C standard, implemented it in their own C toolchain, and wrote a bunch of tooling to work with the Annex K functions inside their massive legacy C codebase.

You see, sometimes you have to do things a bit different if you have tons of legacy code and aren’t willing to spend the (prohibitively expensive) effort to rewrite it all with newer, safer techniques (use your imagination—like C++ or Rust). The _s Annex K functions are designed to replace the unsafe variants in a somewhat predictable, mechanical fashion so you can send a bunch of programmers into your legacy code base to fix security holes. And it does work.

Git is a much newer C code base, is much smaller than Microsoft’s C code, not as old, and doesn’t run in the kernel. In Git, you’d be expected to manipulate strings with strbuf (https://github.com/git/git/blob/master/strbuf.h) which is generally much safer and easier to use than the C stdlib. It’s just so much easier to make decisions like this in a new codebase like Git’s.

> Microsoft’s approach involves replacing the relevant functions with their Annex K variants, which is why the list is so long!

No, Microsoft's approach involves replacing the relevant functions with functions that are semantically different (and incompatible to) the Annex K functions.

[EDIT: In case anyone is wondering, the Annex K functions that Microsoft provides will compile error-free on a conforming implementation (function signatures are the same), but because they are incompatible with what each parameter means sometimes they will crash].

Is there any reason for MS to do this besides vendor lock-in?

I think what happened is Microsoft started using these before standardization, the definitions got changed during the standardization process, and Microsoft didn’t follow (presumably, to avoid breaking code).

This is not really an unusual turn of events, and not something peculiar to Microsoft. Standards development is rarely “standards-first”. Instead, you implement your proposal, put it into use, and then go through the standards process. This is usually the way it should be done, and it’s how WHATWG did HTML5, it’s how Google did HTTP/2, etc.

Yes, but at the time the Annex K definitions were out, there were so few people actually using Microsoft's implementation that Microsoft could have changed the implementation to behave the way the standard prescribes.

After all, ISTR the interface remains the same for most (if not all) the Annex K functions.

So few people other than Microsoft themselves, yes. And for various reasons I don’t think Annex K was going to get popular, and making improvements to standards compliance wasn’t going to change that.

They campaigned for Annex K based on what they implemented. Annex K got changed during standardisation.

Not completely their fault.

Cisco uses the _s variants heavily in IOS-XR. I wasn’t aware that this family of functions was part of a specific addition to the C standard.

This is true. Visual Studio makes recommendations during compilation when you use these functions to replace them with them with the more secure variants.

To add to this, there are a couple of string types in Windows API in addition to C-strings, e.g., BSTR, UNICODE_STRING, etc., which are similar to strbuf and require usage of Windows API functions to manipulate or read.


The "extension" here (the _s functions) is standardized; while in the EEE strategy the extensions cannot be standardized: the whole point of the "extension" phase in EEE is to prevent interoperability.

Except that the microsoft versions of the functions are incompatible with the standardized versions.

(I don't think that that was done in bad faith, however; only that wg14 decided to change the definitions from the microsoft-proposed ones, and microsoft didn't feel willing to break compatibility.)

In case you're wondering, yes, sscanf is on there.

Not really relevant to the recent sscanf problems, because Microsoft would use sscanf_s instead. The sscanf_s function is basically just a security-enhanced version of sscanf that otherwise behaves the same. I think it’s an Annex K version but I don't know Annex K very well.

Can someone explain to me what's wrong with memcpy?

In addition to the others answers, memcpy of size zero is undefined in certain cases.

This and the other two bugs can be particularly nasty when doing something such as an arena allocator or any other means of user memory pools.

After an incredibly long slog on some particularly nasty corruption issues years ago I was only able to identify the problem by using ld preload and patching memcpy with some simple value and boundary checks for these three problems. Each of them was a source of the corruption.

I think the most destructive thing is that you can overflow the destination buffer. With memcpy_s you have to specify the destination buffer size so you can (supposedly) prevent it.

Ok fine. As far as footguns go, that's not the most subtle one.

Many things. Besides the already explained buffer overflow, overlap and zero size UB, compiler also frequently chose to ignore it. The secure variant will fill the dest buffer regardless if the compiler thinks this is a useless side effect. Similar to the memset_s security problems nobody cares to fix.

It can do the wrong thing if the source and destination overlap. Use memmove() instead.

I am happy to see snprintf isn't on the list. It's the best C99 string function, both safe and useful.

Totally unsafe. It also allows %n. If possible use a proper format library, or at the least the _s variant.

> Totally unsafe.

Please justify that claim. snprintf() limits itself to the specified buffer size (in the way safe from off-by-one errors) and always null-terminates. What more do you want?

(Ah, sorry, if it's about format strings, I know they're potentially dangerous. I think they're fine in practice though so long as you have compiler warnings for type mismatches and non-literal strings turned on.)

It took ages for Microsoft to add it, though. They only implemented it as part of Visual Studio 2015.

snprintf still has the typical format string vulnerabilities doesn't it? Information leaks, buffer overflows, etc.

You can do silly things with the format string, but it will never overflow the output buffer if you invoke snprintf properly, nor will it ever not be null terminated, and snprintf can even tell you how large a buffer you need.

Oh yea, I was just thinking there are still limits to what 'safe' means though. My memory on the exact exploits was hazy so I had to look it up but I know it includes sending user defined data to the format string, which is not advised, but people do silly things all the time when coding which is why snprintf exists to start with. People can't be trusted to bounds check.

In my mind a safer version for daily use wouldn't support the more problematic format options but maybe other people use them far more widely than me.

Right, yeah, I should say snprintf() is safer than sprintf(), but it still has the unsafe features of printf().

Completely off topic, but how is “(c) Microsoft All rights reserved” compatible with the rest of this code’s license?

Copyright ownership is generally separate from any license

“All rights reserved” is a confusing (and perhaps meaningless) phrase that could be taken to mean “no license”.

It reserves all rights possible under copyright law. It then immediately grants some exceptions, in the form of a license. Failure to comply with the terms of the license brings the user into breach of contract, and it reverts to "all rights reserved". I don't think the "all rights reserved" part is necessary, since copyright in the US is automatic, but it doesn't change anything.

It's a remnant of the Buenos Aires convention. Where if you didn't expressly reserve copyright, the work defaulted to public domain.

Because all the countries who signed that treaty didn't move to the Berne Convention right away, you had to still reserve your copyright or else lose it in those countries.

The last holdout signed the Berne Convention in 2000, so it's no longer an issue.

> It reserves all rights possible under copyright law.

Could you provide some source that corroborates this? Because my understanding is that this has no effect. It merely states something that is already true. That’s what I was hinting at with “possibly meaningless”.

It's a holdover from the time before the Berne convention where you needed to explicitly state what rights you were reserving when copyrighting a work. The US entered the Berne convention in 1988. Before that the Universal Copyright Convention (1952) was in force, which required a copyright notice and registration at the copyright office. Under the Berne convention (and the later WTO TRIPS agreement) neither of these things are required, so the entire "Copyright (c) Microsoft Corporation. All rights reserved." line is redundant. As you note, it merely states something that is already true. But it's become traditional to include something like that, often including the date, sometimes using the © symbol, etc.

It's not part of a copyright licence. It's a magic incantation that hasn't been necessary for some years.

* http://jdebp.uk./FGA/law-copyright-all-rights-reserved.html

She Tried it's

Lol. I have no idea how the above comment got posted. I just noticed I had lost karma points and wondered why. I check my comment history and find this strange comment. I remember reading comments on this post and then doing something else. either I have gremlins in my phone or I must have unknowingly posted this reply. Not intentional and I can't edit or delete it as it was posted a while ago. Please ignore it :-)

How can you ban memcpy effectively if it’s just a simple for loop everybody can implement in less than 20 seconds?

AFAIK, memcpy is not a simple loop, in a way that it can align the copies to match the architecture. For example, with the right conditions (aligned memory), it can copy 32 or 64 bits at a time for most of the loop, and the remaining through copies of 8 bits (not specifically in that order).

You don't replace memcpy unless you do it right. The difference between systems with a bad or good memcpy is abysmal.

Fun story. I once had a program hang (running on EFI) on a fairly benal line of C code. Turns out Clang injects memcpy into your code even if you don't call it. I hadn't defined memcpy anywhere so it just hung.

This is documented[0]:

> Note that it is assumed that a freestanding environment will additionally provide memcpy, memmove, memset, and memcmp implementations, as these are needed for efficient codegen for many programs.

GCC has effectively the same stipulation[1]:

> GCC requires the freestanding environment provide memcpy, memmove, memset and memcmp.

The main thing I'm confused about is why you didn't get a linker error.

0. https://clang.llvm.org/docs/CommandGuide/clang.html#cmdoptio...

1. https://gcc.gnu.org/onlinedocs/gcc/Standards.html

> The main thing I'm confused about is why you didn't get a linker error.

EFI isn't your usual Freestanding env. Your output is a dynamically linked PE2 (Windows) executable. The linker had spots for memcpy and others to be linked in Dynamically. But they were not. Whatever was there caused a hang in TianoCore UEFI. (Which would eventually reset due to the watchdog timer)

> Turns out Clang injects memcpy into your code even if you don't call it.

so does gcc, it's pretty much a requirement of the C standard. https://gcc.godbolt.org/z/aaE5hn

> it's pretty much a requirement of the C standard

Why would the C standard mandate the presence of standard library functions in a freestanding environment? I always assumed compilers emitting calls to mem* functions were doing it because it was the easy solution just like linking to libgcc.

memmove() isn't on the list, so presumably the recommendation is to replace uses of memcpy() with memmove() due to its well-defined behaviour when source and destination overlap.

(Or with the Annex K variants -- memcpy_s(), memmove_s() -- if that floats your boat, and as this is Microsoft it surely does.)

Well, of course you can't stop out of bounds array access in C. But at least you can cover the major array operations, such as string and memory copy.

Presumably anyone sneaking in a memcpy in the backdoor would also be thwarted on code review.

On code review and/or by a linting tool. Microsoft seems to be using lots of static analysis, and recognizing a memcpy loop is pretty simple.

This is a fair point. memcpy will just be replaced by manual loops or std::copy_n or whatever. Don't know if that's better or worse.

No it will be replaced by memcpy_s which is just generally a more sane version.

Well memcpy_s is a Microsoft initiative and so Unix devs will reject it on principle.

Annex K never really got buy-in from the open source C stdlib developers—Glibc, Musl, etc.

I don’t think it’s an issue of whether it was rejected on principle, or because of its origin, I think that there just wasn’t enough interest in it. My feeling is that Annex K is not as useful without the right set of code review practices and static analysis tools. Many of the secure variants just take an extra size parameter—you would need some kind of assurance that the extra size parameter is somehow correct; without this assurance, the Annex K functions aren’t very useful. There’s not a great way to get that assurance without static analysis tools, as far as I know, because there are tons of unsafe ways to use the Annex K functions, like this:

    // Don’t do this.
    err = memcpy_s(dest, n, src, n);
If you’re just duplicating the “count” argument and pasting it in “destsz”, it will work but it won’t catch any of the errors that memcpy_s is designed to catch.

Yes, annex k was generally judged to not be very helpful in practice[0]. However, I agree that it might make more sense as a component of an environment that mandates strict static analysis and code review. (Microsoft also has other extensions with similar roles, e.g. their in and out parameters.) In that context, it makes perfect sense: microsoft wanted to upstream some components of their static analysis pipeline, but those components ended up being useless without the rest of the pipeline.

0. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1967.htm

I think the only real solution is operations that can't fail, like C++ string operations. Of course even those can fail, but do so with an exception, which IMO is easier to deal with from a security standpoint by aborting the process, which is the default behavior. And an exception is only going to occur under pathological conditions (typically out-of-memory) which really wouldn't be solvable with any practical solution.

> operations that can't fail [...] can fail, but do so with an exception, which IMO is easier to deal with from a security standpoint by aborting the process

That's exactly what annex k does. Detection of a runtime-constraint violation results in a call to a constraint handler; which handler can abort the process if you want it to.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact