Hacker News new | past | comments | ask | show | jobs | submit login
The trouble with struct sockaddr's fake flexible array (lwn.net)
96 points by signa11 15 days ago | hide | past | favorite | 56 comments



I believe struct sockaddr was supposed to serve the role for which sockaddr_storage was later invented. That's why it has the padding array sa_data: so that if you actually define or allocate such an object it will be big enough to hold any address. In no way is that a flexible array, and you never want to be accessing it.

Probably, sockaddr_storage was introduced rather than making sockaddr's array bigger because by that time the definition of sockaddr had become a mystical sacred cow, not to be touched. Plus that size of 14 is the same on every system. It is a one size fits all approach which is not a way to define the maximum size structure over all the address types if you want every platform to be able to make it a tight fit. The size is going to vary with implementation due to different alignment and padding requirements.


It wouldn’t help the kernel case here, but I’ve long thought that the socket TCP and UDP APIs were bad in the general use case. Having to build and shuffle around the sockaddr structure is too low level for the majority use case. I think most programmers would prefer something like:

    int inet_connect(AF_TYPE, destination, port, options, &error);
Also:

    int inet_listen(AF_TYPE, port, options, &error);
So for example:

    int sock;
    inet_err err = NULL;
    sock = inet_connect(SOCK_STREAM, “google.com”, 443, NULL, &err);
    if ( sock < 0 )
    {
        printf(“Failed to connect: %s\n”, err->message);
        return -1;
    }
Basically one level more abstraction in the API. The old API could stick around for people who are doing something weird, but this would cover most use cases I think. Added bonus: programs using this API are trivial to add support for IPv6, it is just a recompile.


> sock = inet_connect(SOCK_STREAM, “google.com”, 443, NULL, &err);

You just added a DNS lookup to connect(2).

Also, you linked socket creation to all of this. This would make it hard to do an asynchronous version. Today you can create a socket and do a non-blocking connect(2) (after doing your asynchronous DNS lookup)

Lastly, I think if you tasked someone with creating that API, they would probably use the old APIs to build it ...


How often do you not want to support host lookup when doing a TCP or UDP connection?

Certainly you could implement these functions using the existing BSD socket libraries, but that misses out on having them used everywhere so you get "free" IPv6 transitions and whatnot.


That second parameter could accept a text IP-address eg "192.168.100.100". Like the address bar of a browser.


So, the dns lookup is still there, it's just optional. You would have to pass a flag to tell it not to try to do the lookup or you would have to do something fancy trying to know if the passed parameter is an IP.


This is already how all the DNS lookup APIs work.


The problem is with asking the kernel to perform dns lookups.


If SystemD had it's way, it would. The local system resolver cache is the default on most systems. DNS is well understood. Why not put it in the Kernel?

If you're doing that, though, you'd better dang well give me secure key storage in the Kernel, too, or full TLS capability. Allow my program to n load it's RSA/ECC key at startup, then drop privileges to access that file, and have a kernel module handle the cryptography.


It extremely doesn't make sense to have a DNS resolver resident in the kernel: DNS resolution is policy-intensive, involves various upgrades to cryptographic transports, and often requires debugging, all weak points of the kernel. Against all that, there's no performance advantage, and doing it admits a new networking API that almost no applications will use --- most new code being written in higher-level languages that already abstract over system calls.


There's a huge performance advantage, if you ask SystemD - Multiple resolutions can be cached. This is incredible bullshit, of course, but that is one of many lies Pottering has gotten away with.


Caching resolutions can be a nice thing, and having in one place can be useful so that the throwing away the cache can be done consistently for all apps when useful (you likely should clear the cache when switching networks, or when the user asks you to), but none of that means it should be a kernel service.

If there's enough volume of requests that a kernel level cache meaningfully improves real world scenarios, something is broken in those scenarios. :p (but I think you agree with me)


I think plan9 API is the sanest one https://9fans.github.io/plan9port/man/man3/dial.html


The socket API was designed to be a low-level API. If you're writing applications, you probably want to use libcurl/WinInet or some other abstraction over all of the low-level stuff. Using raw connect() calls is like writing your own XML parser when myriad of existing solutions are free and ready to use.

Your example would fit perfectly in a libinet library, but I doubt the C standard library is going to include improvements upon this. If you want good internet support, pick a library or, better yet, another language because plain C is terrible for managing arbitrary network inputs anyway.

There is also some stuff your code doesn't handle (do you retry when a connection fails but an SRV record lists multiple options? who is supposed to free the inet_err struct, because your example leaks memory? how do you handle attempts to connect over IPv6 on hosts still stuck on IPv4 only?) that I would expect a low-level C interface to be more explicit about.


There's lots of higher level libraries that can do this nicely. But there has to exist the low level socket API which actually allows all the weird options to exist. Like protocols without ports.


Here's what I don't understand:

> As long as this usage remains, the checking tools built into both compilers must treat any trailing array in a structure as if it were flexible; that can disable overflow checking on that array entirely.

No, they don't? Why couldn't they just have a mechanism to suppress the check for this particular struct (like a whitelist)?


I'd think this is an extremely common pattern anyway (with the declared array length been 1, or even some other "minimal storage value"), and likely struct sockaddr's use of it is the cherry on the top.


It is in C, but not in the Linux kernel (according to TFA). That's why they're planning a big refactoring to get rid of this structure from the kernel


The only embedded sizes I've ever seen are flex, 0, 1, and 14 (sockaddr's). It's trivial enough to exclude all of them.


No, I have seen many people use an arbitrary value to indicate "this is the amount it makes most sense to allocate this structure with", e.g. when you allocate it on the stack. If you need more than that you allocate it with a malloc wrapper or the like, which returns one of arbitrary long size.

What I have not seen is this happening in the middle of the struct, for obvious reasons; it's always the last element in the struct.


> No, I have seen many people use an arbitrary value

Have you seen that in something that's ABI-critical, though? i.e. whose code simply cannot be changed due to backward compatibility, like is the case with sockaddr? Because otherwise I'd consider it a non-issue.


Everything is an ABI issue. Dunno what the point or alternative is here.


No? Not every struct is exposed to clients who can't change their code.


I still do not see the point. No one wants to change their code. No one wants to break their ABI. But the discussion is moot since I do not see what ABI break is being proposed here. My only guess is that the proposal you're envisioning is to forbid this pattern, which breaks a lot of perfectly working code, not just 'ABI'.


>flex, 0, 1

And each one can behave differently: https://lwn.net/Articles/908817/


Yeah it's interesting that a (relatively) small quirk in the Linux source has the "political leverage" to impact the whole C ecosystem.

I suppose C was made for Unix at Bell Labs and GCC is also inextricably tied, you could say that C is first and foremost the language of Linux, fair enough.


I suspect that the problem is a problem in many code bases, but only a hugely important project like Linux has the leverage to try and have it addressed at the ecosystem level.


Either I don't understand the problem completely, or why wasn't it possible to introduce something like 'ex' address family that allowed to pass and disambiguate extended parameter format(s) which would include array sizes etc? We had these *Ex functions everywhere in Win32 API for an eternity, why unices couldn't do the same trick?


You can introduce new APIs and new types to resolve the "is it flexible, or is it 14 chars"; that's more or less one of the explored approaches in the article.

It won't make existing uses any clearer or safer though, requiring rewrites to take advantage of them.


The array should not be touched so the question is moot. The struck sockaddr type should only be used for pointers, which are cast to the correct type according to their family before they're are dereferenced, with the exception that the sa_family member can be accessed through the sockaddr base.

For defining or allocating an object that can hold any address, sa_storage should be used, mentioned in the article.


I imagine the problem they're trying to address is ensuring that everyone _does_ only use correctly cast pointers; as defined, it's legal to use that 14 char array, it's just that it's never what you're meant to do.


In my opinion, the array should be marked obsolescent, and removed (not necessarily physically, but the name gone from the member namespace). The 14 bytes is not enough for it to be define storage for all types, which is why sockaddr_storage is there. It cannot meaningfully access anything. Implementations could rename it to some __sa_data or whatever name so that the size of the structure doesn't change, if that is important to them.


You don't need the Ex thing with socket addresses because they have a run-time type field: sa_family.

A function that takes a sockaddr * can dereference it to get at that field, and then know exactly which address type it's dealing with.


Respectfully, I think you're missing the point. Windows uses idioms like that too (though usually via filling an initial field with the size of the structure in byte), but GP's point of an Ex-style API would be to completely eliminate kludges such as runtime type fields at the beginning of structs and move to something safer.


I don't see what. Type fields in structs is more or less the pinnacle of what it means to make things safer in C. :) :)

The same bind() API has to work for any kind of socket: AF_UNIX, AF_INET, AF_INET6, AF_X25 or what have you. That any kind of socket pairs with the matching address type, whose type is erased at the API level down to struct sockaddr *. But from the socket type, it is inferred what type it must be and the appropriate network stack that is called can check the type of the address matches.

I don't see how you'd get rid of this with a new bind_ex function, or why you would want to.

Of course we could have a dedicated API for every address family: bind_inet, bind_inet6, bind_unix, ... which is bletcherous.

Strictly speaking, I think we could drop the address family field from addresses, and then just assume they are the right type. The system API's all have a socket argument from which the type can be assumed. Having the type field in the address structure lets there be generic functions that just work with addresses. E.g. an address to text function that works with any sockaddr.


If the goal here is to reduce attack surface or bug-prone code in the kernel, leaving the non-ex variants around unchanged wouldn't help much with that.


> Needless to say, these casts can be error prone; casting a pointer between different structure types is also deemed to be undefined behavior in current C.

This surprised me. What kind of mechanism is there in place to prevent the compiler from formatting /dev/sda when you write code like this?


This is one of the most unfortunate cases of UB in C. It's necessary because C largely doesn't forbid pointer aliasing (accessing an object using multiple, distinct pointers). But the compiler needs to have some idea of when pointers might alias; otherwise, it's forced to generate extremely inefficient code that redundantly loads pointers over and over again just in case the value changed unexpectedly.

Within the body of a single function, the compiler can see where addresses came from and figure out what pointers may or may not alias. But when pointers are passed in as function parameters, the compiler has no idea. Thus, the C standard allows compilers to assume that pointers to different types never refer to the same object; an assumption that's true for almost all code anyone would want to write. But this adds surprising undefined behavior to some things you might try, like casting between layout-compatible structs, or the fast inverse square root trick from Quake. (Casting a pointer to a different type can still be tricky to get right even if it wasn't UB though. Alignment requirements are another footgun: for example, it's not legal to cast a uint8_t pointer to a uint16_t pointer unless you're sure its address is even.)

The blessed-by-the-standard way to do type punning is to either use a union or a memcpy, not a pointer cast.

For comparison, Rust prohibits mutable pointer aliasing for safe reference types, so the compiler knows when references may or may not alias. (Raw pointers are assumed to always be aliasable unless the compiler can prove that they're unique, e.g. by being derived from a safe reference). This leads to more efficient codegen in many cases, but it also means that type punning through pointer casting is fully legal in unsafe code (provided validity and alignment requirements are met), since the compiler does not need the type information to figure out aliasing.


> But when pointers are passed in as function parameters, the compiler has no idea.

"restrict" was added to give the compiler an idea.


As far as standard ISO C goes, nothing. That's just how UB can work in theory, though in practice compiler (ab)use of UB is somewhat more indirect (optimizing under the assumption that UB is not present in the program, rather than observing UB and making decisions based on that).

Beyond that, I believe Linux and other kernels technically use a slight variant of C by taking advantage of compiler extensions/flags to better fit their use cases. For example, Linux compiles (compiled?) with -fno-strict-aliasing and -fwrapv and uses a GCC extensions that allows type punning via unions [0], so that they can compile what the standard calls "incorrect" C code without worry.

I'm not sure whether recent versions of C have changed their stance on this particular UB, but since it'll probably be a while until they're adopted (if ever) kernels will be making do with their workarounds for a while longer.

[0]: https://lkml.org/lkml/2018/6/5/769


Type punning through unions is not undefined behavior in C, only in C++.


Yep, you're right. Think I got confused with something else :(


Nothing in the language, though the permissions mechanism of the system may disallow it. Unless the compiler is running with root permissions.

C is not a safe language. The more abrasive among fans of the language would say "skill issue" and "git gud" if you want to avoid footguns.


It's not UB if the "common initial sequence" exemption applies as it does here.


The "common initial sequence" in the standard doesn't apply to pointer casts. It only applies to writing to one union member and reading from another union member.


I stand corrected. Thank you.


What kind of mechanism is there in place to prevent any compiler from just replacing your main() function with formatting /dev/sda?

That's nothing to do with the language, that's always on the compiler. Nothing stops the compiler from taking even correctly-specified code and doing whatever it wants.


"Undefined behavior could format your hard drive" is kind of a meme, but it's a meme with a point. A lot of programmers have the idea that UB is "what the hardware does", that when you go outside the bounds of the C program you just have to think at the machine code level to figure out what will happen. But that's not the reality with a modern compiler; what with all the levels of analysis, optimization, and transformation that happen between source code and machine code, the effects of undefined behavior are extremely unpredictable and often inconsistent and nondeterministic. The point of "undefined behavior could format your hard drive" is that if your code exhibits UB, the compiler is not required to translate it "correctly", and you can't make an attempt to reason about the behavior of the program.

See also: "What Every C Programmer Should Know about Undefined Behavior": https://blog.llvm.org/2011/05/what-every-c-programmer-should...


That's a rathee obtuse response. You can typically operate under the assumption that the compiler works correctly. The question of the GP was - what prevents the compiler from (correctly) exploiting UB to change the program behaviour from the desired one? If the Linux devs want to rely on their compilers' output, they have to somehow be obeying the contract around UB.


I think the more reasonable assumption is that the practical needs of the biggest users will probably trump what any specification demands.

The thing stopping the compiler from doing dangerous behaviours in response to commonly abused UB is obviously that people wouldn't use the compiler if it did that. Just like how the thing stopping the compiler from doing dangerous behaviours in response to spec-legal code is that people wouldn't use it if it did that.


he was referring to the U in "UB".


The article could point out that real BSD sockaddrs only have an 8-bit field for the address family which leaves another 8 bits for the length of the structure.


Some platforms have a field called sa_len.


It's not really the struct sockaddr sa_data[14] problem, it's still a compiler problem - treating explicitly sized buffers at the end of structs to be not explicitly sized. It's a compiler bug. 14 bytes is 14 bytes, and [1] is one byte not a many or some.

And the API's should follow. For the folks who need more bytes, invent your own type. Or set socket attributes as everyone else.


Why not just use TLV style structure?


That's essentially what it is.

You can determine the size by looking at sa_family.

Some platforms also have an sa_len field.

It's still unsafe if you allocate for the "base" type of struct sockaddr then try to use it for something larger, but that generally is not done. People usually allocate for the exact type they want, and only pointers to the structure are passed around, often opaquely.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: