What is "proto" in your tuple? If we're talking about TCP, then it's always TCP,...

toast0 · on March 20, 2023

Well, when one says TCP/IP, they usually mean to include UDP and ICMP. Although ICMP doesn't have ports, so managing state is different.

UDP and TCP both use 4-tuples with the same information, so even though I think it's more common to have a separate table for UDP and TCP, you can conceptually consider it a 5-tuple. It's all a conceptual model, but I'd put protocol up front, {tcp, RemoteIP, LocalIP, RemotePort, LocalPort}, {udp, RemoteIP, LocalIP, RemotePort, LocalPort}, {unix, Path}, {netbios, IDontRememberHowItsAddressed}, {icmp, SomethingConfusing}, etc. If you can't handle multiple arity tuples, you could make a nested 2-tuple for tcp and udp, like {tcp, {RemoteIP ... }}. It's all just conceptual notation though, so there's tons of ways to do it (you'll see I differ in both names and ordering compared to the other commenters, but that's not actually significant either)

MuffinFlavored · on March 20, 2023

> {tcp, RemoteIP, LocalIP, RemotePort, LocalPort}

Does the concept of Remote/Local IP have to do/get introduced when you discuss NAT?

justsomehnguy · on March 21, 2023

For the endpoint there is no difference (because Remote* are NATed and the endpoint see them pointed at the router, not the original source[0]), but for the router performing the NAT it matters and usually it's Reply*

[0] depends on the NAT type, SNAT always rewrite SourceIP (because the far system wouldn't know where to reply[1]), DNAT usually rewrite DestinationIP (because system wouldn't reply to the received packet addressed to IP which doesn't exist on the system).

[1] Thats why NAT is not a security boundary - it's not trivial but you can trigger a response for some system behind the NAT by writing a local (to that system) IP in SrcIP

toast0 · on March 20, 2023

I would use Remote and Local for host networking first of all; rather than src/dest, because when you send you're the src, and when you receive, you're the dst... you don't want to include both permutations in the table (unless you're both the source and the destination, ie: connecting to yourself).

For NAT, you need to have a way to calculate the 5-tuple for SideA when you have a 5-tuple from SideB, and vice versa; most often, that'll be a table lookup, either for the whole 5-tuple, or for 1:1 NAT, it could just be a lookup for the "Local" IP. In that case, maybe src and dest make more sense, and the NAT isn't really Local in my book.

wyldfire · on March 20, 2023

Indeed - I fudged things a bit by talking about TCP there. It would have been clearer if I just discussed IP instead.

> TCP and UDP just have completely separate spaces of socket addresses

But so does SCTP, and ICMP and IGMP and ... -- so rather than enumerate the protocols we can just describe this property of IP.

derefr · on March 20, 2023

Yeah, but for ICMP and so forth, ports aren't a thing. So you don't really have a universal 5-tuple.

For a router or other middlebox, or an OS kernel, to do things like outbound-initiated-flow firewall-rule exceptions correctly, it must keep N different flow-state tables, one per transport-layer (L4) protocol; where each flow-state table's "primary key" is over a set of columns unique to that table / L4 protocol.

TCP and UDP just happen to be both the best-known L4 protocols, and to both use {srcIP, srcPort, dstIP, dstPort} as their "primary key" for flows; but this doesn't hold for other L4 protocols.

(Which is in turn why L4 protocols "must" be handled in kernel-land, for kernel firewalls, traffic-shapers, etc. to work: L4 flow-state doesn't have a universal schema for these services to work with; and because these services are implemented in static-compiled languages, they have to be built with compile-time knowledge of each known L4 protocol, so that they can have concrete implementations for each L4 protocol written or generated for each service. There's no way to just bring in (through some hypothetical FUSE-like "userland L4 protocol server" abstraction) more L4 protocols, and expect those kernel facilities to work with them. [And all the same goes for ASICs in L4 network routers — only moreso.] Which is why we got the L4 protocol ossification we did. Modern protocols like SCTP and QUIC being implemented on top of UDP, is a direct result of there being no universal 5-tuple!)

richardwhiuk · on March 20, 2023

You can handle L4 protocols in userspace. You can bind to a particular IP protocol number. You can even handle L3 in userspace.

Obviously if you do this you lose the ability for multiple applications to handle different "ports", unless you do the multiplexing in userspace as well.

derefr · on March 20, 2023

You have a unique definition of "handle" that doesn't seem to include "your OS's kernel packet filter keeps working to pre-filter these packets based on an L4 understanding of them before handing them to userspace, or after being handed them by userspace."

Which, if your machine is acting as something like a router/NAT/firewall, is kind of... the entire point of the box being there in the communication path.

twic · on March 20, 2023

But the structure of those spaces can be different! The only structure IP imposes is that every packet has a source and destination address. It's up to each protocol whether it has port numbers (like TCP, UDP, and SCTP), or not (like ICMP and IGMP), or some other mechanism for identifying flows.