Originally, an IP address identified a destination host, not an interface card. Didn't matter over what device a packet arrived. The BSD people, though, tied IP addresses to the network interface, so it mattered over which device a packet arrived. This simplified outbound routing in BSD, but broke multipath TCP.
The mistake instead is that TCP is not fully layer 4. They are entangled with layer 3. Specifically, a TCP socket is defined by an (IP, port) pair. Where the IP is layer 3. As such, it is fundamentally impossible to persist a TCP connection accross multiple IPs.
There is no reason for this. If instead a socket were identified as (uuid, port) then after I change my ip address, I can continue receiving packets sent to (uuid, port). And the other side will still recognize packets from me, because only my IP changed, not the connection uuid.
You'd maybe need to add some spoofing defenses, but we need those in current TCP too.
(intended in the most lighthearted way possible!)
I'm not one to defer to authority too easily, but it's my experience that, when someone with enough experience says something that sounds out of this world, it's a good idea to think about it for a bit.
I think a lot of the effort put into QUIC, HTTP/2,3,4,whatever, etc, could have been avoided if SCTP was more broadly adopted.
Do note that SCTP is heavily used in telecom. Every network element in that world needs to be redundant, along with the multiple routes to get to each of those redundant elements. SCTP helps hide some of that from the application layers.
It's actually the quadruple (local ip, port, rem ip, port) in the steady state.
Example: layer 3 knows nothing about ports?
Just like a phone number isn't the same thing as an SMS thread.
Just to provide my 2 cents on the issue:
There are a lot of things at play... Rekhter's law ("Addressing can follow topology or topology can follow addressing. Choose one.") doesn't combine well with the fact that it's not desirable to have end-hosts participate in routing updates. When these two are taken together, they have implications on scalability of any routing solution, especially in the cases of end-host mobility and multihoming.
When the "topology follows addressing" (like TCP/IP), the constraint of not wanting to advertise the end-host address (as someone mentioned, no ISP will accept your /32 on BGP) assigning the end-host a different address per attached network is the simplest solution. This indeed boils down to the assigning L3 addresses to the interface. The TCP connection problem can be handled by tunnels (as is the case in Mobile IP or LISP) but tunnels are rather expensive since the tunnel endpoint in the network has to maintain al lot of state.
In the case of "addressing follows topology", changes in topology, when taking the constraint of not advertising end-hosts via routing, requires address renumbering of the network elements.
So, when taken as a pure L3 problem, it boils down to choosing which problem to solve: constantly tracking the end-hosts, or constantly renumbering the network.
The latter, when used in combination with recursive network layering, shows quite some promise. It requires less addresses, but I know from experience that it's not the easiest idea to sell.
MPTCP provides a pragmatic workaround on top of L4 that doesn't require tunnels. It's a shame that it's not completely transparent to the applications (it requires explicit code changes to enable).I agree with a lot of the concerns of getting this upstream in the kernel. I tried to play around with it a couple of times, but the distribution as a full kernel was a bit of a roadblock. Would be easier if it was distributed as a kernel patch. And it seems perfectly doable to implement it in user space, that might further speed up adoption.
Anyway, take care :)
In my mind, L2 addresses are not for routing across organizations, and L3 addresses are. So the L2 address identifies an interface, and a L3 address identifies a routable entity. The weird thing is that there is an almost one-to-one mapping between these.
It might make sense for the same host on e.g. a WiFi and Ethernet interface off the same organization to have the same L3 address. After all, the organization responsible for routing to that host can know those interfaces belong to the same host.
However, once you get into multiple interfaces at disparate organizations things change. Take for example, a phone with LTE from some provider and WiFi from some how ISP. There are two separate organizations who are responsible for routing to those interfaces. Hence, the decisions needed to route to those interfaces differ. This makes routing based on the same address a lot harder.
I think my argument boils down to "topology follows addressing" being highly beneficial in our federated world of routing on the internet. It allows every autonomous network to handle internal routing however they want.
Biggest thing is to deal with really easy hijacks where you tell a server that your victim 's data should instead be sent to you. This is harder with the current TCP.
But that doesn't solve the problems that MPTCP solves, i.e.: (a) break-before-make failover of a TCP connection across different network paths between two hosts, and (b) combining the resources of multiple end-to-end network paths in one connection.
Because even if a host uses the same IP address for different interfaces (and therefore a TCP connection can survive failure of one interface), it's not like that IP address is going to be individually globally routable. There's no way for some router in the middle of the Internet to know that you just walked out of range of the coffeeshop Wi-Fi and are now only reachable via a commercial LTE ISP, and would like to have incoming datagrams start arriving via the LTE interface (and ISP) instead. They won't let everybody's laptop be its own one-host AS and do a BGP announcement every time it loses a Wi-Fi interface, and even if they did, it would take too long to propagate to be useful. MPTCP (and the mobility in QUIC and Mosh) solves the problem by keeping the network ignorant of the roaming and letting the connection failover to a different network path by having the end hosts address each other at different IP addresses. Similar story, I think, for aggregating network paths between two individually multihomed hosts.
The problem is then how does the network know where my IP address is. When a packet aims at my loopback, how does it work its way to me?
The answer is the same as interface addresses - we use routing protocols for that, mainly today that's OSPF (in local networks) and BGP (globally).
Convergence takes a relatively long time for BGP, and BGP (at least at a global scale) is limited to /24s. ISPs won't accept your /32 advert.
Even if they did, traffic still only goes in one direction from a given host. On top of that, theoretically if every BGP peer rand BFD you might see failovers in a few seconds, but that's not the real world.
> The basic idea behind the separation is that the Internet architecture combines two functions, routing locators (where a client is attached to the network) and identifiers (who the client is) in one number space: the IP address. LISP supports the separation of the IPv4 and IPv6 address space following a network-based map-and-encapsulate scheme (RFC 1955). In LISP, both identifiers and locators can be IP addresses or arbitrary elements like a set of GPS coordinates or a MAC address.
BGP is not the answer to multi path devices for many many reasons. Tackling it at higher levels (OSI 4-8) is the solution.
Modern day example: If you are connected over WiFi and 3G over two different ISPs, the packet for your (probably RFC1918, statefully NATted by your home gateway) address of the WiFi interface has absolutely zero chance of arriving over 3G (which probably has a different RFC1918 address, statefully NATted in the CGN in the mobile infra). And vice versa. So strong host model vs weak host model is irrelevant in this context.
MPTCP works absolutely fine in this scenario.
MIPv6 reportedly works to roam (assuming no NAT66 in the path), but can not use two paths at once.
At least Docker (as it was pointed out by a sibling reply to my comment) also sets the FORWARD chain policy to DROP.
How would an upstream L2/L3 device be aware and be able to handle this. We have things like LACP now but that requires more physical ports and chipsets that understand the protocol. In the early days of TCP a router was just another UNIX box. But even something like LACP requires multiple links to the same upstream device how would this have worked if the box was connected to say two different upstream switches or routers? Today we have things like Juniper's MLAG and Cisco vCP but those are proprietary and very expensive solutions.
In Ouroboros (our recursive network implementation: https://ouroboros.rocks/), we only identify the interface once (MAC address). Addresses are contained within each recursive network layer, and identify a specific process at each layer (similar to how an IP address should actually identify the host).
e.g. either historical discussion or example of other IPv4 system which implemented things as you describe
Pre-existence of IMP's doesn't apply I don't think since we are talking IPv4
And debugger ruins everything; If the other side sends something larger than the local receive buffer it will usually disconnect after a while. as it will sense no one on the other end.
All the things that debugger can "ruin" should just be parametric - increase buffer sizes / keep alive times when debugging.
Also, there are more options than "kernel" and "each process for themselves" - you could have a "network/TCP daemon". QNX successfully does that for disk drivers and file system drivers - and likely network too. So does Minix.
It's just that historically, unix/linux/NT don't.
If the "minimum" was 2 hours then e.g. MSDN wouldn't be recommending that it be configured to 5 minutes would it? https://docs.microsoft.com/en-us/previous-versions/windows/i...
> And debugger ruins everything; If the other side sends something larger than the local receive buffer it will usually disconnect after a while. as it will sense no one on the other end.
"Everything"? Well when data isn't being sent then the connection could live on as long as it's kept alive by the kernel, right? Whereas with a userland implementation even that possibility becomes difficult.
You seem really intent on making unfounded blanket claims to rebut my point... but I feel like there's some validity to the point I'm making? It's be more helpful to see if you can instead find parts of it that might have some truth to them.
> It's just that historically, unix/linux/NT don't.
Yeah, hence why this approach seems problematic...
You could put BitTorrent in the kernel. It makes about as much sense architecturally, but isn't too widely used.
For example, you can have a process that reads a few bytes from a TCP socket and then passes the socket to another process.
Unix tries to model all I/O, including networking, as operations on files. Realistically it is only possible to get this right if the kernel is involved.
Of course it is quite possible to come up with different models. But Unix seems to be uniquely powerful in its ability to create complex systems from lots of small processes.
The kernel being involved does not imply that all code has to be in the kernel. The FUSE file system interface is a well known way to run filesystem code as user processes. Likewise, there are ways to run device drivers in user space.
The disavantage is that the extra context switches cause performance loss. So this approach is use for protocols that are rarely used and do not warrent full kernel implementation.
 boo hiss; I wish the spec was simply to truncate overlong packets at the MTU, and indicate that with a flag; the peers could then figure out what to do when a packet arrived that was shorter than its original length. Handling it in-band would mean it was more likely to arrive. Instead we can fragmenting it, which is icky, because defragmentation sucks; or we can drop it and sending an out of band message to the sender, but that message may not make it (and often doesn't). TCP could very easily adapt to 'i sent 1480 bytes of payload, but my peer is only acking 1472 each time, maybe I should send 1472 --- much easier and quicker than I keep sending packets and they don't get acked, maybe i should try sending smaller packets 15 seconds later.
Handle socket and port allocation, and then forward all packets to the relevant process unchanged.
The application software is then responsible for reconstructing packets into a reliable stream.
(This is partly a joke and partly a description of what actually happens with TOE "TCP Offload Engine" network cards.)
So one has to tcpdump to observe the sub flows?
> But, naturally, there are users who want their unmodified binary programs to start using MPTCP once it's available. There is a working, if inelegant, solution to this problem. A new control-group hook allows the installation of a BPF program that runs when a program calls socket(); it can change the requested protocol to IPPROTO_MPTCP and the calling application will be none the wiser.