That being said, scp-the-protocol is actually very simple. There is no spec for it, but a number of interoperable implementations and the protocol is really damn simple (it's basically goes "file <length of name> <name> <size> write <length> <data> write <length> <data>" and so on). It achieves good throughput (for large files) over SSH, but because every file involves a few ping pongs, it is RTT-bound for small files.
SFTP is much, much more complicated. And the spec situation is much worse, because there are like a dozen drafts and half a dozen different versions of the protocol. SFTP also pulls in half of the POSIX file semantics. SFTP naively is RTT bound for throughput; read size is limited to 64K in OpenSSH, so with 20 ms RTT you're only going to get at most ~3 MB/s with a naive client.
SFTP is essentially NFS, but over single SSH channel (and different). You get to ask for file handles, and then you can do requests on those handles. You get to opendir() remotely and get a directory handle and so on.
Like NFS, SFTP supports having multiple requests in flight (how many: implementation defined / no way to find out), so you can request multiple reads and wait for them to get around the 64K limitation. Problem: maximum read size is implementation-defined / no way to find out, which makes this really quite complex, since you have to account for reads coming back out of order and for reads being shorted than you requested without having reached EOF. Say you want to transfer a 500K file in 256K chunks, you schedule two reads of 256K and 500K-256K = 244K. Let's call them r0 and r1. Now r1 comes back, but it only read 64K (or 8K or 16K or whatever the implementation felt like). Now you need to figure out that (1) you should hold this data back, because the data before the offset of r1 has not been read yet (2) you need to issue another read to get the contents from 320-500K where (3) you may figure out that the implementation probably only does 64K reads (note: SFTP read request length field is 32 bit... expectations and all), so you get smart and schedule a few more reads: r2 for 320-384, r3 for 384-448 and r4 for 448-500K. Now you wait for the responses and get, e.g. r3, r4, r1, r2. You need to hold all this data and shuffle it around correctly, then write it in-order to the file (assuming you want to write the file sequentially, which is very reasonable if you want to have any chance at all of resuming the transfer).
This is on top of SSH already having throughput issues in the basic protocol over long fat networks.
What scenarios are you talking about where chunks are important and you have to be concerned about ordering? Is this strictly for applications that perform large sync'ing jobs where "to-the-limit" performance is important?
It doesn't seem like a huge deal to deprecate scp and start using a short stanza of sftp for simple file transfers.
Is that why I've sometimes observed slower-than-expected transfers when using rsync over ssh to do a mass migration of server data from one data center to another? Can you recommend an alternative (besides writing the data to external media and physically shipping it)?
Also, make sure TCP window scaling is working. I was making transfers through a F5 Big-IP which was running a profile that disabled it.
If SCP/SFTP is the problem due to small-ish files, use a tarpipe instead. Nothing beats tarpipes for small files.