
Fastcat – A Faster `cat` Implementation Using Splice - jontro
https://matthias-endler.de/2018/fastcat/
======
cojo
> "Nice, but why on earth would I want that?" I have no idea.

I know this is referring mostly to the `cat` portion and not the `splice`
portion of the article, but I'll throw in a quick shoutout to `splice` for
giving me one of the single biggest build performance wins in my time at Zynga
(and possibly across most teams at the company at the time).

We had a ruby script which ran the majority of the build, and as the game grew
we found that by far the slowest part was a loop which MD5 hashed each
individual asset and used that as its filename on our CDN for per-asset-
versioning.

At its worst it was taking nearly an hour and a half; the code was basically
as inefficient as you could make it - multiple shell calls for each file
rather than any sort of inlining of the hashing process.

I wrote a basic C program using splice and an MD5 library which took the whole
process to under 10s. A bit overkill, perhaps, but the naive speedup I tried
first still took over 1-2 minutes, and I figured 99.99% was worth the extra
few hours to put it together knowing how many builds we ran each day.

Definitely gave me a healthy appreciation for the cost of transferring to user
space that has stuck with me.

------
accrual
> In this case, if you notice that cat is the bottleneck try fcat (but first
> try to avoid cat altogether).

"Useless Use of Cat Award" [0] is the canonical text for avoiding unnecessary
use of cat, for those who haven't come across it yet.

[0]
[http://porkmail.org/era/unix/award.html](http://porkmail.org/era/unix/award.html)
(2000)

~~~
jgtrosh
Last time I posted a link to that, I received quite a few replies where people
find it more natural to use `cat file | …` even when unnecessary — so even
though I agree with the intent of the page I feel like it's useless to try and
evangelise every case. If cat is the bottleneck though, fair game.

~~~
e12e
First, if cat is slower than redirection from file (<file) - then I'd say
something is amiss. But more to the point - I think it's really a bug that
tools like gzip, grep, awk etc work on files at all. We do need a tool to feed
files to pipes (I think cat is a fine candidate for that - also when we only
con-cat-enate one file (the identity cat, if you will).

Maybe there are cases where a long string of awk|something|other|sort|uniq is
_not_ the problem, but forking an extra process for cat is.

And maybe there's a mismatch between pipes, files and mmap today. Splice seems
like a reasonable fix (if we splice all the things, awk, grep etc).

Finally, I just think:

    
    
      cat input.txt \
       | filter1 args \
       | filter2 args \
       | reduction \
       | ouput-formater
    

Reads better than having to tack on an <input.txt at the end, or special-case
the first filter to be (... And also open a file).

~~~
Pete_D
You don't _have_ to put the redirection at the end, you can write

    
    
        < input.txt filter1 args # ... rest of pipeline
    

(But I agree that the cat version is more readable.)

~~~
e12e
True. I might not mind as much if it was (possible to, in a sane way, do): "<
input | (...)"

------
pantalaimon
I re-implemented it in C and for some reason O_APPEND is set on stdout by
default.

But aside from that it works just as the Rust version.

    
    
      #define _GNU_SOURCE
      #include <fcntl.h>
      #include <stdio.h>
      #include <stdlib.h>
      #include <string.h>
      #include <unistd.h>
      
      #define BUF_SIZE 16384
      
      static void unset_flag(int fd, int flag) {
      	int flags = fcntl(fd, F_GETFL, 0);
      	flags &= ~flags;
      	fcntl(fd, F_SETFL, flags);
      }
      
      int main(int argc, char** argv) {
      	int pipefd[2];
      	pipe(pipefd);
      
      	unset_flag(STDOUT_FILENO, O_APPEND);
      
      	for (int i = 1; i < argc; ++i) {
      		int fd = strcmp(argv[i], "-") ? open(argv[i], O_RDONLY) : STDIN_FILENO;
      		if (fd < 0) {
      			fprintf(stderr, "%s: No such file or directory\n", argv[i]);
      			exit(1);
      		}
      
      		while (splice(fd, NULL, pipefd[1], NULL, BUF_SIZE, 0))
      			splice(pipefd[0], NULL, STDOUT_FILENO, NULL, BUF_SIZE, 0);
      
      		close(fd);
      	}
      
      	return 0;
      }
    

WTFPL if anyone cares.

~~~
pantalaimon
Turns out the buffer size is significant:
[https://imgur.com/a/f4LiHVI](https://imgur.com/a/f4LiHVI)

With 32kiB buffers I get double the throughput than with 16k, the peak appears
to be at 64k, after that it levels off.

~~~
jwilk
Direct link to the image:
[https://i.imgur.com/5jOb1yo.png](https://i.imgur.com/5jOb1yo.png)

------
the8472
Newer kernels also have the copy_file_range syscall (with compatibility shim
in glibc) which is supposed to use the most efficient copying approach
available between any two file descriptors. So it's more general than splice
or sendfile.

------
modells
There is a ruby gem for Linux called io_splice that does zero-copy IO. Hasn’t
been updated in a while but it doesn’t have any dependencies other than modern
Linux and doesn’t mean it won’t work. “Old” code that works still works,
novelty, job-securitization and API churn be damned when it doesn’t add value.

[https://rubygems.org/gems/io_splice/versions/4.4.0](https://rubygems.org/gems/io_splice/versions/4.4.0)

[http://www.bigfastblog.com/zero-copy-transfer-data-faster-
in...](http://www.bigfastblog.com/zero-copy-transfer-data-faster-in-ruby)

EDIT: source to current stable coreutils’ cat
[http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob_p...](http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob_plain;f=src/cat.c;hb=e5dae2c6b0bcd0e4ac6e5b212688d223e2e62f79)

------
Rapzid
The most interesting thing about all this to me, other than the existence of
splice(I really should finish The Linux Programming Interface), is that you
need a pipe and two splice operations to get the data between other file
types.. There must be some dirty implementation detail forcing this right?
Right?!

~~~
omn1
splice is implemented as a pipe, that's why. I think it's a beautiful design
because pipes have been around forever and they just work.

------
omn1
>Windows doesn't provide zero-copy file-to-file transfer (only file-to-socket
transfer using the TransmitFile API).

Anybody knows if the Windows TransmitFile API can also be used to make file-
to-file copies?

~~~
Arnavion
It takes a socket handle.

------
type0
Great educational post, but really someone needs to make a long awaited
version called long cat!

------
mcguire
Tl;dr: splice() as a Linux-only, zero-userspace-copy, file-descriptor to file-
descriptor copy that has to use pipes for one FD.

Interesting, but less than earthshaking.

