

Scrap the SCP. How to copy data fast using pigz and nc - pmoriarty
http://intermediatesql.com/linux/scrap-the-scp-how-to-copy-data-fast-using-pigz-and-nc/

======
drinchev
I don't get it.

    
    
        nc -l 8888
    

Doesn't make any encryption, right?

If the answer is yes, then why they are compared here? Anyone on public
network might sniff what you are actually sending / receiving.

~~~
13throwaway
You could try piping it through ssh, I don't know how that would effect the
speed though.

tar -cf - /u02/databases/mydb/data_file-1.dbf | pigz | ssh user@destination
"pigz -d | tar xf - -C /"

~~~
drinchev
Yeah, but it will need executing commands on both hosts. SSH works on top of
SCP and only needs running sshd on the remote host and access to that machine.

All those solutions ( from the OP ) are faster, but you will need some setup,
before you could use it from one host remotely and in the end you might end up
using good old SMB or NFS which will give you speed (
[http://www.linuxquestions.org/questions/linux-
networking-3/s...](http://www.linuxquestions.org/questions/linux-
networking-3/smb-vs-nfs-performance-4175502751/) )

------
gpsarakis
Thumbs up for pv which shows the progress bar. Using parallel compression may
have serious impact on CPU resources, so it needs to be balanced.

Have you also done any testing with ssh + gzip?

Also, as you note at the end, the security concerns are not trivial.

------
huskyr
I wonder how this would compare to rsync using the '-z' (compress) option.

~~~
inyourtenement
Speaking generally, rsync is faster than straight copying if the files
partially exist in the destination first. If they don't exist, or they've
changed completely, rsync is slower due to the checksumming on both sides.

~~~
buster
There is no checksumming if the file doesnt exist..

------
moe
Looks like he reinvented _scp -C_?

~~~
gpsarakis
pigz does parallel compression, taking advantage of multiple cores. I am not
really sure if you can achieve this with scp.

~~~
reidrac
You can use pigz with SSH as you can pipe commands over SSH (google it). If nc
is faster than scp, I guess encryption is a factor, but then they're not
comparable solutions to the same problem.

~~~
gpsarakis
Sure you can :). Naturally transfers via SSH "suffer" from the encryption
overhead but prevent MITM/network sniffing etc. The author points out that
(only) in a trusted LAN you could use this solution to make things go faster.

I guess the title should be a little more mild - scp isn't going away, or
rsync via SSH for that matter.

------
aktau
I used pigz for compressing and then rsync for transferring my (Postgre)SQL
logs. The reason being that pigz has a built-in "rsyncable" mode, which allows
rsync to avoid sending the whole log over always. The multicore compressing is
just icing on the cake. gzip has a patch for an "rsyncable" mode, but it's no
longer properly included in the distro's I use (debian).

At some point, the only way to copy data faster is to not copy it at all.

I wrote a blog post about it here: [http://www.aktau.be/2014/10/23/pg-dump-
and-pigz-easy-rsyncab...](http://www.aktau.be/2014/10/23/pg-dump-and-pigz-
easy-rsyncable-backups-with-postgresql/)

------
adekok
The problem is that SSH uses a terrible buffering algorithm. This fixes it:

[http://www.psc.edu/index.php/hpn-ssh](http://www.psc.edu/index.php/hpn-ssh)

It's able to consistently fill 1G ethernet links.

~~~
dmm
> a terrible buffering algorithm

The main difference now it that hpn tweaks the tcp window to function better
on high-latency links.

> It's able to consistently fill 1G ethernet links.

Between two stock Debian sid laptops:

    
    
        dd if=/dev/zero bs=1048576 count=4096 | ssh <otherhost> dd of=/dev/null
        8388608+0 records in
        8388608+0 records out
        4294967296 bytes (4.3 GB) copied, 37.0848 s, 116 MB/s
    

That's 90% of the theoretical bandwith, right?

If you're really transferring huge file files across fat WAN links, why not
use GridFTP like the supercomputing centers use? They transfer TB daily.

I wouldn't mess with patching openssh, especially with patches for old
versions of openssh.

------
zobzu
xz + scp will still be faster than this. its not about the speed at which scp
goes, is that the author doesn't compression before sending over scp, but does
when he sends over nc and what not...

heck you can just use rsync over SSH for simplicity.

~~~
wyldfire
> xz + scp will still be faster than this

That's not the case if the data you're sending is already compressed. But
you're right that it's not a fair comparison. scp/ssh also has integrated
compression with "-C".

~~~
zobzu
gzip: tar -cz /home/me/source/directory | ssh target tar -xz --directory
/home/you/target/directory

xz: tar -cJ /home/me/source/directory | ssh target tar -xJ --directory
/home/you/target/directory

ssh -C won't be as good. in particular with a recent xz that's threaded it
will probably be faster than OP's with a good CPU

------
inyourtenement
One good alternative I've used is socketpipe[0]. It handles logging into the
remote server, setting up a listener, and then piping the data over.

But, that tunnel still isn't encrypted. So, either encrypt your data first, or
setup some vpn tunnel. The inefficiency of scp isn't just due to the
encryption, there's overhead in the ssh protocol.

0
[http://freecode.com/projects/socketpipe](http://freecode.com/projects/socketpipe)

~~~
devonkim
A VPN setup seems like overkill. You can run stunnel and run a daemon to loop
back through the stunnel configured connection. It's not quite as easy to use
offhand like scp, but it's certainly not as awkward to try to use as a random
series of OpenSSL commands either and it'll be easier to understand what you
tried to do later on the production machine.

------
lxfontes
Combine this with docker export / import and you got yourself an easy way to
copy images between servers :)

If you still have the setup up and running, throw this in the mix:

    
    
      rsync -a -e 'ssh -c arcfour' source dest:destpath
    

and

    
    
      rsync -az -e 'ssh -c arcfour' source dest:destpath
    

still encrypted! The latter enables compression, but I've found it slower when
dealing with several small files (ex: source code files)

~~~
zobzu
arcfour is arguably close enough to not encrypting these days ;)

------
kbar13
Maybe you can set up stunnel between the two hosts for SSL goodness, point
netcat to the stunnel port, and call it a day.

------
spydum
Surprised to see NC slower than SSH on network throughput. All of the authors
benefits came from compressing the payload first.. Curious if enabling
compression at the highest levels for SSH would have worked, or as others
suggested, piping pigs over the SSH connection itself.

------
wyldfire
scp's beauty is in its ubiquity. That host over there that I want to copy to?
I need no extra configuration there because sshd is already running.

I agree that scp is dog slow and it can be pretty frustrating. As others have
mentioned, cranking down the cipher provides a modest improvement. Ad-hoc
netcat based solutions like this one are the way to go if you need throughput
and not security.

------
nicois
xz supports multiple threads, if CPU is no object. If you want to stream the
results why not use ssh's -F option to create a secure tunnel, particularly if
the required bandwidth is not high.

~~~
krzyk
Which version of xz started supporting multiple threads?

My version latest from Debian testing (yes I know it is not cuting edge) says
in man page: Multithreaded compression and decompression are not implemented
yet, so this option has no effect for now.

~~~
zobzu
they have had it on the git repo for quite some time now (months i think), but
distros generally dont have it yet.

