In the same vein, protecting your SSH server with spiped[1] does 99% of the job. (= No need to setup fail2ban, password auth is not a big deal anymore, protects against out-of-date SSH servers and/or zero-days exploits, ...)
spiped looks like netcat with symmetric encryption. If your SSH server has password auth disabled, then all you're doing is moving the attack surface from one thing to another.
You're making a trade-off no matter which way you go. spiped probably has a smaller attack surface than sshd due to being less code, but it's also less "tried and true" than openssh. Not to mention, managing symmetric keys securely is more difficult than with asymmetric openssh keys where you generally only need to copy around the public key.
OpenSSH is plenty secure enough to be exposed to the public internet as long as you keep it up to date and do not have it misconfigured. But if you have a strong reason to not make it public, then I feel that something like Wireguard is really a better way to go.
> spiped looks like netcat with symmetric encryption
True. But to be more specific, it does symmetric encryption and authentication.
> If your SSH server has password auth disabled, then all you're doing is moving the attack surface from one thing to another.
I get what you're saying. But I see spiped as port-knocking with a 256bits combination. So basically, you are reducing the attack surface. In order for the attacker to get through, they need a vulnerability in spiped and in openssh-server. (If these probabilities are 50-50 each, that means the overall probability is 0.25)
At the end of the day, spiped should run in a chroot, as a user, so the attack surface of spiped is really low. If it gets compromised, the only thing the attacker can do is "be able to try to establish a connection to the SSH server".
The goal of spiped for me is to eliminate the need for constant monitoring of openssh vulnerabilities, and for installing fail2ban/blacklistd (which can lock legitimate users out)
Sort of. Not really. spiped operates at the level of individual stream connections, so you can e.g. make one end a local socket in a filesystem and use UNIX permissions to control access to it.
In fact that's exactly why I wrote it -- so I could have a set of daemons designed to communicate via local sockets and transparently (aside from performance) have them running on different systems.
Is it possible to use tarsnap's deduplication code on my own server? We're setting up an ML dataset distribution box, and I was hoping to avoid storing e.g. imagenet as a tarball + untar'd (so that nginx can serve each photo individually) + imagenet in TFDS format.
Has anyone made an interface to tarsnap's tarball dedup code? A python wrapper around the block dedup code would be ideal, but I doubt it exists.
(Sorry for the random question -- I was just hoping for a standalone library along the lines of tarsnap's "filesystem block database" APIs. I thought about emailing this to you instead, but I'm crossing my fingers that some random HN'er might know. I'm sort of surprised that filesystems don't make it effortless. In fact, I delayed posting this for an hour to go research whether ZFS is the actual solution -- apparently "no, not unless you have specific brands of SSDs: https://www.truenas.com/community/resources/my-experiments-i..." which rules out my non-SSD 64TB Hetzner server. But like, dropbox solved this problem a decade ago -- isn't there something similar by now?)
EDIT: How timely -- Wyng (https://news.ycombinator.com/item?id=28537761) was just submitted a few hours ago. It seems to support "Data deduplication," though I wonder if it's block-level or file-level dedup. Tarsnap's block dedup is basically flawless, so I'm keen to find something that closely matches it.
True, but a couple years ago I ported most of the Tarsnap dedup algorithms to Python. It wasn't too hard, just time consuming. I was hoping someone else did that in a thorough way, but I guess the intersection of "I love tarsnap's design!" and "I have the time to port it from C!" might not be too large.
> Redistribution and use in source and binary forms, without modification,
is permitted for the sole purpose of using the "tarsnap" backup service
provided by Tarsnap Backup Inc.
The codebase is a jewel. I love the design, the way it's organized, the coding style, the algorithms, everything.
Then I started making a mental map of tarsnap: How does it build its deduplication index? How does it decide where block boundaries start within a file? Etc.
Eventually I started coding the algorithms in Python, mostly as a way of understanding the code. It's not actually as hard as it sounds, but you have to be rigorous. (It's a C -> Python conversion, after all, so there's not much room for error.)
My process was basically: Copy the C code into a Python file; comment out the code; for each line, write the corresponding Python; try to get something running as quickly as possible.
It worked pretty well, but I eventually lost interest.
Over the years, I've wanted a deduplication library, and 2021 is no exception. Someday I'll just roll up my sleeves and finish porting it.
OpenVPN may have its issues (complicated setup vs. e.g. Wireguard, but not vs. e.g. IPsec), but I wouldn’t call it “not good” and it predates spiped by a decade.
Ok. I don’t agree there. What I’ve heard from security experts is that WireGuard is vastly superior to OpenVPN.
Addendum: OpenVPN was released in 2001 and there where lots of cryptography-related systems from that era that certainly didn’t age well – IMO OpenVPN is one of those examples.
OpenVPN's encryption is just TLS. It uses OpenSSL for this, not rolling their own implementation. Yes, there are parts of SSL/TLS that haven't aged well, but... it's good enough for the world's web traffic.
> security experts is that WireGuard is vastly superior to OpenVPN
Superior doesn’t imply the other is “not good”.
> lots of cryptography-related systems from that era that certainly didn’t age well
This doesn’t really mean anything.
> IMO OpenVPN is one of those example
That’s your opinion, but so far you’ve given no evidence.
As the other commenter said: OpenVPN is just TLS via OpenSSL. Yes, at some points it has used now-insecure algorithms, but so have web browsers and most everything else. One wouldn’t configure OpenVPN today the way they did in 2001.
Not that it necessarily means much, but AWS Client VPN is just OpenVPN. AWS, GCP, & Azure all support IPsec VPN which dates back to the ’90s. Just because something has been around for a long time doesn’t mean it hasn’t evolved its cryptography at all.
[1] https://www.tarsnap.com/spiped.html