This is a confusing claim. What exactly does "accepts arbitrary data from the public network" mean? (Most servers do that, they just choose not to process the data without additional validation.) And in what way is it critical to the functionality of the system?
Is the claim that, after calling pledge() and unveil(), the openrsync process is happy to satisfy arbitrary read/write requests from the other side of the connection, and so without them it is insecure?
Does openrsync view peer-induced memory corruption after pledge() or unveil() as a vulnerability? Or is the idea that the attacker can already "pump arbitrary data from the network onto your filesystem" and that the attacker gaining control flow is not a meaningful escalation of privileges?
My impression is that pledge() and unveil() are hardening tools, intended to limit the damage from a process that has already gotten out of control (in the same way that e.g. running Apache as non-root does not mean that you're actively fine letting attackers run code as www-data). Is that impression wrong? Is openrsync using them for the basic functionality of making sure that a file is only being rsynced to the filename given on the command line?
Typically, a process once compromised can do all sorts of things: touch files, access the network, execute programs, and so on. Among other things, OpenBSD’s security culture focuses on mitigating the damage done by compromised code through development practices such as privilege separation.
Traditionally this was done by splitting functionality into multiple processes, each serving a specific purpose such as doing network communication or parsing configuration, and dropping privileges in any way possible such as chrooting and switching to a dedicated user. Thus the attack surface is reduced, and the potential damage done by a compromised (sub‐)process is reduced as well.
pledge() and unveil() are the latest evolution in OpenBSD’s technique. pledge() whitelists syscalls, and unveil whitelists files that can be accessed.
So your process reads this arbitrary data from the public network. You validate it through some function and pass the data on to the next stage of your program. But what if there’s a bug in your validator, and your process gets compromised?
If your process hasn’t had its capabilities reduced, the attacker can do practically anything, especially if the process has superuser privileges.
But if the program uses a multi‐process privilege‐separated architecture, your validation process can’t access the filesystem or the network and isn’t running as root. If it tries, the kernel will kill it for violating its pledge. All the compromised process can do is pass malicious data through whatever interface you’ve provided between your validator and filesystem processes, hopefully an interface that is simple, well‐defined, and well‐audited.
What if your filesystem process gets compromised? With pledge() it can’t access the network or execute external code. With unveil(), even its file accesses are limited to the files whitelisted earlier in the program. It can’t read your SSH keys or delete your photos.
Certainly, if the process can be compromised that’s a bug that needs to be fixed. But we see new bugs constantly in the software we use every day. It’s a safe bet to say we will encounter more. By using a secure architecture, the damage these bugs can cause is drastically reduced.
There’s a really good description and demonstration of privilege separation in another project by Kristaps, acme-client (a Let’s Encrypt/certbot alternative): https://kristaps.bsd.lv/acme-client/
Another such project is Google Chrome, which uses pledge() and unveil() on OpenBSD.
I do expect this is structured as you describe - that it has a validator, and that it uses these kernel features as additional hardening if the validator has a bug. But I would not describe that as requiring pledge() / unveil() and certainly not requiring it for functionality. So I don't know what the author means.
And I am worried, in particular, that the author means that the validator is not very strong, and the bulk of the validation is that it unveils the filesystem to the files it's supposed to write to and then blindly trusts the input, on the grounds that the worst the remote side could do is corrupt files but it could have just have sent different contents for the files in the first place. This seems unlikely to me, but I'm having trouble figuring out an alternate interpretation.
Knowing Kristaps, he probably considers strong privsep and privdrop basic functionality. That is after all why he developed acme-client in the first place; he acknowledged at the time the plethora of “lightweight” certbot alternatives but was more concerned with security architecture.
> That certainly isn't true of Chrome (I run it on non-OpenBSD and there are no functionality / security issues in doing so). That isn't true of OpenSSH, which also supports pledge() on OpenBSD and still is pretty secure on other OSes.
Chrome uses different techniques depending on the platform. On OpenBSD it uses pledge() and unveil(), while on Linux it uses seccomp. Kristaps isn’t a fan of seccomp’s complexity, as he mentions in the readme: “Linux's security facilities are a mess, and will take an expert hand to properly secure.” He’s not suggesting it can’t be done, and the Google Chrome team in particular has the kind of expertise he’s talking about.
For projects of less‐than‐Chrome scale, though, Kristaps feels that seccomp is too difficult: https://github.com/kristapsdz/acme-client-portable/blob/mast...
> And I am worried, in particular, that the author means that the validator is not very strong, and the bulk of the validation is that it unveils the filesystem to the files it's supposed to write to and then blindly trusts the input, on the grounds that the remote side could just have sent different files.
I don’t understand this interpretation. It’s not what I got from the readme at all. What kind of validation do you expect Kristaps to be overlooking?
It's license (ISC ofc.) and size makes it great resource to study rsync. I would like to have Dropbox on my phone as legendary combination of rsync and cron. It may be nice to have a port to Java so it would work without JNI, but maybe that's only my fetish.
Tangentially related, I've been using Time Machine-like wrapper  around rsync(1) for a few years. It's very helpful for maintaining snapshots of my home directory.
Which is why bash on MacOS is from 2007.
Does anyone else know of instances where a company did a clean-room implementation of a previously FOSS tool so that they could make a paid/proprietary version? Usually it goes the other way.
It's important to remember that GNU is Not Unix, but OpenBSD userland is much more so. There isn't much reason to protect future forks if you expect that future software should start from first principles instead of extending software until it becomes a monolith that must be protected from its own developers.
The GPL does not place any restrictions on how software is used, so the (literal) users are not restricted.
It restricts how it is redistributed.
(I don't mean that as a plus or negative, but as just a statement on one of the largest philosophical differences between the bsd-style and gpl licenses: Who's freedoms are being protected? Those of the final end user or those of the developer?)
The GPL, being long and complicated (over 5000 words, and that’s just the GPLv3!), and with the ideological restrictions built in, is incompatible with many widely used free licenses, not least previous versions of itself. In any situation where social or legal barriers prevent the target audience from switching to the specific version of the GPL in question, any code I release under it is unusable to them.
Releasing my software under a simple, understandable, and permissive free license prevents this from ever happening.
I dislike proprietary software. I don’t use it or create it, and advocate against it wherever I can.
But given the choice between letting some Chinese featurephone developer use my code without “giving back,” and preventing swaths of the free software community I care about from using and improving my code for themselves, I will favor permissiveness every time.
It's kind of like arguing that a country where anyone can steal from anyone else with impunity is more free. Not when you consider the rights of the person being stolen from.
Also, even if a project is copylefted, people can still just do... exactly what they did here. Which, while different in the weak sense of “avoiding copyright” or maybe “avoiding patents”, in the context of systems code like this almost always results in the same code on both sides anyway. If the choice is between either giving the proprietary developers your code to use, or making them re-implement exactly what you wrote without your copy for reference—with no option for “they don’t implement it at all”—then exactly what is the point of choosing the latter over the former?
We aren't talking about non-free, we're talking about less-free.
> Just because some forks of a project aren’t free, doesn’t mean that the project itself isn’t free
No. The fact that there can be forks of the project that aren't open is what means that the project itself is less free than a project where all forks must be open.
> Especially if the project, like much of the e.g. MIT-licensed code in the world, is “done”
I don't consider this relevant to the argument at hand.
> then exactly what is the point of choosing the latter over the former?
Are you asking me what the point is of making something you don't want people to do hard for them vs making it easy?
Additionally, "theft" as you put it, in this case, doesn't affect the original property owner.
Proclamations of rights aren't really useful if they don't have means for enforcing those rights are not taken away.
(You know, since we're tossing grenades)
I hope the -c and maybe -X option make it.
I see its thoroughness as a feature, not a bug. It's very well written and I can just ignore the bits I'm not interested in. I wish more man pages were "too long" like this one is.
In that particular case the specs were reverse-engineered from actual source, but that's not a necessary part of the process. It's more common to have one team study the protocol, data going over the wire, disassembling, etc, then use the knowledge gained to write specs, and then another team implements the equivalent functionality from specifications only.
It Depends™ on how you judge.
The only thing I would change about rsync is it's default, which IMO should be to copy all meta data supported by both sides. Ie, the default should be to make the destination as similar to the source as possible. It's default is to only copy the data, and you must add options to say what else you want copied. To make matters worse you can just add every option because if you say you want to copy something not supported by one side of the other it errors out. I may have missed it as I am reading the man page source, but openrsync didn't seem to change that.
It's not often I see "open" to mean "non-GPL" in software :)
Before Git and SVN, we had CVS, and to check out code from a CVS repository you needed to have an account on the CVS server. If you wanted to contribute but didn’t already have a developer account, you were limited to writing patches against release tarballs or whatever alternative method upstream supplied.
One of OpenBSD’s major projects in the mid 90s was creating anonymous CVS, where anyone could check out code without any account. This came from Theo’s experience after losing his NetBSD account, where he found himself unable to make meaningful contributions anymore without the ability to cvs checkout, cvs diff, etc. So when he started OpenBSD, he had in mind to open up the development process to everyone, account or not.
This is described in the commentary for the OpenBSD 6.1 release song: https://www.openbsd.org/lyrics.html#61
Similarly, the more strict strict BSD crowd has issues with the Apache2 license clauses regarding revocation - see here: https://www.openbsd.org/policy.html