Hacker News new | comments | ask | show | jobs | submit login
Openrsync imported into the tree (undeadly.org)
137 points by protomyth 7 days ago | hide | past | web | favorite | 70 comments

> The actual work of porting, however, is matching the security features provided by OpenBSD's pledge(2) and unveil(2). These are critical elements to the functionality of the system. Without them, your system accepts arbitrary data from the public network. ... rsync has specific running modes for the super-user. It also pumps arbitrary data from the network onto your file-system. Do you want that running without specific mitigation in place?

This is a confusing claim. What exactly does "accepts arbitrary data from the public network" mean? (Most servers do that, they just choose not to process the data without additional validation.) And in what way is it critical to the functionality of the system?

Is the claim that, after calling pledge() and unveil(), the openrsync process is happy to satisfy arbitrary read/write requests from the other side of the connection, and so without them it is insecure?

Does openrsync view peer-induced memory corruption after pledge() or unveil() as a vulnerability? Or is the idea that the attacker can already "pump arbitrary data from the network onto your filesystem" and that the attacker gaining control flow is not a meaningful escalation of privileges?

My impression is that pledge() and unveil() are hardening tools, intended to limit the damage from a process that has already gotten out of control (in the same way that e.g. running Apache as non-root does not mean that you're actively fine letting attackers run code as www-data). Is that impression wrong? Is openrsync using them for the basic functionality of making sure that a file is only being rsynced to the filename given on the command line?

I’m trying to understand your question.

Typically, a process once compromised can do all sorts of things: touch files, access the network, execute programs, and so on. Among other things, OpenBSD’s security culture focuses on mitigating the damage done by compromised code through development practices such as privilege separation.

Traditionally this was done by splitting functionality into multiple processes, each serving a specific purpose such as doing network communication or parsing configuration, and dropping privileges in any way possible such as chrooting and switching to a dedicated user. Thus the attack surface is reduced, and the potential damage done by a compromised (sub‐)process is reduced as well.

pledge() and unveil() are the latest evolution in OpenBSD’s technique. pledge() whitelists syscalls, and unveil whitelists files that can be accessed.

So your process reads this arbitrary data from the public network. You validate it through some function and pass the data on to the next stage of your program. But what if there’s a bug in your validator, and your process gets compromised?

If your process hasn’t had its capabilities reduced, the attacker can do practically anything, especially if the process has superuser privileges.

But if the program uses a multi‐process privilege‐separated architecture, your validation process can’t access the filesystem or the network and isn’t running as root. If it tries, the kernel will kill it for violating its pledge. All the compromised process can do is pass malicious data through whatever interface you’ve provided between your validator and filesystem processes, hopefully an interface that is simple, well‐defined, and well‐audited.

What if your filesystem process gets compromised? With pledge() it can’t access the network or execute external code. With unveil(), even its file accesses are limited to the files whitelisted earlier in the program. It can’t read your SSH keys or delete your photos.

Certainly, if the process can be compromised that’s a bug that needs to be fixed. But we see new bugs constantly in the software we use every day. It’s a safe bet to say we will encounter more. By using a secure architecture, the damage these bugs can cause is drastically reduced.

There’s a really good description and demonstration of privilege separation in another project by Kristaps, acme-client (a Let’s Encrypt/certbot alternative): https://kristaps.bsd.lv/acme-client/

Another such project is Google Chrome, which uses pledge() and unveil() on OpenBSD.

My question is that the README implies that pledge() and unveil() are required for functionality, to the point that porting to an OS without support for that is an inherently questionable idea. That certainly isn't true of Chrome (I run it on non-OpenBSD and there are no functionality / security issues in doing so). That isn't true of OpenSSH, which also supports pledge() on OpenBSD and still is pretty secure on other OSes.

I do expect this is structured as you describe - that it has a validator, and that it uses these kernel features as additional hardening if the validator has a bug. But I would not describe that as requiring pledge() / unveil() and certainly not requiring it for functionality. So I don't know what the author means.

And I am worried, in particular, that the author means that the validator is not very strong, and the bulk of the validation is that it unveils the filesystem to the files it's supposed to write to and then blindly trusts the input, on the grounds that the worst the remote side could do is corrupt files but it could have just have sent different contents for the files in the first place. This seems unlikely to me, but I'm having trouble figuring out an alternate interpretation.

> My question is that the README implies that pledge() and unveil() are required for functionality, to the point that porting to an OS without support for that is an inherently questionable idea.

Knowing Kristaps, he probably considers strong privsep and privdrop basic functionality. That is after all why he developed acme-client in the first place; he acknowledged at the time the plethora of “lightweight” certbot alternatives but was more concerned with security architecture.

> That certainly isn't true of Chrome (I run it on non-OpenBSD and there are no functionality / security issues in doing so). That isn't true of OpenSSH, which also supports pledge() on OpenBSD and still is pretty secure on other OSes.

Chrome uses different techniques depending on the platform. On OpenBSD it uses pledge() and unveil(), while on Linux it uses seccomp. Kristaps isn’t a fan of seccomp’s complexity, as he mentions in the readme: “Linux's security facilities are a mess, and will take an expert hand to properly secure.” He’s not suggesting it can’t be done, and the Google Chrome team in particular has the kind of expertise he’s talking about.

For projects of less‐than‐Chrome scale, though, Kristaps feels that seccomp is too difficult: https://github.com/kristapsdz/acme-client-portable/blob/mast...

> And I am worried, in particular, that the author means that the validator is not very strong, and the bulk of the validation is that it unveils the filesystem to the files it's supposed to write to and then blindly trusts the input, on the grounds that the remote side could just have sent different files.

I don’t understand this interpretation. It’s not what I got from the readme at all. What kind of validation do you expect Kristaps to be overlooking?

It is possible to read the readme[1] to imply that unveil is the only protection from escaping the root (e.g. with a ".." directory). The only way to know for sure is to dig through the code though.


You're not wrong but the point being made is that wouldn't you want a tool which writes data from the network to disk to have those mitigations enabled?

From a comment on the site: "(...) its (original rsync's) compressed manual page is almost as big as the compressed openrsync sources (...)"

It's license (ISC ofc.) and size makes it great resource to study rsync. I would like to have Dropbox on my phone as legendary combination of rsync and cron. It may be nice to have a port to Java so it would work without JNI, but maybe that's only my fetish.

I just want to point out that rsync is, in fact, no longer ISC licensed but rather GPL (v3, at that), which is likely a big part of the reason this new implementation even exists.

rsync was never ISC licensed afaik. The parent is referring to openrsync's license.

Rsync was developed by the Samba people, it is under the same license (GPL).

Very cool news. rsync(1) is one of the first things I install on a new OpenBSD instance.

Tangentially related, I've been using Time Machine-like wrapper [0] around rsync(1) for a few years. It's very helpful for maintaining snapshots of my home directory.

[0] https://blog.interlinked.org/tutorials/rsync_time_machine.ht...

I use rsnapshot [0] for the same thing.

[0] https://rsnapshot.org/

For those wondering what this is, see https://github.com/kristapsdz/openrsync

I'll try explaining it. It's a new implementation, from scratch (clean room) of rsync, which will become the new rsync in OpenBSD. The tree that it's been imported into is the openbsd cvs tree that contains openbsd, openssh, opencvs, and other major projects.

I would not be surprised if, in a few years, this becomes one of the CLI tools installed on macOS, either as part of the default install or as part of the Xcode CLI tools.

Does macOS have any security features similar to pledge/unveil or any of the Linux hardening packages?

It has a port of PF.

I'm more interested to know about system call and filesystem access restrictors. I think pf is only a packet filter.

There's SIP and Keychain, but it does not prevent say Safari from accessing Mail or user memory in general. If macOS becomes an iOS port (instead of iOS being the derivative work of the barely used UNIX system called macOS) perhaps we'd see some of the iOS specific hardening. AFAIK that kind of sandboxing does not exist in macOS. How difficult would it be to port something like pledge or unveil to macOS?

Why? MacOS has a bunch of GPL stuff, such as bash, IIRC.

GPL2, I doubt you'll find any GPL3 code in there.

Which is why bash on MacOS is from 2007.

And it already has actual rsync.

It has 12 year old rsync due to Apple not wanting to ship anything that's GPLv3: https://bayton.org/2018/07/how-to-update-rsync-on-mac-os-hig...

Interesting. This is the first project I can think of where a clean-room implementation was done so that a project could use a less free license ("free" as defined by the FSF).

Does anyone else know of instances where a company did a clean-room implementation of a previously FOSS tool so that they could make a paid/proprietary version? Usually it goes the other way.

ISC is a more free license for its users. GPL protects theoretical future users of theoretical derivative software by restricting freedom for its users.

It's important to remember that GNU is Not Unix, but OpenBSD userland is much more so. There isn't much reason to protect future forks if you expect that future software should start from first principles instead of extending software until it becomes a monolith that must be protected from its own developers.

That is not precisely accurate.

The GPL does not place any restrictions on how software is used, so the (literal) users are not restricted.

It restricts how it is redistributed.

Apologies, I intended "user" in my comment to mean "a developer using the license". Thank you for clarifying.

This is the core difference between gnu and bsd - guaranteeing freedom for all current and future users VS all current and future distributors (in particular, the bsd guarantees the right to fork and close - often seen as essential for commercial use in a new software or software+hardware appliance; while gnu attempts to guarantee that any downstream user will always have the four freedoms).

It's so easy to forget that at the very end, there are people who are using software. Developers are middlemen for most code in the products they build (think dependencies). GPL cuts through that, and always has the end-user in mind.

WSL. The implementation, lxcore.sys, is a clean room implementation of the Linux kernel ABI.

How is the ISC (version of BSD license used by OpenBSD) less free than the GPL3? This is very far from a "paid/proprietary" version.

Using "less free" or "more free" in this context just leads to pointless semantic debates. What happened is that someone made a clean-room implementation of a copyleft program in order to have it available under a copyfree license. Both licenses are Free.


First time I see this, thanks. The website isn't explicit about this point, but from what I gather, "copyfree" isn't viral in the way GPL is. It seems to provide "Free as in Freedom", but unlike GPL, doesn't protect that freedom from being immediately taken away.

I think gp means that this code is allowed to be used in products that choose to limit the end users freedoms.

(I don't mean that as a plus or negative, but as just a statement on one of the largest philosophical differences between the bsd-style and gpl licenses: Who's freedoms are being protected? Those of the final end user or those of the developer?)

Its much closer to a proprietary version than a GPL version would or could be however.

Proprietary-friendly is not less-free.

How not?

I want my code to be usable by anyone developing free software of their own. I want them to be able to integrate it, modify it, redistribute their modified copies, and more.

The GPL, being long and complicated (over 5000 words, and that’s just the GPLv3!), and with the ideological restrictions built in, is incompatible with many widely used free licenses, not least previous versions of itself. In any situation where social or legal barriers prevent the target audience from switching to the specific version of the GPL in question, any code I release under it is unusable to them.

Releasing my software under a simple, understandable, and permissive free license prevents this from ever happening.

I dislike proprietary software. I don’t use it or create it, and advocate against it wherever I can.

But given the choice between letting some Chinese featurephone developer use my code without “giving back,” and preventing swaths of the free software community I care about from using and improving my code for themselves, I will favor permissiveness every time.

Because you are actually free to do what you want with it, instead of free to do what someone else wants you to do with it.

Yes, you are free to close it up and sell it, but everyone else then isn't free to use your changes. "More friendly to proprietary purposes" is "less free".

It's kind of like arguing that a country where anyone can steal from anyone else with impunity is more free. Not when you consider the rights of the person being stolen from.

Just because some forks of a project aren’t free, doesn’t mean that the project itself isn’t free. Especially if the project, like much of the e.g. MIT-licensed code in the world, is “done” for all intents and purposes and there is no reason at all to fork, proprietary or otherwise. (This comes up in the context of pure algorithms code a lot.)

Also, even if a project is copylefted, people can still just do... exactly what they did here. Which, while different in the weak sense of “avoiding copyright” or maybe “avoiding patents”, in the context of systems code like this almost always results in the same code on both sides anyway. If the choice is between either giving the proprietary developers your code to use, or making them re-implement exactly what you wrote without your copy for reference—with no option for “they don’t implement it at all”—then exactly what is the point of choosing the latter over the former?

> Just because some forks of a project aren’t free, doesn’t mean that the project itself isn’t free

We aren't talking about non-free, we're talking about less-free.

> Just because some forks of a project aren’t free, doesn’t mean that the project itself isn’t free

No. The fact that there can be forks of the project that aren't open is what means that the project itself is less free than a project where all forks must be open.

> Especially if the project, like much of the e.g. MIT-licensed code in the world, is “done”

I don't consider this relevant to the argument at hand.

> then exactly what is the point of choosing the latter over the former?

Are you asking me what the point is of making something you don't want people to do hard for them vs making it easy?

That's not a good analogy. In this case, people aren't being stolen from, they are freely giving it away for someone else to do as they wish.

Additionally, "theft" as you put it, in this case, doesn't affect the original property owner.

Digital analogies to theft rarely are good, but this one is passable. The main point is that this license grants software freedoms, but then doesn't do anything to protect it - thus enabling middlemen (like most of us devs are, for most software we write!) to immediately strip those freedoms away.

Proclamations of rights aren't really useful if they don't have means for enforcing those rights are not taken away.

I wasn't using it as an example of being deprived of property, but as an example of how infringing on other people's freedoms leads a system to be less free than one that doesn't.

You have the freedom to create your own proprietary derivative, a freedom you lose with the GPL version of rsync.

On a very high level, LLVM/Clang happened because Apple needed a clean-room implementation of GCC.

And because the gcc code was an impenetrable mess--intentionally so in order to prevent people from making a non-GPL alternative.

Of gcc, or "a C compiler (with extensions as seen in the wild)"?

Well, Clang implemented gcc extensions long before it went for MSVC ones...

The BSDs have a strong preference for copyfree licenses. They tolerate copyleft programs, but try to switch to copyfree when possible. See for instance GCC -> Clang/LLVM.

Both the ASF and FSF have a variety of NIHed projects that appear to exist purely for license ideology reasons. The most famous that comes to mind is Apache Geronimo, a clone of JBoss that few people used but was bought by IBM for ~$120M IIRC.


(You know, since we're tossing grenades)

I hope this ends up being a lot simpler and easier to understand than the original rsync. The rsync manpage is way too long.

Rsunc solved a complex problem that comes in many nuanced variants. It may seem trivial at the outset, but it is actually not. So I don't think that rsync has many features that are somehow unnecessary or bloat.

Well, the manpage for this is looking really good and it already has almost everything that I care about. The -a option isn't in yet, but it's in the TODO.

https://github.com/kristapsdz/openrsync/blob/master/openrsyn... https://github.com/kristapsdz/openrsync/blob/master/TODO.md

I hope the -c and maybe -X option make it.

> The rsync manpage is way too long

I see its thoroughness as a feature, not a bug. It's very well written and I can just ignore the bits I'm not interested in. I wish more man pages were "too long" like this one is.

What does a "clean-room implementation" mean?

The first (well-known) 'clean-room' implementation was when Phoenix implemented an IBM PC-compatible BIOS by having one team studying the IBM source (which was available), then writing up a specification for how it worked, handing that specification over to somebody else (they were Phoenix' legal team, IIRC), which then handed the specs over to another team that had never seen the IBM source. They sat down in their "clean room" (b/c it wasn't tainted by actual IBM source) and implemented a BIOS from specs only. In that way Phoenix was protected from any claims of copyright infringement: Nothing was copied, and the people writing the code had never seen the original source.

In that particular case the specs were reverse-engineered from actual source, but that's not a necessary part of the process. It's more common to have one team study the protocol, data going over the wire, disassembling, etc, then use the knowledge gained to write specs, and then another team implements the equivalent functionality from specifications only.

Not derived from the existing code. The reason it’s mentioned is to assert that openrsync is not subject to the original rsync’s GPL.

Is it any better than rsync?

It is better in some ways, and worse in some, both largely subjective. It has a different license, is smaller, less battle tested, from different developers, designed with different goals in mind.

It Depends™ on how you judge.

All openrsync implements is the equivalent of a fast "cp -a" across the network, plus it can also remove files if they don't exist. rsync does much more and over the years I've used most of it, so there is no way I would use openrsync. The upside is the manpage of openrsync isn't that much more complex than cp, which is a definite bonus if that's all you are doing.

The only thing I would change about rsync is it's default, which IMO should be to copy all meta data supported by both sides. Ie, the default should be to make the destination as similar to the source as possible. It's default is to only copy the data, and you must add options to say what else you want copied. To make matters worse you can just add every option because if you say you want to copy something not supported by one side of the other it errors out. I may have missed it as I am reading the man page source, but openrsync didn't seem to change that.

No. openrsync implements the rsync protocol. It doesn't have all of its options, but the protocol is what it is. Do you have any idea what you're talking about?

It is interesting that "open" part of openrsync refers to license -- BSD, vs original rsync's GPL

It's not often I see "open" to mean "non-GPL" in software :)

Fun bit of history: the “Open” here comes from OpenBSD; but the “Open” in OpenBSD came from the development process, not the license.

Before Git and SVN, we had CVS, and to check out code from a CVS repository you needed to have an account on the CVS server. If you wanted to contribute but didn’t already have a developer account, you were limited to writing patches against release tarballs or whatever alternative method upstream supplied.

One of OpenBSD’s major projects in the mid 90s was creating anonymous CVS, where anyone could check out code without any account. This came from Theo’s experience after losing his NetBSD account, where he found himself unable to make meaningful contributions anymore without the ability to cvs checkout, cvs diff, etc. So when he started OpenBSD, he had in mind to open up the development process to everyone, account or not.

This is described in the commentary for the OpenBSD 6.1 release song: https://www.openbsd.org/lyrics.html#61

rsync went GPLv3 a while ago, and many businesses don't trust some of the newer clauses that were added.

Similarly, the more strict strict BSD crowd has issues with the Apache2 license clauses regarding revocation - see here: https://www.openbsd.org/policy.html

Stallman himself makes a big fuss about "Open Source" not being equivalent to "Free Software". See: https://www.gnu.org/philosophy/open-source-misses-the-point....

I get the feeling the "open" part was because they were hoping to get it included in OpenBSD like OpenSMTPD, etc.

Yeah, that was my assumption too. It's coming from the OpenBSD community, so openrsync it is.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact