Hacker News new | past | comments | ask | show | jobs | submit login
Microsoft WinObjC: Restore original licenses (github.com/microsoft)
74 points by mikeash on Aug 7, 2015 | hide | past | favorite | 67 comments



Speaking as someone whose job it is to make sure these issues don't happen at a similar large company:

No org can be 100% perfect all of the time, no matter what processes they put in place. This looks like just a regular process failure.

I don't know what MS's are specifically, but we do everything from training new employees to verifying headers on outgoing releases (much to the annoyance of plenty of engineers who just want to release shit). I suspect they do the same.

Replacing headers with a standard header is pretty common. However, we require code segregation to prevent the issue that happened here (IE all third party codes goes in third party directories with an associating LICENSE file describing the license/notices). That makes it easy to tell that someone screwed up when replacing headers.

Doing that is uncommon for most companies in practice (at least, when i surveyed my counterparts, most want to be doing it, but it doesn't really happen in practice with a > 50% rate due to not being closely involved enough with engineering and development practices to make it easy).

Most of them are happy to mix up third party code with their own code, or really bad at enforcing, and then when you go to release it, you don't notice things like what happened here.

Then again, we also tell new employees that we care about getting the notices and credits right not because we are worried about legal repercussions, but because of the golden rule (you'd be pissed if you ask for one small thing to use you code, and they can't be bothered to do it, and we don't want other people to feel that way about us).

I suspect that is pretty different too :)

But this seems like an object lesson in proving that. The people whose headers were changed feel aggrieved regardless of the fact that it was probably a simple process screwup / whatever. They don't care about that. They care that somebody decided they really wanted to use it, and that it saved them time/energy/helped/whatever, but despite that, when they made the choice to use it, they didn't seem to bother to put the one small thing the author asked of them on "the list of things to do before release"

(Regardless of whether this is actually the case).


When I had to go through this at Microsoft 5 years ago or so, the process was incredibly rigorous. Not only did we have to attest that all of our code was actually ours, but any duly integrated third-party code had to be vetted by a separate group to check that the third-party code contained no IP issues of it's own.

The process was incredibly cumbersome and I'm somewhat glad they've scaled back a bit, but this is much too far.


> Replacing headers with a standard header is pretty common.

It shouldn't be. That's a fundamentally bad idea.

It'd be reasonable to put a standard header at the top, immediately followed by "Based on (other project):" and the header for that project. But changing those headers is never OK.


Sorry, i think you misunderstand. I meant: "Replacing your internal copyright/whatever headers" with a standard license header before releasing.

Not "erasing MIT license headers", which is completely wrong.


Ah, I see. As in, automatically replacing a "Internal use only, do not distribute" header or similar with a header suitable for external release? Sure, that makes more sense.


Yes, and my suspicion is that this is essentially what MS messed up.

Someone probably wrote a script (as often happens) that removes the first comment block (which they expected to be some internal-use +copyright comment) or something similar and added their headers , and it got applied to these files.

I know this, because we've stopped people from making the same mistake :)


> It shouldn't be. That's a fundamentally bad idea.

I don't think it's that bad of an idea. The MIT license says:

    The above copyright notice and this permission notice shall be
    included in all copies or substantial portions of the Software.
What that means is anybody's guess. I'd say if you include one copy somewhere you're pretty much off the hook.


It's theoretically possible to comply with that license while moving the notice elsewhere, sure. However, for the purposes of avoiding situations like this, it's a bad idea. Much easier to just keep the license in the file exactly as it was, and add a header above it if needed.


A source file is reasonably "a substantial portion"; in fact, a per-file license is applicable to that file, and does not cover other files.


Yup. I'm saying (guessing, really) the header doesn't necessarily have to stay in the file. You might as well put it somewhere else (e.g., docs, license file).

---

Per-file licenses are different. The only one I looked at is MPL2 and IIRC it mentions the little note doesn't have to be in the file.


Seconded - and it looks like this was an acquired codebase, which was put together probably under engineering constraints very different from those that exist in house for Microsoft and other similar large dev shops. Still, would expect Microsoft to have gone over it with even more care than usual before open sourcing it as a result...

Something for startups to keep in mind: if an IP acquisition by a large tech player is among your exit strategies, it behooves you to get your intellectual property ducks in a row, otherwise you're in for some awkward conversations with the acquiring company's legal team about the provenance of your codebase.


That's what confuses me about this. Why didn't Microsoft's legal team have those awkward conversations with this company during the acquisition?


We do, so no idea there


>> Someone at Microsoft needs to go back to the first commit and track down every bit of code that was taken, and investigate the origin of the code, and properly restore the copyright notices.

It is specifically to avoid these issues that most companies don't go through the trouble of open-sourcing their code. Not at all saying it's wrong to enforce licenses I'm just pointing out the obvious that companies that don't want to get into legal trouble avoid the issue altogether and discourage open source. A well known company I used to work for requires legal to sign-off anything that has to be open-sourced. A startup I currently work-off had to be audited for license compliance before a round of funding. One primary reason I see is that most engineers don't understand licensing very well. They don't understand when it's safe to use GPL and when it's not.


At D one huge reason we use github is to establish an audit trail of where code that gets incorporated comes from, which:

1. discourages people from committing code that isn't theirs 2. if there is bad code committed, we can determine the extent of it and so take corrective action 3. it's a great defense against false claims of bad code


A similar requirement would apply to misappropriated third-party proprietary code.

If you're copying code from elsewhere, document where you got it from, and preserve all licenses. Don't delete copyright notices and license headers, ever. That's neither ridiculously complicated nor burdensome.

Complying with other license conditions is another matter, and that's why many companies have people or groups whose primary job is to understand and deal with those kinds of issues. But basic due diligence about copyright and license headers needs to be fundamental 101-level understanding.


Even though Microsoft has had some issues like this, I'm glad they are open sourcing their code. It's a huge risk for a company like them, but I'd take a poorly open sourced codebase, where they can fix the problems over closed source.

Hopefully they'll learn and get better at it as they go along. I really hope they don't get deterred from open sourcing their code and stop the current trend.


And license auditing is usually just a matter of grepping for license blocks or LICENSE files. Basic due diligence. It will not catch a case where someone takes one file and strips off the license header.


There's been a brewing controversy on Twitter and GitHub today about missing/changed licenses on some of the code that Microsoft incorporated into their WinObjC release from yesterday. This GitHub issue seems to be the best summary of what's going on so far. The quick summary of the summary is that there are a bunch of files released under BSD/MIT-style licenses where Microsoft removed attribution, some files which are GPLd or even more restrictive, and at least one file which is not under an open source license at all.

Full disclosure: some of my code is in there, so I'm not an unbiased source.


This seems like a reasonable response:

  s5msft commented 3 hours ago 
  
  @cjwl We're on it and definitely want to make this right. As a bit of background, (some of) our source code was originally C++ based (marked up to act like Objective-C). We then ran that source through a tool to generate "real" Objective-C code.
  
  In any case, we're going through Foundation right now and will absolutely make this right.
They seem pretty serious about me, and honestly, their acknowledged mistake is pretty understandable. They also have loads of money that people could sue them for, so I'm sure they don't want to risk themselves either.


It's just not that simple: the source attribution for the entire code base is in question. Of the licensing that is known, the licenses are known incompatible and the result is non-distributable.

They also got the copyright headers wrong when trying to fix Cocotron's stripped copyrights:

https://github.com/Microsoft/WinObjC/issues/35#issuecomment-...


This is pretty common in proprietary corporate code, but surprising to see in a project that seemed destined to be open-sourced. For code that doesn't see the light of day other than in binary form, I've seen people cut-and-paste whatever they can find into a source base and then move on without a second thought.

I would expect better of Microsoft -- especially from a company that supported Oracle when suing Google over Android's copying of an API (which included an incidentally small number of actually-copied source). [1]

[1] http://www.infoworld.com/article/2613305/patents/microsoft--...


Reading the thread, there seems to have been an automated translator involved that looks like it did not retain comments.


The scrubbed copyrights are bad enough, but can be fixed. The incompatible licenses mean that this code is actually unusable.

I have no idea how MS is going to rectify this.


Yeah, that's what moved me from "they are jerks, oh well" to "this is something worth posting."

Note to Microsoft: I will be happy to write you a new ObjC runtime if you'd like, at my usual rates.


They should just use the Apple ObjC runtime, not that it's the cleanest piece of code, but... a lot of work has gone into it. Ditto for CoreFoundation. Oh well.


I don't believe Apple's runtime (or at least CF and other important parts) are open, are they?


IANAL but looking at ThirdPartyNotices.txt there is MPL code, so they're probably OK with APSL. Maybe not so with LGPL though (which would rule out Apportable's otherwise excellent Foundation implementation).

It's also kind of weird they're using OpenSSL instead of Microsoft's SChannel. Usually Microsoft are pretty good at only giving you one way to hang yourself in the security department - I'm a big fan of SSPI, even if the API is occasionally unwieldy. (And as soon as I wrote that, I remembered CryptoAPI vs CNG...)


Apple's runtime and most of CoreFoundation are available at http://opensource.apple.com along with a lot of other stuff.

I don't know offhand if their license would be suitable for Microsoft's purposes.


For the incompatible licenses, Microsoft can:

1. Buy the rights from the original owners (if you are the owner, you can re - license your code under a different license).

2. MS can reimplement from scratch the problematic pieces of code.

3. MS can hire someone to write these missing pieces for them.


How does relicensing work?

If I'm the writer of a GPL'd piece of software can I just decide to make it MIT one day?

What happens if I die tomorrow, could that code still be relicensed eventually?

What happens when there is no clear owner?


Yes, as long as you retain the copyright of the code you wrote (i.e. didn't explicitly assign it to someone else, or write it as a work for hire) then you can relicense it under whatever you wish at any time. You could make it available under GPL, CDDL, MIT, and the Beerware license all at once.

This stuff is relatively simple at its core. Copyright says, "You may not redistribute this work without permission from the copyright holder." An open source license says, "I grant you permission to redistribute this work, as long as you follow these conditions." Relicensing is just a matter of making another statement like that with different conditions.

If you die tomorrow, then the copyright transfers to your heirs, who could then do all the stuff you used to be able to do.

If there's no clear owner, then life becomes interesting. If there once was a clear owner who released the stuff with a license, that license is still valid, but relicensing isn't possible unless someone can demonstrate that they're the owner. If there was never a clear owner then you can't really use the stuff, although if you're brave you could proceed under the theory that if nobody claims ownership there is nobody to sue you.


I think the general consensus is that if you authored 100% of the code yourself, you could always change the license (but people could fork the latest version under the old license and keep going, as happened with X.org), but it you ever took any patches/pull requests/contributions without also receiving a copyright assignment, you need to hunt down every single contributor to agree to a license change. I think that happened with Mozilla and VLC, whilst for Linux it would appear that several core contributors have explicitly refused to re-license (for example to change from GPL2only to GPL2+).

Other examples include MySQL, which was dual-licensed GPL and commercial, and KDE/Qt which also had a dual-licensing (and then a re-licensing).

If all contributors also accept to transfer copyright/ownership up-front, re-licensing is easier.

I'm guessing re-licensing after death would require waiting whatever period (70+ years?) for copyright to expire.


"In some of these cases, we have explicit permission from the authors to relicense sections of relevant code." (from linked page, though only very recently added)


If that's actually the case, then the QPL/GPL license headers must be replaced.

Note that the commenting individual is the founder of the company acquired by Microsoft that provided this code.


I feel a bit ignorant here, but are there open source license that simply can not co-exist?

For example, could I write program Foo.c (using CDDL license), which uses GPL'ed Bar.c and MIT'ed Dad.c (both unmodified)?


Most non-copyleft OSS licenses are compatible in some form, including with copyleft licenses such as the GPL; when used in a combined work, the combined work's license essentially reverts to the most restrictive of the licenses.

In some cases, such as with WinObjC, reverting to the most restrictive license makes the work unsuited for its intended purpose.

However, many copyleft OSS licenses have clauses that are genuinely incompatible with both other copyleft licenses, and less-liberal OSS licenses -- in such cases it's impossible to legally combine code under the licenses.

In your example, the CDDL and the GPL have incompatible copyleft restrictions; your resulting program wouldn't be distributable under the provided licenses.

A more common incompatibility example is that of the OpenSSL license and the GPL; GNU maintains a fairly complete list of incompatibile licenses here: http://www.gnu.org/licenses/license-list.en.html#GPLIncompat...


Yes. It's not particularly different from writing program Foo.c (using CDDL license) which uses files Bar.c and Dad.c you took from your former employer without permission. There is no copyright license allowing such a thing to be redistributed.

By default, all code is copyrighted and effectively proprietary. Open-source licenses are one way to grant permission to copy them. Fancy contracts with expensive lawyers are another way. You can only do what the licenses way you can do, and in particular, the GPL says that you may not apply "additional restrictions" and the CDDL has some clauses that the GPL doesn't. So there is no way to satisfy both licenses with regards to redistribution.

You can negotiate for a different license. (This seems to be what MS did with objfw.)


> but are there open source license that simply can not co-exist?

The GPL can basically not coexist with any other license, though lots of licenses (though perhaps fewer than the FSF claims, especially for GPLv2 and previous) allow code to be relicensed under the GPL (which is different from the code coexisting.)

The reverse is emphatically not true, GPL code generally cannot be relicensed (downstream; the copyright owner can do whatever they want) to another license, except newer GPL versions if the optional "or any later version" clause is included with the GPL.


My understanding is that the GPL can usually coexist with licenses that are non-copyleft, or with some copyleft licenses, as long as the license doesn't require the resulting work have any license restrictions beyond what the GPL requires.

Anybody who's terribly interested in all of this will probably enjoy reading the GPL FAQ, especially this bit:

http://www.gnu.org/licenses/gpl-faq.en.html#WhatIsCompatible

IANAL, YMMV, HTH, WTFBBQ, ETC.


Yes, that's been a problem with at least ZFS on linux and a lot of apps linking to openssl earlier.

Some would even claim that CDDL was designed solely to ensure ZFS could not be integrated with linux.


Microsoft is very serious about OS licenses, and I'm sure the team is currently sweating to get this right. Before Nadella, it was very difficult to get a permit to use OSS at all; With the new culture, openness (giving and taking) is encouraged, but some teams might have taken it too lightly. It's a learning process.


What is proper procedure for open-source mashups? Setting aside GPL, if I have a project that lifts a single function each from separate projects that are individually zlib, mit, bsd, apache, mozilla, eclipse and boost. What do I do? Put a license url annotation on each function? What license can my project as a whole be?


Document the licenses for each, and include all the copyright notices and license headers, exactly as they appeared at the top of the files you took those functions from. The resulting file is under the conjunction of all of those licenses, requiring anyone who uses it to comply with all of them simultaneously. That's theoretically possible when the licenses don't conflict.


I hope they make it compilable with clang + mingw headers.


[deleted]


I think the problem is that it takes additional work to figure out what needs to be fixed (especially since the company that has reused your code might be trying to obscure the fact). It's not reasonable to expect the original author to do that work, on behalf of a party that appears to be infringing on their licence.

Even if you want MS to get this fixed and make a success of it (as I very much do) it's clearly their responsibility to figure out the correct licensing.


[deleted]


The author has done enough to show that something was copied. They could just point to that, and insist that redistribution stops now. Instead they are giving MS the opportunity to fix it, but it's not in any way the original author's responsibility to do it for them.


I like the dude who thinks that the GPL hasn't been proven valid in a US court and that migueldeicaza has trouble getting a job in the industry.


An (especially unhinged) HNer. We really added a lot of value to that github discussion over there. :(

https://news.ycombinator.com/threads?id=MTWomg


As I said in-thread, I've now promoted my back burner idea to allow repository owners to automatically lock GitHub threads when they appear here. There are a couple missing pieces from GitHub's end, namely API support and the ability to lock "other" threads like commits, so I suspect the functionality will become deleting non-owner/non-collaborator comments after a thread appears here. There are GitHub accounts like MTWomg whose sole purpose is trolling when a thread appears here, and it's becoming simply irresponsible to divert this community's conversation to another community.

I have contacted GitHub for clarity on these points, since as it stands, it seems like locking was a kneejerk "put out a fire" mechanism that is really unsupported and betrays GitHub not caring about abuse on their platform. They join Twitter in that regard.

HN destroys threads more than Reddit, in my experience, but both are worth hitting.


[flagged]


Following me from platform to platform is not going to get me to engage you. Remain irrelevant, please.


The irony of the comment calling Kim Dotcom a "piece of shit" is just astounding in context.

I was afraid my post might generate some crap in the comments over there, but I hope it's a net gain overall!


> I was afraid my post might generate some crap in the comments over there, but I hope it's a net gain overall!

With respect, it genuinely isn't. Nothing changed from you submitting this thread, because the resolutions were already in flight by the time this hit the front page.


> I was afraid my post might generate some crap in the comments over there, but I hope it's a net gain overall!

The "I can assure you that I do not read HN and am far more intelligent than you" comment from MTWomg was amusing but overall I'm afraid that one guy is capable of doing more damage to an unmoderated conversation than the rest of us are capable of mitigating. (how would we actually do that?)


It isn't really unmoderated, since the repository owner can delete comments. I'm pretty disappointed that they did not do so.

However, last I heard, the only way to ban a user from a GitHub organization's issue tracker (as opposed to an individual's) is to contact support.


If I were that guy I'd be more focused on working the problem. Although public perception is half of that, so yeah, that is a decent point.


[flagged]


Why is it a bad thing for Kim Dotcom to make money by using copyrighted works without the permission of the copyright holders, but a good thing for Microsoft to do it?

Edit: my works are protected by copyright just as much as anything Megaupload hosted. You appear to believe that the license on them is not legally binding. In which case, fair enough, but then standard copyright applies, which doesn't give anyone permission to redistribute them. As for malice, you repeatedly said you wanted to see Microsoft violate these licenses intentionally in order to invalidate them, so you seem to think it would be a good thing even if it's not what they actually did in this case.

Edit edit: you're right, if I release X and the license for X is invalid, then I have released X with no terms of use. With no terms of use, then the standard copyright protections written into the laws of the relevant jurisdictions apply without modification. In the case of US copyright law (which applies to both me and Microsoft) that says they can't redistribute my stuff without my permission. "No terms of use" does not equate to "therefore you can do anything you want."

Edit the third: what do you mean, "default back to you"? If you're right and the license holds no weight, then I never gave up those rights in the first place.

Edit the last: so your theory is that the part of my license which says "you can redistribute this" is legally valid, but the part which says "under the following conditions" is not? That's certainly an interesting legal theory. I am not convinced, to put it mildly.

OK, one more: how should those conditions be written in order to be valid?


Yeah, I am pretty curious about this theory of law where a judge can rule that a right ceases to exist. That's not what I understand a "right" to be.

Anyway, mikeash, you're spending way too much time caring. There's certainly good work to be done in making copyright law clear to hackers, but it won't help people who don't understand basic principles of law. Your time is better spent on more fruitful things. :)


it's quite irrelevant, but I was about to post the same -- the idea of Miguel "becoming unemployable" as a result of this is just delightful.


Yeah. Ironically, what would actually make me hesitate to hire someone is seeing them post on a public issue tracker that they believe the GPL is unenforceable and that blatantly violating it is a good idea. Leaving aside their critical thinking abilities, soft skills, etc. it's a serious risk of legal liability to have such a person write code for my company.


Indeed. They seem eager for Microsoft to intentionally violate the GPL in order to create a court case which they believe would result in invalidating the GPL. Who's to say they wouldn't see an opportunity for causing this to happen by incorporating GPL into whatever project they're working on at their job?


I hope that some day I may join him too at 'Unemployable Island'


you wish! they don't hire just anyone, at 'Unemployable Island'


That person has a lot of very strong opinions about opensource for someone who has never made a single commit to any public repository.


But he is also partly correct, the way some of the comments are phrased is pretty childish, rude or whatever you want to call it.


Oh no. First time I see something like this. Why can't Microsoft be like the good-guy companies, like Google (and Apple)?


I Love open source, but I find its really disturbing that some code can't live together when licensed under different open source licenses. Under my Java Code I mostly avoid unnecessary Libraries altogether. That's the reason why I mostly Build my stuff with akka and undertow and postgresql jdbc only (and the jdbc stuff gets replaced as soon as my library async postgres driver does what I need) so that I only have Apache 2 Code (and some parts BSD which will be replaced).

Everything else is just too odd. And if I or any other people below me pulls a library in that isn't APL2 i will refuse the PR. It just sucks how aweful it is to write programs with awesome licenses that are incompatible. But you just want to use existing things, which you can't do cause people are dumb and doesn't write licenses which are happy together. I mean why can't I write GPLv3 Code and pull other Open Source licensed things, too? It's still Open. And that's the problem the Open in Open Source mostly isn't open. I see the source, but I can't do anything with it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: