Hacker News new | past | comments | ask | show | jobs | submit login
License now displayed on repository overview (github.com/blog)
173 points by joeyespo on Sept 21, 2016 | hide | past | favorite | 65 comments



...but it is only visible if you are logged in. That inconsistency is a little weird.


And as with many other information (such as the bloody fucking search) it's only taking in account the repository's main branch and ignoring all others, if you decide to relicense in a branch for some reason that will not be displayed by github.


What would you prefer GitHub do? Display multiple licenses? I see that situation as not really solvable.


Why not apply the same solution already applied to branches? Have a drop down that lists the different branches and their licenses, with it sorted such that the branches with different licenses are displayed first.


I would think that would be UI overload for such an edge case.


> What would you prefer GitHub do?

The license of the currently selected branch?


This seems like a pretty niche problem. Maybe in your ecosystem this is common, but I work in OSS and haven't seen this yet.

In all cases with more "complex licensing," you need to see the project's own license file. GitHub's little UI icon isn't legally-binding in any way. It's just a simple UI addition that works in the vast majority of cases.

In a Google search for "git branch different license," I don't see anyone else discussing licensing their projects this way. I'm not saying it isn't valid, it just seems really rare.

Your proposed solution wouldn't necessarily work for the (much more common, in my experience) situation where some branches, like deployment branches, omit the license all together. In those situations, those branches are typically under the same license as the main project. (As always, though, the final word should be in the project's license file.)


I agree there's not an elegant UI solution beyond "use master branch's licence unless current branch has one" but it doesn't seem like too much of an edge case to me, particularly if you're serving a GitHub pages site on a gh-pages branch, or docs, which will very possibly come under a different licence.


Change what your main branch is?

The old license is still applicable to the old code. If you're changing the license for future development and not pushing it to the main branch for some reason, it looks like you've got a new main branch.


> Change what your main branch is?

1. That makes no sense, the "main branch" has a semantic for the project whether it's the current development HEAD or the current STABLE or whatever other concept you appli to it.

2. That doesn't fix the issue, which is that different branches can have different licenses (and with respect to search do have different contents).

> The old license is still applicable to the old code.

Yes, and if you change the "main branch" you end up with the same fuck up, namely that older releases/branches are now marked with the new license, which is not any more correct than the reverse.

That is, the license is metadata of a specific revision of a project (and technically subtrees can even have different licenses), not of the project as a whole, the project can be relicensed at any time.


It makes perfect sense. The code that GitHub displays when you hit the project page is from the main branch. It then follows that it makes the most sense to display the license that pertains specifically to the code which you see by default.

If one wants to checkout a non-main branch, ostensibly they can view the LICENSE file in that branch on their own.


> It makes perfect sense.

It doesn't make any sense to show the license of the main branch when I just switched to a different one, no. If they want to take the lazy route, just stop showing licenses on anything but the main branch.

And incidentally they could use that new feature to remove the search bar when not on the main branch since that's not going to search the current branch, but it won't provide any warning or indication that you're not searching what you thought you were.

> The code that GitHub displays when you hit the project page is from the main branch.

You are aware that github has a branch switcher right there on the "project page" right?


This seems very odd, I wonder if it's a bug or by design


I'm gonna guess it's temporary; I get the impression GitHub likes to test its changes before it releases them (I've been seeing the announced license indicator for several days), and that probably goes extra for logged-out users.


Yeah, I have been seeing it for a few weeks now and was actually looking for an announcement - I thought it was a bit weird when I didn't find one. The indicator not being shown for non-logged in users is probably just a consequence of a staged rollout.


Yeah I'm guessing they just turned it on for 100% and haven't removed the feature flag yet which hides it from logged out users.


Dual-licensing seems to not be supported by licensee [1]; MIT/Apache-2.0 is what’s used for Rust and a substantial fraction of the Rust ecosystem.

[1]: https://github.com/benbalter/licensee/issues/57


Anything we can do to improve the usability and detection of licenses is a big win in my book. Even the best intentioned people have a hard time consistently doing the right thing -- whether that's attributing the right copyright owner, reproducing the right copyright message, or otherwise abiding by the license, it can be hard to be 100% correct.

It would be nice if this was more programmatic. If licenses had an identifying string they carried so they could be clearly and consistently identified, and machine parseable metadata. If we make it easy to build automation around license files, we make it easy to do the right thing -- maybe even easier than doing the wrong thing.

I think about something like Webpack. If Webpack could just traverse your project structure and parse all your dependencies license files and generate the proper attributions, that would be amazing.


You might be able to write that as a Webpack plugin with a couple of hours work... could be neat.


This is really cool. One little detail I've noticed is that it doesn't seem to apply to private repositories, which seems like a bit of an oversight. The repo I'm checking clearly having an Apache 2.0 license text in the LICENSE file. Just because a particular copy of a repo is private doesn't mean that the code within it isn't still bound by an open source license.

EDIT: Before anyone replies with "Why would you want that?", it's fairly common to stage a project privately on GitHub before publicly releasing it. It'd be nice to see that the license is detected correctly before going public with it. As it is now, I don't know what'll happen until I go public, and my first public commit in the repo may well be fixing up something minor to get the license detected properly.


It seems to work fine on my org's private repos. Not sure if it's having trouble detecting the license in your case, or if it just hasn't worked through all the repos yet, or what.


It also doesn't detect if you aren't logged in, in my case.


This is nice, but the way they gather license data has serious limitations if it hasn't improved in the last year.

https://lwn.net/Articles/636261/


This is tough problem to solve given how many different ways projects have to declare licenses. I run a project called git.legal and we do ALL of the following, but still don't quite a get a 100% hit rate of finding a project's license: 1. Check for a license declaration in package manager metadata (eg. package.json) 2. Check for a license.txt file or try to parse out a readme's license section, and do a full-text diff against known licenses 3. Check the readme for an extensive set of regular expressions matching known license identifiers and common declarations such as "licensed under ...". 4. Check for a consistent license declaration in project source file headers

This gets about a 97% hit rate, but even for "hits" it's sometimes unclear what specific license version a project is under. For example, many projects just say "licensed under MIT"...but "MIT" isn't a specific license. There are several versions of it and there's no way to know which version the author intends to use for the project. That might be a minute point, but this all adds up to a lot of uncertainty around licensing.

So, project authors, please use metadata and include a specific version :)


I was thinking the same thing. Just a few years ago GitHub used to automatically "figure out" what programming language your project was in, and it was pretty bad.


Github uses an open source gem [0] to detect the license. You're welcome to contribute enhancements.

[0] https://github.com/benbalter/licensee


Looks like although the README says it only looks at LICENSE, it has improved and checks more files; nice!

https://github.com/benbalter/licensee/blob/3692df44ab32772a9...


I'm aware, I read the post.


Another feature copied from gitlab. Not that that is a bad thing.


That's rich considering Gitlab is basically a Github clone.


For quite a while github looked like a cheap gitlab knock off rather than the other way around due to gitlab actually innovating and adding new features and github stagnating.

Now that github is adding the features that gitlab already added, they are more neck and neck again.


Gitlab is a dying product.


... How? If anything, Gitlab seems to have grown a lot recently. As far as I can tell, the competition with Gitlab is one of the main reasons GitHub has started releasing so many features (many of which Gitlab had first).


If it's dying why would GitHub implement their features? Do they want to die too?


It's called "inheritance". It's an OOP concept, you functional kids won't understand.


A nice small enhancement. While many projects mention the license in the README, not all do. So it is very easy to see at a glance which license a project is under. In this context, I really appreciate how github offers to add the license while creating a new repository with a quick list of the most common licenses. A good incentive to put a new project under a proper license from the start and making sure that the correct license terms are attached.


It's a good idea to make this more visible, but they should also make it visible when a repository does not have a license. If they did the UI right this would encourage developers to license their code in the same way that Facebook encourages users to add a profile picture.


I don't think they should do that, for different reasons.

For example, lots of projects have more than one license, but GitHub seems to list only one. Incorrect information is worse than no information.

Besides current implementation only seems to pay attention to most popular licenses (?)... arguably CERN Open Hardware Licence is not amongst the most popular, but for some reason I was expecting GH to detect that one ;) So again: incomplete information.

Perhaps asking the repo owner to manually specify the licence could be an option, but if the repo owner cares about licences, I'm sure that information is available already.


No licence means copyright applies I believe. So maybe if there is no licence then a generic copyright applies, contact the author message would suffice.


Please don't -- there's more than enough of that kind of nagging on todays web as it is.


I've noticed that it doesn't distinguish between the different versions of the CC licence. It just puts them all under CC BY 4.0


Just checked a few of my own projects. Github doesn't recognize LGPLv3 properly, where the license is split into a GPLv3 "COPYING" file and an LGPLv3 "COPYING.LESSER" file. They ignore the latter and just slap GPLv3 on the project.


> We use an open source Ruby gem called Licensee to compare the

> repository's LICENSE file to a short list of known licenses [...]

I have seen more people putting the license of their projects at the end of the README.md than in an individual LICENSE or LICENSE.md file which is what this feature analyzes. I noticed the addition of this feature a few days ago while checking the repository for the Atom.io project, but without an official blog post I was left thinking why none of my repositories (which are MIT licensed) were missing it. I guess reading LICENSE(\.md)? is good enough.


The tool they use explains why they don't just scrape the entire repo. Basically it's because software licences are legally binding and it's not a good idea to put your license at the bottom of an FAQ in your readme. Licensing needs to be clearly deliberate.


I think a less charitable explanation is that they hope this will pressure developers into making their license info more consistently available to the advantage of everyone -- like how Google adjusts its search ranking algorithm to punish certain kinds of behaviour/problems.

... which I would be perfectly okay with.


Yeah, that's also a valid point. I've contributed license PRs to several projects, just so that the next person doesn't need to trawl through the readme or source code to find the one line that specifies the license in the most ambiguous way possible.


I created a command-line tool with basically three columns which are:

- License.(md?)

- README.md parsing the license

- package.json reading the license (it's for NPM)

To try to solve this exactly issue for a dependency tree:

http://github.com/franciscop/ianal


This is a really good development. Programmers shouldn't have to worry about copyright; in the world of software, it does far more harm than good and should be done away with entirely. But for now we do have to worry about copyright, and the proliferation of "open-source" Github repositories with no explicit license puts us at risk of being sued in the future if our projects are successful. Little nudges like this will help a lot.


I think that developers _should_ think about how they licence their software. Software licences are a cornerstone of the free software movement, and people should really consider what they believe people should have the right to do with their code.


Software licenses are a cornerstone of the free software movement because they are our only defense against copyright.

I agree that we need to keep thinking about what one person has the right to do with another person's code, but copyright is a terrible starting point for that thought process, centering as it does the business models of eighteenth-century stationers at the expense of the much broader interests of political dissidents, scientists, historians, patients dependent on medical devices, teachers, students, librarians, archivists, whistleblowers, anybody trying to fix broken things, tinkerers, and journalists.


How long does it take to index the file? I've added LICENSE.md to my project (https://github.com/tananaev/traccar), but the license doesn't show up at the top bar.


When I added one to one of my project, the change was immediately detected.


I think it is because of the .md extension. I have LICENSE files (no .md) and they are picked up correctly.


I have my license in LICENSE.md, LICENSE.rst, etc. and they are picked up correctly. For example, https://github.com/susam/uncap and https://github.com/susam/ice.


Is there any way to disable this, or set it manually at least? I'm concerned there may be a situation where the license is detected incorrectly, and then Github brands your repository as being under a certain license, when it really is not.

Also, it seems to be missing a lot of licensing cases. You can have files under multiple licenses in one repository and dual licensed projects. There doesn't seem to be any "GPLv2 only" and "GPLv2 or any later version of the GPL", which is also a big deal. For example Git and Linux are under "GPLv2 only", but are just listed as GPLv2 just the same as they list a license that's "GPLv2 or any later version".

Even more conspicuous is the lack of support for licenses with additional clauses. For example, many compilers like gcc have additional exception clauses allowing you to link the compiler runtime to your library without incurring the GPL requirements. Swift has a similar exception clause for their runtime library. Even though it's Apache v2 which is permissive, Apache v2 still requires distributing the license/copyright info with all copies binary or source for the software, which would be extremely annoying for a compiler to require. LLVM is discussing switching to a similar Apache v2 plus exception as well.

There's also a lot of licenses missing. And I'm not talking about ones that are obscure or equivalent to MIT, I'm talking about for example the Boost license, which is designed specifically to deal with the binary distribution requirements problem, and is popular in the C++ community. [0]

Just looking at the little badge isn't telling you enough useful information. For example, say you are writing a GPLv3 program, and you want to use a library. Looking at the github page shows you its has a GPLv2 badge. But if it's "GPLv2 only" instead of "GPLv2 or any other license" then you can't use it with your project. And the only way you can find that is by reading the license, or at least the README if the author explains the license there. Similarly, just looking at the badge doesn't tell you if the authors have added any additional exception clauses that could be useful in legal compatibility for your program, so you'll have to click through for that as well.

In the end, I think this just provides potential for confusion, and doesn't really make the job of seeing what license the software is under any easier. There could be advantages in listing the license in a standard place and way, but it should allow the developer to manually select the license, and for arbitrary text input for additional notes, or for licenses that github hasn't added. If they insist on keeping it automated, then there should be a way to opt out (of course it'd be best if it was opt-in, but that's not going to happen, so I won't even go there).

0: The copyright notices in the Software and this entire statement, including the above license grant, this restriction and the following disclaimer, must be included in all copies of the Software, in whole or in part, and all derivative works of the Software, unless such copies or derivative works are solely in the form of machine-executable object code generated by a source language processor.


Hey, Tim, I've talked a number of times with GitHub about the issue of lack of nuance in their license monikers on GitHub. I haven't gotten very far, but this change will cause me of course to raise it again.

As others have commented, kragen is jumping to excitement a bit too quick. GitHub's license data is not curated, and I'm quite sure determining a license of software is an undecidable problem; it just requires human judgment, and many self-report their own license incorrectly to GitHub.

GitHub has a lot of problems to solve before the license data they are presenting can be trusted. Even many very common programs have licenses that can't even be described with an SPDX moniker, so the "badge method" just isn't going to work.


> Is there any way to disable this, or set it manually at least?

It looks like they try to read `LICENSE`, so maybe if you avoid using the American spelling and just put `LICENCE`, github will skip it. I know gitlab had this bug/feature.


In order to distinguish between "GPLv2 only" and "GPLv2 or later", they can't just parse the LICENSE file, they need to look at comments in the actual source files to see if they contain "or (at your option) any later version" or some such disclaimer. This could lead to a lot of false positives and false negatives, especially if the project contains third-party code.

Setting the license manually, of course, would solve the problem.


I noticed this a few days ago and I think it's a great addition. I hope they keep this feature.


This is great! I often find myself clicking through to the license file and trying to remember which license text it looks like. This'll be both easier and more reliable.


Neat.


Interesting that they're devoting screen real estate to something that can be found in the data section.


What is the data section? I don't see that on GitHub.


Err, the file browser area.


The language the project is written in can also be found in the data section, still there is the language bar :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: