
How the TensorFlow team handles open source support - melqdusy
https://www.oreilly.com/ideas/how-the-tensorflow-team-handles-open-source-support
======
jordigh
I don't like Google's CLA. This article spins it to make it sound like the CLA
is there to make sure the code can be used under the Apache license, but what
a CLA really does is shift blame away from Google to all of the external
contributors. The article even even expands the CLA acronym incorrectly, to
make it sound like it's about the code instead of being about the contributor,
and the rights they're giving up for Google.

CLAs are one-way, for covering Google's ass. They do not benefit the
contributors.

[http://ebb.org/bkuhn/blog/2014/06/09/do-not-need-
cla.html](http://ebb.org/bkuhn/blog/2014/06/09/do-not-need-cla.html)

~~~
DannyBee
"I don't like Google's CLA. "

This much is apparent :)

"This article spins it to make it sound like the CLA is there to make sure the
code can be used under the Apache license"

Which it is.

", but what a CLA really does is shift blame away from Google to all of the
external contributors."

Well, no, actually, it doesn't shift "blame" at all (and you aren't clear on
"blame for what").

In fact, it's main goal is to protect the end users and the project.

"CLAs are one-way, for covering Google's ass. They do not benefit the
contributors."

This first part is just flat out false. If you read even the first sentence of
each line of the CLA, you'll see it benefits more than just Google (and it
only benefits Google at all because they are the ones taking on the
liability), and benefits Google exactly as much as the end-users. The second
part is just silly, the goal of properly written CLA's is _not_ to "benefit
the contributors". What benefits do you think they need?

The goal of the CLA is to benefit _the project_ , and make sure the project
itself, and more importantly, the people who _use_ the software don't get
screwed, particularly if a contributor screws up.

So now i just think you read a blog post and think you agree with it, but
don't actually understand the situation enough to articulate coherent
arguments as to why.

"[http://ebb.org/bkuhn/blog/2014/06/09/do-not-need-
cla.html"](http://ebb.org/bkuhn/blog/2014/06/09/do-not-need-cla.html")

I respect Bradley a lot, but on this subject, he is simply wrong in this
situation. This is not entirely surprising. Bradley does not deal with a lot
of communities that involve attempting to maintain patent peace among various
corporations. He mostly deals with communities whose issues are copyright
related, and where most situations can be dealt peaceably with by removing
code.

In those communities, i think it would be reasonable to not give a crap about
CLA's and to think they were a waste of time.

However, all the world is not that simple.

~~~
jordigh
I was not surprised to see that you're a lawyer at Google, so of course you
think Google is doing the right thing.

As you know, what the CLA is doing is making sure Google can't get sued for
patents or copyright claims on the software. That's what I meant by shifting
the blame from Google to the contributors. Google can just say, look, we have
this CLA here, so it means we didn't do it, go talk to the one who signed the
CLA. Not our problem, says Google. So now it's up to the individual who signed
the CLA to deal with whatever legal fallout happens, a person who probably has
a lot less lawyers than Google to handle the problem.

If you really wanted reassurance that the code is being contributed properly,
there are other documents that could be signed which would be much more two-
way than Google's CLA. For example, GNU's copyright assignment includes a
promise from GNU that they will always keep the software free, with GNU
absorbing the legal burden to do so. KDE's CLA has a similar wording, talking
about free software principles, and it's also optional for contributors.

Now I am going to be more adversarial and unpleasant. I don't like the Google
hivemind. Google employees have a very uniform mode of thought on certain
topics. This includes the beliefs that the GPL is dangerous and must never be
used by Google unless forced to (i.e. only on software that came from outside
of Google), that the AGPL is the ultimate evil, and that under all
circumstances Google must be protected above everything. Google only legally
cares about Google, no more. You can pretend like the CLA also benefits the
other contributors, but that's not really the case. The external contributors
could take Tensorflow, fork it away from Google, keep working without a CLA,
and everything would be right. All that's necessary is the implicit agreement
to the free license (for example, the clause in the GPL that says using the
software grants you the rights provided by the GPL). Like you say, should a
problem arise, all they have to do is remove the offending code.

~~~
DannyBee
"I was not surprised to see that you're a lawyer at Google, so of course you
think Google is doing the right thing."

Actually, it's more than that if you really want to get into it! I came up
with Google's policies and CLA. But it's always great to attack people for
what they do instead of actually engage on the merits!

"As you know, what the CLA is doing is making sure Google can't get sued for
patents or copyright claims on the software."

This is false, so i don't know it. It's making sure the innocent end user
can't get sued. Google can and has gotten sued for what contributors do.
Surprise! In fact, in that case, the contributors were not even notified. We
took care of it. Double surprise, apparently.

" So now it's up to the individual who signed the CLA to deal with whatever
legal fallout happens, a person who probably has a lot less lawyers than
Google to handle the problem."

So just to be clear if an individual deliberately screwed up, you think Google
should handle the problem because they have more lawyers? That's an
interesting notion. Hey, i fucked up my neighbors house, but he should worry
about it, because he makes more money than me. Past that, you think the end
users should pay the price? Because again, the issue you are complaining about
is there to protect the end users, not Google.

As I already mentioned, Google already tends takes care of these issues on
it's own.

"If you really wanted reassurance that the code is being contributed properly,
there are other documents that could be signed which would be much more two-
way than Google's CLA."

Not really.

" For example, GNU's copyright assignment includes a promise from GNU that
they will always keep the software free, with GNU absorbing the legal burden
to do so. KDE's CLA has a similar wording, talking about free software
principles, and it's also optional for contributors."

Both of these, which i'm intimately familiar with, have precisely the same
issue you complain about in the first paragraph. So again, your argument seems
non-coherent.

Neither of these CLA's indemnify contributors or otherwise prevent them from
being sued. In fact, they both do exactly the same as the Google CLA in this
regard.

" This includes the beliefs that the GPL is dangerous and must never be used
by Google unless forced to (i.e. only on software that came from outside of
Google),"

This is also just false. You literally have no idea what you are on about. In
fact, if you bothered to look farther, you'd see that i was a GCC maintainer
and have contributed and used GPL software (both FSF-and non at Google for
many years.

Google happily uses and contributes to tons of GPL software, and has no such
policies or thoughts. Again, i'm the person who made the policies, so i would
know. When I first joined one of the first things I did was get Google to sign
a blanket copyright assignment with the FSF.

"that the AGPL is the ultimate evil, "

Whatever this is supposed to mean. We have several practical problems with the
AGPL that other companies have as well. We don't avoid it for ideological
reasons. Interestingly​ if you ask Eben , you'd find that he also has concerns
with how it achieves its goals.

"You can pretend like the CLA also benefits the other contributors, but that's
not really the case. "

I actually explicitly have said, numerous times, the CLA is mainly for the
protection of end users and the project.

"Like you say, should a problem arise, all they have to do is remove the
offending code."

This is where, like Bradley, you simply have no idea what you are talking
about. That only works in very simple situations.

Overall, your argument comes off as "i have an axe to grind with Google".
Nothing you have stated makes a lot of sense. The people you hold up on a
pedestal have precisely the same legal issue you refer to, but you randomly
change arguments about what exactly your issue is.

As you surely also know​ Google CLA is a copy of the Apache CLA with one of
the obligations removed because we felt it was too onerous (there are no other
changes). So your real problem is with the thousands of projects that use
Apache CLAs.

~~~
jordigh
So, if Google doesn't have a problem with the GPL or doesn't think it's
dangerous, what's the software that originally came from Google that was GPL-
licensed? Your gcc example doesn't count, because like I said, Google was
coerced to use it for gcc. When did Google license something under the GPL
because they wanted to defend their copyleft?

And yeah, the practical problem with the AGPL that you have is that you think
your company will be destroyed if the world could see your source code. That's
why you tell all of your employees that they can't even consider _touching_
software that's AGPLed. So, what are you hiding there?

The only reason Google will use GPLed software is because they managed to de-
fang it. Once the AGPL put the fangs back in, oh no, can't have that.

For Google, like for Apple, Microsoft and Facebook; free software is only
intended for relatively unimportant scraps. The real meat, the real code, must
remain secret, must remain safe. That's how you can control the users, how you
can better target ads at them. Imagine the havoc if we could see the source
code for AdSense or Gmail!

On the other hand, thank you for clarifying how the CLA works. No sarcasm
here, I'm honestly thankful about that. Sorry, I was wrong about that.

~~~
tptacek
I think most people in the ecosystem are happy that Google doesn't tend to use
the GPL, since doing so would make it more difficult to use that code
commercially.

In commerce, GPL'ing at the origin is mostly a tactic for protecting the
commercial value of code; for instance, Sourcefire kept Snort under the GPL so
that nobody could effectively compete with them using the Snort codebase,
since they had the asymmetric ability as copyright owners to make private
enhancements while competitors needed to public; Sleepycat GPL'd their
database so that commercial software projects that wanted to use BerkeleyDB
had to pay for an alternate license, &c.

Taking from a GPL project and refusing to contribute back to it on GPL terms
is antisocial, but then, Google doesn't do that (they do they opposite,
contributing more than any normal software firm).

But declining to originate software under the GPL isn't antisocial. I
appreciate what the GPL does and have used it for projects myself, but
virtually anyone who works professionally knows to think twice before adopting
GPL'd libraries and thus constraining their future options.

I don't think this criticism of yours really makes any sense.

Also, as a bystander to this little debate on HN, the sincere thanks you just
gave was a good start, but you also cast aspersions on his motives for
discussing this here, and I think you owe him an apology as well. It's an HN
rule not to say those kinds of things about other commenters.

~~~
jordigh
I refuse to accept the conclusion that the GPL is anti-commercial or that it
makes it more difficult to use code commercially. I think the companies who
are indeed the vast majority, who think that the only way to commercialise
software is to hide the source code and keep things secret are overall wrong.

When none of Google, Microsoft, Apple, or Facebook actually originate code
under the GPL, all this does is further this conclusion that seems so
unquestionable to you that the GPL is anti-commercial or restrictive or just
something to hesitate about. What the big guys do the little guys believe. We
need more free software, and we need to defend the proliferation of free
software, and we need to protest against those who work against free software,
who control our search results and the ads shown to us and the information
collected about us behind the veil of secret source code.

I refuse to apologise for questioning Google's motives. Their GPL refusal,
except when forced to, is not a good thing. If they wanted free software they
would be doing things like pursuing the rampant GPL violations in Android
devices or defending their copyleft against VMWare's attempts to circumvent
Linux, not leave it to charities like SFC.

~~~
tptacek
Nobody's asking you to apologize for questioning _Google 's_ motives.

~~~
jordigh
In this case, DannyBee has said he designed some of Google's motives, so the
distinction is difficult to make. I guess I am not understanding exactly what
offense I committed. This is a failing on my part, and if you can clearly
pinpoint what I said that requires an apology, I would appreciate it.

------
mks40
Big thanks to Derek for his stackoverflow responses, have saved me so much
time, especially considering how uninteresting that support work might be in
general compared to designing and implementing new systems.

------
backpropaganda
Tensorflow isn't really regarded as a very welcoming open source project in
the deep learning community. The deep learning academic community is already
moving to PyTorch, not just because of the imperative style programming, but
also to avoid Google lock-in and the pseudo-open ethos. PyTorch is used and
developed by the community, and backed by multiple companies, not just one.

~~~
laingc
I'd be interested to know how you formed this impression. My own impression is
formed by:

* The work I and my colleagues are doing * Recently published literature and arxiv pre-prints * Conference talks * Industry meet ups * Blog posts

Based on my anecdotal experience from these sources, I see absolutely no
evidence of Torch experiencing any kind of resurgence. If I had to make a call
about the direction if the community as a whole, I would say that it is very
clearly heading towards TensorFlow, with some holdouts using Theano and some
using mxnet. Torch is used by some groups, certainly, but I have the
impression that it's use is decreasing, rather than increasing.

~~~
p1esk
Just to give you another anecdotal experience:

I'm a researcher who used Theano for 3 years to train convnets. A couple of
months ago I realized that Theano is getting too much pain to work with (main
reasons being the lack of implementations for latest models, and difficulty of
using multiple GPUs), so I decided to switch to a more popular framework. I
looked at TF, and almost started porting my code to it, then someone suggested
I look at PyTorch. After 30 minutes playing with it, I was sold. Much more
intuitive. Major architectures from the last 12 months have been implemented.
Dynamic graphs are probably something I will need in the future. Community is
very active and helpful. The downside is that the software is not as mature as
TF, and the community is smaller, but that's changing fast.

p.s. What you wrote about Torch is correct though, but we are talking about
PyTorch, not Torch.

------
EternalData
It's really cool to peer behind the hood of really massive open source
projects. I have to give kudos for the TensorFlow team for staying on top of
what must be a massive amount of work!

------
EvgeniyZh
The sad part is that open-sourced TensorFlow isn't same as TensorFlow that
Google uses.

I've also heard Google is not really fond of cooperation with other
corporations (say, NVidia) on TensorFlow

~~~
vrv
The codebase in GitHub is pretty much exactly the same as the internal, the
main exceptions being things like having to rewrite include paths for files,
filesystem plugins for internal cluster filesystems, etc; and those things are
modularized so that we can have equivalent implementations in the OSS build to
support things like HDFS and GCS filesystems, RDMA network layer
communication, etc.

We daily sync the code between the two repositories using a suite of tools
we've built. I'm on sync rotation this week and you can see all of my commits
and activity on GitHub as proof; I merged something like 60 commits from the
community just this week. It wouldn't make any sense for anybody (Google or
the community) to maintain two different versions for something that is
actively being developed by so many contributors. I've also directly worked
with NVidia engineers on improvements they've made (and merged) to the system;
the ones I've dealt with are great, so that statement is also false.

I'll be giving a talk about all the work we do to make this possible at OSCON
next week, and if you are there feel free to catch me to ask me questions.

~~~
EvgeniyZh
Obviously I don't have any proofs that I can provide. Just small talk with
people here and there. Some NVidia engineer told me that Google is (was?) very
uncooperative to work on some DL stuff together, as opposed to, say Facebook.

About internal version, this is pure speculation, based on idea that TPU are
programmed with some framework, thus out should be TF, thus there is closed
part and there should (could) be others.

~~~
vrv
I'm sorry to hear of that experience, and it's certainly not intentional. We
should (and I will) always try to be better (I know some will forever view
Google as an antagonistic behemoth but usually the engineers on both sides are
trying to do the right thing; it just takes time to come to consensus and
understanding sometimes).

Regarding internal version: we built TensorFlow to support devices as modular
plugins: the CPU and GPU devices are built this way (you can read the source
code to see how device registration works), and the same registration
mechanism is used for the TPU code, which can't be opensourced due to internal
dependencies. Internal customers just link in an additional library to get TPU
support, but it still uses the same core codebase that is available in the
opensource world. I know this because I wrote a lot of the device modularity
and TPU binding code :)

~~~
EvgeniyZh
I see, maybe I was a bit exaggerating.

I'd really like to talk more about birth Tensorflow and TPU, but unfortunately
I won't come to OSCON. May be some other time :)

------
linkmotif
This is a great post!! Thank you.

