
What is the best and right way to open-source packages from a company monorepo? - vaughan
There are a few tools to split commits from sub-dirs to a branch which you can then push to a public repo&#x2F;monorepo.<p>E.g. `git subtree`, https:&#x2F;&#x2F;github.com&#x2F;facebook&#x2F;fbshipit, https:&#x2F;&#x2F;github.com&#x2F;splitsh&#x2F;lite, https:&#x2F;&#x2F;github.com&#x2F;ingydotnet&#x2F;git-subrepo.<p>A lot of these approaches though rely on the source-of-truth being the internal company monorepo. PRs are synced internally, merged, and then pushed out. It means that someone outside the organization cannot be a maintainer, and the speed of PR merges is dictated by the available resources inside the company. So I&#x27;d argue this is not the right OSS way of doing things.<p>Even if there are two public monorepos out in the open you can have similar problems trying to collaborate, because to modify one line of a package, you may need to pull a huge monorepo and its tooling down.<p>Does anyone have a solution or an example of an OSS-friendly approach to monorepo open-sourcing?
======
raziel2p
why does the package you want to split out need to remain part of the
monorepo? in my mind, if you open-source packages, you should treat them just
as you would external packages that aren't maintained by you - either install
them with your langauge's package manager, use git submodules, or add init
scripts that git clone them into the correct path (git-subrepo seems like it's
facilitating this).

> Even if there are two public monorepos out in the open you can have similar
> problems trying to collaborate, because to modify one line of a package, you
> may need to pull a huge monorepo and its tooling down.

in my experience this has never really been a problem.

~~~
compsciphd
what he said. The question is what are you trying to accomplish. are you just
trying to make code dumps (i.e. no real development is done on the OSS side,
just you'll dump code there that others can use) or is development supposed to
take place on the OSS side.

if the former, its not really a Q, just figure out how to dump something that
can be built independently of your mono repo and provide dumps.

If the latter, separate it out from your mono repo and treat it as any other
external dependency you import it into your mono-repo. this obviously takes
more work managing the new external repo, cutting releases and the like, but
is more valuable to the community at large.

~~~
vaughan
The goal is to allow development to happen on the OSS side. Unless you are a
Google or Facebook with the ability to properly resource OSS, the 1st approach
just leads to abandoned projects.

> this obviously takes more work managing the new external repo, cutting
> releases and the like, but is more valuable to the community at large.

I guess to gist of my question is how to keep the benefits of a monorepo while
allowing a OSS development workflow.

It feels like OSS works best when there is a single Github repo that does not
depend on a ton of other dependencies that change often.

~~~
ssivark
I’m very confused. Open sourcing makes it an independent project (might as
well be external), and that’s _necessary_ for the project to be of use to
anyone else (Imagine one of your dependencies being part of someone else’s
monorepo). In case that’s successful, you can handle it just like any other
library dependency. (How do you do that in your monorepo?)

To elaborate... How can the community participate in the project if it is not
a relatively independent project? If things are closely coupled to one
person’s monorepo, then presumably the code is not usable for another person.
So, for practical purposes, it might as well be just a code dump.

------
baslas
I have never used this tool but Google open-sourced Copybara[1]. If you look
at the commits of their other open-source repositories, it seems this tool is
often used.

[1] [https://github.com/google/copybara](https://github.com/google/copybara)

~~~
vaughan
Thanks, hadn't heard of this one.

Looks a bit heavy though for my liking but maybe they have some good ideas.

> Copybara requires you to choose one of the repositories to be the
> authoritative repository

Looks like it is more angled at mirroring private to public.

~~~
mbrukman
Disclosure: I work at Google, but not on Copybara.

I've seen Copybara used at Google in both directions: for some projects, the
internal repo is the authoritative one, and for others, the external repo is
authoritative.

Copybara is not prescriptive, you can go in either direction.

------
quicklime
It seems like the HN consensus from the existing comments is to make the
source of truth the public repo, and import that repo into the monorepo build
somehow. This works in a lot of cases, but it does come with some drawbacks.
Basically you will lose a lot of the benefits of a monorepo:

\- You can't make atomic commits across the open source repo and the internal
monorepo.

\- Changes to the open source project won't automatically trigger internal
integration tests.

\- Your coworkers can no longer just run `bazel build` or `bazel test` on your
project anymore, so there's relatively large amount of friction for them
before they can make changes.

I don't think there's a simple answer to this question yet, but a few things
to consider based on my experience with this:

\- If you expect contributions to mostly come from internal developers, then
maybe lean towards keeping it in the monorepo, but if you think contributions
will come from external developers, lean towards an external repo.

\- If it's going to be a mix of both, it's going to be difficult, so make sure
you regularly sync the two repos, otherwise you're going to have to spend a
lot of time resolving conflicts.

\- Think about what build system you're going to use (usually it'll be
something like bazel or buck inside the monorepo, but some people prefer
language-specific ones for open source repos, e.g. cmake, gradle, yarn/npm).
If you decide to use separate build systems internally and externally, make
sure both have CI systems in place that will catch build errors.

~~~
ssivark
How can the community participate in the project if it is not a _relatively
independent_ project? If things are closely coupled to one person’s monorepo,
then presumably the code is not usable for another person. So, for practical
purposes, it might as well be just a code dump.

------
temikus
Google’s internal OpenSource releasing policy is actually public:
[https://opensource.google/](https://opensource.google/)

It’s written from a more process/legal viewpoint but you might be able to pick
up some ideas in there.

~~~
antoncohen
A couple specific quotes that are relevant:

> If you are planning to regularly mirror the source from Google internal
> repos to public ones (or vice versa) _Copybara provides workflows and tools
> for this_.

[https://opensource.google/docs/releasing/preparing/#tools](https://opensource.google/docs/releasing/preparing/#tools)

> Google-owned open source projects must be moved to third_party prior to
> being released under an open source license, even if Google owns 100% of the
> code, because the projects are expected to receive external contributions.

[https://opensource.google/docs/thirdparty/](https://opensource.google/docs/thirdparty/)

------
staktrace
We have this problem at Mozilla. The Firefox codebase is a monorepo and source
of truth for the WebRender project which lives in a subtree. The source of
truth used to be a separate github repo and it was synced into the monorepo
regularly but we flipped that around and now we sync back out to the github
repo. PRs do come in against the github repo and we have a bit of ad-hoc
tooling that imports the PR into our monorepo's code submission flow
(bugzilla+phabricator). It lands in the monorepo assuming it passes review and
tests, and then gets synced back to the github repo.

That being said the tooling we have is not great - there's still some manual
steps in this flow, and it's not ideal.

For the webgpu project that's in a similar situation but gets a lot more
external contributions, we will eventually try some sort of two-way sync.

So sorry I don't have a good solution for you but you're not alone with
respect to this problem :)

~~~
vaughan
Thanks for the insight. Good to know I'm not alone.

It feels orders of magnitude easier to make the internal repo the source of
truth when developing and a huge sacrifice to productivity to split something
out.

If there was a good solution I think it could really help allowing more open
source company sponsored projects to get out there in the wild.

------
cuspycode
I've had good results from using `git filter-branch --index-filter` to keep
everything related to the exported package and remove everything else. This
keeps commit history in such a way that `git log --follow` still shows older
commits in the new repo, which is nice. I could never figure out how to make
that work with `git subtree`.

------
zackbrown
`git subtree split` will extract a subdirectory and its (exclusive) history to
a new repo.

Publish that repo and maintain it separately.

------
wallstprog
We have the same issue, except in _both_ directions -- we use open-source
projects as part of our software, and have also open-sourced a component that
we developed, so we need to be able to go both ways.

The following seems to be working reasonably well so far:

\- Each component has its own internal repo (in our case, we happen to use
BitBucket). This repo contains any bits that are either proprietary and/or
specific to our environment, like build scripts, integration with internal CI,
etc.

\- The open-source part of the component is hosted in an external repo on
GitHub.

\- The internal repo includes the contents of the external repo using git
submodules ([https://git-scm.com/docs/gitsubmodules](https://git-
scm.com/docs/gitsubmodules)).

We use a convention re: branch names:

\- For projects that we don't own, "staging" is where our changes go before
being submitted upstream as PR's.

\- For projects that we do own, "staging" represents the current development
"HEAD" \-- our internal "master" branch includes the "staging" branch from the
external repo.

\- In either case, once a change has been approved for production use, it is
committed to a release branch that is used to drive production builds.

Git submodules have a couple of features that make this easier:

\- Each branch in the internal repo has its own .gitmodules file, which
identifies which branch to fetch from the external repo. The .gitmodules file
is an ordinary text file, and is managed just like any other file in the repo.

\- Updating the submodule code in the internal repo is done by recording the
hash of the specific commit from the external repo. This lets us control
precisely which code from the external repo is included in builds (either
development or production builds), and also provides an audit trail.

We used subtrees initially, and if there's not much traffic back and forth
between the internal and external repos that can work, but it breaks down
quickly as the repo becomes more active.

~~~
vaughan
The downside of submodules is you lose the ability to easily branch across all
packages/repos and then just commit.

You could branch the other project but then you have to coordinate this
yourself. There is a project called Meta that can help but to me feels like it
can quickly get out of control.

Do you find this a problem?

------
quantummkv
The best approach would to extract the package out into a separate repository
and using that as a regular package in your monorepo. This would solve all
your problems

------
timini
+1 git subrepo

