Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Vdm, a sane alternative to e.g. Git submodules (github.com/opensourcecorp)
100 points by ryapric 41 days ago | hide | past | favorite | 60 comments
Hey folks! I've been working on something on & (mostly) off for a little over a year, and picked it back up recently because of yet another related frustration at work.

I've spent a lot of time ripping out git submodules from repos my teams use, but I've spent an equally large amount of time wondering why there doesn't seem to be a better option for managing arbitrary dependencies across repos in the Year of Our Lord 2024. So, I put together a really early version of such an arbitrary-dependency manager. It's called vdm, and you can find it in the linked URL above & below.

I'm sharing mostly because I'm curious if I'm blatantly missing some other tool that exists that isn't language-specific (like Bit for JS seems to be, for example), but also in case people have any hot-takes or feedback on the functionality as listed in the README.

Also of note is that I'm not sharing to potentially monetize or "generate customer interest" or anything -- I'm just another builder on the internet.

Thanks for looking, and let me know if you have any questions!

vdm: https://github.com/opensourcecorp/vdm




Nice!!

If you're looking for alternatives, here's something we've built (hope I'm not hijacking this): https://github.com/audiotool/pasta

It's called "pasta" for copy pasta. It was built with exactly the same motivation aa yours, also has a yaml config file, and is also implemented in go, kinda interesting. If yours takes off and we can drop ours, that'd be awesome!

For some feedback in features we have which we thinkg we'd be missing:

- we have the ability to copy individual files and specific subdirectories of other repos, not the entire repos

- mechanics to "clear" the target directory, in case a file gets deleted upstream, to keep the directories in sync

- we've modelled it with a plugin API, so you can implement new "copiers" for bitbucket, google drive, subversion, ...

- the github plugin we have uses the Github API for better performance, and you can add auth by setting an env var GITHUB_TOKEN

We also create a "result" file of every copy, noting the exact commit that was copied, which might or might not be a useful... Were thinking of posting it here at some point but never got around to it. Again, if yours takes off, that'd be the best option :)

We're using it mostly to copy .proto definitions from one repo to another.


Probably unpopular opinion: git submodules are just fine. They're "just" lacking a consistent UI. They have improved over the years, but the default config sucks because the defaults emulate the original, awful, UX. With proper configuration, it's much better, although there are still pain points (like rebase conflicts in non-submodule parts messing things up if you don't git submodule update)


Can we see your `git config -l`? I sparingly use git submodules, and don't really suffer from any of the common issues as I have a very strict update routine, but I'd love to see where things could be improved.


I'm on mobile, so I don't have that, but search for submodule in the git-config manual page.


If they don't have a consistent UI and emulate the original awful UX, then in what aspect are they nice?

They have a ton of problems in my experience, a few off the top of my head:

- They force the specific repo url, e.g. ssh github even if you prefer to clone by http.

- Pulling from remote becomes difficult when submodules change, e.g. when a submodule is merged into main repo and becomes a proper subdir.

- git commands such as `git checkout -- .` don't work properly on them and I don't see how configs could change that.


For your first point, does ../../user/repo.git not work? I have a self hosted GitLab and that’s how I’ve specified all my submodules and it survived a top-level url change (with a new clone or changing the origin)


> Probably unpopular opinion: git submodules are just fine. They're "just" lacking a consistent UI.

I second the sentiment. Git submodules work just fine. The UX could use some work. It baffles me why bolting on convoluted tools is considered a preferable alternative.


What I don't like about submodules is that they are centralised, you can't just easily migrate to another server without having them still point to the old one, the urls are version controlled. I since then moved to packages.


If it just clones the repos and removes the .git directories, then I assume it doesn't keep their commit history? So if you use e.g. `git blame` or `git log` to look at file history, you will see when changes were introduced to the parent repo, but not when/why those changes were made in the first place.

In that respect, it resembles git-subtree with --squash, but differs from git-submodule or regular git-subtree.


Yep, you have it correct. I've got a note at the bottom of the README that I'm considering adding a config field to keep the .git directory, but I'm trying to keep pretty far away from git-in-git consequences/use cases. I said the same in another comment here, but I don't envision vdm becoming something that's git-specific or developmental -- it's really just intended to be a getter, not a writer, and the functionality reflects that.

Cool info though, thanks for sharing!


> I've got a note at the bottom of the README that I'm considering adding a config field to keep the .git directory, but I'm trying to keep pretty far away from git-in-git consequences/use cases.

Maybe a better approach would be to rename the `.git/` to (for example) `.vdm` in each submodule? Each `vdm` command would first rename it back to `.git`, execute the git commands needed, and then rename it back to `.vdm/`.

This gives you the ability to implement `vdm history` or similar command while still keeping the submodule invisible to the parent?


Git would then version that .vdm/ directory - the repo would grow exponentially.


The vdm submodule directories would probably have to be entered in the gitignore anyway. You wouldn't want to commit the submodule source files either. Won't the .git directory get excluded that way?


you'd want vdm to be a superset of git commands and then alias git=vdm, or something to avoid that


No, I mean that the larger Git repo, of the main project, would version the .vdm directory.


To me, the biggest indicator that all the links being posted here about Git submodule systems come from people who don't know what they're doing is the fact that all of them (vdm, pasta, peru, git-aggregator, etc.) are using YAML as a config. Anyone who has worked at least a few years with Git and YAML knows that this type of file is not Git/diff friendly. I've seen too many disastrous merges, and the developers in the company have to keep using unityyamlmerge to resolve a foolish decision by Unity. Moreover, if anyone here has tried to parse YAML, they understand how unnecessary it is to use this format 99% of the time. In your case, the only advice I can give is to use a complete repo config per line, so it doesn't spread across different lines. This ensures the atomicity and integrity of your information.


I never thought of that before, but it's a good point.


What config format do you recommend?


For this specific scenario, a simple custom format, one line per config, properties separated by commas and key from value by equals. In C# you can populate a dictionary with readallines/linq in one line of code. Need hierarchy/tree, use dots at the property name. Kiss


If you are looking for something very light and efficient, let me suggest you to give a try to:

https://github.com/fviard/svn_xternals

Despite the README saying that it is a work in progress, the tool is functional for a few years already. Also, again despite the name, it works with GIT.

The idea is to be able to use the concept of "externals" from SVN transparently with svn or GIT. It does something similar to what Google "gclient" was doing but in a more efficient way (ie a lot faster and consuming a lot less resources).

To use it, you just need to create a file ("externals.conf" in your project for example), in a format like that:

externals.conf

   git@github.com:user/myproject_core.git                   myproject/core
   git@github.com:user/myproject_plugins_onething.git       myproject/plugins/onething
   git@github.com:anotheruser/another_thing.git@mybranch    myproject/plugins/another_thing
   git@github.com:corpuser/random_library.git@release-tag-123           myproject/vendor/random_library
Then, you can simply run: python3 externalsup.py

And it will take care to do automatically the git clone, or pull, or "switch" if you change a branch/tag indicator in the externals file.

Like that, you can easily commit a externals.conf file in a root project folder, and individually manage the version of sub-components that can be hosted anywhere.

The "externals.conf" file is a plain text file so easily to read and diff to compare different versions of your project.


Git Subrepo is another alternative to submodules and subtree.

> This git command clones an external git repo into a subdirectory of your repo. Later on, upstream changes can be pulled in, and local changes can be pushed back. Simple.

https://github.com/ingydotnet/git-subrepo

After trying many similar solutions, it gets the closest to what I want to achieve, which is nested Git repositories. A project with subprojects, each of which can be an independent Git repo with its own history to push/pull, while the project itself has the entire codebase and history.

It's written in Bash, so fairly portable.

---

Edit: After skimming through the project vdm, I see the problems it aims to solve are different from what git-subrepo does. The latter is more about monorepos. Ah well, that's what I get for commenting before reading the post.

vdm does look useful for managing a project with external dependencies, which are Git repos owned by others or oneself. Maybe like a language-agnostic package manager.


I made a full dependency manager called Degasolv[1] capable of managing arbitrary code in zip files some years back. I wrote it in Clojure. It has features for hosting zip repositories, version comparison, transitive dependency resolution, the whole nine yards.

I poured my heart and soul into it[2] but it wasn't very popular. I guess there's not much need for a dependency manager that's not tailored to the needs of a particular community, like a platform or language.

1: https://github.com/djhaskin987/degasolv

2: https://degasolv.readthedocs.io


Looks cool! Seems functionally similar to AOSP’s git-repo, but already feels more approachable with that simple yaml remote list.

What collaborative tool would you recommend using with vdm? AOSP has gerrit which is sort of specifically designed for this multi-remote meta setup. GitHub/GitLab don’t play nice with this type of environment.


This tool looks like "submodules, but lighter", while repo is "submodules, but heavier". Looks to me like the motivation is for dependencies that are not hard enough to justify a submodule.

Repo seriously sucks to use, but I also can't imagine many tools living up to AOSP-type workloads without being specifically designed for it. My gripe with repo is that it's really hard to pin the entire repo state if you have a bunch of prototype patches across multiple subrepos. I usually end up having to modify the XML directly.


>This tool looks like "submodules, but lighter", while repo is "submodules, but heavier". Looks to me like the motivation is for dependencies that are not hard enough to justify a submodule.

I think you nailed how I was feeling in much fewer words!


Thanks! That AOSP `repo` tool is one I'd not heard of, so thanks for sharing!

I actually haven't really put much thought into collaborative/mutlirepo development work using vdm -- the original intent was for it to strictly be a retriever of what the user specifies. I think the majority of both my frustration and complexity of other tools is because they're trying to solve for a lot more than at least I personally usually want to use them for. It's like, I just want a `pip install/requirements.txt/go.{mod,sum}` for any kind of tool, not just the language that takes up the majority of my codebase.

One of the thoughts I had, though, was to maybe add a field for each remote section called e.g. `try_local` that takes a filesystem path, and if that doesn't exist then fetch the remote. That way, your development workflow is exactly the same as usual -- developing a dependency is just doing work in the actual repo directory nearby. I'm not married to the idea though. I just REALLY don't want to have it be in the business of managing git-repos-in-git-repos, because vdm isn't really intended to be a Git tool, if that makes sense.


I've have great experiences with https://github.com/ingydotnet/git-subrepo


I think you’re going to find that, out there, somebody has already built this. I’ve built one, and worked on two others that somebody built. Usually they have names like workspace manager or repo manager or whatever. Most will probably have something to build a dag and code to do a topological sort for the recursive projects. The better ones will use the topological sort to pull repos and build in parallel.

In addition, other tools can also do this to varying degrees of success, like Bazel and cmake.


What problems are there with git submodules and how does this solve them? The readme isn't forthcoming in this respect.


I previously saw https://github.com/buildinspace/peru in use. Seems somewhat similar.


Nice! As an alternative backup tool, you can look at GitProtect Backup & Disaster Recovery for GitHub, Bitbucket, and GitLab. It allows you to pick up the storage (Cloud/local or both), automate backups by scheduling them at the most appropriate time, avoiding throttling, and restore data immediately from any point in time in case of failure, and many other features that meet pain points.


In our case we do not use submodules, because we need to apply some patch or PR to the dependency.

To solve it we use git-aggregator (I am not the autor) (language agnostic too). It seem to have the same features as VDM + some extra one (possiblity to have a frozen file, possibly to apply patch/pr...)

Source : https://github.com/acsone/git-aggregator


I quite like https://github.com/ingydotnet/git-subrepo

This allows you to treat common code in a repo as just a normal part of the repo. However, the common code is also in a repo of its own. This tool then allows you to push / merge your changes back to the common repo.

Check the git page for a list of the benefits.


I think submodules make sense in a lot of use cases, but a gotcha I saw with a team introduced to them recently is that pulling down from a branch or switching branches doesn’t update the submodule and/or stop you from changing branches if it is modified without being committed in some way.

If I could have submodules that operated that way I think submodules would be a lot more straightforward to newcomers.



Note that this won't solve all cases. For example, you still have to watch out when merging branches with different submodule commit hashes that you run submodule update while merging.


Yup, submodules are actually ok. Like with most git issues, it's more of a tooling UX problem then an architecture deficiency.


How does this compare to git-subrepo?


And Git subtree and GIL?

https://github.com/chronoxor/gil


Does it do anything to help manage the .gitignore file(s)? Otherwise I'd think you have to specify the dependency in both places consistently, which sounds a bit tedious.


For projects where I can't trust that the people involved can deal with submodule bullshit correctly I just use these git aliases:

    box = !cd ${GIT_PREFIX:-.} && git config --get remote.origin.url > .gitboxinfo && git rev-parse --abbrev-ref HEAD >> .gitboxinfo && git rev-parse HEAD >> .gitboxinfo && mv .git .gitbox && git add -f .gitboxinfo && true
    unbox = !cd ${GIT_PREFIX:-.} && mv .gitbox .git && true
Then I add the .gitbox folder to gitignore. Whenever I need to interact with the "submodule" repo I unbox, otherwise I leave it boxed and as far as everyone else in the project is concerned, the dependency was just copied n pasted in the project.

If you ever need to regenerate the gitbox folder from scratch you can take a peek at the gitboxinfo file and git clone and reset the dependency repo in a temp folder, then move the git folder next to the gitboxinfo file.

Plus unlike submodules with this you can have local changes to the submodule files without having to fork the submodule itself.


This sounds like git-subtree, which has been part of git for a quite a few years now.

https://www.atlassian.com/git/tutorials/git-subtree


For a Python project, what are the pros/cons of

1: A setup.py that installs dependencies like this:

    pip install git+https://github.com/dependency/repo
2: Git submodules

?


3. copy everything into vendor/lib folder.

version pinning, no extra install needed, works offline, zero deps headaches.

Example: requests.packages.*


Do you mean a completely manual workflow where you copy dependencies into the vendor dir by hand and then they are part of your project? If so, you back them up with your project backups and they also go into your repos history?

Otherwise, I would be interested, how you "copy" a git repo that goes into your vendor dir. Where you put the list of repos that need to get copied. Which command you run to copy them all. How you handle it if they have sub-dependencies and how those get installed in your workflow.


> Do you mean a completely manual workflow

Coding is a manual process anyway, no? It's no different than writing code on your local machine and you decides to use some third-party modules.


I like to wrap it in a venv (pure python project) or nix flake (mixed languages)


That seems to be about isolation, not about dependency management, right? I use Docker containers for that.

But my question was about dependency management.


Not so much of a hot take as some confusion: what are the pain points of Git submodules that lead to this tool? You imply they're 'not sane', worse but don't mention any of the deficiencies that your tool overcomes.


The project looks interesting.

Regarding the name, I’m French, and VDM basically means FML in French.


nice! I've been using jsonnet-bundler for this, even for non-jsonnet projects.


Another solution that "nix" solved years ago.


This seems to be almost the same as androids repo tool. https://android.googlesource.com/tools/repo

Personally I don't see the difference between this and submodules. Repo stores the information in xml files, vdm stores it in yaml files and git submodules in the git database. I don't really care.

The real headache for me is the trouble of traceability vs ease of use. You need to specify your dependencies with a sha1 to have traceable SLSA compliant builds, but that also means that you'll need to update all superrepos once a submodule is updated. Gerrit has support for this, but it's not atomic, and what about CI? What about CI that fails?


>I don’t really care

I care about the aesthetics and the convenience that the tool provides. git-repo at least has a simple command to get all the latest stuff (repo sync). Git submodules is a mess in this regard. Just look at this stack overflow thread:

https://stackoverflow.com/questions/1030169/pull-latest-chan...

People are confused at how to do THE most basic command that you’d have to do every single day with a multi-repo environment. There’s debating in the comments about what flags you should actually use. No thanks.

There’s a lot of room for improvement in this space. git-repo isn’t widely used outside of aosp. Lots of organizations are struggling with proper tooling for this type of setup.


You update your submodules every day?

Also, the discussions are there because it's been more than a decade and the options have evolved over time.

Submodules are a bit clunky but the problem it solves is itself clunky. Bringing in another tool doesn't really feel like its going to reduce the burden.

I have yet to be in a situation where I blindly want to update all submodules. It is a conscious action, X has updated and I want to bring that change(s) in.

cd submodule, update, test, commit.

I haven't seen anything in this thread that really motivates me to learn another bespoke tool just for this. I'm sure it varies for different projects though.

Fast forward 15 years and see how the tooling this thread has been evolved and how many different tools people will have used and compare that to the stackoverflow post. I'm more inclined to invest time in git itself.


> I'm more inclined to invest time in git itself.

This is fine until you're working with hundreds of other developers. I believe the reason solutions like this exist is to abstract git away from most devs, because in (my experience) many enterprise devs have only rudimentary git knowledge.

Sure, the devs should "just learn git" - but the same argument applies to a lot of other tech nowadays. Ultimately most folks seem to want to close their ticket off and move to the next one.

Git submodules and git subtrees generally do not fit my org's needs - we have internal tooling similar to this. Happy to expand on that if you have questions.


The risk with that approach is that every other of the hundreds of developers will bring their own tool for X. So now you have hundreds of tools and everyone only knows a subset.

If there is a common operation that people get wrong or don't use often enough but still need to run regularly a five-line bash script will not only do the job it will actively help them learn the tool they are using.


I want to update 20+ submodules every day, ensuring i'm always at the tip of all submodules.


    git submodule foreach ………
Or is there something I missed?


[flagged]


Sure, but not all my code in a single repo is monolingual code, and not all of those languages have cooperative (or existing) package managers. Bash is a great example of something I wish I could stop copy-pasting between repos, and was actually the original motivator for vdm (along with protobuf files, for the same reasons).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: