Maintaining forked dependencies with Git subtrees

Tom
4 min readNov 20, 2019
Photo by Noah Rosenfield on Unsplash

It’s all very simple in the beginning. Your new project is in one private repository and all the dependencies are managed by your favourite package manager. After a while, you publish a library to query your API and make it open source, so it must be in a public repository. For every new feature of your API, you will have to commit in that new repo as well.

Then, a colleague notices there is a bug in a dependency causing your application to crash sometimes. She quickly forks the dependency, patches the bug and sends a pull request upstream but, in the meantime, it will have to pull the dependency from the forked repository.

You get the picture. It quickly becomes complicated to manage and you wish there was something that could help you get stuff done without calling git clone a thousand times.

Maybe you’ve heard of Git submodules and surely it also came with a comment about how to shoot yourself in the foot. Enter Git subtrees.

Contrary to submodules, working with subtrees does not require a developer to know anything about them. Some directory of your repository can be a subtree and you wouldn’t know the difference. Well, that’s because there isn’t really.

Git subtrees are not reified and Git has no concept of “subtrees”. For all purposes, there are just subdirectories sitting in the repository and no difference is made whatsoever. git-subtree is the command to push, pull and merge changes in these subdirectories from or to other repositories.

Use Case 1: Maintaining a local fork

Let’s imagine that some dependency, say api-reader, is not completely suitable for your needs but needs some tweaks that can't be (all) pushed upstream. A fork can be maintained directly within the project.

Let’s first add the upstream repository as a new remote as it makes it a lot easier to manage. We also fetch the commits inside the local repository. The same way there is origin/master, there is now api-reader/master.

git remote add api-reader https://github.com/api-reader/api-reader.git
git fetch api-reader

Commits from this repository can now be pulled into a subdirectory.

git subtree add --prefix=lib/api-reader api-reader master

The obvious first effect this command has is to create a new directory lib/api-reader with the latest version of master from the library. You can now proceed to tweak its code and commit your changes as usual.

If we were to look at git log, we would see something interesting. The commits of the library are now part of the history and a new commit has been added to add the subtree.

This is not always desirable and the option --squash makes the command create a unique commit instead. Note that this option is also available for git subtree merge and git subtree pull.

Now that you have the dependency’s source code inside your repository come two responsibilities: you need to pull changes from upstream and you might want to push some of your own.

Pulling changes from upstream

git fetch api-reader
git subtree pull --prefix=lib/api-reader api-reader master

This command will pull the changes since the last time you ran and added the subtree and create a merge commit on top. Keep in mind that the pulled commits might be older than the latest commit of your code though, so they might not appear directly when you call git log.

Pushing your changes

This one is simply the counterpart of the previous command and doesn’t require much more explanation.

git subtree push --prefix=lib/api-reader api-reader master

In a real-life situation, you probably can’t push to the remote api-reader though. In that case, head over to Github and fork the repository. Then you can add your fork as another remote, say api-reader-fork and push your changes to it. You can now create your pull request.

Use Case 2: Extracting a subproject

Extracting some part of your repository into its own is something you might want to do to open source a library for instance. It’s a lot simpler than you may think. If you read the previous use case, you already know the answer. There is indeed no difference between this use case and pushing local changes to another repository.

git remote add awesome-forms git@github.com:corpinc/awesome-forms.git
git subtree push --prefix=lib/awesome-forms awesome-forms some-branch

However, there is another concern to think about. Your commit history might need to be cleaned up a little before pushing it. First, create a new repository locally.

mkdir ~/code/awesome-forms
cd ~/code/awesome-forms
git init
git remote add origin git@github.com:corpinc/awesome-forms.git

From your main repository, you can push the subtree to this local directory. This will create a new branch temp with the commits.

git subtree push --prefix=lib/awesome-forms ~/code/awesome-forms temp

Finally, head back to the new local repository and clean things up before pushing. This could look a bit like this:

cd ~/code/awesome-forms
git checkout temp
git rebase --interactive --root
git push origin master

This was just a quick overview of what’s possible with Git subtrees. You can find more information in the links below.

https://manpages.debian.org/jessie/git-man/git-subtree.1.en.html

https://git-memo.readthedocs.io/en/latest/subtree.html

https://www.atlassian.com/git/tutorials/git-subtree

If you like stories like this one, consider a membership to Medium and/or subscribe.

--

--