Update 2008-05-21: Tim Dysinger and Pat Maddox pointed out that git submodules are inherently not well-suited for frequently updated projects. Read the comments for more details, and please use submodules with caution on projects where you can't guarantee a shared repository has not changed between 'pull' and 'push' operation.
Today I'm releasing git.rake into the wild under an open-source license. It's a rakefile for managing multiple git submodules in a shared-server development environment.
We've been using it internally at my company, Pharos Enterprise Intelligence, for the last 5 months and it's been a huge timesaver for us. Read below for a detailed description of the features and its use.
The code is being released under the MIT license and the git repository is being hosted on github. Take a look:
What git.rake Is
A set of rake tasks that will:
Keep your superproject in synch with multiple submodules, and vice versa. This includes branching, merging, pushing and pulling to/from a shared server, and committing. (Biff!)
Keep a description of all changes made to submodules in the commit log of the superproject. (Bam!)
Display the status of each submodule and the superproject in an easily-scannable representation, suppressing what you don't want or need to see. (Pow!)
Execute arbitrary commands in each repository (submodule and superproject), terminating execution if something fails. (Whamm!)
Configure a rails project for use with git. (Although, you've seen that elsewhere and are justifiably unimpressed.)
If you're not sure how to add a submodule to your repo, or you're not sure what a submodule is, take a quick trip over to the Git Submodule Tutorial, and then come back. In fact, even if you ARE familiar with submodules, it's probably worth reviewing.
The Problem We're Trying to Solve Here
Let's start with stating our basic assumptions:
- you're using a shared repository (like github)
- you're actively developing in one or more submodules
This model of development can get very tedious very quickly if you don't have the right tools, because everytime you decide to "checkpoint" and commit your code (either locally or up to the shared server), you have to:
- iterate through your submodules, doing things like:
- making sure you're on the right branch,
- making sure you've pulled changes down from the server,
- making sure that you've committed your changes,
- and pushed all your commits
- and then making sure that your superproject's references to the submodules have also been committed and pushed.
If you do this a few times, you'll see that it's tedious and error-prone. You could mistakenly push a version of the superproject that refers to a local commit of a submodule. When people try to pull that down from the server, all hell will break loose because that commit won't exist for them.
Ugh! This is monkey work. Let's automate it.
OK, fixing this issue sounds easy. All we have to do is:
- develop some primitives for iterating over the submodules (and optionally the superproject),
- and then throw some actual functionality on top for sanity checking, pulling, pushing and committing.
git-rake presents a set of tasks for dealing with the submodules:
git:sub:commit # git commit for submodules git:sub:diff # git diff for submodules git:sub:for_each # Execute a command in the root directory of each submodule.\ Requires CMD='command' environment variable. git:sub:pull # git pull for submodules git:sub:push # git push for submodules git:sub:status # git status for submodules
And the corresponding tasks that run for the submodules PLUS the superproject:
git:commit # git commit for superproject and submodules git:diff # git diff for superproject and submodules git:for_each # Run command in all submodules and superproject. \ Requires CMD='command' environment variable. git:pull # git pull for superproject and submodules git:push # git push for superproject and submodules git:status # git status for superproject and submodules
It's worth noting here that most of these tasks do pretty much just what they advertise, in some cases less, and certainly nothing more (well, maybe a sanity check or two, but no destructive actions).
The exception is
git:commit, which depends on
git:update, and that has
some pixie dust in it. More on this below.
Leaving only the following specialty tasks to be explained:
git:configure # Configure Rails for git git:update # Update superproject with current submodules
The first is simple: configuration of a rails project for use with git.
git:update, does two powerful things:
(Only if on branch 'master') Submodules are pushed to the shared server. This guarantees that the superproject will not have any references to local-only submodule commits.
For each submodule, retrieve the git-log for all uncommitted (in the superproject) revisions, and jam them into a superproject commit message.
Here's an example of such a superproject commit message:
commit 17272d53c298bd6a8ccee6528e0bc0d62104c268 Author: Mike Dalessio <email@example.com> Date: Mon May 5 20:48:13 2008 -0400 updating to latest vendor/plugins/pharos_library > commit f4dbbce6177de4b561aa8388f3fa9f7bf015fa0b > Author: Mike Dalessio <firstname.lastname@example.org> > Date: Mon May 5 20:47:46 2008 -0400 > > git:for_each now exits if any of the subcommands fails. > > commit 6f15dee8c52ced20c98eef63b3f3fd1c29d91bbf > Author: Mike Dalessio <email@example.com> > Date: Fri May 2 13:58:17 2008 -0400 > > think i've got the tempfile handling correct now. awkward, but right. >
Excellent! Not only did
git:update automatically generate a useful log
message for me (indicating that we're updating to the latest submodule
version), but it's also embedding original commit logs for all the
changes included in that commit! That makes it much easier to find a
specific submodule commit in the superproject commit log.
A Note on Branching and Merging
Note that there are no tasks for handling branching and merging. This is intentional! It could be very dangerous to try to read your mind about actions on branches, and frankly, I'm just not up to it today.
For example, let's say I invented a task to copy the current branch
master to a new branch
foo (the equivalent of
git checkout -b foo
master) in all submodules, but one of the submodules already has a
Do we reduce this action to a simple
git checkout foo for that
submodule? That could yield unexpected results if we a) forgot we had
a branch named
foo and b) that branch is very different from the
master we expected to copy.
Well, then -- we can delete (or rename) the existing
foo branch and
follow that up by copying
foo. But then we're silently
renaming (or deleting) branches that a) could be upstream on the
shared server or b) we intended to keep around, but forgot to
In any case, my point is that it can get complicated, and so I'm
punting. If you want to copy branches or do simple checkouts, you
should use the
Everyday Use of git:rake
In my day job, I've taken the vendor-everything approach and refactored lots of common code (across clients) into plugins, which are each a git submodule. My current project has 14 submodules, of which I am actively coding in probably 5 to 7 at any one time. (Plenty of motivation for creating git:rake right there.)
Let's say I've hacked for an hour or two and am ready to commit to my local repository. Let's first take a look at what's changed:
$ rake git:status All repositories are on branch 'master' /home/mike/git-repos/demo1/vendor/plugins/core: master, changes need to be committed # modified: app/models/user_mailer.rb # public/images/mail_alert.png (may need to be 'git add'ed) WARNING: vendor/plugins/core needs to be pushed to remote origin /home/mike/git-repos/demo1/vendor/plugins/pharos_library: master, changes need to be committed # deleted: tasks/rake/git.rake
You'll notice first of all that, despite having 14 submodules, I'm only seeing output for the ones that need commits, and even that output is minimal, listing only the specific files and not all the cruft in the original message. It tells me that all submodules are on the same branch. It's smart enough to tell me that a file may need to be git-added. It will even alert me when a repo needs to be pushed to the origin.
I'll have to manually
cd to the submodule and git-add that one
file, but once that's done, I can commit my changes by running:
$ rake git:commit
which will run
git commit -a -v for each submodule, fire up the
editor for commit messages along the way, push each submodule to the
shared server, and then automagically create verbose commit logs for
To pull changes from the shared server:
$ rake git:pull
When you run this command, you'll notice that the output is filtered, so if no changes were pulled, you'll see no output. Silence is golden.
$ rake git:push
Not only will this be silent if there's nothing to push, but the rake task is smart enough to not even attempt to push to the server if master is no different from origin/master. So it's silent and fast.
Let's say I want to copy the current branch,
master, to a new
$ rake git:for_each CMD='git checkout -b working master'
If the command fails for any submodules, the rake task will terminate immediately.
Merging changes from 'working' back into 'master' for every submodule (and the superproject)?
$ rake git:for_each CMD='git checkout master' $ rake git:for_each CMD='git merge working'
What git.rake Doesn't Do
A couple of things that come quickly to mind that git.rake should probably do:
Push to the shared server for ANY branch that we're tracking from a remote branch.
Be more intelligent about when we push to the server. Right now, the code pushes submodules to the shared server every time we want to commit the superproject. We might be able to get away with only pushing the submodules when we push the superproject.
Parsing the output from various 'git' commands is prone to breakage if the git crew starts modifying some of the strings.
There should probably be some unit/functional tests. See previous item.
Anyway, the code is all up on github. Go hack it, and send back patches!