Git Blame: June 2013

Friday, June 28, 2013

Git 1.8.3.2

The second maintenance release for 1.8.3.x series is now available at the usual places. It contains the following fixes that have already been applied to the 'master' branch for 1.8.4.

Cloning with "git clone --depth N" while fetch.fsckobjects (or transfer.fsckobjects) is set to true did not tell the cut-off points of the shallow history to the process that validates the objects and the history received, causing the validation to fail.
"git checkout foo" DWIMs the intended "upstream" and turns it into "git checkout -t -b foo remotes/origin/foo". This codepath has been updated to correctly take existing remote definitions into account.
"git fetch" into a shallow repository from a repository that does not know about the shallow boundary commits (e.g. a different fork from the repository the current shallow repository was cloned from) did not work correctly.
"git subtree" (in contrib/) had one codepath with loose error checks to lose data at the remote side.
"git log --ancestry-path A...B" did not work as expected, as it did not pay attention to the fact that the merge base between A and B was the bottom of the range being specified.
"git diff -c -p" was not showing a deleted line from a hunk when another hunk immediately begins where the earlier one ends.
"git merge @{-1}~22" was rewritten to "git merge frotz@{1}~22" incorrectly when your previous branch was "frotz" (it should be rewritten to "git merge frotz~22" instead).
"git commit --allow-empty-message -m ''" should not start an editor.
"git push --[no-]verify" was not documented.
An entry for "file://" scheme in the enumeration of URL types Git can take in the HTML documentation was made into a clickable link by mistake.
zsh prompt script that borrowed from bash prompt script did not work due to slight differences in array variable notation between these two shells.
The bash prompt code (in contrib/) displayed the name of the branch being rebased when "rebase -i/-m/-p" modes are in use, but not the plain vanilla "rebase".
"git push $there HEAD:branch" did not resolve HEAD early enough, so it was easy to flip it around while push is still going on and push out a branch that the user did not originally intended when the command was started.
"difftool --dir-diff" did not copy back changes made by the end-user in the diff tool backend to the working tree in some cases.

Friday, June 21, 2013

Fun with various workflows (2)

As I discussed in a separate post, even though Git is a distributed SCM, it supports the centralized workflow well, to help people migrating from traditional SCM systems. But of course, Git serves the distributed workflow well. The one that is used in the Linux kernel development, where you work based on Linus's or a subsystem maintainer's repository, and publish to your own repository to get it pulled by others (including Linus, if your work is very good).

You would first start by cloning from your upstream:

$ git clone git://git.kernel.org/.../git/torvalds/linux.git

The only difference from the initial step in the centralized workflow is,... nothing. You will get a "linux" directory that becomes your working area, where you will have the standard configuration, perhaps not very different from this:

[remote "origin"]
url = git://git.kernel.org/.../torvalds/linux.git
fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
remote = origin
merge = refs/heads/master

And your "master" branch, which was copied from the "master" branch of Linus's repository, is ready for you to build your work on it.

The only difference is that you would not "git push" back to Linus's repository. The "git://" protocol will not usually let you push, and even if it did, Linus would not let you write into his repository.

After working on your changes on "master", the way you would push out what you did is to say something like this:

$ git push git@github.com:me/linux.git master

This might get cumbersome to type every time, so you would add another remote, perhaps like this:

[remote "me"]
url = git@github.com:me/linux.git

By defining a short-hand for that URL, you can now say:

$ git push me master

and push out the work you did on your master branch as the master branch of your public repository, so that other people can pull from it.

If you worked on a topic that was forked from Linus's master to enhance a specific feature or fix a specific bug, you may want to say:

$ git checkout -b fix-tty-bug origin/master

... work work work ...

$ git push me fix-tty-bug

to publish the result in your public repository as a branch.

By the way, do you recall the reason why upstream mode was appropriate when using the centralized workflow from the previous post?

While the purpose of the Linus's master branch is to advance the overall state of the Linux kernel to prepare for the next release, the purpose of your topic branch fix-tty-bug is a lot narrower. And you are usually not integrating the work other people did into your work before you push it out. Indeed, you are encouraged to pick one stable point in the official (i.e. Linus's) history, and build on top of it without rebasing or merging things unrelated to what you are trying to achieve yourself.

Unlike in the centralized workflow where you tentatively play the role of integrator and change the purpose of your topic branch into "advance the overall project status" (which is compatible with the purpose of the "master" branch you will be updating with your work in the centralized workflow) immediately before you push it out, the purpose of your topic branch will stay to be the same as the original purpose of the topic until and after you push it out, when you are working with the distributed workflow.

If you started your topic branch, fix-tty-bug, to fix a bug in the tty subsystem and named it after the purpose of the topic branch, it can and should keep the name in your public repository. There is no reason to publish the result as your master branch. You control the branch names in your public repository, and pushing it out as master will only lose information. The branch name fix-tty-bug told what the branch was about. The name master sounds as if you are trying to make everything better, but that is not what you did.

So in general, you would be pushing out your topic branches to your public repository under the same name. You can use the 'current' mode when push your work out, like this:

$ git config push.default current

And then, you can lose that branch name from the command line when you push your work out:

$ git push me

You run the above command while you have your fix-tty-bug branch checked out, and the current branch is pushed out to the destination repository (i.e. me) to update the branch of the same name.

Recently, we added a mechanism to help those who are too lazy to even type "me", i.e. it let you say:

$ git push

To use this, you configure what remote you push to when you do not say from the command line, with a configuration variable, like this:

$ git config remote.pushdefault me

This feature is available in Git 1.8.3 and later.

Thursday, June 20, 2013

Fun with various workflows (1)

Even though Git is distributed, you can still use it for projects that employ the centralized workflow, where there is a single central shared repository. Everybody pulls from it to obtain everybody else's work, and after integrating his own work with others' work, everybody pushes into it so that everybody else can enjoy the fruit of his work.

In the simplest workflow, you can start by cloning from the central repository:
$ git clone our.site.xz:/pub/repo/project.git myproject
and the myproject directory becomes your working area, where you will have the standard configuration, perhaps not very different from this:
[remote "origin"]
url = our.site.xz:/pub/repo/project.git
fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
remote = origin
merge = refs/heads/master
and your "master" branch, which was copied from the "master" branch of the central shared repository, is ready for you to build your work on it.

If you run "git pull --rebase" (without any other argument), the configuration above left for you by "git clone" will tell Git that you would want to obtain the latest work from the central shared repository, and you would want to rebase your own work on top of their master branch.

If you say "git push" (without any other argument), the current default mode of pushing is to look at your local branches, and look at the branches the repository you are pushing to has, and update the matching branches. In this "simplest" case, you only have the 'master' branch, and the central repository does have its 'master' branch, so you will update its 'master' branch with the work you did on your 'master' branch.

In Git 2.0, this default mode will change to 'simple', which will push only the current branch to the branch at the central repository you integrate with, but only when they have the same name (so the example of working on 'master' and pushing it back to 'master' will still work).

If your project employs the centralized workflow, after learning Git enough to be comfortable with it, you may want to do
$ git config push.default upstream
to choose to always update the branch at the central repository you integrate with, even if the branch names are different. Note that you can do this (or use 'simple' instead of 'upstream'), and indeed you are encouraged to do so, without waiting for Git 2.0.

That will allow you to work on different things on different branches, e.g.
$ git checkout -b my-feature -t origin/master
$ git push
The first "checkout" will create a new "my-feature" branch, that is set to integrate with the master branch from your central repository. When using the upstream mode, you will push "my-feature" back to update the "master" branch over there.

An interesting thing to notice is that in the centralized workflow, because there is no central project maintainer (aka integrator), everybody is responsible for integrating his own work to advance the mainline of the project. The job of integration is indeed distributed when you use centralized workflow. It is a bit funny when you think about it.

But that is exactly why the upstream mode makes sense. In order to fully appreciate it, you need to realize what it means to have forked the "my-feature" branch out of the "master" branch of the central shared repository.

The purpose of the master branch at the shared central repository is to advance the state of the project in general, but the purpose of your local branch, my-feature, is a lot more specialized. It may be to fix this small bug, or add that neat feature. You would only be working on a small part of the project while on that branch.

But because you are the one who plays the top-level integrator role when you run "git pull --rebase" just before you "git push", when that "git pull --rebase" finishes, the tip of your my-feature branch is no longer about your small fix or neat feature. It temporarily becomes about advancing the state of the overall project. And that is the reason you would "git push" it to update the master branch, not the "my-feature" branch, at the central repository. Of course, if you want to publish it as "my-feature", perhaps because you want to show it to others before really updating the shared master branch, you can explicitly say:
$ git push origin my-feature
Pushing my-feature that was forked from and still integrates with their master is not usually what you want to do every time in the centralized workflow, though. In fact, it often is the case that administrators of a project with centralized workflow flown upon people making random branches at their shared central repository willy-nilly (exactly because the central shared repository is a common resource and a feature branch like "my-branch" is often not of general interest).

Common things require less typing, and uncommon things are possible but you need to explicitly tell Git to do so.

The Git core itself is very much agnostic to what workflow you use, and you can also use it for projects that use "I publish my work to my public repository, others interested in my work can pull my work from there, and there is an integrator who pulls and consolidates good work from others and publishes the aggregated whole" distributed workflow. That will be a topic for a separate post.

Monday, June 10, 2013

Git 1.8.3.1

The first maintenance release 1.8.3.1 is out.

This is primarily to push out fixes to two regressions that seems to have affected many people recently. Sorry about that.

With Git 1.8.3, an entry "!dir" in .gitignore to say "This directory's contents is not ignored, unless other more specific entries tells us otherwise" did not work correctly. This regression has been fixed.
With recent Git since 1.7.12.1 or so, "git daemon", when started by the root user and then switched to an unprivileged user, refused to run when ~root/.gitconfig (and XDG equivalent configuration files under ~root/.config/) cannot be read by the unprivileged user. The right way to start the daemon might be to reset its $HOME (where these configuration files are read from) to somewhere the user the daemon runs as, but it is cumbersome to set up. With 1.8.3.1, failure to access these files with EPERM is treated as if these files do not exist, which is not an error.

The release tarballs are available at the usual places:

https://code.google.com/p/git-core/downloads/list
https://www.kernel.org/pub/software/scm/git/

Checking the current branch programatically

The git branch Porcelain command, when run without any argument, lists the local branches, and shows the current branch prefixed with an asterisk, like this:

$ git branch
* master
next
$ git checkout master^0
$ git branch
* (no branch)
master
next

The second one with (no branch) is shown when you are not on any branch at all. It often is used when you are sightseeing the tree of a tagged version, e.g. after running git checkout v1.8.3 or something like that.

To find out what the current branch is, casual/careless users may have scripted around git branch, which is wrong. We actively discourage against use of any Porcelain command, including git branch, in scripts, because the output from the command is subject to change to help human consumption use case.

And in fact, since release 1.8.3, the output when you are not on any branch, has become something like this:
$ git checkout v1.8.3
$ git branch
* (detached from v1.8.3)
master
next

in order to give you (as a human consumer) a better information. If your script depended on the exact phrasing from git branch, e.g.

branch=$(git branch | sed -ne 's/^\* \(.*\/\1/p')
case "$branch" in
'('?*')') echo not on any branch ;;
*) echo on branch $branch ;;
esac

your script will break.

The right way to programatically find the name of the current branch, if any, is not to use the Porcelain command git branch that is meant for the human consumption, but to use a plumbing command git symbolic-ref instead:

if branch=$(git symbolic-ref --short -q HEAD)
then
echo on branch $branch
else
echo not on any branch
fi