Git Blame: March 2012

Wednesday, March 28, 2012

Hopefully Final rc before Git 1.7.10

Git 1.7.10-rc3 has been tagged and users are strongly encouraged to download and test it before the final release happens. The tarballs are found at http://code.google.com/p/git-core/downloads/list and people fetching them using Git can find it in my public repositories.

Among the many improvements in the upcoming release is an update to the git merge command. While this has been received very favorably, it may surprise you when you run it for the first time, and is worth mentioning.

Traditionally, when the git merge command attempted to merge two or more histories, it prepared a canned commit log message based on what is being merged, and recorded the result in a merge commit without any user intervention, if the automatic merge logic successfully computed the resulting tree without conflict. When the automatic logic did not manage to come up with a clean merge result, it gave up, leaving the conflicted state in the index and in the working tree files, and asked the user to resolve them and run the git commit command to record the outcome.

Most merges do cleanly resolve, and this behavior resulted in people making their merges too easily and lightly, even when the reason why the merge was made in the first place should be explained. Nobody explained why the merge was made in a merge commit, because in order to do so after git merge already made the commit, they have to go back and run git commit --amend to do so.

Recently in a discussion on the Git mailing list, Linus admitted (and I agreed) that this was one of the design mistakes we made early in the history of Git. And in 1.7.10 and later, the git merge command that is run in an interactive session (i.e. both its standard input and its standard output connected to a terminal) will open an editor before creating a commit to record the merge result, to give the user a chance to explain the merge, just like the git commit command the user runs after resolving a conflicted merge already does.

A quick way to update your scripts that shouldn't let its users explain the merges created by them is to set and export an environment variable, like this.

#!/bin/sh

GIT_MERGE_AUTOEDIT=no
export GIT_MERGE_AUTOEDIT

# whatever your script did originally

...

git merge foo

git merge bar

...

For a more detailed discussion, see these earlier posts.

Monday, March 26, 2012

Git 1.7.9.5

Well, I lied when I said 1.7.9.4 would be the last maintenance release before the 1.7.10 final.
Here is a small update that contains mostly documentation fixes with a couple of back-merged low impact fixes that have been cooking on the 'master' front to become part of the upcoming 1.7.10 release.

Have fun.

Friday, March 23, 2012

Git 1.7.10-rc2

Not much has changed since the previous rc, but you know the drill.
Please test to catch any last minute regressions.

The release tarballs are found at:

http://code.google.com/p/git-core/downloads/list

and their SHA-1 checksums are:

b32514ad69bb3100b6b5038bc88798d56ebe1e1d git-1.7.10.rc2.tar.gz

f66bc63ed3e98df6c7ad205e06878e9cfc9084fc git-htmldocs-1.7.10.rc2.tar.gz

130601149f97e9414467bb3d6a722aa37b8205af git-manpages-1.7.10.rc2.tar.gz

Compatibility Notes

From this release on, the "git merge" command in an interactive session will start an editor when it automatically resolves the merge for the user to explain the resulting commit, just like the "git commit" command does when it wasn't given a commit message. If you have a script that runs "git merge" and keeps its standard input and output attached to the user's terminal, and if you do not want the user to explain the resulting merge commits, you can export GIT_MERGE_AUTOEDIT environment variable set to "no", like this:

#!/bin/sh

GIT_MERGE_AUTOEDIT=no

export GIT_MERGE_AUTOEDIT

to disable this behaviour (if you want your users to explain their merge commits, you do not have to do anything). Alternatively, you can give the "--no-edit" option to individual invocations of the "git merge" command if you know everybody who uses your script has Git v1.7.8 or newer.

The "--binary/-b" options to "git am" have been a no-op for quite a while and were deprecated in mid 2008 (v1.6.0). When you give these options to "git am", it will now warn and ask you not to use them.

When you do not tell which branches and tags to push to the "git push" command in any way, the command used "matching refs" rule to update remote branches and tags with branches and tags with the same name you locally have. In future versions of Git, this will change to use the "upstream" rule to update the branch at the remote you would "pull" from into your current branch with your local current branch. The release after 1.7.10 will start issuing a warning about this change, to encourage you to tell the command what to push out, e.g. by setting push.default configuration.

Wednesday, March 21, 2012

Toilet Roll Effect

To put this in a more formal-sounding way, it can be observed when there are two independent cues that stop or start your behavior, and the behavior is something whose change tends to be one-way.

In order to wipe certain parts after you have finished doing certain things, you wind some length of paper around your hands, cut the paper from the roll, and use it.

Let's say you are used to use 1-ply toilet rolls, but you noticed that 2-ply rolls were on sale at the market, so that is what you bought recently.

What happens?

Because you are so used to the hand-motion to wind the 1-ply toilet rolls (say, you always wind 4 times), you still wind the same length of the stuff. Your backside starts to be comfy with the extra fluffiness your hands are giving it by using more paper. After all, you are winding the thicker 2-ply paper the same number of times. You may start winding the stuff a bit fewer number of times, but the rate of such decrease is slower. Perhaps you learn to wind only 3 times, but it would take time for you to go down to 2, which is the half of the original.

Next week, you notice that 1-ply rolls are on sale, so you switch again. By this time, your hands are used to winding 3 times, but your backside now complains because it no longer is rewarded by the same fluffiness. After all, you are giving it less paper. So you end up start winding the stuff even more, until your backside gets comfy enough. Your hands may learn to wind 5 times by the time this happens.

Notice that there are two cues that makes you stop winding when you do this. How many times you wind the paper around your hand, and how much fluffiness your backside feels by being wiped by it. And the change in the amount of paper you would use tends to be one-way (using more is easier).

Imagine if you switch to 2-ply rolls again. And then switch back to 1-ply rolls. The consumption of your toilet rolls tends to increase, and increase more if you switch between 1-ply and 2-ply rolls more often.

Exactly the same thing happens to cigarette usage, by the way. Two competing cues are how often you take a break, and how much chemical effect you get from a puff. Start from a weak brand, and your break schedule may settle for a break per 3 hours. Switch to a stronger brand, and your body gets used to greater chemical effect with the same 1 break per 3 hours schedule. Switch back to the weaker brand, and now your body will complain and wants more nicotine, and your break schedule ends up being more frequent. Switch back to the stronger one again, with the more frequent schedule, your body gets trained to more nicotine. Rinse and repeat...

Please discuss: what "git push" should do when you do not say what to push?

[Update is here]

There is a proposal to change the default behaviour of git push on the Git mailing list. The goal of this message is to encourage you to discuss it before it happens (or the change is aborted, depending on the outcome of the discussion).

In the current setting (i.e. push.default=matching), git push without argument will push all branches that exist locally and remotely with the same name. This is usually appropriate when a developer pushes to his own public repository, but may be confusing if not dangerous when using a shared repository. The proposal is to change the default to 'upstream', i.e. push only the current branch, and push it to the branch git pull would pull from. Another candidate is 'current'; this pushes only the current branch to the remote branch of the same name.

For more details on the behavior of Git with these values, read the documentation about push.default in 'man git-config'.

You may be negatively affected when such a change happens if you do not see anything in the output from git config push.default and if you rely on the default that pushes all your matching branches. On the other hand, you may want to see the default behaviour to change, especially if you are using shared repositories. In either case, please join the discussion to give us more data point and help us decide the future of Git. Also, if you think your friends and colleagues will be affected by this change, either positively or negatively, please tell them about this discussion.

What has been discussed so far can be seen in this thread:

http://thread.gmane.org/gmane.comp.version-control.git/192547/focus=192694

Previous relevant discussions include:

http://thread.gmane.org/gmane.comp.version-control.git/123350/focus=123541

http://thread.gmane.org/gmane.comp.version-control.git/166743

To join the discussion, send your messages to:

git@vger.kernel.org

The list accepts messages from non-subscribers, and you do not have to ask "please Cc me, I am not subscribed", as it's customary to Cc: posters when replying on this list.

Thanks.

Administrivia: Comments on this post is disabled; the discussion should happen on the mailing list.

Summary of discussion on "git push" default change

So far, the messages on the git mailing list show that the proposed change of the default away from the traditional 'matching' is received overwhelmingly positively.

We've known it from the beginning that matching is not for many people, and definitely not for new people (otherwise, we wouldn't have made noises about it in the first place). It was not even the primary objective of the discussion thread to decide if we are going to switch away from 'matching' by voting. Waking up those who are sleeping, so that they do not have to be surprised with "I didn't know that was happening!" was.

The new default we are switching to is not about how many people prefer it for their own use. It is about what default is the least confusing to the new users. A default whose behaviour is easy to explain, easy to follow and easy to understand is the goal. Once people understand what they want and realize the hardcoded default does not work well for them, it is easy for them to configure their push.default to something else, like 'matching'. And 'matching' was a bad default for that purpose; it was the hardest to explain and understand in the context of the workflows of many new people.

Many said that they like 'upstream' solely based on their personal preference, but a few people did justify their preference of 'upstream' over 'current' based on their experience in teaching new people and observing the sharp edges that hurt them. And they all sounded reasonable.

In order to show how the world after phase #2 of the transition [1] would look like to developers, testers and early adopters, I am planning to merge Matthieu's patch mm/push-default-switch-warning topic [2] to the 'next' branch, together with Christopher's ct/advise-push-default topic [3]; hopefully these topics can be merged to the master branch soon after 1.7.10 final.

Thanks.

To people who helped spreading the initial RFD message: please do feel free to distribute this message to the same channels, too.

[References]

1 http://article.gmane.org/gmane.comp.version-control.git/193308
2 https://github.com/gitster/git/commit/5293b54
3 https://github.com/gitster/git/commit/f25950f

Monday, March 12, 2012

Git 1.7.9.4

This is hopefully the final maintenance release before 1.7.10 happens, and merges four low impact fixes relative to the previous 1.7.9.3 release.

The code to synthesize the fake ancestor tree used by 3-way merge fallback in "git am" was not prepared to read a patch created with a non-standard -p<num> value.
"git bundle" did not record boundary commits correctly when there are many of them.
"git diff-index" and its friends at the plumbing level showed the "diff --git" header and nothing else for a path whose cached stat info is dirty without actual difference when asked to produce a patch. This was a longstanding bug that we could have fixed long time ago.
"gitweb" did use quotemeta() to prepare search string when asked to do a fixed-string project search, but did not use it by mistake and used the user-supplied string instead.

Have fun.

Thursday, March 8, 2012

Fun with --first-parent

Depending on the work style of their project, sometimes people wonder what story git log --first-parent output is trying to tell, and this article is about demystifying it.

The mechanical definition of "first parent" is very simple.

A merge is a commit with more than one parent.
When you run "merge", you are on one commit, HEAD, taking changes made by "other branches" you are merging into "your history" (whose definition is "the commit-DAG leading to your HEAD commit"), and record the resulting tree state as a new commit.
This new commit records all its parents, one of them being your old "HEAD" and the rest being "other branches" you merged into "your history". They are recorded in that order in the resulting commit (git cat-file commit HEAD after a merge to see them).

Given the above definition, the first thing to realize is that "the first parent" is primarily a local concept. If you are looking at one commit on a run of "a single strand of pearls", it only has one parent (i.e. its first parent), and it is the state the committer was on when he made the commit. If you are looking at a merge, its first parent is the commit the person who made that merge was on when he made the merge.

Because of this local nature of first-parenthood, depending on the way your project works, following the first parent chain all the way down to the root, i.e. git log --first-parent, may or may not give an output that makes sense. git show HEAD~250 shares the same issue.

An extreme example is where Git is used merely as a better CVS, where everybody works on his own "master", e.g.

$ git clone $central mine && cd mine

... begin repeating from here ...

$ git pull ;# this may get "already up-to-date" or create a merge

$ work; work; work; git commit

$ git pull ;# this may create a merge or get "already up-to-date"

... optionally ...

$ git push

... go back and repeat ...

Imagine many people are doing the above simultaneously against the same shared central repository.

Because everybody can create a "merge" when he is on his latest commit that may not even yet be ready for other's use, the first parent of a merge has no significance in the history of the overall project. The first parents of merges in such a project are "points at which random members of the project happened to be immediately before he pulled from the shared central repository". When you want a birds-eye view of changes between two versions of a project, git log --first-parent v1.0..v2.0 gives no useful information in such a project. git log --no-merges or git shortlog over the same range would generally work much better and give more meaningful information.

Insisting on git pull --no-ff in such a workflow makes things even worse for the "first-parent summary". If everybody else were active while you were sleeping, and if you were up-to-date before going to bed, git pull --no-ff you do as the first thing in the morning from a habit will record a useless merge commit, and the only two things such a commit records are

where the tip of the shared central repository was before you went to bed, and
where the tip of the shared central repository was when you came back to work.

Neither is worth recording as part of the overall project history, obviously.

There is the other end of extreme that first-parent summary works very well. When there is a clear pecking order among project members, i.e. there is the central integrator who can say "My history is the official one, yours are forks of it and I may merge them back to my history from time to time". Unless there is a fast-forward merge, in such a setting, git log --first-parent going all the way down to the root shows the way how the history grew from the integrator's point of view, and use of git pull --no-ff by the integrator is one way to make sure that all merge commits yield this consistent view. He may have made individual commits (i.e. a single strand of pearls) directly on the mainline of the history, and they are shown as individual commits. He may have merged from a branch of his own or of somebody else into the mainline of the history, and such a merge is shown as a single event that pulls all the commits from the side branch (and this is where git merge --summary becomes useful).

It is no accident that we encourage users to focus the work made on a single branch (either his own or a remote) to a single topic—by doing so, it makes more sense to treat a merge of such a branch into the mainline as a single event that adds a feature (or fixes a bug, or whatever the topic of the branch wanted to achieve), relative to the state of the project immediately before the merge (i.e. "the first parent" of the merge). And git log --first-parent is a way to summarize the history by culling the details of "side branches" and letting only the merge commits talk about what these side branches did to the history.

It also is not an accident that git log --first-parent is a much later invention than git log and git shortlog. Only after people got used to working with Git, they discovered the usefulness of the topic branch workflow, which is the key ingredient for any history of which the first parenthood can give a birds-eye view.

Even if your project is using a central shared repository, you can take advantage of "first parent" summary by making sure you merge your work into the shared history, not the other way around like the workflow illustrated earlier. You would work like this instead:

$ git clone $central mine && cd mine

$ git checkout -b mywork

... begin repeating from here ...

$ git checkout work ;# make sure you do not work on 'master'

$ work; work; work; git commit

... when you are done working on one logical topic ...

$ git checkout master

$ git pull --ff-only ;# the tip of shared history

$ git merge mywork ;# note the first parent is the shared history

$ test

$ git push

... go back and repeat ...

The resulting history may look like this:

---A---B---C---D---M

\ /

O---O---O---O mywork

where A, B, C, and D are the commits others made and published to the central shared repository (i.e. the shared project history). A is where the "master" of the shared central repository was when you cloned and forked your "mywork" branch at. O are commits you made on your "side branch".

By checking out your "master" and runnig "git pull --ff-only", you would be checking out D to your working tree, and merging your work on the side branch to record M, which is what you publish and what becomes the tip of the shared history. By following the first parent chain starting at M, you can see that a single unit of your work (the goal you wanted to achieve on your "mywork" branch), which consists of 4 commits, was merged into the shared history at M, and before that, somebody else integrated his work into the shared history at D, and so on.

Note that the last git push may race with other people and fail due to non fast-forward. In such a case, you have two possibilities. An easier way that violates the "first parent" principle for a short while is to do this:

$ git pull ;# may conflict and need resolution

$ test

$ git push

The resulting history may look like this:

E---F

\ / \

---A---B---C---D---M---X

\ /

O---O---O---O mywork

where E and F happened at the shared central repository while you were resolving merge M and testing the result. You pulled these changes and integrated into your history to create X, so the first parent of X becomes M and you would end up treating E and F as if they happened on a side branch, even though they are on what people may consider the "mainline" of the the shared project history.

If you are a purist, you could instead do this when the last git push is rejected in the original sequence:

$ git reset --hard HEAD^ ;# cancel the merge of mywork

$ git pull --ff-only ;# get the updated tip of shared history

$ git merge mywork

$ test

$ git push

which will result in a history like this:

E---F---M'

\ / /

---A---B---C---D---M /

\ / /

O---O---O---O--- mywork

You abandon merge M (git reset --hard HEAD^ would take you back to D), update from the central repository again to be at F, and then merge your work (the same O--O--O--O) into it to create a new merge M', and make that the tip of the shared history by pushing it. Again, following the first parent chain starting from M', the history can be summarized as a series of merges of completed topics into the shared history.

It is a judgement call if this "purist" approach to avoid the "last minite" first-parent breakage is worth it. I personally do not think it is, but others may disagree.

Wednesday, March 7, 2012

Git 1.7.10-rc0

I just tagged the zero-th release candidate for the upcoming 1.7.10 release of Git. This is expected to be more or less feature complete, and we will see only regression fixes before the final.

Please test, especially if you are on a minority platform. It was reported that one of the tests in the recent 1.7.9.3 maintenance release was not passing on a particular platform, but the same test has not been passing on both the master and the next branches for quite some time (it was in 1.7.9, so it is not urgent to fix it right now), and only if somebody actually bothered to run tests on that platform, it could have been avoided.

Monday, March 5, 2012

Git 1.7.9.3

The latest maintenance release Git 1.7.9.3 is now available at the usual places. Time to upgrade.