Sunday, October 21, 2012

Git 1.8.0

Finally.

With help from 82 contributors since 1.7.12 release, and great help from countless reviewers on these changes, a new feature release Git 1.8.0 has been tagged with 497 new non-merge commits. Thanks everybody in the Git development community, and thanks for feature requests and bug reports for everobyd in various Git user communities.

Tarballs of the new release have been pushed out to the usual place, and the public repositories have the histories leading to this release.

For highlights, please refer to earlier entries. The final Release Notes describes everything in gory details.

I'll be on vacation for a few weeks, and I do not plan to carry a computer with me. In the meantime, Jeff King will be acting as an interim maintainer.

Here is a fun fact.

The very initial public appearance of Git consisted of 11 files, 1244 lines in total, v0.99~954 (Initial revision of "git", the information manager from hell, 2005-04-07).

Among these 1244 lines, there were 1065 non-empty lines, of which 206 lines still survive to date (v1.8.0).

   surviving     original path (survival%)
          14           40 Makefile (35.00%)
          16          168 README (9.52%)
          51           93 cache.h (54.83%)
           4           23 cat-file.c (17.39%)
           6          172 commit-tree.c (3.48%)
           4           51 init-db.c (7.84%)
          84          259 read-cache.c (32.43%)
           0           43 read-tree.c (0.00%)
           7           81 show-diff.c (8.64%)
          20          248 update-cache.c (8.06%)
           0           66 write-tree.c (0.00%)
         206         1244 Total (16.55%)

Note that this does not mean one sixth of 1.8.0 release was from Linus.


Thursday, October 11, 2012

Git 1.8.0-rc2


Things seem to have calmed down and hopefully we can have the final 1.8.0 release without regression soon. Give it a final round of testing by grabbing it from the usual places. Tarballs are found here.

For highlights, please see this earlier post.


Monday, October 8, 2012

Git 1.8.0-rc1 and 1.7.12.3

A maintenance release 1.7.12.3 has been tagged with a handful of fixes that have been backported from the primary development branch.

On the primary development front, we now have the first release candidate 1.8.0-rc1. The final 1.8.0 is scheduled to be tagged on October 21st, 2012, and I'll disappear for a couple of weeks after that, so please help us find the last minute regressions and fix them, if any, in the coming two weeks.

The upcoming release consists of a finely balanced mix of bug-fixes, usability enhancements and surprising new features.

Some highlights:

  • In the next major release (not the 1.8.0 release, but the one that follows it, perhaps 1.9 or 2.0), we will change the behavior of the "git push" command.  When "git push [$there]" does not say what to push, we have traditionally used the "matching" semantics (all your branches were sent to the remote as long as there already are branches of the same name over there).  We will use the "simple" semantics, that pushes the current branch to the branch with the same name but only when the current branch is set to integrate with that remote branch.
    There is a user preference configuration variable "push.default" to change this, and in 1.8.0 release, "git push" will start warning qabout the upcoming change until you set this variable.
  • When "git am" is fed an input that has multiple "Content-type: ..." header, it did not grok charset= attribute correctly. It also mishandled a patch attached as application/octet-stream (e.g. not text/*); Content-Transfer-Encoding (e.g. base64) was not honored correctly.
  • Even during a conflicted merge, "git blame $path" always meant to blame uncommitted changes to the working tree version. The command has been updated to show cleanly merged parts as coming from the other branch that is being merged.
  • "git branch --set-upstream" is deprecated and may be removed in a relatively distant future.  "git branch [-u|--set-upstream-to]" has been introduced with a saner order of arguments.
  • "git cherry-pick A C B" used to replay changes in A and then B and then C if these three commits had committer timestamps in that order; now it replays them in the order the user told it to, i.e. "A C B", which is what the user naturally expects. 
  • A repository created with "git clone --single" had its fetch refspecs set up just like a clone without "--single", leading the subsequent "git fetch" to slurp all the other branches, defeating the whole point of specifying "only this branch" in the first place.
  • "git log --all-match --grep=A --grep=B" ought to show commits that mention both A and B  but when these three options are used with --author or --committer, it showed commits that mention either A or B (or both) instead; this has been fixed.
  • "git log -g" can be given "--grep-reflog=pattern" option to look for entries with strings that match the given pattern.
  • The "-Xours" (and "-Xtheirs") backend option to "git merge -s recursive" now takes effect even on binary files.
  • The sub-command in "git remote" to remove a defined remote was "rm" and the command did not take a fully-spelled "remove".
  • The interactive prompt "git send-email" gives was error prone. It asked "What e-mail address do you want to use?" with the address it guessed (correctly) the user would want to use in its prompt, tempting the user to say "y". But the response was taken as "No, please use 'y' as my e-mail address instead", which is most certainly not what the user meant; the command asks for confirmation in such a case now.
Again, please help us find the last minute regressions and fix them, if any, in the coming two weeks.

Thanks.

Friday, October 5, 2012

Fun with running textconv

Today I met somebody and signed his PGP key. Then I thought I should check my own key, so I did this:

  $ gpg --recv-key 713660A7

which reported that I got a few new signatures.

I have my GnuPG keys in a Git repository. Out of habit, I did

  $ git diff -U0

and was pleasantly surprised to see:

  diff --git i/gnupg/pubring.gpg w/gnupg/pubring.gpg
  index 22b29b8..8beac85 100644
  --- i/gnupg/pubring.gpg
  +++ w/gnupg/pubring.gpg
  @@ -22,0 +23,1 @@ pub  4096R/713660A7 2011-10-01 Junio C Hamano
  +sig        00411886 2012-07-20   Linus Torvalds
  @@ -42,0 +44,1 @@ uid                            Junio C Hamano
  ...

in the output.

The surprise is not that I got a signature by Linus (I gave him the key fingerprint when I met him in person during OSCON week). It is that I am seeing a textual diff, which I completely forgot about having arranged to happen.

In the directory, I have this in the .gitattributes file:

  *ring.gpg       diff=gpg

and the repository has this in its .git/config file:

  [diff "gpg"]
          textconv = gpg -v
          xfuncname = "^((pub|uid) .*)"

These two, taken together, tells Git when comparing any file whose name matches "*ring.pgp", pass its contents to "gpg -v" command before running its comparison logic.  "gpg -v" command, when fed a keyring, shows the textual report of what is in the keyring, and that is how I get the above output. And the xfuncname thing tells Git to show the key's name on the @@ lines.

Fun ;-)

Tuesday, October 2, 2012

Counting log messages

Kees Cook made an interesting observation on his Google+ post.
$ git log --no-merges v3.5..v3.6 |egrep -i '(integer|counter|buffer|stack|fix) (over|under)flow' | wc -l
31
It finds phrases like "integer overflow", "fix underflow", etc. in the log messages of commits since v3.5 release up to v3.6 release.  There are 31 such phrases among 10k non-merge commits.

This however does not necessarily mean that there are 31 commits.  There only are 23 such commits (the "--grep=" option takes BRE, and syntactic metacharacters need to be quoted with backslashes), and you can count them like this:
$ git log --oneline --no-merges --regexp-ignore-case --grep='\(integer\|counter\|buffer\|stack\|fix\) \(over\|under\)flow' v3.5..v3.6 | wc -l 
23
This is because some commits have these phrases multiple times.  For example, commit dd03e734 reads like this:

mlx4_core: Fix integer overflows so 8TBs of memory registration works
This patch adds on the fixes done in commits 89dd86db78e0 ("mlx4_core:Allow large mlx4_buddy bitmaps") and 3de819e6b642 ("mlx4_core: Fix integer overflow issues around MTT table") so that memory registrationof up to 8TB (log_num_mtt=31) finally works.
It fixes integer overflows in a few mlx4_table_yyy routines in icm.cby using a u64 intermediate variable, and int/uint issues that causedtable indexes to become nagive by setting some variables to be u32instead of int.  These problems cause crashes when a user attempted toregister > 512GB of RAM.

Note that the regular expression used here does not catch when the phrase does not appear exactly as spelled in the log message. For example, it misses this:

mtdchar: fix offset overflow detection
Sasha Levin has been running trinity in a KVM tools guest, and was ableto trigger the BUG_ON() at arch/x86/mm/pat.c:279 (verifying the range of the memory type).  The call trace showed that it was mtdchar_mmap() that created an invalid remap_pfn_range().
To catch a commit with a log message like this, we can loosen the condition to say "the commit log must have one of these words (integer|counter|buffer|stack|fix), and also one of these words (overflow|underflow)".

The way to spell that is like this:
$ git log --all-match --no-merges --regexp-ignore-case \ 
  --grep='\(integer\|counter\|buffer\|stack\|fix\)' \
  --grep='\(over\|under\)flow' v3.5..v3.6
Each --grep= pattern match individually, and --all-match tells git log to show only commits for which all of the patterns trigger, and that is why this will find "fix" and "overflow" not next to each other.  With --oneline , this will count 53 commits.

By the way, I haven't written "Git guide" material on my blog for quite some time, and it shows. I need to practice writing a bit more.

Monday, October 1, 2012

Git 1.8.0-rc0

I just tagged Git 1.8.0-rc0. Tarballs are found at the usual place. The 1.8.0 is scheduled to appear on Oct 21 and I am planning to disappear for a few weeks after that on vacation.

Here are some highlights:
  • In the next major release, we will change the behavior of the "git push" command.  When "git push [$there]" does not say what to push, we have used the traditional "matching" semantics (all your branches were sent to the remote as long as there already are branches of the same name over there).  We will use the "simple" semantics, that pushes the current branch to the branch with the same name only when the current branch is set to integrate with that remote branch.  There is a user preference configuration variable "push.default" to change this, and "git push" will warn about the upcoming change until you set this variable.
  • "git branch --set-upstream" is deprecated and may be removed in a relatively distant future.  "git branch [-u|--set-upstream-to]" has been introduced with a saner order of arguments.
  • The "-Xours" (and "-Xtheirs") backend option to "git merge -s recursive" now takes effect even on binary files.
  • Even during a conflicted merge, "git blame $path" always meant to blame uncommitted changes to the working tree version. The command has been updated to show cleanly merged parts as coming from the other branch that is being merged.
  • "git cherry-pick A C B" used to replay changes in A and then B and then C if these three commits had committer timestamps in that order; now it replays them in the order the user told it to, i.e. "A C B", which is what the user naturally expects. 
  • "git log --all-match --grep=A --grep=B" ought to show commits that mention both A and B  but when these three options are used with --author or --committer, it showed commits that mention either A or B (or both) instead; this has been fixed.
  • The interactive prompt "git send-email" gives was error prone. It asked "What e-mail address do you want to use?" with the address it guessed (correctly) the user would want to use in its prompt, tempting the user to say "y". But the response was taken as "No, please use 'y' as my e-mail address instead", which is most certainly not what the user meant; the command asks for confirmation in such a case now.
I've been doing "-rc0" since the Git version 1.5.0 era (that's almost 6 years).

It is easy to understand what "-rc1" is.  It is the first "release candidate". Some last-minute bugs and regressions are expected to be found in it, reported, fixed and will result in "-rc2", "-rc3", etc. until the final release, which hopefully is as close to perfect as we can make.

But what is "-rc0"?

I do not know about other projects, but in the context of Git, it is "a more or less feature complete preview of the upcoming release". There may still be a couple to several topics cooking on the 'next' branch that are to be merged before the real feature freeze ("-rc1"), but they are relatively minor that lacking them is not expected to change the overall end user experience too much from the final release.

And the 1.8.0-rc0 is such a preview. I am planning to merge two small features and two fixes later this week when I tag 1.8.0-rc1, but other than these four topics, what has been tagged today is more or less a good preview for the final release (modulo bugs and regressions that will be fixed, if any, before the real thing).

That means that it is very important for the users to test and find regressions in this preview. If you are using Git, please take some time and do your part to help the upcoming release as polished as we could.

Thanks.