Monday, March 30, 2015

Fun with Non-Fast-Forward

Your push may fail due to “non fast-forward”. You start from a history that is identical to that of your upstream, commit your work on top of it, and then by the time you attempt to push it back, the upstream may have advanced because somebody else was also working on his own changes.


For example, between the upstream and your repositories, histories may diverge this way (the asterisk denotes the tip of the branch; the time flows from left to right as usual):


Upstream                                You


---A---B---C*      --- fetch -->        ---A---B---C*


                                                    D*
                                                   /
---A---B---C---E*                       ---A---B---C


            D?                                      D*
           /                                       /
---A---B---C---E?   <-- push ---        ---A---B---C


If the push moved the branch at the upstream to point at your commit, you will be discarding other people’s work. To avoid doing so, git push fails with “Non fast-forward”.


The standard recommendation when this happens is to “fetch, merge and then push back”. The histories will diverge and then converge like this:


Upstream                                You


                                                    D*
                                                   /
---A---B---C---E*  --- fetch -->        ---A---B---C---E


                                                       1
                                                    D---F*
                                                   /   /2
---A---B---C---E*                       ---A---B---C---E


               1                                       1
            D---F*                                  D---F*
           /   /2                                  /   /2
---A---B---C---E    <-- push ---        ---A---B---C---E


Now, the updated tip of the branch has the previous tip of the upstream (E) as its parent, so the overall history does not lose other people’s work.


The resulting history, however, is not what the majority of the project participants would appreciate. The merge result records D as its first parent (denoted with 1 on the edge to the parent), as if what happened on the upstream (E) were done as a side branch while F was being prepared and pushed back. In reality, E in the illustration may not be a single commit but can be many commits and many merges done by many people, and these many commits may have been observed as the tips of the upstream’s history by many people before F got pushed.

Even though Git treats all parents of a merge equally at the level of the underlying data model, the users have come to expect that the history they will see by following the first-parent chain tells the overall picture of the shared project history, while second and later parents of merges represent work done on side branches. From this point of view, what "fetch, merge and then push" is not quite a right suggestion to proceed from a failed push due to "non fast-forward".


It is tempting to recommend “fetch, merge backwards and then push back” as an alternative, and it almost works for a simple history:


Upstream                                You


                                                    D*
                                                   /
---A---B---C---E*  --- fetch -->        ---A---B---C---E


                                                       2
                                                    D---F*
                                                   /   /1
---A---B---C---E*                       ---A---B---C---E


               2                                       2
            D---F*                                  D---F*
           /   /1                                  /   /1
---A---B---C---E    <-- push ---        ---A---B---C---E


Then, if you follow the first-parent chain of the history, you will see how the tip of the overall project progressed. This is an improvement over the “fetch, merge and then push back”, but it has a few problems.


One reason why “merge backwards” is wrong becomes apparent when you consider what should happen when the push fails for the second time after the backward merge is made:


Upstream                                You


                                                    D*
                                                   /
---A---B---C---E*  --- fetch -->        ---A---B---C---E


                                                       2
                                                    D---F*
                                                   /   /1
---A---B---C---E*                       ---A---B---C---E


               2                                       2
            D---F?                                  D---F*
           /   /1                                  /   /1
---A---B---C---E---G    <-- push ---    ---A---B---C---E


               2                                       2   2
            D---F?                                  D---F---H*
           /   /1                                  /   /1  /1
---A---B---C---E---G    --- fetch -->   ---A---B---C---E---G


If the upstream side gained another commit G while F was being prepared, “fetch, merge backwards and then push” will end up creating a history like this, hiding D, the only real change you did in the repository, as the tip of the side branch of a side branch!

It also does not solve the problem if the work you did in D is not a single strand of pearls, but has merges from side branches. If D in the above series of illustrations were a few merges X, Y and Z from side branches of independent topics, the picture on your side, after fetching E from the updated upstream, may look like this:


    y---y---y   .
   /         \   .
  .   x---x   \   \
 .   /     \   \   \
.   /       X---Y---Z*
   /       /
---A---B---C---E


That is, hoping that the other people will stay quiet, starting from C, you merged three independent topic branches on top of it with merges X, Y and Z, and hoped that the overall project history would fast-forward to Z. From your perspective, you wanted to make A-B-C-X-Y-Z to be the main history of the project, while x, y, ... were implementation details of X, Y and Z that are hidden behind merges on side branches. And if there were no E, that would indeed have been the overall project history people would have seen after your push.

Merging backwards and pushing back would however make the history’s tip F, with its first parent E, and Z becomes a side branch. The fact that X, Y and Z (more precisely, X^2 and Y^2 and Z^2) were independent topics is lost by doing so:


    y---y---y   .
   /         \   .
  .   x---x   \   \
 .   /     \   \   \
.   /       X---Y---Z
   /       /         \2
---A---B---C---E-------F*
                     1



So "merge backwards" is not a right solution in general. It is only valid if you are building a topic directly on top of the shared integration branch, which is something you should not be doing in the first place. In the earlier illustration of creating a single D on top of C and pushing it, if there were no work from other people (i.e. E), the push would have fast-forwarded, making D as a normal commit directly on the first-parent chain. If there were work from other people like E, “merge in reverse” would instead have recorded D on a side branch. If D is a topic separate and independent from other work being done in parallel, you would consistently want to see such a change appear as a merge of a side branch.

A better recommendation might be to “fetch, rebuild the first-parent chain, and then push back”. That is, you would rebuild X, Y and Z (i.e. “git log --first-parent C..”) on top of the updated upstream E:


    y---y-------y   .
   /             \   .
  .   x-------x   \   \
 .   /         \   \   \
.   /           X’--Y’--Z’*
   /           /
---A---B---C---E


Note that this will work well naturally even when your first-parent chain has non-merge commits. For example, X and Y in the above illustration may be merges while Z is a regular commit that updates the release notes with descriptions of what was recently merged (i.e. X and Y). Rebuilding such a first-parent chain on top of E will make the resulting history very easy to understand when the reader follows the first-parent chain.

The reason why “rebuild the first-parent chain on the updated upstream” works the best is tautological. People do care about the first-parenthood when viewing the history, and you must have cared about the first-parent chain, too, when building your history leading to Z. That first-parenthood you and others care about is what is being preserved here. By definition, we cannot go wrong ;-)

And of course, this will work against a moving upstream that gained new commits while we were fixing things up on our end, because we won't be piling a new merges on top, but will be rebuilding X', Y' and Z' into X'', Y'', and Z'' instead.

To make this work on the pusher’s end, after seeing the initial “non fast-forward” refusal from “git push”, the pusher may need to do something like this:


$ git push ;# fails
$ git fetch
$ git rebase --first-parent @{upstream}


Note that “git rebase --first-parent” does not exist yet; it is one of the topics I would like to see resurrected from old discussions.

But before "rebase --first-parent" materialises, in the scenario illustrated above, the pusher can do these instead of that command:


$ git reset --hard @{upstream}
$ git merge X^2
$ git merge Y^2
$ git merge Z^2


And then, inspect the result thoroughly. As carefully as you checked your work before you attempted your first push that was rejected. After that, hopefully your history will fast-forward the upstream and everybody will be happy.


Friday, March 27, 2015

Stats from recent Git releases

Following up to the previous post, I computed a few numbers for each development cycle in the recent past.

In all the graphs in this article, the horizontal axis counts the number of days into the development cycle, and the vertical axis shows the number of non-merge commits made.

  • The bottom line in each graph shows the number of non-merge commits that went to the contemporary maintenance track.
  • The middle line shows the number of non-merge commits that went to the release but not to the maintenance track (i.e. shiny new toys, oops-fixes to them, and clean-ups that were too minor to be worth merging to the maintenance track), and
  • The top line shows the total number of non-merge commits in the release.

Even though I somehow have a fond memory of v1.5.3, the beginning of the modern Git was unarguably the v1.6.0 release. Its development cycle started in June 2008 and ended in August 2008. We can see that we were constantly adding a lot more new shiny toys (this cycle had the big "no more git-foo in user's $PATH" change) than we were applying fixes to the maintenance track during this period. This cycle lasted for 60 days, 731 commits in total among which 120 went to the maintenance track of its time.


During the development cycle that led to v1.8.0 (August 2012 to October 2012), the pattern is very different. We cook our topics longer in the 'next' branch and we can clearly see that the topics graduate to 'master' in batches, which appear as jumps in the graph.  This cycle lasted for 63 days, 497 commits in total among which 182 went to the maintenance track of its time.


The cycle led to v2.0.0 (February 2014 to June 2014) has a similar pattern, but as another "we now break backward compatibility for ancient UI wart" release, we can see that a large batch of changes were merged in early part of the cycle, hoping to give them better and longer exposure to the testing public; on the other hand, we did not do too many fixes to the maintenance track.  This cycle lasted for 103 days, 475 commits in total among which 90 went to the maintenance track of its time.



The numbers for the current cycle leading to v2.4 (February 2015 to April 2015) are not finalized yet, but we can clearly see that this cycle is more about fixing old bugs than introducing shiny new toys from this graph.  This cycle as of this writing is at its 50th day, 344 commits in total so far among which 115 went to the maintenance track.



Note that we should not be alarmed by the sharp rise at the end of the graph. We just entered the pre-release freeze period and the jump shows the final batch of topics graduating to the 'master' branch. We will have a few more weeks until the final, and during that period the graph will hopefully stay reasonably flat (any rise from this point on would mean we would be doing a last-minute "oops" fixes).

Thursday, March 26, 2015

Git 2.4 will hopefully be a "product quality" release

Earlier in the day, an early preview release for the next release of Git, 2.4-rc0, was tagged. Unlike many major releases in the past, this development cycle turned out to be relatively calm, fixing many usability warts and bugs, while introducing only a few new shiny toys.

In fact, the ratio of changes that are fixes and clean-ups in this release is unusually higher compared to recent releases. We keep a series of patches around each topic, whether it is a bugfix, a clean-up, or a new shiny toy, on its own topic branch, and each branch is merged to the 'master' branch after reviewing and testing, and then fixes and trivial clean-ups are also merged to the 'maint' branch. Because of this project structure, it is relatively easy to sift fixes and enhancement apart. Among new commits in release X since release (X-1), the ones that appear also in the last maintenance track for release (X-1) are fixes and clean-ups, while the remainder is enhancements.

Among the changes that went into v1.9.0 since v1.8.5, 23% of them were fixes that got merged to v1.8.5.6, for example, and this number has been more or less stable throughout the last year. Among the changes in v2.3.0 since v2.2.0, 18% of them were also in v2.2.2. Today's preview v2.4.0-rc0, however, has 333 changes since v2.3.0, among which 110 are in v2.3.4, which means that 33% of the changes are fixes and clean-ups.

These fixes came from 33 contributors in total, but changes from only a few usual suspects dominate and most other contributors have only one or two changes on the maintenance track. It is illuminating to compare the output between

$ git shortlog --no-merges -n -s ^maint v2.3.0..master
$ git shortlog --no-merges -n -s v2.3.0..maint

to see who prefers to work on new shiny toys and who works on product quality by fixing other people's bugs. The first command sorts the contributors by the number of commits since v2.3.0 that are only in the 'master', i.e. new shiny toys, and the second command sorts the contributors by the number of commits since v2.3.0 that are in the 'maint', i.e. fixes and clean-ups.

The output matches my perception (as the project maintainer, I at least look at, if not read carefully, all the changes) of each contributor's strength and weakness fairly well. Some are always looking for new and exciting things while being bad at tying loose ends, while others are more careful perfectionists.

Wednesday, March 25, 2015

Git Rev News

Christian Couder (who is known for his work enhancing the "git bisect" command several years ago) and Thomas Ferris Nicolaisen (who hosts a popular podcast GitMinutes) started producing a newsletter for Git development community and named it Git Rev News.

Here is what the newsletter is about in their words:

Our goal is to aggregate and communicate some of the activities on the Git mailing list in a format that the wider tech community can follow and understand. In addition, we'll link to some of the interesting Git-related articles, tools and projects we come across.

This edition covers what happened during the month of March 2015.

As one of the people who still remembers "Git Traffic", which was meant to be an ongoing summary of the Git mailing list traffic but disappeared after publishing its first and only issue, I find this a very welcome development. Because our mailing list is a fairly high-volume one, it is almost impossible to keep up with everything that happens there, unless you are actively involved in the development process.

I hope their effort will continue and benefit the wider Git ecosystem. You can help them out in various ways if you are interested.

  • They are not entirely happy with how the newsletter is formatted. If you are handy with HTML, CSS or some blog publishing platforms, they would appreciate help in this area.
  • They are not paid full-time editors but doing this as volunteers. They would appreciate editorial help as well.
  • You can contribute by writing your own articles that summarize the discussions you found interesting on the mailing list.