Wednesday, August 31, 2011

How to inject a malicious commit to a Git repository (or not)

[Note: there are follow-up articles here and there]

[Note: some site seems to have misreported that I outlined how one can forge a history stored in Git here, but the point of this article is how impractical and unrealistic it is for anybody to do so without letting other people take notice.]

Suppose if you momentarily gained write access to other people's public repositories at a large distribution point, such as kernel.org. What damage can you inflict on their projects if you wanted to?

You could create a malicious commit on top of the tip of "master" branch of linux.git repository of Linus Torvalds. Nobody prevents you from pretending that you are Linus:

$ GIT_AUTHOR_NAME="Linus Torvalds" \
  GIT_COMMITTER_NAME="Linus Torvalds" \
  GIT_AUTHOR_EMAIL=torvalds@linux-foundation.org \
  GIT_COMMITTER_EMAIL=torvalds@linux-foundation.org \
  git commit -s


Your English may be good enough to fool readers into believing that the log message may have come from Linus himself. Perhaps you may have done this around August 12th, when the tip of Linus's true "master" branch was commit M and X is the malicious commit you created on top of it. The resulting history may look like this:

--M
   \
    X

If an unsuspecting victim pulls regularly from Linus's repository, he may run a git pull before your malicious commit is discovered in security audit. And he may have already based his derivative product based on this malicious version of the kernel.

Is this a big "Oops"? We'll see what happens to this unsuspecting victim later.

When Linus tries to upload his updated work, however, the history on his development machine (which is not the distribution point you managed to add your malicious commit) does not have your commit X. In Git terms, the history you tweaked and the history Linus has now diverged:


--M---o---o---o---o---o---o---o---L
   \
    X

where M is the original tip of the "master" branch at the public repository, X is the malicious commit you created and updated the "master" branch to point at, and L is the tip of the history Linus is about to upload. We say "L does not fast-forward to X", as X is not part of L (time flows from left to right).

What happens now is that "git push" Linus runs to upload to his public repository notices that updating the "master" branch at the public repository with the tip of his history will lose commit X you created (it does not notice that the commit that is about to be lost is a malicious one, nor does it notice it was not made by Linus, but it does not have to notice either at all for this protection to work), and refuses to do so. Linus would definitely notice that something fishy is going on, because he needs to do something he usually never does to push his changes as his next step.

If this were a shared repository setting, Linus may say "Ah, somebody else beat me to it", then runs "git pull" to merge work by other people who share the same public repository (i.e. you) to his tree to create a merge commit Y, and then pushes the result again:


--M---o---o---o---o---o---o---o---L
   \                               \
    X-------------------------------Y


In the end, your malicious commit X could end up in the resulting history this way, provided if he does such a merge, and if he does not inspect the merge Y.

But Linus (or any kernel people with publishing repositories at kernel.org in general) does not work using a shared repository with other people to begin with. The repository at kernel.org is his publishing repository and his alone, so you cannot sneak your malicious commit into his history through this avenue.

Linus could choose to be careless and force his push, without bothering to investigate why his push does not fast-forward (in real life, this is not going to happen, but for the sake of mental exercise, imagine that he chose to be careless and let's see what happens). This will eliminate your malicious commit from his public repository. If he did so, the repository would look like this:


--M---o---o---o---o---o---o---o---L

Your malicious commit X would not have any effect to people who pulled from Linus's public repository after this happens, but what about the unsuspecting victim who pulled X before Linus forced this push? Is he contaminated with your malicious commit and will not notice it forever?

Remember, as far as he is concerned, Linus's history he pulled earlier, which is kept in his origin/master remote tracking branch, was X, and then it is being updated to L, which does not fast-forward. His "git pull" (actually it is "git fetch" that is invoked as part of "pull") will notice and would report:

From git://git.kernel.org/.../torvalds/linux.git/
 + 9d901d9...ad4d968 master     -> origin/master  (forced update)


Notice "forced update"? The unsuspecting victim can notice that the side branch lead to X is no longer part of Linus's history.

One security tip I would offer here is this. If you know that your upstream (in this illustration, Linus) never rewinds his history, you can tweak your .git/config file (open it with your favorite $EDITOR, it is a simple text file and is designed to be editable by hand) and drop the '+' sign from the "fetch" line. Find a line that looks like this:

[remote "origin"]
        fetch = +refs/heads/*:refs/remotes/origin/*


And edit it to make it look like this:

[remote "origin"]
        fetch = refs/heads/*:refs/remotes/origin/*


This will make your "git pull" (again, it is actually "git fetch" that is invoked from the command) to fail when the upstream rewound the history, like this. You will see that the command fails like so when you pull from Linus:

From git://git.kernel.org/.../torvalds/linux.git/
 ! [rejected]        master     -> origin/master  (non-fast-forward)


We might want to revisit the default settings "git clone" leaves in your new repository to make it harder for upstreams to rewind their branches by dropping the '+' (which means "allow non-fast-forward), but that will have to be discussed on the Git mailing list (git@vger.kernel.org), not in this blog post. There is a reason we didn't make it default to insist on fast-forwardness.

By the way, it does not make an iota of difference to the above story if you rewrote the commits that lead to M (i.e. the old tip of the "master" branch of Linus's history) using "rebase" or "commit --amend". The only difference is that such a change will move the fork point of the diverged histories from M (in the above story) further back to a different commit that is older than M in the ancestry chain. The history Linus will try to push to his public repository L will not fast-forward to the commit you place at the tip of the "master" branch that contains your malicious version, and that is the only thing that matters.


4 comments:

Jeff King said...

I think there is one other thing to note for the "victim" who accidentally fetches the malicious commit: they do not live or fetch in a vacuum. It only takes one pusher like Linus, or even one careful fetcher, to notice that the malicious commit has been installed and sound an alarm. And what happens then?

Unless the hack was very clever (e.g., a trojan git serving different but valid history to different clients), once the alarm is raised through out-of-band channels (email lists, notices on web sites, etc), then everyone will know the sha1 of the malicious commits. And git can do what it does best: manage versions of your code. It becomes trivial to find out if you have accidentally built on top of the bad code, and it can be excised via "git revert" or even a rebase. You can treat it just like a regular buggy commit.

I can only imagine after kernel.org's recent announcement that everyone is double-checking their repositories.

Anonymous said...

What if someone puts some changes into commit before it is pushed. Could the original author unknowingly introduce the malicious change to the history without raising any flags?

Corsac said...

The danger is not a bad commit in Linus tree but more in some random tree of a random developper or team. In this case the tree might be shared, and people might do a git pull before pushing, legitimating the commit. What whould happen then, if through merge request the commit ends in mainline?

geekaholic said...

What if Linus used multiple machines, wouldn't it be like sooner or later he will pull from upstream? Suppose did some coding on his main laptop and pushed it upstream then went on vacation in a hurry with his netbook and does some work offline. After coming back he pushes and it conflicts so naturally he pulls. So basically the security lies only in the fact you got one way communication which is inconvenient.