Friday, May 23, 2008

On some revision control systems

So I've long wondered about the advantages of those shiny, modern (distributed) revision control systems that have seemingly become quite fashionable these days. I started with CVS and liked it, but once I moved to Subversion I have started to feel a dissatisfaction with my version control system, the kind that would not go away if I went back to CVS. It's like driving a car. Once you drive a faster car, you realize the faster car is not fast enough. Obviously, going back to the original car is a step in the wrong direction.

CVS

The first revision control system I used was CVS. I liked it when I got used to it. It let me keep track of my files, was easy to back up and was fast enough for my little projects. There were good workarounds for CVS' limitations such as "renaming" files by performing an explicit add and then remove operation or by doing a "repo-copy" (i.e. copying the files in the repository itself). Empty directories could be pruned on updates. What else would one want?

Subversion

Well, I have long wondered why I should be using Subversion instead of CVS. After all, CVS has worked well for me in the past and simply "because it can rename files" hasn't been too convincing. In fact, I have heard that argument so often and from so many people, it started to become a reason not to switch to Subversion. Well, then I gave Subversion a shot and I have to say, I like it - with some limitations.

But let me first say what I think is good about Subversion. I like Subversion's speed advantage over CVS. The reason I started using Subversion was that I wanted a way to track my own changes to a rather large source tree that is kept in CVS. I wanted to make periodic imports of snapshots and merge my local changes - similar to how vendor branches work in CVS. When trying to accomplish this with CVS, it became apparent that it would be very time consuming: An import and merge session could take several hours. Doing the same thing with Subversion accellerated the process quite a bit - an update session would still take about an hour, but mostly because I had to download the updated source tree from the net.

Ok, that's about it when it comes to subversion's advantages. What I don't like is subversion's poor handling of branches. I don't think a branch is another directory, I think a branch is a branch. The same holds true for a tag. Also, merging branches is a major pain - while simple at first, it will get to the point where keeping track of what has been merged and what needs merging is a complex task . Granted, CVS isn't a whole lot better at that.

To set things straight: I'm not saying Subversion is bad. All I'm saying is that it isn't a lot better than CVS for my purposes.

Mercurial

So now on to the reason I started this post. I realize, it has become a lot longer than anticipated, but who cares?! I've read about the advantages that distributed revision controls offer some time ago. Having all the history available in with every working copy is one of them. The ability to keep track of merges between branches is another one. It's the latter that got me interested in Mercurial. While I realize that upcoming versions of Subversion will support merge tracking as well, the version(s) installed on my various computers don't support it - and I don't want to compile some development branch of a critical (to me) piece of software.

So I looked at other choices, e.g. git and Mercurial. To be honest, I haven't looked at git because I heard it is supposed to be hard to use with more than 100 commands. So I started to look at Mercurial and played around with it. I like it, so I don't think I'll look at git anytime soon.

Mercurial has (almost) everythig I need: It's fast, it's easy to use and it handles merges between branches well. I'm the "almost" can be erased as soon as I dig further. What's still missing is an easy setup for a permanent, central "master" repository (short of the "hg serve" command which starts a minimal HTTP server). I'm also missing a web frontent similar to CVSWeb and SVNWeb - I'm sure such thing does exist, but I haven't found it yet. The third thing I haven't figured out yet is how to do backups.

I'd like to write a bit about the differences between Mercurial and the other systems I've used. First, a revision of a file has up to two parents. Actually, it always has two parents, but one may be the null parent, and that doesn't count. You start out with a file that has two null parents. Once you commit a change to the file, the file's parent becomes the previous revision, and so on. If a revision has only one parent, and no other revision uses it as it's parent, then that revision is called a head.

The second parent comes in the play, when you have multiple branches. Creating a branch is also different from other systems. You first switch a working copy to a new branch by issuing hg branch <name<. The default branch, i.e. what's called "HEAD" in CVS and "trunk" in Subversion is called "default" in Mercurial. The default branch always exists. Switching a working copy to a new branch does not create the branch, yet. Only commiting a first change in that working copy does. Note that the first revision's parent will be the branchpoint revision.

So what happens when you merge between branches? When you merge and commit, the current branch will contain all changes of the original branch since the last merge (or the branchpoint). You don't need to remember which changes you need to merge into the current branch - Mercurial does that automatically for you. This is possible because when merging, a file's second parent is filled in with the head revision of the originating branch. That also means, that when you merge changes from branch A into branch B, the head revision of branch A is no longer a head. Don't worry, though, once you commit something on branch A again, it will have a head again.

Now on to the distributed part. The concept of branches is taken further in a distributed revision control system. Multiple instances of the repository can have dependencies between each other. A new copy of a repository is created by running the "hg clone" command. Then, changes to that repository copy can be made as if the original never existed. But, if you want to, you can also pull any changes the original repository has incorporated. Do this by running "hg pull" - it's really just merging all changes from your master repository into your copy. It also works the other way around: You can push your changes upstream by running "hg push" (if the upstream repository allows it).

All in all, I find Mercurial very easy to use once the basic concepts have been understood. I'm not sure yet whether I'll convert any of my Subversion repositories to Mercurial or if I'll use seriously at all. But for future reference, here's a link to a free book on Mercurial.

No comments: