Robert Cowham's Weblog 1 of 1 article Syndicate: full/short

Microsoft, SCM and "Large" Projects   13 Mar 07
[print permalink all comment ]

There have been some interesting posts recently on the challenges faced by Microsoft in developing Vista. It has of course fairly recently made it out of the door which in itself is a major achievement. However, it has become clear that many features originally scheduled for inclusion have been dropped (WinFS for example). The sheer challenge of developing such a vast system is immense, and a key part of any such development effort is the change, configuration and release management.

Some fairly recent blog postings have highlighted some of the issues involved, and seeing as SCM is right in the mix, I though it worth a review.

Overview of Vista - Paul Thurrott's Supersite for Windows

The sheer size and scope of Windows Vista makes it difficult to review, to digest, and to understand. If you step back too far, it doesn't look very impressive at all: It's like XP with a spit-shine. But if you get too close, it's easy to get lost in the seemingly never-ending lists of new features.

A previous article by Paul bemoaned various lost features.

Joel Spolsky wrote an article How many Microsofties does it take to implement the Off menu? reflecting on UI choices and how less is usually more.

This attracted a comment (or was it coincidental): Moishe Lettvin's Windows Shutdown Crapfest posting:

The most frustrating year of those seven was the year I spent working on Windows Vista, which was called Longhorn at the time. I spent a full year working on a feature which should've been designed, implemented and tested in a week..

The key SCM-related part of Moishe's post is:

In small programming projects, there's a central repository of code. Builds are produced, generally daily, from this central repository. Programmers add their changes to this central repository as they go, so the daily build is a pretty good snapshot of the current state of the product..

In Windows, this model breaks down simply because there are far too many developers to access one central repository. So Windows has a tree of repositories: developers check in to the nodes, and periodically the changes in the nodes are integrated up one level in the hierarchy. At a different periodicity, changes are integrated down the tree from the root to the nodes. In Windows, the node I was working on was 4 levels removed from the root. The periodicity of integration decayed exponentially and unpredictably as you approached the root so it ended up that it took between 1 and 3 months for my code to get to the root node, and some multiple of that for it to reach the other nodes. It should be noted too that the only common ancestor that my team, the shell team, and the kernel team shared was the root.

Joel Spolsky then chimed in:

Of all the things broken at Microsoft, the way they use source control on the Windows team is not one of them...

When you're working with source control on a huge team, the best way to organize things is to create branches and sub-branches that correspond to your individual feature teams, down to a high level of granularity. If your tools support it, you can even have private branches for every developer. So they can check in as often as they want, only merging up when they feel that their code is stable. Your QA department owns the "junction points" above each merge. That is, as soon as a developer merges their private branch with their team branch, QA gets to look at it and they only merge it up if it meets their quality bar.

I think there are various interesting things about this discussion.

Managing Complexity and Reducing Dependencies

From the days of The Mythical Man Month: "Adding manpower to a late software project makes it later", it has been clear that the challenges of managing large scale systems are many. The whole book is a joy to read, and the Wikipedia articles referenced give a good highlight.

I particularly like the idea of Conceptual Integrity to help keep things simpler both in development and control.

Microsoft's products would appear almost by design to have a very high level of interdependencies between them (although this is changing: "And what happened is as the projects got larger and larger, we introduced too many complex interdependencies on early software, more so than we could really digest throughout the system," said David Treadwell, corporate vice president of the .Net Developer Platform group).

One advantage of the dependencies is that if your customer buys one product then they pretty much have to take a slew of accompanying products too.

Branches and Sub-Branches - Your SCM Tool

Back to Joel's comment that the best way to organise things is branches and sub-branches. This is a fairly classical SCM approach to the problem which has many benefits.

It is interesting to note that Joel mentions SourceDepot which Microsoft uses internally, and which is a re-badged version of Perforce circa 1999 (when win2k was in development), with some Windows specific improvements such as memory usage, but from what I can gather no fundamental algorithm improvements. Perforce of that era only had the ability to easily propagate changes between directly related parent-child branches. Things like grand-parent <-> grand-child propagation skipping over intervening parent were possible but tricky - the tool didn't default to handling it. (Note that I am interested in the original comment about there being a tree of repositories, as opposed to a tree of branches - if the former, then it makes things much more difficult to improve merging across separate repositories).

Microsoft's new Team Foundation Server tool (uses a similar branching model and terminology but totally re-architected it would appear), does not yet support good common ancestor detection either.

Over the years Perforce has addressed this in various ways, and from 2005 it is working well and with reasonable performance at large sites. Thus in Perforce you can pull changes from one sub-branch to another without going via the parent, and changes propagated back to the common parent will not cause major problems for either sub-branch. There still remains a very large question as to whether "uncontrolled" propagation of changes is a good idea (as Laura Wingerd puts it - "why we don't allow driving through hedges" - Ch7), and my advice is certainly to think carefully and plan your normal strategy (with tightly locked down exceptions maybe permissible). It still comes back to managing complexity: the more you allow changes to be propagated all over the place in an ad-hoc fashion, the harder it is to track what has gone one.

As mentioned by Joel, Accurev is a very interesting looking tools and their stream browser certainly looks attractive, and has raised the bar in terms of features for the SCM vendors. My impression is that Perforce still has the advantage in terms of scalability (see Google paper) and performance, but I would certainly put Accurev on my evaluation list if advising a client in this area.

 

Copyright © 2008 Robert Cowham