There has always been a need to be able to quickly determine the size and complexity of a program, especially for development managers, as they consider scheduling, cost estimations, assigning projects to staff and projecting expected completion dates. The article “When Size Does Matter: The Evolution of Techniques to Gauge Program Size,” available at http://esmpubs.com/uuazl, discussed attempts to do this using lines of code and the Halstead Metrics. The Halstead Metrics help developers glean more insight into the complexity of a program by looking at the actual verbs and variables used in the program instead of just the lines of code.

That article examined Unique Operators and Operands, Total Operators and Operands, Vocabulary and Length. Vocabulary and Length, in particular, get to the heart of what’s important in a program. But there are other calculations based on the count of Operands and Operators, a program’s unique verbs and variables, respectively. We’ll dig into those calculations, but first, let’s review the Halstead Metrics that will be examined here along with their commonly used descriptions:

• Computed Length (N^) is a prediction of program length and is computed as n1log2n1+n2log2n2.
• Volume is a measure of the size of a piece of code (computed Nlog2n) compared to potential volume (n*log2n*), where n* is the size of the potential vocabulary.
• Level is the ratio of potential volume to actual volume, a measure of how abstractly the program is written.
• Intelligence Content is the total content of your program and is computed by multiplying program level and volume.
• Difficulty is computed as D = n1/2 * N2/n2. The difficulty measure is related to the difficulty of the program to write or understand, such as when doing a code review.
• Time Required to Program is computed as T= e/18. This is an estimate of how long it would take to code the program, in seconds.
• Number of Delivered Bugs is calculated using this commonly used formula: Volume/3000.
• Maintenance Effort is computed as E = D(Difficulty) * V(Volume) and is a measure of the effort required to maintain the program, based on program clarity, computed as the ratio of Volume to Level. The lower the number, the easier the program will be to maintain. This is a relative number that can be compared to other programs to help you determine which would require more effort to maintain.

The first five metrics are for studying programs in the abstract. They aren’t as practical as Vocabulary and Length, but they’re clear measures of size and they also attempt to go a little further into the nature of the program. To use a baseball analogy, they’re like earned run average (ERA) or runs batted in (RBI), while Vocabulary and Length are like hits and runs. Since they’re further from the actual building blocks of Operands and Operators, they’re more difficult to understand, explain and, therefore, use.

The Difficulty, Time Required to Program and Number of Delivered Bugs metrics are all appealing, especially the latter metric. However, are they reliable? The concept is that in code there will be bugs; and it assumes a certain number of bugs for a given Volume. So, as the size of the program grows, so does the number of potential bugs. But that’s just it; it’s an assumption of potential bugs in new code. The Number of Delivered Bugs makes no allowances for the skill of the developer or the type of logic involved. Even assuming the number is correct, how would you use that? Would you compare that count against the number of bugs you found and keep testing until they matched; i.e. until you found them all? But if you did that, and then fixed the bugs and were able to keep the Volume similar, the number of delivered bugs would remain.

The Difficulty, Time Required to Program and Number of Delivered Bugs metrics are best used for new code, which may help guide you in test plans. But, since they’re all related to size, other metrics would also serve that purpose without promising something like a definitive number of bugs. Before you pursue any metric, study it and determine how you would really use it. Ask yourself, does the metric really provide what I need? Will everyone understand what it really shows so it isn’t misinterpreted? If you can’t clearly and easily explain how a metric is calculated, and the basis for that calculation, then it will be difficult to get acceptance.

The Most Useful Metric
While there’s a strong correlation between the Halstead Metrics and software lines of code, there are some nuances captured in the Vocabulary and Length that can help you understand programs better. The same can be said for the Maintenance Effort.

The Maintenance Effort proves to be the most useful among all these other Halstead Metrics because it provides a level of granularity that helps you spread out the programs in your portfolio based on how difficult or easy they will be to maintain. The Maintenance Effort will produce a very large number. We’ve found that a value of more than 4.5 million is the cutoff for very large programs, and the range of 1.5 to 4.5 million is where most programs fall. Smaller programs fall below 1.5. We’ve worked with QA teams that used Maintenance Effort to guide testing efforts. Those programs in the highest range are targeted for more testing than those in the lowest. 

The Maintenance Effort is useful in reviewing application portfolios. For this, we typically calculate the average Maintenance Effort and Standard Deviation. The Standard Deviation provides insight into distribution so you can gauge the consistency of the programs. Is this an application with one main program and many small subprograms, or are they evenly distributed? Of more importance, it provides a way to rank the programs by how many deviations there are from the mean, allowing you to understand the comparative size of any one program. The Standard Deviation will also provide some guidelines for relative size within a portfolio so you can then classify programs by their rank for estimation or testing.

Your focus should be on those programs that are two deviations above the mean. These are the programs that everyone fears—the ones everyone dreads having to touch. This will be fairly obvious when reviewing familiar applications, but what if you have a new application to support? How will you quickly classify the complex programs and identify the troublesome ones? The Maintenance Effort method, coupled with an automated tool to calculate the metrics, will provide you with a quick estimation so you can compare these metrics with the metrics for familiar programs.

The Maintenance Effort can also supplement Vocabulary in helping you rank the programs in your portfolio for change estimation and testing efforts. They’re great for comparing two programs. But remember, these metrics are based on size with the assumption that with an increase in size comes an increase in complexity. They don’t account for how the logic is actually coded. For example, they have no insight into the number of code paths. You could find two programs with very close Maintenance Effort numbers, yet inside, the structures are very different. To gain that insight, you will need different metrics. What’s needed is a way to break up the program and look at the actual sections you need to touch. For example, how many decision points are there in the section where you must make a change? If there are a few, then it should be easy to work with and test, but what if there are hundreds? The metrics mentioned here provide an accurate view of the entire program, but no insight into the various sections. For accurate assessment, you will need to look deeper into the program and other metrics can guide the way.

For More Information
Elements of Software Science by Maurice H. Halstead (1977). Amsterdam: Elsevier North-Holland, Inc. ISBN 0-444-00205-7.