Google “software metrics” and you will find an array of articles—and opinions—about the real usefulness of such metrics, particularly as they relate to determining the size and complexity of a program or application.

Older developers will remember punch cards and how they were used to determine the size of a program. Each line of code in a program was displayed on a punch card, and when stacked, the cards provided a simple visual gauge of the program’s size. By knowing the size, a developer could determine how much time it would take to modify the program or write a similar program. These were important considerations for scheduling, making cost estimations, assigning projects to staff and estimating expected completion dates. 

However, just knowing how many lines of code, while helpful, wasn’t enough to really gauge the complexity of a program. Often, development managers would rely on their feelings about the program itself, the nature of the change as well as who did the change and the testing. But due to the increased workloads teams are facing these days, and the outsourcing of application development projects, managers don’t necessarily have that same connection with the application or the person who wrote the code and did the testing. 

Increasingly faced with unfamiliar applications, how could good decisions be made—and quickly? An impartial and automated way to assess the nature of the application, program, change and testing was necessary. “Impartial” here can be described as something that elevates the decision-making process above an individual’s “gut” instinct and provides consistent results across the board. Automation was also key because developers didn’t need yet another step that could be seen as a roadblock. 

SLOC or Slack?

One gauge often used to assess the size of a program is Software Lines of Code (SLOC). Useful in comparing programs written in the same language using the same coding standards, SLOC measures the size of the program via logical lines of code, which in turn helps a developer predict how many hours it will take to develop and later maintain a similar program. A program with a high SLOC value will take longer to develop.

SLOC can also be used a gauge a developer’s productivity; the more lines of code written, the more productive, or so the thinking goes. Many argue, however, the metric doesn’t speak to functionality and quality—whether it’s effective or if it had to be repeatedly debugged before it worked. In other words, a developer can write fewer lines of code and yet be more productive in terms of functionality than someone who writes copious amounts of code that doesn’t do what it’s supposed to do.

There are two kinds of SLOC measures: straight, or physical SLOC, and logical SLOC. Physical SLOC is a straight count of lines, including all comments, spaces and definitions. But because of all the variables involved, it doesn’t provide a real comparison with other programs and applications because these elements will vary from program to program. Physical SLOC is still used today because of its simplicity, but it doesn’t really say much about the program or the application. It’s the modern day equivalent of the old card deck.

Two things must be known about a program for a better comparison: how many statements it has and how many variables it has. Logical SLOC measures the number of executable statements, and thus is a better metric for program comparisons. By understanding how much of the program is involved with logic, a developer can get closer to understanding the true size of a program. Likewise, a manager can gauge workload by using this metric. By knowing how many statements the developer will have to review to make a change, the manager can more accurately estimate the time it will take to evaluate and carry out the task.

While logical SLOC may still be the most commonly used program metric, mainly because it’s easy to produce, it’s not without challenges. Its main detractor is that it lacks real insight into a program. When programs were on card decks, it was an easy measurement, but not all decks were created equally. You could have two decks of the same size, so the same number of lines of code, but the programs did two totally different things, and thus the logic and number of variables could vary greatly. In addition, complex applications can use many programming languages, and some programming languages require fewer lines of code to accomplish the same task. Further, depending on the developer and/or coding standards, one line of code in one program could be written on many separate lines in a similar program. For these reasons, a more precise metric was needed to assist in estimating how long it might take to implement a change. 

A Better Metric

As part of his treatise on establishing an empirical science of software development, Maurice Halstead introduced complexity measures in 1977. These metrics, among other things, enabled developers to glean more insight into a program by looking at the actual verbs and variables used in the program instead of just the lines of code.

The calculation begins with a count of the Unique Operators and Operands, a program’s unique verbs and variables, respectively. These numbers are added together to come up with the Vocabulary, a key measurement. Next, the total number of verbs and variables are counted, providing the Total Operators and Operands counts. Total Operators and Operands are combined to produce the Length.

Following are the Halstead Metrics values:

Unique Operators (n1) are the unique or distinct number of verbs and elements other than data elements occurring in your program. Operators are syntactic elements such as +, -, <, >.
Unique Operands (n2) are the unique or distinct number of data elements occurring in your program. Operands consist of literal expressions, constants and variables.
Total Operators (N1) are the total number of verbs and elements other than data elements occurring in your program. Paired operators such as BEGIN .. END, DO .. UNTIL , FOR .. NEXT are treated as a single operator.
Total Operands (N2) are the total number of data elements occurring in your program.
Vocabulary (n) is the number of unique operators and operands in your program, computed as n1+n2. This is an estimation of the size of the program’s vocabulary (the number of things that must be known to understand the program).
Length (N) is the length of your program, computed as N1+N2.

Determining the Complexity of a Program: A Comparison of Measures

The four programs noted in the chart in Figure 1 progress in size from very small to relatively large. Straight Lines of Code shows PDA008 as the largest. The next metric is Comment Lines, which illustrates how well-documented the programs are. No surprises here; the larger the program, the more comments. (A developer will want to look at the ratio of comments to statements to get a real feel of how well a program is documented.) The Statements count provides a better idea of size and again, PDA008 is still the largest.
Moving on to the Halstead Metrics, rows four through nine, the unique operators counts don’t differ much among the first three programs, but the count is much larger for the last. When we examine unique operands, which is actually a count of all the variables used in the program, not merely defined, PP110 has the most variables to understand. The unique counts provide a base of what’s included in the program, but not actual size. To see how often the operators and operands are used, we look at the total counts. Here we can start to see the growth in complexity. 

The most important metrics are next—Vocabulary and Length. It’s often said that Vocabulary comprises the number of elements a developer should know to truly understand the program. For TRIMAIN, it’s 33 things—manageable for most developers. CWXTCOB has 218, so it would be more challenging to handle. PP110 and PDA008 are interesting. Because there are more unique variables used in PP110, there are more things a developer will need to follow to understand the program, but PDA008 has slightly fewer variables, so slightly less time would be required to grasp its complexity. The level of intelligence that Vocabulary provides is simply not something revealed in LOC or Statements, making it a useful gauge. Finally, there’s Length, and we can easily see the size progression in the programs.

Conclusion

From the beginning of computing, program size has mattered and will continue to do so. Techniques to gauge size have evolved from card decks to more sophisticated means. But there’s more to understanding and comparing programs than just their apparent size. Halstead Vocabulary and Length provide more reliable metrics that aid a development manager in deciding which team member to assign a project and how long it might take to implement a change.