Each day, your development team likely performs many check-in or check-out operations from your ClearCase versioned object bases (VOBs). Likewise, if a project has engaged a Continuous Integration process, many software builds may be queued and executed each day. Consider how many ClearCase operations your team performs over the lifetime of a project, and it is easy to see how even a small improvement in the speed of these operations can save a significant amount of time.
Over the years, I have worked with development teams of all sizes and geographic distributions, helping them use ClearCase more effectively and efficiently for software configuration management (SCM). I think it is fair to say that all of them appreciated any efforts that would enable them to get more work accomplished in a day, and ultimately complete projects faster. Whether you are a ClearCase administrator facing a performance problem, or you simply desire to improve performance to give your team’s productivity a boost, it helps to have a plan.
This article on principles and techniques for improving ClearCase performance, provides an overview of the principles of performance assessment and advice on how to apply them in a ClearCase environment. It presents an approach that I have found useful in diagnosing performance issues and arriving at a solution,1 and uses a case study to illustrate this approach.
When I address a performance problem, I start by gathering some general information about the ClearCase environment. I try to identify characteristics of the problem and determine how the problem manifests itself. Performance issues can be broadly classified into two categories: issues that suddenly appeared, and those that gradually worsened over time. Slowdowns that have a sudden onset are usually easier to diagnose and fix, as they are often related to a recent change in the ClearCase operating environment. Performance issues that evolve over a long period of time – such as a year or more – are more difficult to solve.
In many ways, the questions you ask to diagnose a performance problem are similar to those for tracking down a bug in an application. Is the problem repeatable or transient? Is it periodic? Does it happen at certain times of day? Is it associated with a specific command or action such as a checkout or checkin operation? For example, with ClearCase, does the problem only happen when a build is performed using clearmake or some other tool? And, as with programming bugs, the performance issues that you can reproduce easily -- such as those associated with specific commands -- are easier to deal with. Intermittent problems are, by nature, more challenging.
Once you have a better understanding of how the problem manifests itself, you can start digging deeper to determine what exactly is happening in the various systems that ClearCase relies on.
The first principle of performance analysis and monitoring
Systems are a loose hierarchy of interdependent resources2:
The first principle of performance analysis is that, in most cases, poor performance results from the exhaustion of one or more of these resources. As I investigate the usage of these resources in an ClearCase environment, I look first for obvious pathological symptoms and configurations -- that is, things that just don’t belong. As an example, I recently was looking into a performance problem at a customer site. A quick check of the VOB server revealed that it was running 192 Oracle processes in addition to its ClearCase server duties. Whether that was the cause of the performance problem was not immediately obvious, but clearly, running that many memory-intensive processes on a VOB server is not a best practice.
In fact, that leads to another principle of performance analysis: Beware of jumping to conclusions. Often one problem will mask a less obvious issue that is the real cause of the problem. Similarly, be careful not to let someone lead you to a conclusion if he or she has a preconceived notion about what they think is causing the problem. It’s important to recognize that their notion may not really be the explanation for the problem.
In performance analysis, I often think of a quote by physicist Richard Feynman, “The first principle is that you must not fool yourself, and you are the easiest person to fool.” One must remember not to fall into the trap of believing that the first thing that looks wrong is really the primary problem.
The Performance Stack
Tackling an ClearCase performance problem can be a complex task. I find it a great help to partition the problem into three levels that comprise a “Performance Stack,” shown in Figure 1. At the lowest level are the operating system and hardware, such as memory, processors, and disks. Above that are ClearCase tunable parameters, such as cache size. At the highest level are applications. In ClearCase, the application space includes scripts that perform ClearCase operations, and ClearCase triggers that execute automatically before or after a ClearCase operation.
As a general rule -- and barring any pathological situation -- as you move up each level in the performance stack, you can expect the performance payback from your efforts to increase by an order of magnitude. If you spend a week tweaking and honing parameters in the operating system kernel, you might see some performance gains. But if you spend some time adjusting the ClearCase database caching parameters, you’ll see about a tenfold performance gain compared to the kernel tweaks. When you move further up and make improvements at the application layer, your performance gains will be on the order of 100 times greater than those garnered from your low-level efforts. If you can optimize scripts and triggers, or eliminate them altogether, you will see huge paybacks.
With that in mind, you may be tempted to look first at the application layer. But as a matter of principle, when I do a performance analysis, I start at the bottom of the stack. I instrument and measure first at the OS and hardware level, and I look for pathological situations. Then I move up into the tunable database parameters, and I look at the application level last.
There are a number of reasons for this order of investigation. First, it is easy to look at the OS and hardware to see if there is something out of place going on. There are very basic and well-understood tools you can use that are easy and very quick to run, and anything out of the ordinary tends to jump right out at you -- like the 192 Oracle processes for example. Similarly, ClearCase provides utilities that will show you its cache hit rates and let you tune the caches. These utilities are also very simple to use.
I look at the application layer last because it is usually far more complex to deal with. It is more complex technically because it has multiple intertwined pieces. It also tends to be more complex politically because scripts and triggers usually have owners who created them for a reason and might not approach problem-solving the same way you do or who may not appreciate your identification of their composition as a possible source of performance problems
Another reason for starting at the lowest level is simply due diligence. You need to verify the fundamental operations of the system. I start there, but I don’t necessarily spend a lot of time there—it’s not where you get the most bang for your buck. I don’t spend a lot of time with the ClearCase tunable parameters, either. It is usually a very quick exercise to examine the caches, adjust the parameters, and move on.
If the system is out of memory, then that is issue number one. You should add more -- it is an inexpensive, fast and easy fix. If you were to start at the top, you might tweak on triggers and scripts for a month, and never get to the fact that you are out of memory. By getting the lower two layers out of the way first, it gives you time to deal with the application layer. If you have enough time to optimize -- or even eliminate -- the application layer, then that’s where you will have the greatest impact on improving performance. Iterate, iterate, iterate and keep going.
Performance tuning is an iterative process:
You can keep following this cycle indefinitely, but eventually you’ll come to a point of diminishing returns. Once you find yourself tweaking the OS kernel or looking up esoteric registry settings in the Microsoft knowledge base, you are probably at a good place to stop, because you are not likely to get a big return on your investment of time.
As you iterate, keep in mind the hierarchical nature of performance tuning. Remember that memory rules all. Symptoms of a memory shortage include a disk, processor, or network that appears to be overloaded. When a system doesn’t have enough memory, it will start paging data out to disk frequently. Once it starts doing that, the processor is burdened because it controls that paging, and the disk is working overtime to store and retrieve all those pages of memory. Adding more processing power or faster disks may help a little, but it will not address the root cause of the problem. Check for and fix memory shortages first, and then look at the other things.
Where to look
ClearCase is a distributed application. Its operations involve multiple host computers as well as several common network resources. For the purposes of solving a performance issue, I like to think of the ClearCase world as a triangle whose vertices are the VOB server, the view server, and the client3 (see Figure 2). When I undertake a performance analysis, I inspect each vertex on the triangle. I check the performance stack on each of those hosts, make sure that each has enough memory, and look for abnormal situations.
Senior Area Sales Director, Software Change and Configuration Management
Connect with me on LinkedIn