In an ClearCase community, the permanent repository of software artifacts consists of one or more VOBs, which are located on one or more VOB servers.
VOB servers are especially sensitive to memory, because of the performance benefits of caching the VOB database. With more memory, the VOB server can hold more of the database in memory. As a result, it will have to access data from the disk less often, a process that is thousands of times slower than memory access. For the VOB host, the ClearCase Administrator’s Guide recommends a minimum of 128 MB of memory, or half the size of all the VOB databases the host will support, whichever is greater. Heed the advice of the Administrator’s Guide: “Adequate physical memory is the most important factor in VOB performance; increasing the size of a VOB host’s main memory is the easiest (and most cost-effective) way to make VOB access faster and to increase the number of concurrent users without degrading performance.”
Typically, there aren’t many ClearCase tunable parameters on the VOB server. There are settings that you can use to control the number of server processes, but this function is rarely needed.
View server (a machine running the view_server process)
A view server manages activity in a particular ClearCase view. The view database resides on the view server, which in practice should not be the same physical machine as a VOB server. In some cases, the view server and the client can run on the same box, depending on the configuration.
As with the VOB server, the first areas to check are the fundamentals -- memory, other processes running, and so on. But on a view server there are more ClearCase parameters that can be adjusted. Views have caches associated with them, and you can increase the size of those caches to improve performance.
I’ve been to some customer sites where the VOB servers and view servers were performing well, but the client machines were woefully low on memory. The users complained about build problems because the compiler they were using was consuming all the available resources on the client. So if you’re checking in and checking out fine, but builds are slow, the client machines are a good place to look. As usual, check the OS and hardware level first. Also, the client may have MVFS (multiversion file system) caches that you can increase to improve performance.4
I’ll talk in more detail about how to check resources and tune ClearCase in Part II of this series.
Shared network resources
Figure 2 shows a cloud of shared network resources that are also very important to ClearCase performance. These resources include domain controllers, NIS servers, name servers, registry servers, and license servers. ClearCase must authenticate users before it allows operations. If the connection to the shared resources that are required for this authentication is slow, then user authentication in ClearCase will be slow. The registry server and license server are fairly lightweight and are often run on the VOB server, so connectivity to these resources is usually not an issue.
Latency is more important than bandwidth
The edges of the triangle in Figure 2 are important as well. They represent the connectivity between the VOB server, view server, and client. In an ClearCase environment, not all network performance metrics are created equal. Network latency – the time it takes data to arrive at its destination -- has a much greater impact on ClearCase performance than network throughput, the amount of data that can be sent across the network within a given timeframe. That is because in most cases, ClearCase is not moving enormous files around. What it is doing is making a lot of remote procedure calls, or RPCs.
As a quick review, an RPC is a particular type of message between two processes that can be running on different machines. When a client process calls a subroutine on a server, RPC data, including arguments to the subroutine, are sent over a lower-level protocol such as TCP or UDP. The server receives the RPC, executes appropriate code, and responds to client. Then the client receives the response and continues processing. RPCs are synchronous; that is, the client does not continue processing until it receives the response. It is important to note that there is a call and a return -- every RPC is a two- way street. If it takes 10 ms (milliseconds) for an RPC to flow from the client to the server, then the total RPC “travel-time” is 20 ms, plus processing time.
In a typical ClearCase transaction, either the MVFS or a client will send an RPC to the view server. The view server, in turn, calls an RPC on the VOB server. The response must first come back to the view server, and then a second response is sent back to the client
This process has two layers of RPCs, each with a call and a response. If you have network latency of 10 ms, then this particular transaction will require 40 ms. Although that may not seem like much time, it quickly adds up. A check-out operation may involve more than 200 RPCs, as ClearCase authenticates the user, verifies the license, locates the VOB, locates the view, and so on. So in this case, even with relatively good 10 ms latency, over the course of the entire operation ClearCase can spend more than a second waiting for data to arrive through the network.
Latency increases with every ”hop” -- or router -- that data must traverse en route from its source to its destination. The fewer hops, the better. So remember, with ClearCase performance tuning, it is not so much bandwidth that matters, but rather it is latency. You might have a network with gigabit throughput capabilities, but if an RPC call has to travel through a dozen routers, then you will be pay a significant performance penalty.
A subsequent installment of this article series will provide details on how to assess network latency and other network issues.
A case study
To illustrate the principles of ClearCase performance analysis and tuning we have just discussed, let’s look at a real-life case study. I was working with a customer that had been using ClearCase for about a year. They had implemented their own process, which included additional tracking and authorization -- they were not using UCM (Unified Change Management). The VOBs were all located on a single Solaris server, which had four processors and four GB of memory. The view server -- which they also used to perform builds -- was on a separate, but essentially identical, machine. Even with these fairly high-powered machines, the customer was complaining of poor performance during check-in and check-out operations.
Level 1: OS / Hardware
When we went in and talked to the system administrators, they were emphatic that the VOB and view servers were running just fine and were convinced that ClearCase was the problem. So we started with the performance stack, moving from the bottom to the top. We did our initial analysis at the bottom layer, looking for pathological things as well as the standard sweep of resource metrics --memory, processor, disk, and so on. We determined that the VOB server was fine but the view server was not.
As it turned out, this was the customer that had 192 Oracle processes running on the view server! These processes were consuming 12 GB of memory on a system with only 4 GB of physical memory. Our analysis quickly revealed that the system was out of memory, and that the processor utilization was very high-- the processor had zero idle time. But the core issue wasn’t processing power; it was memory.
We recommended that the customer remove the Oracle processes from the view server machine. After that, we suggested adding memory if it was still needed, and changing their user interaction model, so that they were not compiling on the view server. The customer rejected these recommendations and still insisted that ClearCase, not their systems, was the cause of the problem.
Level 2: ClearCase tunable parameters
We responded by moving up the performance stack, looking at ways to tune ClearCase to improve performance. We determined that the MVFS and view caches were undersized. Our second recommendation was to increase the size of these caches, but we warned the customer of the inherent danger in this step. Allocating larger caches would make the memory shortfall greater, because we were essentially setting aside memory that the system already lacked. We went ahead knowing that we were not addressing the memory issue. Performance did improve, but not substantially.
Level 3: The application space
Our next step was to examine the application layer. The customer had implemented process scripts that they wrapped around check-out and check-in operations. We instrumented those scripts to find out where the time was being spent, and then we ran them periodically throughout the day. The measurements revealed that the actual ClearCase check-out and check-in times averaged 0.5 seconds, even on a view server that was completely out of memory. The rest of the scripts’ processing time clocked in at 17.4 seconds. The logging and other functions being performed in the application layer were taking roughly thirty-five times longer than the ClearCase functions. And this was a fairly consistent ratio. At different times of the day, the ClearCase check-in and check-out times would be as much as 0.7 seconds, but the script execution times were then close to 25 seconds. And that’s why people were complaining.
To summarize, we started at the bottom of the performance stack. At the hardware level, you don’t often get a lot of payback, but looking for pathological indicators is something you need to do. We quickly saw the Oracle processes and determined that the view server was very low on memory. Next, we looked at the ClearCase tunable parameters, and we produced a noticeable -- but not huge -- improvement by adjusting them. The real impact was in the application layer. By quickly examining the first two layers, we had enough time to fully analyze the application space and found that there was a lot of room for improvement.
What. Where. How?
So far I’ve talked about what to look for when analyzing and tuning ClearCase performance, and I’ve talked about where to look. In a subsequent blog post, I’ll discuss how to improve ClearCase performance using tools and utilities you probably already have.
Senior Area Sales Director, Software Change and Configuration Management
Connect with me on LinkedIn