A Source Code Management (SCM) tool is one of the most critical pieces of any software development organization’s tool suite. Selection, use, and potentially migration decisions can be “make or break” for some organizations. In some cases, the tool’s users may seem almost religious in their zeal for or against a given toolset. GIT is a currently-very-popular SCM solution created by the Linux Kernel team for their own use, and it has seen wide adoption. “Why aren’t we using GIT?” is an increasingly common question in organizations using proprietary SCM solutions such as ClearCase. This is particularly true when those organizations are looking to reduce costs. Git and ClearCase have radically differing architectures based on their design requirements and anticipated usage. Git is a radically-decentralized, radically-open, tool. ClearCase is a hybrid central/distributed SCM tool when paired with ClearCase MultiSite. Each of these architectures brings its own strengths and potential weaknesses, depending on the organization, or part of the organization, that uses the tools. It is not necessarily true that any one tool will serve all an organization’s needs, and it is not uncommon for a single organization to have multiple SCM systems in use. The challenge is selecting the right tools for the right use cases. This document will compare GIT and ClearCase to assist in evaluating which tool is best for a given part of an organization. In short(?): GIT’s openness makes it ideal for completely-open development by small teams, or small sub-teams. This very much describes the Linux Kernel project, each sub-team maintains its own “master” GIT repository, managed by the “maintainer”, who pulls changes from or ok’s user pushes to, this repository. This person in turn then notifies the main kernel maintainer – originally Linus Torvalds – that there are approved changes to bring into the main kernel release. The “maintainer” is also expected to ensure that updates from the “upstream” repository are applied to the sub-team’s master repository. All development work is completely transparent to all other users, and no “access control” needs to be applied. In contrast, ClearCase is significantly less “open” by design. From its inception, it was designed around the needs of Enterprise-level software development organizations who had to manage large complex codebases, and restrict access to those codebases to specific developer communities. Every ClearCase repository operation has multiple access checks, every attempt to access a given version of a file has access checks at the application and the operating system levels. These versions are maintained in one or more shared repositories, with central access and policy control. Git’s strengths:
ClearCase’s Strengths:
minutes as no data needs to be copied locally. Non-dynamic view types can be configured to load as little or as much of project files as needed instead of GIT’s “all or nothing” approach. - Using multiple component repositories is dramatically simpler than doing so with GIT. - Parallel development of multiple releases of the same product requires creation of multiple “views” instead of multiple copies of the repository or use of “stashing”.
ClearCase’s weaknesses:
Architecture Open Vs. Closed GIT is a completely open environment, designed by and for open source developers. This developer focus has positive and negative aspects. The chief positive aspect is that – due to the fame of the initial author and original user, Linus Torvalds – the tool has wide adoption and a large ecosystem has grown around it. This means that any developer with any experience has at least encountered GIT in some flavor, even if it was just downloading files from GitHub. Unfortunately, while a newly-hired developer may “know GIT,” the plethora of client options may mean that the developer knows a certain “flavor,” or even only a given version, of GIT. They may also be fully versed in only one of the 16+ GUI options and not in the command line. Additionally, as GIT is still rapidly evolving, key behaviors may change between GIT versions. GIT’s open source nature also alludes to another potential issue. Abandonment. The primary trigger for GIT’s ubiquity is the Linux Kernel. While this means that GIT will likely be around for the foreseeable future, other open source change management tools also had large followings and are now marginalized at best. Those include the Concurrent Versions System (CVS) and Subversion. Both were hailed as advances and replacements for proprietary SCM systems, and both had large followings in developer circles, yet neither has a significant following today. FOSS developers are at least as prone as anyone else to chase “the next big thing,” which can lead to a project going stale. Using CVS as an example, its last production release occurred in May 2008. Repository Model GIT is a distributed version control system. Each developer using GIT has a complete copy of GIT repository on their local workstation, which by default includes all project history from the “remote” it was created from. All common GIT operations save “push,” “pull”, and “fetch” operate on this copy, allowing for completely disconnected development. This completely decentralized architecture has costs related to developer endpoints as well. Particularly: 1. As the project or product grows, disk space needs can grow more than linearly. Each GIT working area contains a full copy of the repository, and potentially 2 additional copies of each working file.
3. If the developer endpoints are laptops or other portable devices, those devices – and their backups – require encryption to safeguard corporate IP. ClearCase’s shared-repository model significantly reduces storage consumption for developer workspaces and safeguards developer changes by committing “checked in” work to the shared repository and not a local copy. Use of dynamic views also significantly reduces the time needed to create developer workspaces since there is no need to copy entire repositories locally. Change Flow Differences Git’s focus on radically-distributed user communities gives it some unique capabilities. Key among those are:
These allow very selective propagation of developer changes, which is a great benefit when the development team is widely dispersed. However, their use requires careful process planning.
#3 above essentially means that each developer could need a minimum of 2 GIT repositories in an “all pull” model. The first would be their local copy, and the second would be an on-premises or cloud-based GIT instance (using GitHub. GitLab, etc.). The change-flow would be:
Additionally, this selective change flow between repositories can impact build reproducibility. A recent example of this is the conversation thread at: https://forum.f-droid.org/t/where-is-the-vlc-app/108/25. Background: F-Droid is an alternative open-source-only Android appstore. For various reasons, the F-Droid team builds the applications in their repositories from the published source code. In this internet thread, VLC, a common open source video player had to be removed from their “active” application list because of build failures, and application crashes. When investigating the issue: Problem quote: For example, VLC 2.5.2 requires libvlc to be built from commit 1c02164. But the version they distribute was built with libvlc from ef7c26f5a7 (you can get this from logcat, grep “core libvlc: revision”). BTW, this mismatch can be the reason why F-Droid’s VLC 2.5.2 crashes. Official 2.5.3 uses libvlc built from cbfa98bd98, which has not been published at all. Analysis: We see 3 different GIT commit ID’s, but there is no way to know the which commit is where in the timeline as they are based on SHA hashes. This leads one team to believe that this is a potential cause for application crashes. Additionally, an apparently-unavailable commit is referenced in the most-recently-released officially-released software. Given GIT’s ability to “rewrite history,” you cannot categorically prove – without having a direct copy of the GIT repository used to perform the “official” build – that your copy of the repository has the same changes used to build the official release. These are easily as much “process” as “tool” issues. GIT provides a labeling mechanism similar in purpose to ClearCase’s labels and baselines, it just happens in this case that at least one set of developers elected not to use them, leaving other developers to wonder if they will in fact be able to build a working copy of the open source project. SCM toolsets and processes are tightly intertwined, and the tools should facilitate or encourage SCM best practices. In contrast, ClearCase’s “change flow” in a distributed environment is an “all or nothing” model. All changes are by default replicated to all remote replicas of a given repository. This can, and does change when security or other concerns dictate “one way” replication from one site to another. However, in general, if the replicas of the repositories are in sync, then all changes are at all locations. This allows managers, or external auditors, to see the complete history of all changes made to the source files in any desired range of dates. Unlike GIT, ClearCase’s ability to selectively integrate/merge changes operates on the file level within the repository. Additionally, it is possible to add specific changes from one version of a file to another using an “insert” merge, and to remove them with a “delete” merge. As the source and destination versions are assumed to already be present in the repository, the developer does not need to run tools to process specially-formatted text files to integrate a set of changes, and the files in question do not need to be text-formatted-files. Workspaces and Multiple Repositories Workspace Management Another area where ClearCase’s and GIT’s differing philosophies stands out is the area of developer workspace management and complex projects using multiple repositories. GIT simply does not have a concept of a developer “workspace” separate from the repository. This can lead to issues in an environment where current and previous versions of a project are maintained and delivered separately to customers. By far the simplest way to perform this task would be to create multiple copies of the “upstream” repository and work separately in those repositories, and for a relatively small project it will certainly work. However, when the project is large or complex, with a 20 GB repository (not including the “working copies” of files.). This can become very unwieldy very fast. The way to avoid this is to use “git stash.” Using the “stash” command is an area where the lack of user-friendly references can cause issues if there are multiple “stashes” done in your local copy of the repository. The git stash command uses references in the style “stash@{0}” to reference stashes. It becomes far too likely that – at one point or another – a developer will apply “stashed” changes to the wrong product version. ClearCase, on the other hand, currently has 4 separate developer workspace types and multiple ways to configure those workspaces, referred to as views. The current types are: LAN-Based views
Web -based views
ClearCase’s wealth of workspace types would be of little use without some mechanism to manage the configuration of those workspaces. ClearCase provides 2 “out of the box mechanisms” to manage workspace configurations, and tools to automate custom workspace configurations. Out of the box:
Multiple Repository Support Both GIT and ClearCase support the ability to reference multiple repositories in a single project. However, the 2 products do so in radically different ways. Git’s process for referencing multiple repositories is through GIT submodules, and this mechanism is not particularly mature or transparent. Creating a full “image” of a project using GIT submodules is a multistep process:
As mentioned above, an all-pull usage model may not be possible using submodules because the location of the submodule is fixed at the time of creation. This means that it MUST point to a central repository. You can use the submodule as a read-only view of the second repository, but that adds unnecessary complexity to the project. In fact, one of the recommendations for using submodules is to limit their use if the developers are not going to customize or modify code in the submodules, instead using a dependency manager like Maven to download the needed libraries, etc., for read-only use. ClearCase, on the other hand, has mature support for multiple repositories:
Workspace Isolation Developers working on complex projects often need to work with multiple releases of their projects between developing future releases and supporting the current and previous releases. It often becomes necessary to have simultaneous workspaces, and for those workspaces to be isolated from one another. Sometimes this is due to legacy architectural decisions or environmental concerns. Examples include:
ClearCase dynamic and automatic views provide this capability “out of the box.” On Unix, ClearCase repositories can be “overlaid” on arbitrary directory locations (except for system directories) allowing for a very flexible development architecture. Security and Policy Enforcement Git’s open and highly decentralized design creates some unique administration concerns. Security is only partially “baked in” to the product. Some of those concerns are: Traceability/accountability of code changes. GIT:
Build Auditing GIT does not provide any mechanisms to ensure that a given build is based on specific versions of files or specific commits. While it is possible to embed the most recent git commit ID into build output, as evidenced by GIT information pulled out of VLC above, there is no way to prove or disprove that the build used that commit short of building a second copy yourself and doing a binary compare. And this still may fail if the build used different versions of other build tools (IBM vs. Sun java compilers; Differing Visual Studio versions or service pack levels; etc. As seen above, building in a GIT-based environment presents challenges unless steps are taken to ensure that the repository state at the time of the build is adequately recorded, and that all potential build hosts have access to that repository state. ClearCase currently provides 4 separate build auditing tools that, when combined with dynamic views, allow for in-detail auditing of all input files for a given build.
This dramatically simplifies third party audits because you can create a view that shows explicitly the files used for any given build as opposed to making guesses that depend on timestamps, labels or commit ID’s. If the build tools are also under source control, ClearCase can also provide explicit information about the exact version of the build tools used. This can be a critical use case for environments requiring 3rd party code audits or have other heavy regulatory requirements. Cloud Support GIT’s highly distributed and open source model has allowed for various third parties to create cloud hosted GIT offerings, with GitHub and Gitlab being two key examples. it is also possible to host GIT in a “hybrid cloud” offering where there may be GIT instances using “On Premises” GitHub servers, private AWS servers, and semi-public locations like GitHub at the same time. Unfortunately, this also may pose challenges locating the repositories if they are not created from a central location. ClearCase, at the time of this writing supports running ClearCase server processes on Amazon Web Services (AWS), provided any LAN clients are in the same AWS instance. Hosts not within that instance should use ClearTeam Explorer with Web or Automatic views. Like GIT, you can have multiple instances of a given repository in multiple locations, but unlike GIT, if the repositories are synchronized, all repositories will be known to all other repositories. Conclusion Selection of, or migration between, Software Configuration Management systems a process that requires a thorough review of organizational and individual project needs, as well as the strengths and weaknesses of the alternative tools. ClearCase and GIT are capable tools with differing strengths, weaknesses and design philosophies. GIT excels in environments where:
Author: Brian Cowan Senior Technical Specialist at HCL
10 Comments
Hello Brian,
Reply
brian.cowan
3/21/2019 01:43:14 pm
As I work in support, please excuse the support-like answer. The answer to "which is better" is "that depends..."
Reply
Neil Lindberg
5/21/2020 08:51:59 am
Why couldn't you just use "git logs" to view history? And, branch names are whatever you make them.
Jon Velapoldi
3/5/2021 10:28:44 am
I feel like the Version ID is a little bit of a red herring. True, git exposes its internals a bit with the commit hashes, but tags are cheap and free to use, and those provide with much more readable names. As for whether rolling up changes into a changeset rather than individual commits, I'm mostly in the changeset camp. While its potentially interesting to see "version /main/branch/4" of a specific file, I feel like the use case of the context of what else was changed with that changeset is much more useful. Collecting those together in a discrete changeset is more "interesting" to me as a dev, and as a product owner. As for whether you should be breaking up changesets, I feel like that might be more of a failure of the software dev process - each changeset SHOULD be a discrete (small) collection of changes that can't reliably be picked apart without breaking the software.
Lee K
4/23/2019 09:10:27 pm
Nice comparison.
Reply
Marc Towersap
1/29/2020 11:40:18 am
generally, at most of the companies I worked at that used clearcase, it worked pretty well. Had clearcase vob servers with uptimes over a year. if you had a flaky connection, you could always use snapshot views.
Reply
Jon Velapoldi
10/27/2020 02:57:49 pm
Pretty good writeup. And I remember Brian and Marc back when I followed CCIUG - two excellent contributors to the User Group. I used to be the ClearCase admin for a 200+ member team. And for the most part (after a LOT of work), it more or less ran itself (after initial setup). However, after having used git for a while, I've not found many of git's "problems" (other than the admittedly correct decentralized process issue) listed to really be any problems at all. I find that most of the points that you state are "in some configuration, yes, but if you set up some process then ..." (which incidentally, goes both directions), or "if you try to shoehorn that understanding and process into git, it will be bad, but ...".
Reply
Steve
3/5/2021 04:17:02 am
You get what you pay for. GIT does not even come close to what clearcase provides.
Reply
Jon Velapoldi
3/5/2021 10:35:50 am
I feel that this completely misses the point, however. Yes, git has fewer features that are included in it, But git has some features that ClearCase can't ever have (like performance, and the ability to see the entire source code base locally in fractions of a second when you're working from home). The real question you should be continuing to ask yourself is "do you really need that feature"? Having administered both products, I'm certain that git makes much more sense for the vast majority of software projects, mostly because it gets out of your way better than ClearCase does. As a CM administrator, I eventually learned that I should be constantly re-evaluating the efficacy and value that any software development process we put in Software Configuration Management.
Steve
3/5/2021 04:20:19 am
You get what you pay for. GIT is not even in the same ball park when it comes to features, source control management and the ability to customize your scm processes.
Reply
Leave a Reply. |
Archives
November 2019
CategoriesHelpful Links
Open RFEs Knowledge Center IBM Marketplace
|