Distributed Version Control Guide
This guide explains Plastic SCM's capabilities when it comes to working with distributed systems. It contains a general description of the different distributed scenarios supported, followed by detailed explanations. Replication scenarios are also covered in-depth.
Plastic SCM's distributed capabilities allow you to set up different servers for multi-site development support, which are able to both replicate and reconcile changes made on replicated branches.
Plastic SCM has the ability to create different multi-site scenarios ranging from single-server to fully distributed deployments.
As the figure above shows, Plastic SCM can be configured to work in a single server mode, which is the default mode on installation and the conventional mode available on all SCM products.
The next step has been called classic multisite in which several servers exist - one for each development location - and contents are replicated among them. The basic rule at this distribution stage is that branch mastership is kept by only one site at a time. If a branch is modified at one site, the other sites won't modify it until the branch is replicated again. In many systems, this behavior is encouraged by the software itself, preventing simultaneous changes in a master/slave relationship. Plastic SCM does not restrict you to working in this mode, though you can use both permissions and a clear replication policy to simulate it, should you find this configuration useful.
Full multisite support is almost identical to the previous distribution stage with only one difference: All the SCM servers can modify their branches at any time. Changes can be reconciled back later on if the same branch is modified more than once at different locations.
Full distribution is exactly the same as full multisite, but on this deployment scenario each developer has his own SCM server. There's only one restriction imposed on systems working in this mode: Servers must be light enough to run on non-dedicated workstations and even laptops. Plastic SCM servers can easily be configured to work in this mode, introducing full disconnected support. A developer can take his laptop home and continue working as if he were at the office and reconcile his work when he's back at the office.
The main operation in distributed systems is replication. By means of this operation, repositories can be distributed on several machines.
The replication unit in Plastic SCM is the branch. The users specify which branch they want to work with and replicate it from a source repository to a destination repository. All revisions, labels, links, attributes, and changesets will be replicated to the destination.
In Plastic when you replicate a branch you replicate "just this branch" you selected, so all replicas are partial unless you select all the branches in the original repo.
The example below shows two repository servers at two different locations: we'll be replicating from the London server to the Stockholm one. In the original server I have 3 branches but in reality in Stockholm today I'm only interested on branch2. So I can simply replicate to the newly created repository at the Stockholm server the branch branch2 from the London server. This way I'll get a working repo with only 2 changesets but with all the files required to download the source tree.
The branch is perfectly functional. So a developer can do more checkins (changeset 9) or even branch from it (branch branch3 which can be replicated back (push) into the London server later on.)
As you can see from the figure, distributed repositories don't have to be exact clones. They share replicated branches and their contents but the entire repositories don't have to be identical. Instead, they can evolve separately, sharing only some branches.
Replicating a single changeset
Suppose you need to replicate just from changeset 6 but you don't need to pull all the previous changesets on the main branch. If that is the case there is a small trick you can use today: create a branch from changeset 6 and replicate the new empty branch. Look at the figure below:
The branch4 will be perfectly functional on the pablolaptop machine (it will replicate the entire tree loaded by changeset 6 @ "London") and you can checkin new changes, branch from it and so on.
Note: You've to use the "empty branch" trick because we didn't implement the "replicate from cset" feature so far. But we know it's something to do!
Beware of the merge history
If you replicate partially the history from a repository, you've to be careful when running merges because you do not have the entire merge history and hence merges can be different. If you do need to make sure you can safely merge, then replicate all the branches.
Look at the scenario below: if you replicate main and branch2 only, then the merge between cset 10 and 11 won't detect 7 as ancestor but 3, turning the merge into something more complex or even wrong.
Of course, it is perfectly safe to run merges when you know it is a simple branch hierarchy, like the following:
And the same holds true for any branch where the entire merge hierarchy is present on the repo.
We're considering a change here because while it is perfectly fine for very advanced users, sometimes it is confusing for the rest of us ;)
We're considering two options:
- Whether we ask the original repo for the missing merge tree information (when available), which sounds pretty cool under certain cases (especially for collocated teams with developers working in dvcs mode but will full access to the central repo),
- Or we replicate always the entire changeset hierarchy (aka merge tree info) but not the associated data (so it would still be achieving the goal of partial replica but also detecting when intermediate csets are missing for merge).
There are several possible distributed scenarios with Plastic SCM. They will be explained in detail in this chapter.
Multi-site replication with mastership policy
In this scenario, two or more servers are used in replication. Servers will normally run at different locations to enable geographically distributed teams to work together on the same project. A server at each location will solve the problem of slow or unreliable internet connections between sites.
The figure above shows both a deployment diagram and a detailed view of the branching strategy. This set up resembles classic multi-site replication as implemented by many master/slave based products. In Plastic SCM, this scenario is just one possibility and it will be used to explain replication.
The two sites, Location 1 and Location 2, will have their own servers. Both sites will be working on the same code-base, so developers will need to be able to check in changes at any time. The chosen strategy would be this:
- Both servers will have an exact replicated copy of the main branch, containing the latest baseline.
- The new baselines will be generated only in one server at a time, so they will be implementing a sort of "mastership" behavior. Let's assume new baselines will be created on server 01.
- Developers at both locations will follow the >branch-per-task pattern. Branches will be created using the latest baseline as a starting point. The two teams agree that the main branch won't be modified in parallel at the two sites.
- Periodically (maybe twice a week, depending on the amount of work completed) all the task-branches created at Location 2 will be replicated into Location 1. All branches will be integrated on the main branch, tested, and a new baseline will be created. Alternative integration branches could exist at each site in order to ease the integration process.
- Once the release is finished, the main branch will be replicated from Location 1 to Location 2 and a new development iteration will begin.
The figure above shows how branches are replicated from Location 2 to Location 1 in its lower area.
The next figures describe the previous scenario step by step. They show how the main branch is first replicated from Location 1 into Location 2, how the newly created release 58 is then available to the two development groups.
Then the groups start working on and creating task branches independently from each other, but starting at a well-known point: release 58.
Once the iteration is finished, branches task1012, task1013 and task1030 created at Location 2 are replicated to Location 1 to be integrated.
Once the integration is finished, the branch /main will be replicated again to Location 2, so that the development group there can continue working with the latest approved baseline.
Note that both repositories are not identical after the development iteration finishes, but the content on the main branches, considering they're being modified at only one site, is exactly the same.
Multi-site replication with shared mastership
The deployment required for this scenario is exactly the same as in the mastership case. The difference will be the way in which the replicated branch evolves. Now developers will make simultaneous changes to the replicated branch and Plastic SCM will have to help reconcile these changes together.
The figure below depicts the situation: Developers at two different sites are working against the same branch which has been replicated from Location 1 to Location 2. Then both groups perform changes directly on their replica of the main branch.
In the previous diagram, all changesets are on the main branch. We show it this way for visual simplification. In the real world, however, development teams should use the branch-per-task pattern and integrate their changes to the main branch on all sites.
When Location 1 (or vice versa) requests changes made at Location 2 using replication, the newly created changesets on branch main at Location 2 will be linked to the right changesets on the Location 1, keeping the correct changeset linking. If a subbranch is created (more than one head or last changeset on a given branch, which can happen after pulling changes from a remote repository) it will have to be merged in order to reconcile the changes created remotely.
Pure distributed scenario
In a purely distributed scenario, there isn't a central server. Each developer instead runs his own server containing his own repositories.
This strategy can be fully implemented with Plastic SCM by configuring a server on each developer's workstation.
This fully distributed scenario can be adopted by any company, even if they would normally prefer to count on a central copy. With distributed development there will always be a master server, not necessarily due to software restrictions, but to some sort of meritocracy, as happens with open source projects. It is usually best to explicitly decide which computer will be the one containing the well-known stable releases. Obviously there will be more than one satisfying this requisite, but it is better for simplicity's sake to exactly determine which will be the master one at any time.
In corporate scenarios this purely distributed ability can be tuned to support a mixed scenario:
- Onsite developers continue using a regular client/server configuration when working against a central server.
- The central server plays the role of master copy.
- Developers working at a different location (at home, for instance) have their own repository server which they can keep in sync regularly.
- Developers working on laptops can also run their own servers and then implement fully disconnected support.
Alternatively, all developer's workstations could run Plastic SCM servers. This is totally supported by the system. Deciding to use this capability or not will depend on the organization itself, developers' skills, and the amount of administrative burden required.
The figure below depicts the concepts described above.
How replication works
So far, the behavior of general distributed systems has been introduced. This topic will explain in detail how Plastic SCM replicates changesets between branches on different repositories and how to reconcile conflicting changesets created in parallel in the same replicated branch on two different repositories
Replication in detail
The diagrams and samples introduced in the previous chapters focused on overall branch behavior. The figure below details a replication sample that studies what happens at the changeset level.
The sample focuses on a file named /src/main.cpp at the branch main/fix. The branch is replicated from repository A at Location A to repository B at Location B. Note that the figure specifies the Plastic SCM command needed in order to run a replication.
- At step 1, there is only one changeset on the two replicated repositories, containing the first change on /src/main.cpp.
- Step 2 shows how the file is modified at rep A: two new revisions are created.
- At step 3, the developer at location B runs once more the same replication command. The two new revisions created at rep A are now copied into rep B.
During replication, Plastic SCM first pulls the changesets at the branch specified by the user (starting at the last previously replicated changeset if any). Then it will pull the changesets from the source repository. To do so, Plastic SCM finds the parent changeset of the new changesets being pulled, and links them accordingly.
- At step 4, the developer at rep B makes a new change starting from the latest replicated changeset and modifying again main.cpp.
- At step 5, the developer at rep A replicates /main/fix at rep B. The newly created changeset 3 gets replicated and correctly placed in his repository.
Note that the example from the previous figure shows only one change at a time on the branch, so no conflicts can happen. While following this strategy, the two replicated branches will continue being exact clones on replication.
The next figure shows a more complex scenario. Both locations start with the same configuration: three changesets at branch /main.
- At step 2, the two repositories evolve in parallel when the developers introduce new changes on main.cpp.
- At step 3, the user at rep A tries to replicate changes from rep B. Now Plastic SCM can't directly "link" revisions 3 and 4, created at rep B to revision 2 because a new revision 3 has also been created at the branch.
Internally Plastic SCM identifies each object by a GUID (Globally Unique Identifier
) so don't get confused by the "changeset numbers" shown in the sample.
If changeset 4 at rep A didn't exist, then Plastic SCM would have placed revision 5 and 6 from rep B just right of the existing changeset 3. In this situation though, it can't do that. So what Plastic SCM actually does is create a subbranch to place the replicated changesets.
There are two replication modes available:
- Direct server-to-server replication - A Plastic SCM client will tell the destination server to replicate a branch from a source server. Servers will communicate through internet or intranet connections to replicate data.
- Package based replication - A Plastic SCM client connects with the source server and creates a replication package. The package will be delivered in person (via USB drive, for example) and imported later on by the destination server.
The next figure depicts the two available replication modes.
The package based replication introduces the ability to keep servers in sync which are not allowed to connect directly due to security restrictions.
Replication from the command line
All the replication scenarios and possibilities described can be set up with a single Plastic SCM command: replicate.
cm replicate srcbranch destinationrepos
Where srcbranch is a branch spec identifying the branch to be replicated and its repository, and destinationrepos is the repository where the branch is going to be replicated.
Direct server replication
Suppose you want to replicate the branch main at repository code at server london:8084 to repository code_clone at bangalore:7070. The command would be:
cm replicate main@code@london:8084 rep:code_clone@bangalore:7070
To replicate branches using packages, the first step will be creating a replication package, and then importing the package into another server.
Suppose you have to create a replication package for the main branch at repository code at server box:8084.
cm replicate br:/main@code@box:8084 --package=box.pk
The previous command will generate a package named box.pk with all the content of the main branch.
Later on, the package will be imported at the repository server berlin:7070.
cm replicate rep:code@berlin:7070 --import=box.pk
During replication, different servers have to communicate with each other. This means that servers running different authentication modes will have to exchange data.
To do so, the replication system is able to set up different authentication options.
Setting up authentication modes
The next figure shows a typical scenario with a client and two servers. All the involved Plastic SCM components are configured to work in LDAP and they share the same LDAP credential, so no translation is required.
Note that authentication happens at two levels:
- The client needs to be authenticated in order to connect to the destination server. In the figure, the destination server is berlin.
- Then berlin will need to connect to the server london to retrieve information about the branch to be replicated (main in the sample).
If both servers were not using the same authentication mechanism or not authenticating against the same LDAP authority, step 2 would fail.
The figure below shows a scenario in which the server london is configured to use user/password authentication. In this case, a command like the one specified at the top of the figure will fail because authentication between servers won't work at step 2.
To solve this problem, the replication system has the ability to specify authentication credentials to be used between servers. In the example, the client can specify to the server berlin a user and password to communicate with server london.
The next figure shows two different ways to specify authentication credentials when using user/password at the source server.
The first option is actually specifying the mode plus the user and password (for UP) at the command line.
The second one uses an authentication file, which is useful when authentication credentials are going to be used repeatedly. As the figure shows, an authentication file is a simple text file containing two lines:
- Authentication working mode - UPWorkingMode, LDAPWorkingMode, NameWorkingMode, ADWorkingMode, or NameIDWorkingMode
- Specific authentication data for the authentication mode - The data specified in the second line is exactly the same data in the SecurityConfig section of the configuration file
Suppose now that replication must happen in the opposite direction, from berlin to london as the next figure shows. The parameters to connect to an LDAP server (in this case an Active Directory accessed through LDAP) are specified. Normally in LDAP an authentication file will be used to ease the process.
Note: If replication is performed through replication packages, the client needs to be able to connect to the source or destination servers, depending on whether it is performing an export or import operation.
Translating users and groups
When replication is performed between servers with different security modes, authentication is not the only issue. User and group identifications have to be translated between the different security modes.
The sample at the next figure tries to replicate from a user/password authentication mode to an LDAP based one. The user list at the UP node stores plain names but the user list at the LDAP server stores SIDs. When the owner of a certain revision being replicated needs to be copied from repA to repB, a user or group will be taken from the user list at repA and introduced into the list at repB. If a name coming from repA is directly inserted into the list at repB, there will be a problem later on when the server at berlin tries to resolve the LDAP identifier because it will find an invalid one: The user identifiers in user/password mode won't match those of the LDAP directory and the user names will be wrong in the replicated repository.
So in order to solve the problem, translation will be needed.
The Plastic replication system supports three different translation modes:
- Copy mode - This is the default behavior. The security IDs are just copied between repositories on replication. This only works when the servers hosting the different repositories work in the same authentication mode.
- Name mode - Translation between security identifiers is done based on name. In the sample at the previous figure suppose user daniel has to be translated by name from repA to repB. At repB, the Plastic SCM server will try to locate a user with name daniel and will introduce its LDAP SID into the table if required.
- Translation table - This also performs a translation based on name, but is instead driven by a table. The table, specified by the user, tells the destination server how to match names. It tells how a source user or group name has to be converted into a destination name. The next figure explains how a translation table is built and how it can translate between different authentication modes.
Note: A translation table is just a plain text file with two names per line separated by a semi-colon (";"). The first name indicates the user or group to be translated (source) and the second one indicates the destination.
Replication from the graphical interface
Replication can be done from both the command line interface (CLI) and the Plastic Graphical User Interface (GUI) tool. All the possible actions are located in a submenu under the branch options, because replication is primarily related to branches. This topic will describe how to perform the most common replication actions from the GUI.
In the GUI, replication and distributed collaboration has been organized in the following actions:
- Branch actions:
- Push the selected branch
- Pull the selected branch
- Pull a remote branch
- Package actions:
- Create a replication package from the current branch
- Create a replication package from a branch
- Import a replication package
The next figure depicts the different available operations. From the command line, all the operations are issued from a single command, but the GUI makes a distinction between push (move changes from your server to a destination) and pull (bring changes from a remote repository to yours) actions.
As was mentioned before, all replication actions can be accessed from the branch menu (check the figure below).
The options Push this branch, Pull this branch, and Create replication package from this branch are related to the branch currently selected in the branch view.
The other options: Pull remote branch, Create replication package, and Import replication package are generic replication actions which are not constrained to the current branch, but are instead located under the branch menu to keep all the replication options together.
Pushing your changes to a remote repository
Whenever you want to push your changes to a remote repository, select Push this branch on the branch menu. Pushing your changes means sending the changes made on the selected branch to a remote repository.
- Server - If you never replicated from the destination repository before you'll have to type the server destination name or the server destination ip (plus the port number).
- Browse server - List previous replications on your repository and the list of servers available on your configured profiles (check Advanced options).
When you pull a branch, a record is created on your repository to know where (server and repository) the branch comes from. This server will be used on later push and pull operations as possible server.
- Repository in the Replication destination - If the branch already exists in the destination repository, the changes will be synchronized. A warning message will show up if there are conflicting changesets in the destination. Then, the developer will have to reconcile changes by first pulling the branch to the local server and then pushing it, once the merge conflicts have been resolved.
If the branch doesn't exist in the destination repository, a new branch will be created (identified by the same GUID used on the source repository).
- Browse repository - Browse repositories on the destination server. If you don't have permissions to access the server you'll have to select or create a profile on Advanced options.
Synchronizing your branch with remote changes
Once you've pushed your branch to a different repository, the branch can be modified remotely. At some point in time, you'll be interested in retrieving the changes made remotely to your branch. In order to do that, you have to use the Pull this branch action from the replication branch menu.
The dialog box depicted in the next figure is very similar to the one used to push changes, but this time, your server is located on the right as destination of the operation.
When you pull changes from a remote branch, a subbranch can be potentially created if there are conflicting changesets on the two locations.
Importing a remote branch
Another common scenario during replication is importing a branch from a remote repository into yours in order to start making changes or create child branches from it.
In order to perform the import, use the Pull remote branch option. The dialog box shown in the next figure will be displayed. Notice that this time you can choose the source server, repository, branch, and destination repository on your server.
Managing remote authentication
As it was described in the Authentication chapter, different Plastic SCM servers can use different authentication modes. By default, when you try to connect to a remote server, you'll be using your current profile (the configuration used to connect to your server). Sometimes, though, the default profile won't be valid on the remote server.
In order to configure Plastic SCM to be able to connect with a remote server with different authentication mode, use the Advanced options button on the replication dialog. It will pull up a dialog like the one in the next figure.
The dialog box shows the profile currently selected (the default one on the screenshot) and also the translation mode (refer to Authentication chapter for more information) and the optional translation table.
You can have different authentication profiles created from previous replication operations, and you can list them or create new ones by pressing the Browse button located on the right of the Remote server configuration profile edit box.
It will display a dialog box like the one in the figure below which will allow you to select, edit, create, or remove a profile.
Note: The Replication dialog box will try to choose a profile automatically each time you change the server. It will look for the most suitable profile based on the server information provided.
Running the replication process
So far, all the steps have been focused on setting up the replication process. Once the operation is correctly configured, press the Replicate button and you'll actually enter the replication progress dialog box as explained in the figure below.
The replication operation is divided into three main states:
- fetch metadata - Happen on the source server.
- push metadata - Happen on the destination server.
- transfer revision data - Involve the two servers as data is transferred from the source to the destination.
At any point in time, the operation can be canceled pressing the Cancel button.
When the replication operation finishes, a summary is displayed, containing detailed information about the number of objects created.
Creating a package
A replication package can be created from a branch on your repository or from any branch on any server you can connect to. In order to create a package from the selected branch in the branch view, click on Create replication package from this branch.
If you want to create a package from any remote branch, click on Create replication package on the replication menu.
The figure above shows the package creation dialog. It will generate a replication package from the selected branch which will contain all data and metadata from the branch. It can be used to replicate between servers when no direct connection is available.
Importing a replication package
From the replication menu select Import replication package and select a package file to be imported. The dialog box is shown in the next figure.
Distributed Branch Explorer
The Branch Explorer is one of the core features in the Plastic SCM GUI and it has been greatly improved in recent releases to be able to deal with distributed scenarios. That's the reason why it now receives the distributed Branch Explorer name. Its short-hand name is DBrEx.
How the DBrEx works
Consider two replicated Plastic SCM servers, one running on a central server, the other one on a laptop, as depicted below.
The server running on the laptop firsts replicated the main branch from the central. Later task002 was created and the developer worked on it. At a certain point in time the scenario is as follows:
The DBrEx is able to render a distributed diagram by collecting data from different sources and then rendering the changesets and branches on a single diagram as the next figure shows.
The DBrEx will combine the different sources and create an interactive diagram with the information gathered from the different sources.
Rendering multiple repository sources on the DBrEx
There are several options in order to combine more than one replicated repository into the same DBrEx diagram. The first one is used to create a combined render including all the changesets and branches coming from the selected replication sources. The next figure shows you how to start configuring the diagram.
The Replication sources tab shows the repositories that have been used to pull changes from the one that is being rendered on the DBrEx (or alternatively repositories that pushed changes to the active one).
Once you click on one or more replication sources (clicking on Show remote data checkbox) the distributed diagram will be rendered as depicted in the figure below. It's expanded and include the information from the remote repository.
This way, the Distributed Branch Explorer introduces a new way to understand how the project and branches evolve across different replicas.
It is also possible to run the replication operations from the DBrEx, so pulling a remote branch is now as easy as selecting the remote branch rendered in the DBrEx and clicking on Pull this branch. Remote branches and changesets are available for "diffing" too, which greatly enhances your work with distributed changes.
Diffing remote branches and changesets
It is possible to right-click a remote branch or changeset on the DBrEx to explore and understand what was modified remotely. This way developers or integrators can better understand what changes are going to be pulled from the remote sources prior to completing the operation. The following figure shows the options enabled on a remote changeset.
Enabling the DBrEx for a single branch
Sometimes it is not necessary to render the whole distributed diagram because the SCM manager or developer needs to focus on a specific branch only.
The figure above shows the Branch Explorer / Show remote changesets in current branch from menu option which allows you to select a remote source to decorate a branch with remote data to understand what needs to be pulled, see explorer differences, and trigger replication commands.
Plastic SCM is all about helping teams to embrace distributed development. To do so, we enhanced the DBrEx, but in order to deal with hundreds of distributed changesets, a new perspective has been created: the distributed view.
The Sync View enables you to synchronize any pair of repositories easily, browsing and diffing the pending changes to push or pull.
To get all the information about how to work with Synchronization refer to The Synchronization View section in the GUI guide.
October 15, 2015
Identifying the branch replication as Partial replica.