Infrastructure/VersionControl/ArchitectureDraft

= Contents =

= Overview =

Features
These are the features this implementation aims to achieve:
 * Authentication and transport via ssh
 * Accesses the common module
 * Checkout partial repositories
 * Tagging a checkout and sending that tag to buildsys to be built
 * ACLs. Easily add owners, co-owners, contributers to a package or branch of a package.
 * For any contributer, a package can be read-only, read-write, or invisible.
 * Certain groups will need broad access to all packages (ex: Security Team should have the ability to update any package if the need arises)
 * Sending email from the server when commits are made.
 * Checkout "views" examples: 1) all packages in FC5 2) all branches of qa-assistant

Workflow
The common workflow is work occurs on the trunk (devel). Once complete, changes are merged into previous released branches. When changes are completed on any branch, the changes are committed and tagged. Then a script is run that sends the tag to the builder to create the package.

Other workflows are possible. Some people like to work in a released branch and then merge forward and backwards. We also want to support simultaneous work being done by two contributers on separate release branches (so one user can own the current branches and someone else can own legacy branches, for instance.)

= Implementation = This implementation was written for bazaar-ng (aka bzr). Some of the architecture is relevant to all VCSs. Other pieces are relevant to any VCS which does not use a Smart Server and still others will be bzr specific.

SSH for Authentication
Since bzr does not define its own transport mechanism (yet) we use the authentication scheme created by our transport layer. I'm going to outline using ssh and sftp for this. Since sftp is essentially logging the user into the remote machine in order to copy files, we will need to create accounts for the user on those machines. The nice part of this is that the accounts system has a record of a user's ssh keys. So we can tie into the accountsdb to create a new account on the server whenever the account is updated to cvsextras and use the ssh key provided there. We'd want to limit what the user is allowed to run on the servers, though, so rather than a full login shell we could use the scponly shell to limit the user's access to sending and receiving files for the repository.

Filesystem ACLs for Authorization
SSH takes care of authenticating the user but we still need to authorize whether they have permission to read, modify, or even see the files they request. I think this can be conveniently done with filesystem ACLs. ACLs are more flexible than UNIX ownership permissions as you can directly assign users the ability to read and write files. Without using ACLs, we'd have to create new groups for every branch of every package in the repository. With ACLs, we can place users into  logical groups (like security) and assign access to a wide array of packages to that group.

Managed by the Package DB
Rather than giving the users access to the getfacl and setfacl commands to change permissions on the packages, this job is better done by the package database. The package database can give us a central point to organize data about the packages and make changes to their administrative state. We can define the following as effects that are triggered from changes to the package database:


 * Assigning packagers as the owners of a package.
 * Assigning contributers that help the owners on the full package or a branch of the package.
 * Branch requests can be made (automatically fulfilled for non-legacy branches; sent for confirmation by FESCo if it's legacy).
 * Allow owners and the security team to create embargo branches and add other users to it.
 * Allow placing contributers into groups (KDE, games, security, other SIGs, etc) and giving those groups access to packages.

Repository Format
In bzr and the arch family of distributed revision control systems (DRCS) the emphasis is on removing the need for a central repository for revisions. Anyone can branch from a source code repository and publish that branch somewhere else. New contributers are free to pull from the original repository, the branch, or merge from both. bzr provides commands and features for operating under a more centralised model, though, and we'll want to utilise these features in our architecture.

Chroot layout
The initial prototype will implement a minimal chroot jail that the contributers have access to via sftp. We'll use bzr's sftp methods to push and pull data. The chroot will be enabled via scponly, a replacement shell that limits the ommands available to the user. The object of the chroot is to minimize the commands available to the contributer to sftp only. (scp may be a better choice of transport but we still need to evaluate whether that is an out-of-the-box solution or will need hacking in either scponly or bzr.)

Chroot layout: [chroot /] [dev] null [etc] group [lib] ld-linux-x86_64.so.2 [repos] [BZR repository structure] [usr] [lib] libc.so.6 libcom_err.so.2 libcrypt.so.1 libcrypto.so.6 libdl.so.2 libgssapi_krb5.so.2 libk5crypto.so.3 libkrb5.so.3 libkrb5support.so.0 libnsl.so.1 libresolv.so.2 libutil.so.1 libz.so.1 [libexec] [openssh] sftp-server

Quick notes:
 * lib/* and usr/lib/* contain just the libraries to get sftp-server to work. This is generated by a script.  The script needs to be enhanced to decide if newer versions of the programs have been installed on the base system and update the chroot.  We can then run it from cron or as a yum plugin to update the chroot if the libraries in the base OS have changed.

BZR Repository Layout
The central unit of controlled revisions in bzr seems to be the branch. Branches can be used naked or as part of a repository. The advantage of using a repository is that the repository can optimize space usage when branches have files in common. However, this sharing of source may cause problems when attempting to use filesystem based ACLs. Since the files actually live within the shared space, anyone with commit access on a branch within the shared repository may have access to everything.

If the permissions issues with shared source repositories are acceptable we could setup a structure like so: [repos] (directory) [embargo] (directory) [PKGNAME-RAND] (bzr branch) [packages] (bzr repository) [common] (bzr repository) [trunk] (toplevel bzr branch) Makefile [common] [PKGNAME] (bzr repository) [FC5] (bzr branch of trunk) [trunk] (bzr branch of common) bzr init common cp -pr /cvs/common/ common/ cp -pr /cvs/PKGNAME/devel/Makefile common/ with the caveat that the Makefile and common files would need heavy modiication to work with bzr instead of cvs.
 * Nesting toplevel reositories creates new storage areas for the package data. The innermost repository is where the data is stored.  This allows us to use filesystem ACLs to manage who can read, write, and see different branches of the repository.  If we go to a smart server that manages ACLs, we would switch to having one repository around everything.  All the inner repositories could become directories.  Then storage would be shared for all branches but the data would be secure.
 * The  repository is created with.
 * The common branch would be the equivalent of our present CVS common module. It's roughly equivalent to:
 * Branches, whether within the shared source repository or outside of it, are created with the branch command:

If shared source repositories provide too much access I think we should use the same directory structure but not make  a bzr repository. This is because this problem may go away in the future and we would want to be able to add a shared repository if that were the case. If the level of sharing was deemed inappropriate for the whole package tree but okay for the per package level we would still see most of the shared source benefits from making the  directory into a repository.

Embargo Branches
The embargo branches have packages within the shared repository as their source but share no storage with them. The  directory itself should not be readable by any contributer. The branches inside contain a random string as part of their name to discourage outsiders from attempting to find out what's inside the embargoed area by doing a blind request for the directory (otherwise, they'd get "No such file" when they tried to access something that wasn't in the embargoed tree and "Permission denied" when it was present). The branches themselves should be restricted in all permissions to only the groups which are allowed to work on them. As an example, this might consist of the package owner, security group, and an upstream author. The ACLs set on the directory would allow those three entities to access the directory but everyone else would be denied permission.

Administration (not finished)
when common is updated, the system will need to do a merge or pull to all the packages. An alternative would be a clientside plugin that performed a sync of the common module.


 * Branching
 * package creation
 * updating common

Scripts for Setup
Some initial scripts to setup the repository:
 * [[Image:Infrastructure_VersionControl_ArchitectureDraft_scponly-setup.sh]] Sets up the chroot environment in which scponly will run.  This needs to be enhanced so it can be run from cron to keep a chroot up to date with the latest changes to ssh and its dependencies.
 * [[Image:Infrastructure_VersionControl_ArchitectureDraft_setup-repo.sh]] Sets up the repository structure using bzr.  Note that this structure is slightly different than documented above.  I need to reconcile the changes.
 * [[Image:Infrastructure_VersionControl_ArchitectureDraft_user.sh]] Script to setup a new user.  This adds the user to the system passwd file.  Needs to be enhanced to add users to the chroot passwd file as well.

Working With the Repository
These are examples of commands that would be used to perform operations on the packages similar to our present operations on the CVS repository.

Common bzr Commands
This section is still in its infancy. It has several useful pieces of information but can use a lot of additional information, tips, and examples. If we go with a bzr implementation, we'll want to pull this section out as the basis of a developers guide for using the vcs.

Branches and Repositories
Where cvs has both a cvs repository and cvs modules, bzr has one piece, the bzr branch which is the container for changes to a project. A branch traces the complete revision history of a project from initial checkin to the present file. There can be several parallel sibling branches which diverged at different times from their parent branch. These branches will share the same files prior to the divergence and then have different files thereafter. The  command will merge between sibling branches, attempting to detect what changes have already been merged between branches. will merge the parent branch into the current branch.

A bzr repository is a format for optimizing the space requirements of branches that stem from the same initial branch. It is only an optimization. For Fedora, we might have several branches of a project within one repository on a server. A developer that creates a new branch from our project need not know that the repository exists. They can create their branch within a repository on their local machine or outside of any repositories with no ill effects.

Your own Local Repository
Although it is optional, having your own local repository is recommended. This can allow you to save a tremendous amount of space if you are working with several branches that have common ancestors (such as the FC-4, FC-5, and trunk versions of your package. Or the trunk and several local branches in which you are working on several new features.)  To setup your repository, simply do this: bzr init-repo ~/repo cd ~/repo bzr branch [...] Every branch that you create within the ~/repo directory will now be a part of your local repository. If bazaar finds that you are checking out or creating branches with common ancestors, the files that are the same between the branches will be stored in the repository once rather than in each branch.


 * Note: The working copies will of course exist separately for each branch you create. The repository only combines the version control data (the other revisions of the files under revision control.)  If you want, you can ameliorate this by choosing to store branches in your repository without working copies.  When you start working on a project you can checkout a copy from your local repository  .  When you are finished and have committed your changes you can remove the working copy to retrieve the space.

branches and checkouts
makes a new branch into the current working directory. The branch contains a working tree and all the files necessary for looking up historical changes and commiting new ones. A common invocation might be  which will checkout the current development branch of the qa-assistant package and save it in a directory qa-assistant.local.

is most similar to the cvs model of changes. A lightweight checkout retrieves a working tree from the server. Any VCS related operations (diff, status, commit, etc) must be submitted to the server.

combines the  command with another command,. It creates a branch just as bzr branch does and then binds it to the remote location. Once bound, commiting changes to the branch will first send the changes to the server. This is similar to a cvs checkout, cvs commit style except that you have a full branch of the repository on your machine. So diffing, logs, etc can go to the local machine instead of to the server. In addition, if you are unable to connect to the server for a while and want to use bzr to checkin your changes, you can use the  switch to make changes on your local branch. When you next do a normal, all the revisions you have committed locally will be submitted to the server.

In addition to the summaries above, the current bzr implementation has one notable behaviour. is able to operate against a read-only copy of the original branch (since it is creating a whole new branch.) , because it stores information into the remote repository, needs to acquire a lock on the remote branch in order to checkout data.

Transports
bzr has two builtin transports: sftp and http. http is read-only which makes it unsuitable for developers, however, it is faster than sftp. If you just want to look at the files in the repository use http. When you have to write to them, use the sftp transport.

If you need to convert a read-only copy you downloaded via http or you want to retrieve the branch you're working on faster and don't mind remembering several comands, you can do the following: bzr branch http://bzr.fedoraproject.org/repos/packages/qa-assistant/trunk qa-assistant.devel cd qa-assistant.devel bzr --bind sftp://bzr.fedoraproject.org/repos/packages/qa-assistant/trunk This first retrieves the branch using the http protocol. Then it binds your commits to the sftp server so when you commit to the local branch, the changes are first sent to the remote sftp server.

Note that this is a current limitation. The bzr http and sftp transports are currently receiving optimisations. This has recently sped up http by as much as 400% and sftp by as much as 50%. In the futre, these transports may achieve parity or we may switch to using bzr behind an http or bzr-serve Smart Server that speaks a bzr native protocol.

Branching to get work Done
If you plan to work disconnected from the repository for a long period (say that you are branching to track the development version of the upstream project while the official fedoraproject repository will continue to track the stable releases) you can do  and do all your work in your unstable branch. When you are ready to commit the changes back to the stable branch, you can do the following:: $ ls qa-assistant.unstable $ bzr checkout --lightweight sftp://bzr.fedoraproject.org/repos/packages/qa-assistant/trunk qa-assistant.devel $ ls qa-assistant.unstable   qa-assistant.devel $ cd qa-assistant.devel $ bzr merge ../qa-assistant.unstable $ bzr commit This proceedure gets the latest version of the qa-assistant trunk. You then merge the changes in your branch back into the trunk checkout. And then you commit the new working copy back to the fedoraproject.org server. Note that there are many variations on this depending on whether you already have a checkout of the trunk and whether you plan on doing future work directly on the trunk after commiting your changes or continuing on with a local branch.

External Commands (notes only)
These commands can be merged into bzr through its plugin interface In our present CVS tree commands which are not directly handled by CVS are scripted in a Makefile. In bzr we have the option of using an external script (like a Makefile) or writing the commands into a custom bzr plugin that we can invoke through the bzr commandline. Example: Pros: Cons:
 * Commands are in one place.
 * We no longer have to maintain the common module.
 * Have to install the plugins in order to perform the operation.

Commands we need to replace/rewrite:
 * make new-sources FILES=""
 * make [TARGETARCH]
 * make clog
 * make mockbuild
 * make tag
 * make build
 * merging between branches
 * checking out views (all of FC-5 or every branch of PKGNAME)