Stateless Linux - Cached Client
The original StatelessLinux prototype contained support for what we called a "cached client" - similar in concept to a diskless client using a network mounted root filesystem, except that the root filesystem is cached on disk. The cached root is never modified directly, but instead updated from the master copy on the server.
This was implemented very simply:
- The client had two copies of the root filesystem, each in its own separate partition
- One copy would be the active copy - i.e. the copy the client boots from. The other copy would be a shadow copy used only for updating the client
- Periodically the client would use rsync to update the shadow copy. Once the update was complete, the shadow copy would be marked as the active copy for the next boot
The main problem we saw with this mechanism was that the rsync operation was highly intensive on both the client and the server, severly limiting the scalability of the system. Another lesser concern was that maintaining two complete copies of the root filesystem was wasteful.
In FC6 we are implementing cached clients in a different way:
- When any changes are to be made to the master copy of the OS image on the server, an LVM snapshot of the image is first created and the changes are made there. These block-level changes are recorded in a copy-on-write (COW) image.
- On boot, the client creates an LVM snapshot of the root volume and boots from the snapshot itself.
- Periodically the client polls the server for changes to the master copy, downloads these changes and merges them into original root volume.
- Because the client is running from a snapshot of the root volume, these changes are not visible until the next time the client boots, at which time it discards the old snapshot and creates a new snapshot
In More Detail
LVM's "snapshot" feature allows you to take a logical volume and create two logical and independant forks of that volume. One fork is known as the "snapshot" fork and the other is known as the "origin" fork.
This is implemented using a copy-on-write (COW) area where the block-level deltas between these two forks are maintained as follows:
- When you write to the snapshot fork, the affected chunk is first copied from the original volume to the COW area and then modified.
- When you read from the snapshot fork, the kernel checks to see if the chunk in question is available from the COW area. If it is, that version of the chunk is returned. If it isn't, the chunk from the original volume is returned.
- When you write to the origin fork, the affected chunk is copied to the COW area, but the chunk on the original volume is modified.
- When you read from the origin fork, the chunk from the original volume is always returned, ignoring the contents of the COW area.
All this logic is implemented in the device-mapper dm-snapshot kernel module.
A new feature we're adding to LVM and device-mapper is the ability to merge the snapshot fork back into the origin fork. Conceptually, this is a pretty simple operation. You take the contents of the COW area and for each chunk, copy it to the original volume overwriting the original version of the chunk in the process.
LVM and dm-snapshot patches implementing this can be found here .
In order to capture changes made to master OS image on the server, we will be implementing tools to create checkpoints of the image. Basically, at each checkpoint we will merge the previous snapshot of the image into the original image and then create a new snapshot. By doing this atomically we end up with a COW image for each checkpoint which represents the block-level changes made to the image between each checkpoint.
In order to allow clients to update its cache of the OS image, each client will create a snapshot of its cache at boot time and the snapshot will be mounted read-only. A daemon will periodically poll the server and if there are any changes available, it will download those changes (in the form of a COW image) and merge those changes into the cache. Because the client is running from a snapshot of the cache, it doesn't see those changes until it next boots at which time it discards the current snapshot and creates a new snapshot.
The way we merge changes into the cache is a little funky. We first create yet another snapshot of the origin fork of the snapshot which was created at boot time. However, we create this new snapshot using the COW image we downloaded from the server so that the snapshot appears identical to the new version of the image. We merge this new snapshot fork into its origin, causing the original version of the affected chunks to be copied to the COW area of the first snapshot. This way, the snapshot volume which the client booted is unaffected.
- The LVM and device-mapper code to allow merging is awaiting upstream review.
- Work is well underway on some code to provision cached clients by partitioning the local disk, installing a bootloader and copying the image from the server to the client
- The following still needs to be implemented:
- Support in mkinitrd to create a snapshot of the root filesystem at boot.
- A daemon to periodically poll for updates to the image and merge them into the local cache.
- Server side tools for managing the OS images and tracking changes to those images.