Coda File System

Next Previous Contents

10. System Administration: Volumes

10.1 Description

Although users view the Coda file system as a hierarchy of directories and files, system administrators view the Coda file system as a hierarchy of volumes. Each volume contains a subtree of related directories and files which is a subtree of the entire file system. Volumes, then, parallel traditional Unix file systems. Like a Unix file system, a volume can be mounted. Thus, the root of a volume can be named within another volume at a mount point. The Coda file system hierarchy is built in this manner and is then mounted by each client using a conventional Unix mount point within the local file system. Since all internal mount points are invisible, the user sees only a single mount point for the entire Coda file system.

All system administration tasks are performed relative to a volume or a set of volumes. So, adding new users requires creating new volumes for their files and directories; quotas are enforced on volumes; and backups are performed on a per-volume basis. The volume abstraction greatly simplifies the administration of large systems. NOTE: Quotas have not been implemented yet .

The Coda file system provides four different types of volumes. The simplest of these is the non-replicated volume . Non-replicated volumes reside on a single server and are in the custody of the Coda file server they reside on. The Coda servers work with the venus processes on client workstations to provide a single, seamless view of the file system. However, if a custodian crashes or is otherwise inaccessible, its non-replicated volumes are inaccessible as well.

To partially solve this availability problem, Coda provides replicated, read-only volumes . This type of volume has exactly one read-write copy, but may have any number of read-only copies controlled by other servers. Changes to such a volume are made on the custodians read-write copy and then distributed to all servers with read-only copies. Read-only replication provides higher availability for volumes containing frequently-requested but infrequently-updated objects, like system binaries. In addition, read-only replication is used in performing backups on volumes.

Unfortunately, read-only replicas cannot provide high availability for all types of volumes, e.g. user volumes. Thus, Coda also provides read-write, replicated volumes . Read-write, replicated volumes are logical volumes which group together multiple read-write, non-replicated volumes. Coda provides protocols which allow read-write, replicated volumes to reside on a number of servers and to be accessed even when some servers are inaccessible. Although read-write replication provides everything read-only replication provides, its protocols are more expensive. Thus, read-only replication, rather than read-write replication, should be used for volumes which change slowly but are accessed frequently. Table XXX illustrates the differences between the volume types.

Coda Volume Types

Volume Type
Where Reads bf/Where Writes/ Conflicts
Performed Performed Possible?
Non-replicated Only Custodian Only Custodian No
Read-only Replicated Any Server with Only Custodian No
Replica
Read-Write Replicated Any VSG Member Any VSG Member Yes
Backup Only Custodian Nowhere No

10.2 Creating a Volume

Typically, volumes consist of a single users data objects or other logically connected groups of data objects. Four factors should be used in dividing the file system tree into volumes.

  1. The volume is the unit of quota enforcement and, potentially, accounting.
  2. The rename command is prohibited across volume boundaries. Thus, manipulation of the tree structure at a granularity other than a whole volume (e.g. moving the mount point) or less than a volume (e.g. moving directories or files within a volume) is expensive. Moving a subtree of a volume to another volume requires copying every byte of data in the subtree.
  3. The size of the volume should be small enough that moving volumes between partitions is a viable approach to balancing server disk utilization and server load. Thus, volumes should be small relative to the partition size.
  4. Finally, the size of a volume must not exceed the capability of the backup media.

A volume naming convention should also be used by those administrators who create volumes. Volume names are restricted to 32 characters and should be chosen so that given a volume name, a system administrator (who knows the naming conventions) can determine its correct location in the file system hierarchy. The convention used by the Coda project is to name volumes by their function and location. Thus, a replicated volume named "u.hbovik" is mounted in /coda/usr/hbovik and contains hboviks data. A project volume is prefixed by "p." and a system volume is prefixed by "s." Similarly, volumes containing machine specific object files are prefixed by the machine type. For instance, "p.c.alpha.pmax.bin" contains project coda binaries for our current alpha release and is mounted on /coda/project/coda/alpha/pmax_mach/bin .

Use the commands createvol (8) and createvol_rep (8) to create non-replicated and read-write replicated volumes respectively. (Read-only replication is discussed in Section XXX below). These commands are actually scripts which ultimately invoke the volutil (8) command with the create option at the appropriate server. The volume will contain an access list initialized to System:AnyUser rlidwka . Creating the volume does not mount the volume within the file system hierarchy. Mounting the volume as well as changing the access list or the quota must be done using the cfs (1) command from a client. A new volume may not be visible at client workstations for some time (see Section XXX below).

A few concrete examples should clarify the use of some of these commands. On the SCM, the command

* createvol u.hbovik mahler /vicepa

will create a non-replicated volume named "u.hbovik" on server "mahler"s /vicepa partition. Similarly, the command

* createvol_rep u.hbovik E0000107 /vicepa

will create a replicated volume named "user.hbovik" on each server in the Volume Server Group (VSG) identified by "E0000107". The file /vice/db/VSGDB contains the mapping between VSGs and their identifications. The names of the replicas will be "user.hbovik.n", where n is a number between 0 and |VSG| - 1.

10.3 Mounting a Volume

In order to use a volume which you have created and added to the appropriate databases, you must mount the volume. Although Unix file systems must be mounted upon reboot in Unix, Coda volumes are mounted only once. To mount a Coda volume, you must be using a Coda client and be authenticated as a user (use the clog) command who has write access to the directory in which the mount point will be created.

Mount the volumes using the command

* cfs mkmount
<
filename
>
volname

Note that cfs creates < filename > automatically. For example,

* cfs mkmount /coda/usr/hbovik u.hbovik

will create /coda/usr/hbovik and then mount the u.hbovik volume created in the example in Section XXX . The volume is now visible to all users of the Coda file system. When mounting a volume, avoid creating multiple mount points for it. Coda cannot check for this. More information about the cfs command can be found in Chapter XXX as well as in Appendix XXX .

10.4 Deleting a Volume

When a volume is no longer needed, it may be purged by running the purgevol or purgevol_rep scripts on the SCM. Before removing a volume, you should probably create a backup for offline storage (see Section XXX up to the restore step). The volume's mount point should be removed with the cfs (1) command (see the rmmount option) before purging the volume (if possible). Note that purging the volume will not purge related backup volumes. Backup and ReadOnly volumes should be purged with the purgevol script.

10.5 Restoring a Volume

For complete details on the backup/restore process, see Chapter XXX . In short, one first needs to get the correct dumpfile, possibly merging incremental dumps to an older full dump to get the desired state to be restored. Once this file is obtained, use the volutil (8) restore facility.

* volutil restore
<
filename
>
<
partition
>
[
<
volname
>
[
<
volid
>
]]

The < partition > should be one of the /vicep? partitions on the server. You may optionally specify the volumeid and the volume name for the restored volume. This is useful when creating read-only replicated volumes. Note that currently dump files are not independent of byte ordering -- so volumes cannot be dumped across architectures that differ in this respect.

10.6 Read-only Replication of a Volume

Read-only replication of a volume requires more effort on the part of the system administrator. However, it greatly increases the availability of volumes which cannot be read-write replicated. The most important example of such a volume is the root of the Coda file system. Conflicting updates cannot be allowed to occur at the root volume since this would make the entire Coda file system inaccessible. However, if the root volume is not replicated, the availability of the entire Coda file system depends upon the availability of the server acting as the custodian for this one volume. For these reasons, we highly recommend making the root volume of the Coda file system read-only replicated. We provide an extended example here to show you exactly how to go about the replication and distribution process. Note that the example shows how to make the root read-only replicated. Details pertaining to the coda root volume can be ignored when making other read-only replicated volumes (such as subtrees containing standard binaries).

We assume that you have a non-replicated root volume, called coda.root . If you are installing a new system, you can access this volume as /coda by using a Coda client. This volume should look exactly how you want the read-only replica to look. If it doesnt, make any changes now. Also note that any volumes that you want mounted within the read-only replicas should be mounted before continuing. Our root volume has three read-write replicated volumes mounted. The usr volume contains the home directories of our users, the project volume contains project directories and tmp contains temporary files. In addition, we have one subdirectory called nonrep which has non-replicated volumes (one per server) mounted within it. The purpose of these non-replicated volumes is to provide users with a location to perform repairs of conflicting objects. (Although the need for such a directory may not be clear at this point, we highly recommend providing such a directory.)

If you already have a read-only replicated root volume but want to update it, you should mount the read-write version of the root volume elsewhere and make your changes to this volume. Once you have made your changes, you will need to purge the old read-only replicas of your root volume using the volutil(8) command. Be sure that you purge the replica on each server. Then, you will need to edit the VolumeList file in /vice/vol and remove the entry for the read-only replicated root volume. (The name of the read-only replica will probably be coda.root.readonly.)

On the SCM, you need to clone the read-write copy of the root volume. You can use the command

* volutil clone
<
VolumeId
>

This command will create a read-only volume with the name coda.root.readonly (Assuming that your root volume is called coda.root) . Next, you will need to dump this cloned volume to a file with the command

* volutil dump
<
VolumeId
>
<
filename
>

Now, copy this file to each of the servers which will have read-only replicas of the root volume and execute the command

* volutil restore
<
filename
>
<
partition
>
[
<
volid
>
[
<
volname
>
]]

Note that the root volume currently must reside on /vicepa . Read-only replicated volumes must share the same volid and name, so take care to specify these correctly when restoring to more than one server. The final step is to build the VLDB by running the command

* bldvldb.sh

on the SCM and to make sure that the file /vice/ROOTVOLUME contains the name or volume id of the root volume (coda.root.readonly). (Also, it may be necessary to restart the venus on the clients.)

10.7 Building the VLDB

The volume location data base, VLDB, is used to provide volume addressing information to workstations. Copies of the VLDB reside on the servers and are updated periodically. The VLDB lists the latest known location or locations of all on-line volumes in the system. A human readable version of the VRDB is maintained on the SCM in the file /vice/vol/VRList .

The VLDB is maintained on the SCM. When you wishe to update it, run the /vice/bin/bldvldb.sh (8) script on the SCM. The script gathers a copy of the /vice/vol/VolumeList file from all the servers, merges it into a single list, and builds a new VLDB. The UpdateMon program then propagates the new VLDB to all the servers. Note that the createvol and purgevol scripts automatically invoke bldvldb.sh .

10.8 Building the VRDB

The volume replication data base, the VRDB, is used to provide information about replicated volumes to client workstations. Copies of the VRDB reside on all servers and are updated periodically. The VRDB maps each logical volume to its corresponding set of physical volumes.

A human readable version of the VRDB is maintained on the SCM in the file /vice/vol/VRList . The makevrdb option to the volutil(8) command will create a new VRDB which will automatically be distributed to the other servers.

10.9 Building the VSGDB

The Volume Storage Group Data Base, VSGDB, is currently maintained by hand. Each valid volume storage group has an entry in this data base containing an identification number and the names of the servers in the group.

10.10 Ensuring Volume Consistency after Server Crashes

Coda servers ensure file system consistency after a crash by running fsck (8) , recovering RVM, and running the Coda salvager. The fsck used here a CMU has been modified so that it does not require every inode to be referenced by a

Warning : the vanilla fsck must not be used on a Coda file system partition as the Coda files will be thrown away.
Coda accesses inode directly. After the server machine is booted, the codasrv process starts and RVM recovers the servers committed state. The Coda salvager then reconciles the results from fsck and the salvager.

10.11 Getting Volume Information

The cfs provides information on volumes. cfs can only be used on a machine which has a running venus (such as a client workstation). cfs is described in Chapter XXX as well as in the manual page contained in Appendix XXX .


Next Previous Contents