Although users view the Coda file system as a hierarchy of directories and files, system administrators view the Coda file system as a hierarchy of volumes. Each volume contains a subtree of related directories and files which is a subtree of the entire file system. Volumes, then, parallel traditional Unix file systems. Like a Unix file system, a volume can be mounted. Thus, the root of a volume can be named within another volume at a mount point. The Coda file system hierarchy is built in this manner and is then mounted by each client using a conventional Unix mount point within the local file system. Since all internal mount points are invisible, the user sees only a single mount point for the entire Coda file system.
All system administration tasks are performed relative to a volume or a set of volumes. So, adding new users requires creating new volumes for their files and directories; quotas are enforced on volumes; and backups are performed on a per-volume basis. The volume abstraction greatly simplifies the administration of large systems. NOTE: Quotas have not been implemented yet .
The Coda file system provides four different types of volumes. The simplest of these is the non-replicated volume . Non-replicated volumes reside on a single server and are in the custody of the Coda file server they reside on. The Coda servers work with the venus processes on client workstations to provide a single, seamless view of the file system. However, if a custodian crashes or is otherwise inaccessible, its non-replicated volumes are inaccessible as well.
To partially solve this availability problem, Coda provides replicated, read-only volumes . This type of volume has exactly one read-write copy, but may have any number of read-only copies controlled by other servers. Changes to such a volume are made on the custodians read-write copy and then distributed to all servers with read-only copies. Read-only replication provides higher availability for volumes containing frequently-requested but infrequently-updated objects, like system binaries. In addition, read-only replication is used in performing backups on volumes.
Unfortunately, read-only replicas cannot provide high availability for all types of volumes, e.g. user volumes. Thus, Coda also provides read-write, replicated volumes . Read-write, replicated volumes are logical volumes which group together multiple read-write, non-replicated volumes. Coda provides protocols which allow read-write, replicated volumes to reside on a number of servers and to be accessed even when some servers are inaccessible. Although read-write replication provides everything read-only replication provides, its protocols are more expensive. Thus, read-only replication, rather than read-write replication, should be used for volumes which change slowly but are accessed frequently. Table XXX illustrates the differences between the volume types.
Volume Type |
Where Reads | bf/Where Writes/ | Conflicts |
Performed | Performed | Possible? | |
Non-replicated | Only Custodian | Only Custodian | No |
Read-only Replicated | Any Server with | Only Custodian | No |
Replica | |||
Read-Write Replicated | Any VSG Member | Any VSG Member | Yes |
Backup | Only Custodian | Nowhere | No |
Typically, volumes consist of a single users data objects or other logically connected groups of data objects. Four factors should be used in dividing the file system tree into volumes.
A volume naming convention should also be used by those
administrators who create volumes. Volume names are restricted to
32 characters and should be chosen so that given a volume name, a
system administrator (who knows the naming conventions) can
determine its correct location in the file system hierarchy. The
convention used by the Coda project is to name volumes by their
function and location. Thus, a replicated volume named "u.hbovik"
is mounted in
/coda/usr/hbovik
and contains hboviks
data. A project volume is prefixed by "p." and a system volume is
prefixed by "s." Similarly, volumes containing machine specific
object files are prefixed by the machine type. For instance,
"p.c.alpha.pmax.bin" contains project coda binaries for our current
alpha release and is mounted on
/coda/project/coda/alpha/pmax_mach/bin
.
Use the commands
createvol (8)
and
createvol_rep
(8) to create non-replicated and read-write
replicated volumes respectively. (Read-only replication is
discussed in Section
XXX
below). These
commands are actually scripts which ultimately invoke the
volutil (8)
command with the create option at the
appropriate server. The volume will contain an access list
initialized to
System:AnyUser rlidwka
. Creating the volume
does not mount the volume within the file system hierarchy.
Mounting the volume as well as changing the access list or the
quota must be done using the
cfs (1)
command from a
client. A new volume may not be visible at client workstations for
some time (see Section
XXX
below).
A few concrete examples should clarify the use of some of these commands. On the SCM, the command
* createvol u.hbovik mahler /vicepa
will create a non-replicated volume named "u.hbovik" on server "mahler"s /vicepa partition. Similarly, the command
* createvol_rep u.hbovik E0000107 /vicepa
will create a replicated volume named "user.hbovik" on each
server in the Volume Server Group (VSG) identified by "E0000107".
The file
/vice/db/VSGDB
contains the mapping between
VSGs and their identifications. The names of the replicas will be
"user.hbovik.n", where n is a number between 0 and |VSG| - 1.
In order to use a volume which you have created and added to the appropriate databases, you must mount the volume. Although Unix file systems must be mounted upon reboot in Unix, Coda volumes are mounted only once. To mount a Coda volume, you must be using a Coda client and be authenticated as a user (use the clog) command who has write access to the directory in which the mount point will be created.
Mount the volumes using the command
* cfs mkmount < filename > volname
Note that cfs creates < filename > automatically. For example,
* cfs mkmount /coda/usr/hbovik u.hbovik
will create
/coda/usr/hbovik
and then mount the
u.hbovik
volume created in the example in Section
XXX
. The volume is now visible to all users of the
Coda file system. When mounting a volume, avoid creating multiple
mount points for it. Coda cannot check for this. More information
about the
cfs
command can be found in Chapter
XXX
as well as in Appendix
XXX
.
When a volume is no longer needed, it may be purged by running the purgevol or purgevol_rep scripts on the SCM. Before removing a volume, you should probably create a backup for offline storage (see Section XXX up to the restore step). The volume's mount point should be removed with the cfs (1) command (see the rmmount option) before purging the volume (if possible). Note that purging the volume will not purge related backup volumes. Backup and ReadOnly volumes should be purged with the purgevol script.
For complete details on the backup/restore process, see Chapter XXX . In short, one first needs to get the correct dumpfile, possibly merging incremental dumps to an older full dump to get the desired state to be restored. Once this file is obtained, use the volutil (8) restore facility.
* volutil restore < filename > < partition > [ < volname > [ < volid > ]]
The
<
partition
>
should be one of the
/vicep?
partitions on the server. You may optionally specify the volumeid
and the volume name for the restored volume. This is useful when
creating read-only replicated volumes. Note that currently dump
files are not independent of byte ordering -- so volumes cannot be
dumped across architectures that differ in this respect.
Read-only replication of a volume requires more effort on the part of the system administrator. However, it greatly increases the availability of volumes which cannot be read-write replicated. The most important example of such a volume is the root of the Coda file system. Conflicting updates cannot be allowed to occur at the root volume since this would make the entire Coda file system inaccessible. However, if the root volume is not replicated, the availability of the entire Coda file system depends upon the availability of the server acting as the custodian for this one volume. For these reasons, we highly recommend making the root volume of the Coda file system read-only replicated. We provide an extended example here to show you exactly how to go about the replication and distribution process. Note that the example shows how to make the root read-only replicated. Details pertaining to the coda root volume can be ignored when making other read-only replicated volumes (such as subtrees containing standard binaries).
We assume that you have a non-replicated root volume, called
coda.root
. If you are installing a new system, you can
access this volume as /coda by using a Coda client. This volume
should look exactly how you want the read-only replica to look. If
it doesnt, make any changes now. Also note that any volumes that
you want mounted within the read-only replicas should be mounted
before continuing. Our root volume has three read-write replicated
volumes mounted. The
usr
volume contains the home
directories of our users, the
project
volume contains
project directories and
tmp
contains temporary files.
In addition, we have one subdirectory called
nonrep
which has non-replicated volumes (one per server) mounted within
it. The purpose of these non-replicated volumes is to provide users
with a location to perform repairs of conflicting objects.
(Although the need for such a directory may not be clear at this
point, we highly recommend providing such a directory.)
If you already have a read-only replicated root volume but want
to update it, you should mount the read-write version of the root
volume elsewhere and make your changes to this volume. Once you
have made your changes, you will need to purge the old read-only
replicas of your root volume using the
volutil(8)
command.
Be sure that you purge the replica on each server. Then, you will
need to edit the VolumeList file in
/vice/vol
and
remove the entry for the read-only replicated root volume. (The
name of the read-only replica will probably be
coda.root.readonly.)
On the SCM, you need to clone the read-write copy of the root volume. You can use the command
* volutil clone < VolumeId >
This command will create a read-only volume with the name coda.root.readonly (Assuming that your root volume is called coda.root) . Next, you will need to dump this cloned volume to a file with the command
* volutil dump < VolumeId > < filename >
Now, copy this file to each of the servers which will have read-only replicas of the root volume and execute the command
* volutil restore < filename > < partition > [ < volid > [ < volname > ]]
Note that the root volume currently must reside on
/vicepa
. Read-only replicated volumes must share the
same volid and name, so take care to specify these correctly when
restoring to more than one server. The final step is to build the
VLDB by running the command
* bldvldb.sh
on the SCM and to make sure that the file
/vice/ROOTVOLUME
contains the name or volume id of the
root volume (coda.root.readonly). (Also, it may be necessary to
restart the venus on the clients.)
The volume location data base, VLDB, is used to provide volume
addressing information to workstations. Copies of the VLDB reside
on the servers and are updated periodically. The VLDB lists the
latest known location or locations of all on-line volumes in the
system. A human readable version of the VRDB is maintained on the
SCM in the file
/vice/vol/VRList
.
The VLDB is maintained on the SCM. When you wishe to update it,
run the
/vice/bin/bldvldb.sh (8)
script on the SCM. The
script gathers a copy of the
/vice/vol/VolumeList
file
from all the servers, merges it into a single list, and builds a
new VLDB. The UpdateMon program then propagates the new VLDB to all
the servers. Note that the
createvol
and
purgevol
scripts automatically invoke
bldvldb.sh
.
The volume replication data base, the VRDB, is used to provide information about replicated volumes to client workstations. Copies of the VRDB reside on all servers and are updated periodically. The VRDB maps each logical volume to its corresponding set of physical volumes.
A human readable version of the VRDB is maintained on the SCM in
the file
/vice/vol/VRList
. The
makevrdb
option
to the
volutil(8)
command will create a new VRDB which will
automatically be distributed to the other servers.
The Volume Storage Group Data Base, VSGDB, is currently maintained by hand. Each valid volume storage group has an entry in this data base containing an identification number and the names of the servers in the group.
Coda servers ensure file system consistency after a crash by running fsck (8) , recovering RVM, and running the Coda salvager. The fsck used here a CMU has been modified so that it does not require every inode to be referenced by a
Warning : the vanilla fsck must not be used on a Coda file system partition as the Coda files will be thrown away.Coda accesses inode directly. After the server machine is booted, the codasrv process starts and RVM recovers the servers committed state. The Coda salvager then reconciles the results from fsck and the salvager.
The cfs provides information on volumes. cfs can only be used on a machine which has a running venus (such as a client workstation). cfs is described in Chapter XXX as well as in the manual page contained in Appendix XXX .