Coda File System

Next Previous Contents

1. Coda's ingredients

1.1 What is Coda ?

Coda is a distributed file system, i.e. it makes files available to a collection of client computers as part of their directory tree, but ultimately maintains the authoritative copy of the file data on servers. Coda has some features that make it stand out: it supports disconnected operation , i.e. full access to a cached section of the file space during voluntary or involuntary network or server outages. Coda will automatically reintegrate the changes made on disconnected clients when reconnecting. Further Coda has read write, failover server replication, meaning that data is stored and fetch from any of a group of servers and Coda will continue to operate when only a subset of all servers is available. If server differences arise due to network partitions Coda will resolve differences automatically to a maximum extent possible and aid users in repairing what can't be done automatically. Coda is very differently organized from NFS and Windows/Samba shares. Coda does have many similarities to AFS and DCE/DFS.

1.2 Getting clued in with the Coda terminology

A single name space

All of Coda appears under a single directory /coda on the client (or under a single drive under Windows). Coda does not have different exports or shares as do NFS and Samba that are individually mounted. Under /coda the volumes (aka file sets) of files exported by all the servers (living in your Coda cell) are visible. Coda automatically finds servers and all a client needs to know is the name of one bootstrap server that gives it information how to find the root volume of Coda.

A Coda cell

is a group of servers sharing one set of configuration databases. A cell can consist of a single server or up to hundreds of servers. One server is designated as the SCM , the system Control machine. It is distinguished by being the only server modifying the configuration databases shared by all servers, and propagating such changes to other servers. At present a Coda client can belong to a single cell. We hope to get a cell mechanism into Coda whereby a client can see files in multiple cells.

Coda volumes:

File servers group the files in volumes. A volume is typically much smaller than a partition and much larger than a directory. Volumes have a root and contain a directory tree with files. Each volume is "Coda mounted" somewhere under /coda and forms a subtree of the /coda. Volumes can contain mountpoints of other volumes. A volume mountpoint is not a Unix mountpoint or Windows drive - there is only one drive or Unix mountpoint for Coda. A Coda mountpoint contains enough information for the client to find the server(s) which store the files in the volume. The group of servers serving a volume is called the Volume Storage Group of the volume.

Volume Mountpoints

One volume is special, it is the root volume, the volume which Coda mounts on /coda . Other volumes are grafted into the /coda tree using cfs mkmount . This command installs a volume mountpoint in the Coda directory tree, and in effect its result is similar to mkdir mountpoint ; mount device mountpoint under Unix. When invoking the cfs makemount the two arguments given are the name of the mountpoint and the name of the volume to be mounted. Coda mountpoints are persistent objects, unlike Unix mountpoints which needs reinstating after a reboot.

Data storage

The servers do not store and export volumes as directories in the local disk filesystem, like NFS and Samba. Coda needs much more meta data to support server replication and disconnected operation and it has complex recovery which is hard to do within a local disk filesystem. Coda servers store files identified by a number typically all under a directory /vicepa . The meta data (owners, access control lists, version vectors) and directory contents is stored in an RVM data file which would often be a raw disk partition.

RVM

stands for Recoverable Virtual Memory . RVM is a transaction based library to make part of a virtual address space of a process persistent on disk and commit changes to this memory atomically to persistent storage. Coda uses RVM to manage its metadata. This data is stored in an RVM data file which is mapped into memory upon startup. Modifications are made in VM and also writtent to the RVM LOG file upon committing a transaction. The LOG file contains committed data that has not yet been incorporated into the data file on disk.

Client data

is stored somewhat similarly: meta data in RVM (typically in /usr/coda/DATA ) and cached files are stored by number under /usr/coda/venus.cache . The cache on a client is persistent. This cache contains copies of files on the server. The cache allows for quicker access to data for the client and allows for access to files when the client is not connected to the server.

Validation

When Coda detects that a server is reachable again it will validate cached data before using it to make sure the cached data is the latest version of the file. Coda compares cached version stamps associated with each object, with version stamps held by the server.

Authentication

Coda manages authentication and authorization through a token. Similar (the details are very different) to using a Windows share, Coda requires users to log in. During the log in process, the client acquires a session key, or token in exchange for a correct password. The token is associated with a user identity, at present this Coda identity is the uid of the user performing the log in.

Protection

To grant permissions the cache manager and servers use the token with its associated identity and match this against priviliges granted to this identity in access control lists (ACL). If a token is not present, anonymous access is assumed, for which permissions are again granted through the access control lists using the System:AnyUser identity.

1.3 Organization of the client

The kernel module and the cache manager

Like every filesystem a computer enabled to use the Coda filesystem needs kernel support to access Coda files. Coda's kernel support is minimal and works in conjunction with the userspace cache manager Venus . User requests enter the kernel, which will either reply directly or ask the cache manager venus to assist in service.

Typically the kernel code is in a kernel module, which is either loaded at boot time or dynamically loaded when Venus is started. Venus will even mount the Coda filesystem on /coda .

Utilities

To manipulate acl's, the cache, volume mountpoints and possibly the network behaviour of a Coda client a variety of small utilities is provided. The most important one is the cfs command.

There is also a clog program to authenticate to the Coda authentication server. The codacon programm allows one to monitor the operatoin of the cache manager, and cmon program gives summary information about a list of servers.

1.4 The server organization

The main program is the Coda fileserver codasrv . It is responsible for doing all file operations, as well as volume location service.

The Coda authentication server auth2 handles requests from clog for tokens, and changes of password from au and cpasswd . Only the the auth2 process on the SCM will modify the password database.

All servers in a Coda cell share the configuration databases in /vice/db and retrieve them from the SCM when changes have occurred. The updateclnt program is responsible for retrieving such changes, and it polls the updatesrv on the SCM to see if anything has changed. Sometimes the SCM needs a (non-shared) database from another server to update a shared database. It fetches this through an updatesrv process on that server using updatefetch .

Utilities

On the server there are utilities for volume creation and management. These utilities consist of shell scripts and the volutil command. There is also a tool to manipulate the protection databases.


Next Previous Contents