Coda File System

From: Peter J. Braam <braam_at_cs.cmu.edu> Date: Tue, 17 Feb 1998 01:16:15 -0500 (EST)

Eli,

I have given quite a detailed reply attached below.  I have cc'd other
interested people and the Coda list.  You can expect some mail, but on the
whole I think there is enough which is clearly needed to get going. 

Peter

> >One excellent project that crosses my mind, and which is quite doable in
> >the time allocated is to modify the backup system to do multilevel
> >backups.  At the moment it can do full backups or incrementals from the
> >full backup, we would want a multilevel backup scheme. 
> >
> >Dumping on Coda is done in two stages.  First a copy on write clone of the
> >volume is made.  The clone is read only, has a fresh copy of metadata, but
> >diskfiles are only cloned when the originals are modified.  Then the clone
> >is dumped.  Currently, the dump routines take a parameter to state if it
> >is incremental.  If it is incremental the versions of the files are
> >compared against a lits of version vectors stored after the previous dump
> >was made.  The modified files and their metadata are dumped and stored on
> >tape.  
> >
> >The clone by the way is incredibly nice: we mount that (it's read only) 
> >under users home directories as "old files". So a user has always got a
> >copy of yesterdays files ready.  The dumps can be merged, and then
> >restored into a read only volume, which is then mounted -- this would
> >happen if yesterdays clone didn't contain the lost files anymore. Users
> >fish up their own files when lost, the sysad just merges, restores and
> >mounts. 
> >
> >We would want a level to the incrementals (like in Unix dump), and more
> >lists of version vectors to be maintained. Similarly the merging of
> >incremental dumps into a full dump needs a bit of modification.
> >
> >This project would really have enormous practical value and put you in
> >touch with the core of the fileserver vnode data structures -- at the same
> >time, one can probably get away with fairly little coding after really
> >understanding the issue.
> >
> >There is an important aspect to this project which makes it very suited
> >for an MSc project: we need a detailed (a couple of pages) reasoning to
> >'prove' that the restoration of files is always possible.  Backups are
> >confusing with files being removed and identically named files being
> >recreated -- and I just want to make sure that I understand exactly what
> >state can be restored by merging a few incremental backups onto a full
> >one. 
> >
> >If this sounds interesting I will get you more details.  Do you have
> >(root) access to a fast pentium machine (P90 or higher with at least 32M
> >or RAM)  (otherwise I'll get you accounts here, you need a "private"
> >server to play and a fast machine since you'll be recompiling the server
> >quite a few times.). 
> >
> >- Peter J. Braam -
> >Coda Project, SCS, CMU
> >
> >ps: Bradley: this cannot be done on your server, it needs to be another
> >machine -- we have plenty, but obviously it would be easier for Eli to
> >work on a local machine.
> >
> 
> Peter, 
> 
> This does sound interesting.  Can you tell me some more about the issue of
> "proving" that restoration is possible?  I'm not sure I followed your brief
> explanation above.
> 
> I do have my own machine that I could use.  It's a 6x86 PR-166+ with 48 MB
> of RAM... the only issue is disk space.  My linux partition currently has
> only about 20 or 30 MB left (after compiling the coda client and server);
> I've got quite a lot of space left on a windows partition, so I could
> resize things to get more space if I needed to.
> 
> Thanks,
> 
> -Eli Daniel
> --
>        o
>          o
> \  /\ o
>  \/ O\ o
>  /\  /
> /  \/
> 

Space will be too tight I think.  Ask Bradley if he can NFS export a large
home directory for you from his server.  Use that to build Coda (as user
prophet or something) and install and run Coda on your own disk, since you
need root privileges.  A total of 100MB available for Coda binaries and
some small volumes and client cache (all on one machine will just do). 

After moving your build environment out of the way, I would install a Coda
server and client -- using the rpms (manual setup is pretty cumbersome). 
This is absolutely step one, since you need a server to do the dumps, and
test your new code. Let me know when you have done this successfully, then
I'll give you a run through on the "volutil dump" command. 

What I meant with the theoretical aspect is the following.  You are doing
a "dump" which is a snapshot of a volume at a certain time (I believe
that Coda takes an _exact_ snapshot by locking the volume -- most backup
utilities allow file system modifications during backup.  The reason coda
can do this is that is uses copy on write files and directories, so the
lock doesn't need to be held for very long.)

Then you take an incremental dump, and continue the increase the level of
increments until 3 or so.  I would like to see the design and
implementation of "dump" and "merge" explained/done in such a way that a
statement of the following kind is true:

- take full dump date A
- take incremental dump for date A dump on date B
- take incremental dump for date B dump on date C
(clearly C > B > A)

- merge the three dumps
-----> desirable result: _exact_ state of the filesystem on date C. Exact
means that both the file data and metadata is identical.

This means dump must keep precise track of creations/deletions and
renames.

Perhaps this is obvious for Coda dumps, which would be great.  Unix dump,
as well as almost any other backup system fail this criterion miserably --
unless the backup is done on a quiet system. 

The reason it is so important to me to get TOTAL detail about the dumps is
the following.  We would want to use the dumps to move a volume from one
server to another.  If this volume has only one replica, then it wouldn't
matter very much if the dump and restore changed a few things a little bit
(like perhaps a rename would be treated as a create/delete.) 

However, Coda has replication, and in order for dumps to become moveable
from one server to another, and continue to function a replicas of a
replicated volume, it is very important that EVERYTHING is restored
exactly right -- otherwise any access to a difference would trigger a
resolution to bring the replicas back in sync, and it is not clear that
this resolution could succeed. 

(There is another way in which we could move replicas "safely", this is by
convincing resolution system to take care of it.  This might be more
natural, but it would also require great care to get it right.)

Moving volumes works as follows: lock the volume - make a copy on write
clone. Unlock the volume (i.e. it can be used again).  Dump and move the
clone to a new server. Note that the volume can be large, so this can take
a while - but the volume is available again. Now lock the volume again,
and make an incremental dump; keep the volume locked. Move the
incremental, and merge.  Note that this is supposed to be quick, since
even for a large volume probably not much changes. Bring the new volume
online and disable the old volume.

If you get hooked on Coda, you could do the moving of volumes after you
have finished the backup work. 

The key piece of source code in Coda to start studying is 
coda-src/volutil/vol-dump.cc vol-backup.cc backup.cc etc.
The routine DumpVnodeIndex contains the crux.  It compares Coda version
vectors to determine if an incremental is needed.  It uses lists of
version vectors dumped by the server in /vice/backup.

If you send me a mail address I can send you some documentation I wrote on
the backup internals, which you may find helpful. If I remember correctly
I did describe the format of the dump file of version vectors.

When are you starting?

- Peter -

Coda File System

multilevel backups (and an introduction to moving volumes)