Coda File System

From: Brian Bartholomew <bb_at_wv.com> Date: Wed, 10 Dec 1997 16:56:10 -0500

> There are a number of bugs in Coda that can lead to corrupt data in
> Coda.  They are unlikely to crop up in strongly connected non
> replicated server setups, but when using re-integration, resolution
> and repair there is a bug in the transaction handling that can write
> stale buffer cache data to the disk.

I've never much trusted fault-tolerant implementations designed as a
"everything working" code path and a "during failure" code path,
because the "during failure" code doesn't get tested enough.  The
canonical example is a fault-tolerant NFS server implemented as two
workstations connected by a private network and some specialized
identity-takeover code.  If the vendor can't keep a workstation up
long enough to be a reliable NFS server, why do I trust them to get
the failover code right?  Failover is a harder problem.

Is there some way that the re-integration code in Coda could be
exercised in normal use?  My favorite example of this is the reliable
multicast protocol isis, where the complex parts of the failover
checks are exercised for normal reception.  When something really does
fail, not much changes in the execution profile.

Also, could a 'crashme' suite be written for Coda?  This could
generate ill-formed Coda traffic, and insert randomly changing delays
into the code to exercise timing relationships and race conditions.

Another member of the League for Programming Freedom (LPF) www.lpf.org
-------------------------------------------------------------------------------
Brian Bartholomew - bb_at_wv.com - www.wv.com - Working Version, Cambridge, MA

Coda File System

Re: Coda and GNU/Linux systems