(Illustration by Gaich Muramatsu)
> There are a number of bugs in Coda that can lead to corrupt data in > Coda. They are unlikely to crop up in strongly connected non > replicated server setups, but when using re-integration, resolution > and repair there is a bug in the transaction handling that can write > stale buffer cache data to the disk. I've never much trusted fault-tolerant implementations designed as a "everything working" code path and a "during failure" code path, because the "during failure" code doesn't get tested enough. The canonical example is a fault-tolerant NFS server implemented as two workstations connected by a private network and some specialized identity-takeover code. If the vendor can't keep a workstation up long enough to be a reliable NFS server, why do I trust them to get the failover code right? Failover is a harder problem. Is there some way that the re-integration code in Coda could be exercised in normal use? My favorite example of this is the reliable multicast protocol isis, where the complex parts of the failover checks are exercised for normal reception. When something really does fail, not much changes in the execution profile. Also, could a 'crashme' suite be written for Coda? This could generate ill-formed Coda traffic, and insert randomly changing delays into the code to exercise timing relationships and race conditions. Another member of the League for Programming Freedom (LPF) www.lpf.org ------------------------------------------------------------------------------- Brian Bartholomew - bb_at_wv.com - www.wv.com - Working Version, Cambridge, MAReceived on 1997-12-10 17:22:21