(Illustration by Gaich Muramatsu)
The bug in question is a simple evolutionary design error in the RVM transaction handling as applied to Coda. It is basically unrelated to resolution and reintegration apart from the fact that the transactions in these two cases are large and have a higher chance of failing than on small FS operations. The evolutionary aspect is that we don't need the offending buffer cache at all, so I am ripping it out. No big deal. The re-integration path will indeed become the canonical path. Lily Mummert and I designed a write back caching mechanism for Coda which effectively "kicks" the client into "write disconnected" state. It uses logging to record the operations and they are re-integrated on the servers. Indeed, things like this should be part of the standard path of the code and they will be. - Peter - Brian Bartholomew writes: > > There are a number of bugs in Coda that can lead to corrupt data in > > Coda. They are unlikely to crop up in strongly connected non > > replicated server setups, but when using re-integration, resolution > > and repair there is a bug in the transaction handling that can write > > stale buffer cache data to the disk. > > I've never much trusted fault-tolerant implementations designed as a > "everything working" code path and a "during failure" code path, > because the "during failure" code doesn't get tested enough. The > canonical example is a fault-tolerant NFS server implemented as two > workstations connected by a private network and some specialized > identity-takeover code. If the vendor can't keep a workstation up > long enough to be a reliable NFS server, why do I trust them to get > the failover code right? Failover is a harder problem. > > Is there some way that the re-integration code in Coda could be > exercised in normal use? My favorite example of this is the reliable > multicast protocol isis, where the complex parts of the failover > checks are exercised for normal reception. When something really does > fail, not much changes in the execution profile. > > Also, could a 'crashme' suite be written for Coda? This could > generate ill-formed Coda traffic, and insert randomly changing delays > into the code to exercise timing relationships and race conditions. > > > Another member of the League for Programming Freedom (LPF) www.lpf.org > ------------------------------------------------------------------------------- > Brian Bartholomew - bb_at_wv.com - www.wv.com - Working Version, Cambridge, MAReceived on 1997-12-10 18:27:03