(Illustration by Gaich Muramatsu)
I just thought we should feel a little proud today: we lost a server due to an unknown bug. All volumes on it were read-write replicated volumes and Coda's replication/resolution features can be used to "repair" such a lost server - this happens through resolution which kick in automatically by doing "ls -lR". It is an absolute first (I believe) that we got 1.1 GB of data resolved without any server crashes or hangs -- and in sharp contrast with experiences only 10 months ago. Puneet Kumar was responsible for the design and implementation of resolution, and much to his credit we achieved the current state of affairs by me only fixing between half a dozen and a dozen bugs -- I must say that I found some of them really hard to find, involving distributed locks, subtle errors in the directory code and referencing some unallocated memory here and there. Last year we felt that fixing the resolution bugs was a priority we will now be migrating our attention to the area of low bandwidth connectivity, hoarding, repairing and reintegration. For more information about Coda, see http://www.coda.cs.cmu.edu/. - Peter Braam - Carnegie Mellon UniversityReceived on 1998-03-05 21:30:13