(Illustration by Gaich Muramatsu)
> This is a Darwin issue, that I need help with again. > > Quite frequently, we get a local/global conflict while writing data to > coda. This happens when there is only one client involved, so there is > no real cause for a conflict. Even a single client can get a conflict, one simple situation is when a store completes, but the client doesn't see the final ACK message. The client assumes disconnection, logs the store operation in the reintegration log and retries the operation when the server is heard from again. However the store in the reintegration log has a different 'store id' compared to the one that was already committed and it still has the locally cached version vector of the original object. As a result the server believes this retried store is a conflicting update and declares a conflict. > It seems to happen if a write coincides with venus making a checkpoint. > > Is there any way I can turn off automatic checkpointing? In that case, > I'd like to do that and run som stress testing operations to see if > turning it off helps. Hmm, checkpointing shouldn't interfere, I thought the same thread that does the reintegration is responsible for the checkpointing (the 'voldaemon' thread). When reintegration returns an error we always automatically create a checkpoint file. Originally local/global repair used to involve replaying the operations in the checkpoint file one by one instead of using an ioctl to ask for the exactl CML records involved in the conflict. Because of this automatic checkpointing when reintegration fails it might look like the checkpointing caused reintegration to fail, which we only checkpointed as a result of the failure. In any case there is a 'VolCheckPointInterval' variable in coda-src/venus/vol_daemon.cc, which controls the frequency of the checkpointing. You could either set that to a large value, or comment out the check later on in the file that triggers the perioding checkpointing. > Then, if turning it off helps, should there be any kind of mutex in the > kernel module to serialize these operations ? There is a mutex, it is tested in vdb::CheckPoint (vol_daemon.cc), if the CML_lock is taken we print 'volume foo CML is busy, skip checkpoint!' to the error log (/usr/coda/etc/console?). JanReceived on 2005-02-24 14:03:30