(Illustration by Gaich Muramatsu)
On Tue, Apr 15, 2003 at 03:51:35PM -0400, Samir Patel wrote: > 2) On client1, I create a file with the 'cat > blah' command... and > just enter garbage into the file. > 3) On client2, I delete the file that is still in the process of being > created by client1. > 4) I close the created file on client1. It now shows up as something > like: When the file was removed on client 2, the callback was ignored by client 1 because the file was still open for write. When client 1 finally closed the file and tried to store the data the server returned an error. To avoid 'losing data', client 1 logged the store operation in the CML and went into disconnected mode (and declared a conflict). > 5) I attempt a repair... but get messages like this: > > # repair > repair > beginrepair blah > Too few directory entries > Could not allocate replica list > beginrepair failed. Here beginrepair simply tells venus to expand the conflict. Once the conflict is expanded it checks the contents of the 'fake directory' to see what type of conflict this is. You can suspend repair (^Z) and do 'ls -l blah' to check for yourself. In this case it should have expanded to a local and a global directory. However the enumeration step failed, maybe we got completely disconnected from the servers and global is a dangling link instead of a proper directory (as I just noticed is the case further down in your email). > repair > beginrepair blah > Could not allocate new repvol: Object not in conflict > beginrepair failed. The conflict is still expanded, so the first test before expanding (check for the symlink) fails and repair claims that there is no conflict. This version of repair actually forgets to call 'cfs endrepair' in many cases, I corrected most error paths to collapse the conflict in the CVS version. > If I do an 'ls -l blah' now, I get: > > lrw-r--r-- 1 root nfsnobod 27 Apr 15 13:35 global -> > @7f000000.000018f2.000007ef > -rw-rw-r-- 1 samir nfsnobod 14 Apr 15 12:19 local This is easily explained. The object does not exist on the servers, so ofcourse we cannot show the global object. I believe this conflict 'should' be propagated to a directory conflict in the containing directory. Perhaps it fails to do so because your shell process has 'pinned' down the directory and therefore we fail to turn the directory into a conflict. Or maybe there is something wrong in the local-global repair handling of store conflicts, this is an update/remove type conflict that hasn't been tested as frequently as update/update conflicts (when both clients concurrently try to update the file). So doing 'cfs er blah ; cd .. ; cfs er dirname ; ls -l' could show that the parent directory is the actual location of the conflict. > Also, at this point, it seems that I can make changes to directories > under /coda w/o even having authentication tokens (none of these ... > Connection State is WriteDisconnected That's the reason, the client is logging updates, but you need to resolve the conflict (and obtain tokens) before the logged updates are propagated back to the server. > but I can't ever get the conflicts resolved. The only way to get > things back to "normal" is to shut Coda down, and re-run venus-setup. 'cfs purgeml' will throw away all logged updates that are still waiting for reintegration. Once the logged updates are gone the conflict should be gone as well and the volume can become 'Connected' again. > Occasionally, I can't even cleanly shutdown Coda (get timeout error's > trying to access /coda partition), in which case I just reboot the > client. When there is any process with a reference to a file in the /coda subtree (f.i. working directory of a bash shell), the Linux kernel rejects the unmount. The timeout and EIO errors occur because there is no venus process listening for upcalls anymore. The only way out is to find and kill any processes that have references to objects in /coda, then umount /coda and then the client can be restarted. JanReceived on 2003-04-15 22:04:56