(Illustration by Gaich Muramatsu)
On Wed, Mar 21, 2007 at 03:24:36PM +0100, Davor Ocelic wrote: > 1) I set up a coda Client (CVS) and cfs lv /coda/realm/ . It finds the > LAN server and reports WriteDisconnected state. > > I cd into the realm, clog as admin and mkdir test. The process hangs forever, > and cfs lv reports that there is one CML entry (about 200 bytes) waiting for > re-integration, and nothing more happens. Was your local userid root? I can only think of the following fix that went in after 6.9.0, but that went into CVS a while ago and you are already running a CVS version. http://www.coda.cs.cmu.edu/trac/changeset/2f783b94a9852f1d36caaa937055f76b20ecd1ce > it is created. I try copying two files to it, and the task finishes > quickly. One does get copied, while for the other it prints > 'Connection timed out'. > > Immediately after I try the copy again, both files work, and all subsequent > operations (creating, reading, deleting a thousand files) had no more > problems. > > What debugging options could I enable to exactly find out what happened > in the beginning? First thing I try is to create some (shell) script that reliably reproduces the problem, something where I set up some state on the servers, reinit the client, and run a couple of operations. That's probably 90% of finding the bug, because from that point I can increase the venus or codasrv debug level, set various breakpoints with gdb, etc. > 2) After using 'vi' to edit a file yesterday, I see two Vim swap files somehow > got created (.file1.swp and .file1.swo). When I do 'ls' and 'rm' on them: > > # ls -al | less > ?--------- ? ? ? ? ? .file1.swo > ?--------- ? ? ? ? ? .file1.swp > ...... (and the rest of OK files)... Actually 'ls' does this when stat(2) fails. Strace will probably tell you that it is returning either ETIMEDOUT or ENOENT. > # ls .file1*sw* > ls: .file1.swp: No such file or directory > -rw------- 1 root nogroup 0 Mar 21 15:10 .file1.swo Interesting, ENOENT for the .swp file, but the other suddenly worked. > # rm .file1*sw* > rm: cannot remove .file1.swo: No such file or directory > rm: cannot remove .file1.swp: No such file or directory And now we get ENOENT for both, which is unusual since we just got the attributes for the .swo file on the previous ls. In any case, the directory contents is cached and contains entries for both .file1.swo and .file1.swp, but maybe our cached copy is stale and the files no longer exist on the server so our getattr (stat) requests fail with ENOENT. In some cases a reference count goes wrong, there is a conflict, or we have pending changes that have not yet been reintegrated and the client is unable to fetch the correct directory contents. Sometimes it is a conflict on the directory that we are in, and the kernel isn't allowing venus to turn an active directory into a symlink, just doing "cd .. ; ls" will reveal the conflict. JanReceived on 2007-03-21 15:23:59