(Illustration by Gaich Muramatsu)
I am having similar problems. It has seemed to me that coda reliablity, specifically repair and conflicts arising when they shouldn't, has been worse recently (than say a year ago). It could also be that I'm just stressing it more. I just mv'd a file from my non-coda homedir (NetBSD/i386 1.6) into /coda/home/gdt, and ended up with a conflict shortly thereafter. I saw the dreaded 'local inconsistent object' message, and tried to do a repair. I got a failure to allocate (something - fuzzy now), and found the normal directory-instead-of-object. local had the right contents, and global was a symlink to a volume id. I was able to do 'cfs er' and 'cvs br' ok, but not to actually repair. I then shut down venus and tried to run norton, but it seems to be only a server thing. I then tried 'cfs fl .' in the directory, hoping to just remove the CML entry, since I can easily regenerate the file. Now the file 'wi0.dump' is present in the directory, but everything says connection timed out. I even tried to flush the whole directory, but I get "can't flush active file". poblano gdt 239 ~/%co/HARDWARE/POBLANO > l ls: wi0.dump: Connection timed out total 18 -rw-r--r-- 1 gdt 65534 1316 Dec 4 13:56 bios -rw-r--r-- 1 gdt 65534 44 Dec 2 15:05 disk -rw-r--r-- 1 gdt 65534 5228 Dec 9 14:50 dmesg.new-memory -rw-r--r-- 1 gdt 65534 5628 Dec 6 20:21 dmesg.sound,bluetooth -rw-r--r-- 1 gdt 65534 1316 Nov 8 20:36 shopping -rw-r--r-- 1 gdt 65534 263 Dec 4 09:52 wi0 poblano gdt 247 ~/%co/HARDWARE/POBLANO > cfs gf wi0.dump VIOC_GETFID: Connection timed out [ W(19) : 0000 : 09:15:58 ] fsdb::Get: Locally created fid (0x7f000001.0xffffffff.0x80001) not found! [ W(19) : 0000 : 09:16:00 ] fsdb::Get: Locally created fid (0x7f000001.0xffffffff.0x80001) not found! [ W(19) : 0000 : 09:16:01 ] fsdb::Get: Locally created fid (0x7f000001.0xffffffff.0x80001) not found! [ L(18) : 0000 : 09:16:02 ] LocalInconsistentObj: objFid=7f000001.4830.988c The cml file from 'cfs ck': Create ???/local Chown ???/local (owner = 0) Store ???/local (length = 2836) and the tar file is empty. The cmd.old file: Create ???/wi0.dump Chown ???/wi0.dump (owner = 0) Store ???/wi0.dump (length = 2836) and the tar file has -rw-r--r-- [uid/gid redacted] 2836 Dec 11 08:48 2002 ???/wi0.dump in server log: 08:56:56 Callback failed RPC2_DEAD (F) for ws [client-in-question]:65516 09:01:19 Callback failed RPC2_DEAD (F) for ws [client-in-question]:64967 So, it may be that I can wait to reinit until tomorrow, when my client will be on the same ethernet as the server. [pause] Nope, I get EROFS, due to pending conflicts. So, I hate to sound cranky, but coda has become less usable for me. Having to reinit venus is a real problem when I don't have a fast link (28.8kb/s right now). And that's when it breaks, usually. I realize my connectivity is lame for 2002 standards, but it seems supporting this situation is one of the design goals for coda. I know Jan has been working towards a totally new repair scheme in terms of the local representation of in-process repairs. The usual caution towards putting such drastic changes in the mainline may not be in order now, given that from my point of view, repair essentially does not work at the moment. (I do not have the usual bona fide conflicts due to my work style; I only get pseudo-conflicts, I think due to a reintegration failing on the client but succeeding on the server. And the latter kind can never be fixed, in my experience.) In my view there are are two big problems, and perhaps more lurking: * repair doesn't work in some situations, and there is apparently no way to recover. If there were a tool to just remove the problematic LocalInconsistentObj entries, that would help a lot. * I am not 100% certain, but having timeouts on reintegration operations seems to lead to declared conflicts due to the server completing an operation and the client not getting the ack. Such operations should be idempotent even across disconnections and long time intervals (days). This requires keeping state on the server, I think, since the underlying operations are no idempotent. But, if I could just say 'repair, discardlocal, discardlocal, end' and not have to reinit, I wouldn't mind as much. I am running code from 2002-10-14; I'll upgrade to -current CVS and see if things are better. Greg Troxel <gdt_at_ir.bbn.com>Received on 2002-12-11 09:37:34