(Illustration by Gaich Muramatsu)
Hi Marc, On Thu, Apr 23, 2009 at 11:42:22AM +0200, Marc SCHLINGER wrote: > It becomes complicated when on the scm, I block all traffic using iptables. > I see the client starting sending messages to the replica(via tcpdump). > But when I unblock the traffic on the scm I always get the same error. > On the scm: > 18:25:10 GetVolObj: Volume (1000002) already write locked > 18:25:10 RS_LockAndFetch: Error 11 during GetVolObj for 1000002.1.1 > 18:25:46 LockQueue Manager: found entry for volume 0x1000002 There are certainly some locking issues hiding there. I have been hit by "Volume (XXXXXXX) already write locked" as well. This problem stems quite certainly from one of the original assumptions of Coda design - the servers are treated as well-connected to each other, in contrast to the clients which may have unreliable connections. > On the client I got a dangling symlink for volume test. > > My question is: Isn't coda fail tolerant? Or do I miss something in my > installation/configuration ? No, I don't think you do. Coda is quite fault tolerant, it copes pretty well with - clients losing connection to the net - a server going down once in a while It does not cope well with servers intermittently losing contact with each other. I guess this would be relatively hard to fix, given the original assumption named above. AFAIK there are no current plans to. It is nice that you are consequently testing Coda, this might certainly help to discover some hiding bugs and possibly even convince the developers about the server-side fault tolerance. There are certainly many potential users which would appreciate weakly connected servers being supported, but this may present some fundamental problems besides the implementation ones. On the other side Coda is very useful as it is and there are also issues of more immediate interest to fix. The developers' resources are limited, so your best bet would be to join the development. Unfortunately the "entry threshold" is quite high because of the code being complex and still reflecting the years of research-oriented programming. Regards, RuneReceived on 2009-04-23 07:31:23