(Illustration by Gaich Muramatsu)
Hello Jan, On Thu, Apr 23, 2009 at 04:52:48PM -0400, Jan Harkes wrote: > > 18:25:10 GetVolObj: Volume (1000002) already write locked > > 18:25:10 RS_LockAndFetch: Error 11 during GetVolObj for 1000002.1.1 > > 18:25:46 LockQueue Manager: found entry for volume 0x1000002 > > The volume xxx already write locked sounds very ominous, but it is > really just a debugging message added to help debug Rune's issues. > He is running a non-replicated server, so his testing never hit the > resolution case, and either way it doesn't seem to have solved his > issues, so I'll probably revert this change. Especially as now there is > no queueing on these locks so readers are in some cases not able to > obtain the lock. Trying to minimize confusion: I get this message in a replicated scenario quite regularly "forever", the only way to get rid of this situation is to restart the server with the runaway lock. It is not the same as the harmless messages on the single server. I did not complain loudly as I see this on servers where one of them has a slow connection which potentially can be flooded (say by sftp's unflexible resend policy) and become unreliable. You said it is an unsupported configuration :) My observed error on the clients is though "Resource temporarily unavailable", not a dangling link. I have a realm with a volume in this state right now. It is not going to recover on its own, nor could I use repair. [So this has nothing to do with our non-replicated server, which apparently does not deadlock - fully conforming to your expectations, but it still has the "unexpected delays" issue which may look as a deadlock.] Regards, RuneReceived on 2009-04-24 05:29:23