Coda File System

Re: CVS updates take down the client

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Tue, 19 Aug 2003 11:03:10 -0400
On Tue, Aug 19, 2003 at 07:39:56PM +0900, Stephen J. Turnbull wrote:
> >>>>> "Jan" == Jan Harkes <jaharkes_at_cs.cmu.edu> writes:
> FWIW, I just saw something I never saw before: an "Operation already
> in progress" error followed by a recovery and (all of 60 seconds so
> far ;-) normal operation.  Starting from the tail of the mount
> sequence (I'd already cycled the restart-reintegrate-crash sequence
> three times):

That is not too unusual. When a client got disconnected before it
received the reply of the last reintegration attempt and it tries again,
the server returns an EALREADY error along with the storeid of the last
successfully reintegrated operation.

There are some problems in this area. First of all the operation might
have been optimized away, and the client ends up throwing away all
logged operation trying to find the non-existant storeid.

The other problem is when some connected operations time out. The code
then retries the operation which is logged in the CML, but with a
different storeid from the first attempt. If f.i. the connected store
already completed, but we didn't get the reply, we always get a
reintegration conflict when the logged store operation is sent to the
servers.

These problems have been there for ages, and are not all that easy to
fix and along with the problematic cross-directory rename server-server
resolution are the main reasons why I don't run write-disconnected 24/7.

> because I've gotten the same crash on my roadwarrior client as well as
> on the local client, and because reinitializing the client doesn't
> prevent it from happening again on a regular basis.)
> 
> Details on request, of course, but I get the impression you're not
> putting high priority on this right now.

I don't like crashing clients, no matter what. Anything that crashes a
client or a server is definitely among my high priority issues. Right
now I'm trying to figure out why servers sometimes crash during backups,
but most of the time have no problems at all.

I just have to try to reproduce your problem here to see where exactly
it crashes and more importantly how it got in that state in the first
place.

Jan
Received on 2003-08-19 11:05:06