Coda File System

Re: Unresolvable Conflicts

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Fri, 30 Jul 2004 12:02:53 -0400
On Wed, Jul 28, 2004 at 04:58:13PM -0500, Troy Benjegerdes wrote:
> > Hrrm, I seem to have hit a somewhat serious bug..
> > 
> > After a couple combinations of 'cfs er', 'cfs fl', 'cfs flushvolume',
> > etc, and a venus restart, I'm getting
> > 
> > 16:39:16 Local inconsistent object at
> > /coda/hozed.org/user/hozer/.gnupg/gpg.confed.org:2,S, please check!
> > 
> > messages from venus.
> > 
> > There was never a 'gpg.confed.org:2,S' file.. this looks like part of a
> > maildir filemane got appended onto the filename of the bogus conflicting
> > object.
> > 
> > Do we have any testcases for resolution and conflicts that can excercise
> > all the code paths? Are there any coda testcases I can run at all?
> > 
> 
> I just tried a "cfs purgeml", and killed venus...

The problem with reintegration conflicts.....

What happens is that when reintegration fails, the locally cached
objects are all copied into a 'local fake volume'. This is done so that
the object 'foo/local' doesn't collide with the object 'foo/global'.
However the cleanup action was never really worked out correctly, or
possibly got lost over time.

When repair succeeds all the objects in the local fake volume are simply
discarded and we refetch the correct data from the servers. When repair
fails they simply stay around in the fake volume in the hope that a next
repair session will fix things up and we can discard them.

However when venus is restarted, the salvager tries to move the objects
back into their original place so that we can re-try the reintegration.
If the reintegration succeeds we're done, and if it fails we
automatically end up in the previous repair state. However sometimes the
linking doesn't really work out right and we end up messing up the CML,
the volume or the objects that had a conflict.

Whenever I try to fix something in the local repair expansion related
code, something else seems to break. Probably because I don't really
understand everything it is trying to do.

I did start on a redesign/rewrite that avoids moving the objects into a
local fake volume. Ideally we want to see something that is similar to
server-server repair expansion but with an added view on the locally
cached copy. We cannot use the existing server-server expansion code
because the expanded object/directory has the same file identifier as
the local copy which gives the same name collission problem as we have
with local vs. global. So I create a single conflict object that is
patched in the tree where the real conflict appeared that way I managed
to make the expansion work, but the collapsing (taking the special
conflict marker out) isn't really working yet.

Hopefully at some point we'll be able to remove most of the current
local-repair special cases and have a single well tested code path that
is used by all types of (conflict) expansion.

Jan
Received on 2004-07-30 12:05:51