Coda File System

Re: Repair error 95 and VGetVnode: vnode 27000009.37 is not allocated

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Wed, 29 Sep 2004 08:56:10 -0400
On Mon, Sep 27, 2004 at 08:41:48AM -0500, Troy Benjegerdes wrote:
> I was copying a bunch of digital camera photos (around 3.7GB worth of
> 1-2MB files) into a couple of coda volumes, and managed to trigger one
> of the cases we can get what appear to be bogus conflicts, and now I
> can't seem to repair it.
> 
> This is with the latest CVS build with the patches I posted earlier this
> week.

Those patches look good, they shouldn't trigger this problem. I'm going
to apply them as soon as I can reach the CVS repository again, we're
reorganizing the lab so some internal machines are offline at the
moment.

> hozer_at_narn 2004$ cfs br 121canon
> hozer_at_narn 2004$ ls -l 121canon/
> total 3
> lrw-r--r--    1 root     nogroup        38 Sep 27 08:18 global -> @7f00000b.00000037.0000000f_at_hozed.org
> drwxr-xr-x    2 maur     nogroup      2048 May 21 12:10 local

Ok, it looks like there is a server-server conflict, which caused the
reintegration to fail. As a result the local-global conflict is 'hiding'
the real problem and because of the way the conflict expansion works we
can't repair this from a single client. You would have to repair the
server conflict first from another client. I have been working on a
method which combines both local-global and server-server expansion on
the client in one operation, instead of seeing the local copy and the
replicated version it would expand into all available versions, i.e.

    $ ls -lF 121canon/
    total 3
    drwxr-xr-x    2 maur     nogroup      2048 May 21 12:10 localhost/
    drwxr-xr-x    2 maur     nogroup      2048 May 21 12:10 server1/
    drwxr-xr-x    2 maur     nogroup      2048 May 21 12:10 server2/

The expansion works now, but collapsing it back is still a bit iffy,
especially the parts where we have to correctly clean up the local state
and deal with possible client restarts. Also, the repair tool needs to
be taught how to deal with combined reintegration and resolution
conflicts, both of which are entirely different codepaths at the moment.

> server1 log entries:
> 08:26:09 VGetVnode: vnode 27000009.37 is not allocated
> 
> server2 log entries:
> 08:26:09 VGetVnode: vnode 29000008.37 is not allocated

It is curious that both servers claim the object doesn't exist, I would
expect that at least one would have the object, while the other fails.
But right now, you'd probably want to look if a second client sees a
server-server conflict.

Jan
Received on 2004-09-29 08:57:52