Coda File System

Re: bad parent pointer - codasrv won't run

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Fri, 10 Nov 2000 12:43:15 -0500
On Fri, Nov 10, 2000 at 08:46:53AM -0500, Greg Troxel wrote:
> 08:23:34 Entering DCC(0x1000002)
> 08:23:34 JE: parent = 0x1000002.451.4d46 ; child thinks parent is 0x45d.4d4c; Shouldnt Happen
> 
> Assertion failed: 0, file "vol-salvage.cc", line 830
> 
> Looking at the data, it seems that the .. entry in the directory does
> not match the fid of the entry for the parent.

Ouch, I'm trying to figure out how this could have happened.

Possibly after a rename,

  'mv (0x47)/demo-report (0x47)/DOCUMENTS/demo-report'

while weakly connected or disconnected and only the (0x47) directory
got resolved successfully.

> Any clues on how to fix this?
> 
> 1) I am tempted to comment out the ASSERT (or add to skipsalvage),
> move all the stuff from the problem dir, and then rmdir it.
> 
> 2) use norton to delete the .. entry in the vnode and recreate it.
> 
> So I tried 2 and 
> 
> norton> delete name 0x1000002 0x451 0x4d46 .. 1
> norton> create name 0x1000002 0x451 0x4d46  .. 0x449 0x4d42

Which is what I would have tried as well.

> 08:45:29 JE: parent = 0x1000002.47.380a ; child thinks parent is 0x451.4d46; Shouldnt Happen

Is there still an entry for 0x451 in 0x47 as well?

> So I'm not sure if something else bad happened.  I'll keep poking.
> I'm partially posting because I think this indicates some sort of
> server bug.  It would be cool if the salvage could fix these
> backpointers a la fsck.

Yes, although RVM transactions should have given us some guarantees that
these things don't occur, except ofcourse in the nasty case of partly
resolved cross-directory rename operations.


Hold on a second....

08:23:34 JE: parent = 0x1000002.451.4d46 ; child thinks parent is 0x45d.4d4c;

It never printed the vnode of the object that actually failed.

 norton> show directory 0x1000002 0x451 0x4d46
				  ^^^^^^^^^^^^
     (0x461 0x4d4e)      test6
     (0x45d 0x4d4c)      .
     ^^^^^^^^^^^^^^
     (0x47 0x380a)       ..
     (0x160e 0x4d83)     m1.tgz

This looks very wrong, why is the `.' of 0x451 pointing to 0x45d, which
shows up as being the entry FLIGHT_DEMO_STUFF in directory 0x47.

 0x47
   0x1   (parent)
   0x1   ..	     	  
   0x47  .
   0x449 DOCUMENTS
   0x45d FLIGHT_DEMO_STUFF

 0x449
   0x47  (parent)
   0x47  ..
   0x449 .
   0x451 demo-report

 0x451
   0x449 (parent)
   0x47  ..	   !!BAD
   0x45d .	   !!BAD

 0x45d
   unknown contents, hopefully:
   0x47  (parent)
   0x47  ..
   0x45d .

It looks like 0x451 for some reason has the . and .. entries of 0x45d.

Jan
Received on 2000-11-10 12:44:44