Coda File System

Re: okay, what am I doing wrong?

From: Stephen J. Turnbull <stephen_at_xemacs.org>
Date: Mon, 13 Jan 2003 21:01:21 +0900
>>>>> "Jan" == Jan Harkes <jaharkes_at_cs.cmu.edu> writes:

    Jan> In any case 'errorcode 198' is EINCOMPATIBLE. It is returned
    Jan> when we're trying to write (store) data to a file that has
    Jan> been modified, i.e. the store-id of the original copy on the
    Jan> client doesn't match the store-id of the file on the server.

    Jan> Your server log should show a message similar to,
    Jan> CheckStoreSemantics: (0x7f00002a.0x76a.0x44d0), VCP error
    Jan> (198)

Not quite; I see

CheckStoreSemantics: (1000015.76a.44d0), VCP error (198)

where 1000015 is the volume ID that corresponds to the 7f00002a
replica ID (or do I have that backwards?).

    Jan> It could be that this has made reintegration more susceptible
    Jan> to failures. This is just one theory, but

    Jan>     'volutil setlogparms <volume replica id> reson 4'

    Jan> will turn resolution back on.

Urk, I'll try again, but with

volutil setlogparms 0x7f00002a reson 4

I got disconnected and the server crashed.  :-(

The log shows lots of stats from the server followed by

19:46:40 done

19:46:50 VAllocFid: volume disk uniquifier being extended
19:46:50 ****** FILE SERVER INTERRUPTED BY SIGNAL 11 ******
19:46:50 ****** Aborting outstanding transactions, stand by...
19:46:50 Uncommitted transactions: 0
19:46:50 Uncommitted transactions: 0
19:46:50 Becoming a zombie now ........
19:46:50 You may use gdb to attach to 389

Restarting the codasrv shows 47 "unreachable" log entries, and the log
ends 

20:30:52 Entering DCC(0x1000013)
20:30:52 done:  3378 files/dirs,        101945 blocks
20:30:52 SalvageIndex:  Vnode 0x9e4 has no inodeNumber
20:30:52 SalvageIndex: Creating an empty object for it
20:30:52 Entering DCC(0x1000015)
20:30:52 MarkLogEntries: loglist was NULL ... Not good

Uh-oh ... the server crashed right there.  Now what do I do?  It
doesn't look like I can get it to start at all!

bash-2.05b$ cat /vice/srv/SrvErr
Assertion failed: 0, file "vol-salvage.cc", line 851
EXITING! Bye!

Why is inconsistent data for a single volume a fatal error?  Couldn't
we just take that volume offline?

    Jan> Another interesting fact is that the first entry in your CML
    Jan> is a store.  Perhaps the client got disconnected during the
    Jan> connected store attempt, and this is essentially a replay of
    Jan> an already committed operation.

I get that complaint from venus a lot on a (non-init) start after one
of these volume-specific write disconnects.  This used to cause venus
to refuse to do anything with the volume involved, until the kernel
upgrade to 2.4.20.  Now I can generally repair, begin INCOBJ,
discardalllocal, end, quit.  However that volume is rather unstable
thereafter.  I don't think I've ever successfully repaired one of
these conflicts.

And of course sometimes venus just refuses to admit that the conflict
exists when I ask, although the console shows it refusing to
reintegrate because of a conflict.  Not even cvs ck always helps.

    Jan> So we just 'invented' fake identifiers in the range from 0xea
    Jan> to 0x27b.  But the kernel is asking for fake objects that are
    Jan> clearly out of this range. Which would indicate that the
    Jan> directory data that the kernel is using is
    Jan> incorrect/outdated.  Possibly caused by a process that is
    Jan> blocking a re-open by having it's cwd in the offending
    Jan> location.

That wouldn't be surprising, because cvs very likely cd's into the
directory it's working on.


-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.
Received on 2003-01-13 07:07:58