(Illustration by Gaich Muramatsu)
>>>>> "Jan" == Jan Harkes <jaharkes_at_cs.cmu.edu> writes: Jan> In any case 'errorcode 198' is EINCOMPATIBLE. It is returned Jan> when we're trying to write (store) data to a file that has Jan> been modified, i.e. the store-id of the original copy on the Jan> client doesn't match the store-id of the file on the server. Jan> Your server log should show a message similar to, Jan> CheckStoreSemantics: (0x7f00002a.0x76a.0x44d0), VCP error Jan> (198) Not quite; I see CheckStoreSemantics: (1000015.76a.44d0), VCP error (198) where 1000015 is the volume ID that corresponds to the 7f00002a replica ID (or do I have that backwards?). Jan> It could be that this has made reintegration more susceptible Jan> to failures. This is just one theory, but Jan> 'volutil setlogparms <volume replica id> reson 4' Jan> will turn resolution back on. Urk, I'll try again, but with volutil setlogparms 0x7f00002a reson 4 I got disconnected and the server crashed. :-( The log shows lots of stats from the server followed by 19:46:40 done 19:46:50 VAllocFid: volume disk uniquifier being extended 19:46:50 ****** FILE SERVER INTERRUPTED BY SIGNAL 11 ****** 19:46:50 ****** Aborting outstanding transactions, stand by... 19:46:50 Uncommitted transactions: 0 19:46:50 Uncommitted transactions: 0 19:46:50 Becoming a zombie now ........ 19:46:50 You may use gdb to attach to 389 Restarting the codasrv shows 47 "unreachable" log entries, and the log ends 20:30:52 Entering DCC(0x1000013) 20:30:52 done: 3378 files/dirs, 101945 blocks 20:30:52 SalvageIndex: Vnode 0x9e4 has no inodeNumber 20:30:52 SalvageIndex: Creating an empty object for it 20:30:52 Entering DCC(0x1000015) 20:30:52 MarkLogEntries: loglist was NULL ... Not good Uh-oh ... the server crashed right there. Now what do I do? It doesn't look like I can get it to start at all! bash-2.05b$ cat /vice/srv/SrvErr Assertion failed: 0, file "vol-salvage.cc", line 851 EXITING! Bye! Why is inconsistent data for a single volume a fatal error? Couldn't we just take that volume offline? Jan> Another interesting fact is that the first entry in your CML Jan> is a store. Perhaps the client got disconnected during the Jan> connected store attempt, and this is essentially a replay of Jan> an already committed operation. I get that complaint from venus a lot on a (non-init) start after one of these volume-specific write disconnects. This used to cause venus to refuse to do anything with the volume involved, until the kernel upgrade to 2.4.20. Now I can generally repair, begin INCOBJ, discardalllocal, end, quit. However that volume is rather unstable thereafter. I don't think I've ever successfully repaired one of these conflicts. And of course sometimes venus just refuses to admit that the conflict exists when I ask, although the console shows it refusing to reintegrate because of a conflict. Not even cvs ck always helps. Jan> So we just 'invented' fake identifiers in the range from 0xea Jan> to 0x27b. But the kernel is asking for fake objects that are Jan> clearly out of this range. Which would indicate that the Jan> directory data that the kernel is using is Jan> incorrect/outdated. Possibly caused by a process that is Jan> blocking a re-open by having it's cwd in the offending Jan> location. That wouldn't be surprising, because cvs very likely cd's into the directory it's working on. -- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software.Received on 2003-01-13 07:07:58