(Illustration by Gaich Muramatsu)
I'm currently running a krb5-built coda 5.3.8 on RedHat-6.2. No other src mods except for changes suggested by the README.kerberos. While I had a read-only volume restored and mounted from a previous dump and directly after dumping some backup volumes to disk, my codasrv crashed and won't come back up. It seems to be looking for the restored volume, but can't find it, I assume because it was only temporarily restored from a file. The previously restored volume was given id 1000004, as you can see in the log below, where the server tries to startup and recover. Is there anyway to recover from this, or will I just need to rebuild the server again? Appreciate any help as this has happened twice now after a server crash where a restored volume was still up and mounted. Server seems to be stable most of the time. Has just been going flaky with restored volumes it seems. Thanks. -Stephan 16:22:25 New SrvLog started at Tue Aug 15 16:22:25 2000 16:22:25 Resource limit on data size are set to 2147483647 16:22:25 Server etext 0x80c44ba, edata 0x80fa5a0 16:22:25 RvmType is Rvm 16:22:25 Main process doing a LWP_Init() 16:22:25 Main thread just did a RVM_SET_THREAD_DATA 16:22:25 Setting Rvm Truncate threshhold to 5. Partition /vicepa: inodes in use: 23230, total: 16777216. 16:22:42 Partition /vicepa: 5148943K available (minfree=5%), 4960987K free. 16:22:42 The server (pid 4941) can be controlled using volutil commands 16:22:42 "volutil -help" will give you a list of these commands 16:22:42 If desperate, "kill -SIGWINCH 4941" will increase debugging level 16:22:42 "kill -SIGUSR2 4941" will set debugging level to zero 16:22:42 "kill -9 4941" will kill a runaway server 16:22:42 Vice file system salvager, version 3.0. 16:22:42 SanityCheckFreeLists: Checking RVM Vnode Free lists. 16:22:42 DestroyBadVolumes: Checking for destroyed volumes. 16:22:42 Salvaging file system partition /vicepa 16:22:42 Force salvage of all volumes on this partition 16:22:42 Scanning inodes in directory /vicepa... 16:22:46 SFS: There are some volumes without any inodes in them 16:22:46 Entering DCC(0x1000001) 16:22:46 DCC: Salvaging Logs for volume 0x1000001 16:22:46 done: 10 files/dirs, 13 blocks 16:22:46 SFS:No Inode summary for volume 0x1000002; skipping full salvage 16:22:46 SalvageFileSys: Therefore only resetting inUse flag 16:22:46 Entering DCC(0x1000004) Magic wrong in Page i 16:22:46 DCC: Bad Dir(0x1000004.6d.68e9) in rvm...Aborting 16:22:46 JE: directory vnode 0x1000004.6d.68e9: invalid entry ; 16:22:46 JE: child vnode not allocated or uniqfiers dont match; cannot happen **** Here's some of the final SrvLog data from before the crash. Seems like my SrvLog has been much much busier than normal, but perhaps I just turned on more detailed logging somehow? You can spot the crash at the end of this segment pretty easily, but there don't seem to really be any clues as to what caused it since all the previous actions finished properly. 16:08:53 --DC: (0x100000c.0x1.0x1) ct: 1 16:08:53 VN_PutDirHandle: Vn 1 Uniq 1: cnt 0, vn_cnt 0 16:08:53 VN_GetDirHandle for Vnode 0x1 Uniq 0x1 cnt 1, vn_cnt 1 16:08:53 VN_GetDirHandle for Vnode 0x1 Uniq 0x1 cnt 2, vn_cnt 2 16:08:53 VN_PutDirHandle: Vn 1 Uniq 1: cnt 1, vn_cnt 1 16:08:53 VN_GetDirHandle for Vnode 0x1 Uniq 0x1 cnt 2, vn_cnt 2 16:08:53 VN_PutDirHandle: Vn 1 Uniq 1: cnt 1, vn_cnt 1 16:08:53 --DC: (0x100000c.0x1.0x1) ct: 1 16:08:53 VN_PutDirHandle: Vn 1 Uniq 1: cnt 0, vn_cnt 0 16:10:58 VAttachVolumeById: vol 1000007 (h.skoledin.backup) attached and online 16:10:58 S_VolMakeBackups: backup (1000007) made of volume 1000006 16:10:58 NewDump: file /vice/backup/7f000004.1000006.newlist volnum 7f000004 id 1000007 parent 10000 06 16:11:06 S_VolNewDump: volume dump succeeded 16:11:06 VAttachVolumeById: vol 100000d (pub.install.0.backup) attached and online 16:11:06 S_VolMakeBackups: backup (100000d) made of volume 1000009 16:11:06 NewDump: file /vice/backup/7f000006.1000009.newlist volnum 7f000006 id 100000d parent 10000 09 16:11:08 S_VolNewDump: volume dump succeeded 16:11:08 VAttachVolumeById: vol 100000e (pub.jabber.0.backup) attached and online 16:11:08 S_VolMakeBackups: backup (100000e) made of volume 100000a 16:11:08 NewDump: file /vice/backup/7f000007.100000a.newlist volnum 7f000007 id 100000e parent 10000 0a 16:11:17 S_VolNewDump: volume dump succeeded 16:11:17 VAttachVolumeById: vol 100000f (pub.krb5.0.backup) attached and online 16:11:17 S_VolMakeBackups: backup (100000f) made of volume 100000b 16:11:17 NewDump: file /vice/backup/7f000008.100000b.newlist volnum 7f000008 id 100000f parent 10000 0b 16:11:18 S_VolNewDump: volume dump succeeded 16:11:20 VAttachVolumeById: vol 1000010 (pub.coda.0.backup) attached and online 16:11:20 S_VolMakeBackups: backup (1000010) made of volume 100000c 16:11:20 NewDump: file /vice/backup/7f000009.100000c.newlist volnum 7f000009 id 1000010 parent 10000 0c 16:12:13 S_VolNewDump: volume dump succeeded 16:12:22 ****** FILE SERVER INTERRUPTED BY SIGNAL 11 ****** 16:12:22 ****** Aborting outstanding transactions, stand by... 16:12:22 Uncommitted transactions: 0 16:12:22 Uncommitted transactions: 0 16:12:22 Becoming a zombie now ........ 16:12:22 You may use gdb to attach to 1853 Date: Tue 08/15/2000 16:18:20 Starting new SrvLog file **** couldn't seem to attach a debugger to it either, looked like it just crashed and didn't even zombie like it said it would... Thanks again for any help with this. Stephan B. Koledin The Motley Fool Systems DORC skoledin_at_fool.com http://www.fool.comReceived on 2000-08-15 22:39:54