(Illustration by Gaich Muramatsu)
Update on this issue (still not resolved): since seriously cutting back the number of writes from two of the machines, those machines have not yet crashed. However, the one machine that I forgot to install gdb on (and rpm hangs trying to install something when venus is zombied) got another signal 11 last night. Scratch that, if you use the --ignoresize option in rpm it won't hang when venus is zombied. So, below I'll give the logs and then whatever I can get out of gdb. This time I had a lot of monitoring all being sent to the console so that we could see the log messages in collated form, though it doesn't appear to be much help. Here's what was being monitored: [1] Running codacon & [2] Running tail -f console & (wd: /usr/coda/etc) [3]- Running tail -f venus.log & (wd: /usr/coda/etc) [5]+ Running ( while /bin/true; do date; sleep 60; done ) & and here's the output. Please help! Tue May 24 21:29:15 MDT 2005 Callback dir129 (5086c448.7f000002.0.0) ( 21:30:00 ) Callback dir130 (5086c448.7f000002.0.0) ( 21:30:00 ) Callback dir129 (5086c448.7f000002.18.27) ( 21:30:00 ) Callback dir130 (5086c448.7f000002.18.27) ( 21:30:00 ) Probe ( 21:30:00 ) BackProbe dir130 ( 21:30:00 ) BackProbe dir129 ( 21:30:00 ) NewConnectFS dir129 ( 21:30:00 ) BackProbe dir129 ( 21:30:00 ) Probe ( 21:30:00 ) BackProbe dir129 ( 21:30:00 ) BackProbe dir130 ( 21:30:00 ) NewConnectFS dir129 ( 21:30:00 ) BackProbe dir129 ( 21:30:00 ) Probe ( 21:30:00 ) BackProbe dir129 ( 21:30:00 ) BackProbe dir130 ( 21:30:00 ) DisconnectFS dir129 ( 21:30:01 ) DisconnectFS dir130 ( 21:30:01 ) DisconnectFS dir129 ( 21:30:01 ) DisconnectFS ( 21:30:01 ) [ D(2577) : 0000 : 21:30:00 ] WAITING(SRVRQ): [ D(2576) : 0000 : 21:30:00 ] userent::Connect: ViceGetAttrPlusSHA (dir129) [ D(2576) : 0000 : 21:30:00 ] userent::Connect: ViceGetAttrPlusSHA() -> 22 [ D(2576) : 0000 : 21:30:00 ] userent::Connect: VGAPlusSHA_Supported -> 1 [ D(2577) : 0000 : 21:30:00 ] WAIT OVER, elapsed = 18.1 [ D(2577) : 0000 : 21:30:00 ] userent::Connect: ViceGetAttrPlusSHA (dir129) [ D(2577) : 0000 : 21:30:00 ] userent::Connect: ViceGetAttrPlusSHA() -> 22 [ D(2577) : 0000 : 21:30:00 ] userent::Connect: VGAPlusSHA_Supported -> 1 21:30:01 root acquiring Coda tokens! 21:30:01 root acquiring Coda tokens! 21:30:01 root acquiring Coda tokens! 21:30:01 Fatal Signal (11); pid 17858 becoming a zombie... 21:30:01 You may use gdb to attach to 17858 [ W(70) : 0000 : 21:30:01 ] ***** FATAL SIGNAL (11) ***** Tue May 24 21:30:15 MDT 2005 And here's the output from gdb (with the coda source rpm installed, but not the kernel source or headers): 0xb73f99d6 in __sigsuspend (set=0x1532f0bc) at ../sysdeps/unix/sysv/linux/sigsuspend.c:45 45 ../sysdeps/unix/sysv/linux/sigsuspend.c: No such file or directory. ---Type <return> to continue, or q <return> to quit--- in ../sysdeps/unix/sysv/linux/sigsuspend.c (gdb) bt #0 0xb73f99d6 in __sigsuspend (set=0x1532f0bc) at ../sysdeps/unix/sysv/linux/sigsuspend.c:45 #1 0x080ab4f5 in strcpy () at ../sysdeps/generic/strcpy.c:31 #2 <signal handler called> #3 0x080c99a2 in strcpy () at ../sysdeps/generic/strcpy.c:31 #4 0x1532f480 in ?? () #5 0x0804e559 in strcpy () at ../sysdeps/generic/strcpy.c:31 #6 0x080ac46c in strcpy () at ../sysdeps/generic/strcpy.c:31 #7 0x080ac14c in strcpy () at ../sysdeps/generic/strcpy.c:31 #8 0x080abdaa in strcpy () at ../sysdeps/generic/strcpy.c:31 #9 0x080a4b33 in strcpy () at ../sysdeps/generic/strcpy.c:31 #10 0x080a5d15 in strcpy () at ../sysdeps/generic/strcpy.c:31 #11 0x080aa3a2 in strcpy () at ../sysdeps/generic/strcpy.c:31 #12 0x080a1396 in strcpy () at ../sysdeps/generic/strcpy.c:31 #13 0xb757cd64 in Create_Process_Part2 () from /usr/lib/liblwp.so.2 (gdb) frame 13 #13 0xb757cd64 in Create_Process_Part2 () from /usr/lib/liblwp.so.2 (gdb) list 40 in ../sysdeps/unix/sysv/linux/sigsuspend.c (gdb) frame 12 #12 0x080a1396 in strcpy () at ../sysdeps/generic/strcpy.c:31 31 ../sysdeps/generic/strcpy.c: No such file or directory. in ../sysdeps/generic/strcpy.c (gdb) list 26 in ../sysdeps/generic/strcpy.c I hope this makes more sense to somebody else since there is no strcpy () function call from Create_Process_Part2. Here's the code for that function: static void Create_Process_Part2() { PROCESS temp; lwpdebug(0, "Entered Create_Process_Part2"); temp = lwp_cpptr; /* Get current process id */ savecontext(Dispatcher, &temp->context, NULL); (*temp->ep)(temp->parm); LWP_DestroyProcess(temp); } -- Patrick Walsh eSoft Incorporated 303.444.1600 x3350 http://www.esoft.com/Received on 2005-05-25 12:29:13