Coda File System

Re: coda client hangs

From: Patrick Walsh <pwalsh_at_esoft.com>
Date: Wed, 25 May 2005 10:27:16 -0600
	Update on this issue (still not resolved): since seriously cutting back
the number of writes from two of the machines, those machines have not
yet crashed.  However, the one machine that I forgot to install gdb on
(and rpm hangs trying to install something when venus is zombied) got
another signal 11 last night.  

	Scratch that, if you use the --ignoresize option in rpm it won't hang
when venus is zombied.  So, below I'll give the logs and then whatever I
can get out of gdb.

	This time I had a lot of monitoring all being sent to the console so
that we could see the log messages in collated form, though it doesn't
appear to be much help.  Here's what was being monitored:

[1]   Running                 codacon &
[2]   Running                 tail -f console &  (wd: /usr/coda/etc)
[3]-  Running                 tail -f venus.log &  (wd: /usr/coda/etc)
[5]+  Running                 ( while /bin/true; do
    date; sleep 60;
done ) &

and here's the output.  Please help!

Tue May 24 21:29:15 MDT 2005
Callback dir129 (5086c448.7f000002.0.0) ( 21:30:00 )
Callback dir130 (5086c448.7f000002.0.0) ( 21:30:00 )
Callback dir129 (5086c448.7f000002.18.27) ( 21:30:00 )
Callback dir130 (5086c448.7f000002.18.27) ( 21:30:00 )
Probe ( 21:30:00 )
BackProbe dir130 ( 21:30:00 )
BackProbe dir129 ( 21:30:00 )
NewConnectFS dir129 ( 21:30:00 )
BackProbe dir129 ( 21:30:00 )
Probe ( 21:30:00 )
BackProbe dir129 ( 21:30:00 )
BackProbe dir130 ( 21:30:00 )
NewConnectFS dir129 ( 21:30:00 )
BackProbe dir129 ( 21:30:00 )
Probe ( 21:30:00 )
BackProbe dir129 ( 21:30:00 )
BackProbe dir130 ( 21:30:00 )
DisconnectFS dir129 ( 21:30:01 )
DisconnectFS dir130 ( 21:30:01 )
DisconnectFS dir129 ( 21:30:01 )
DisconnectFS ( 21:30:01 )

[ D(2577) : 0000 : 21:30:00 ] WAITING(SRVRQ):

[ D(2576) : 0000 : 21:30:00 ] userent::Connect: ViceGetAttrPlusSHA
(dir129)
[ D(2576) : 0000 : 21:30:00 ] userent::Connect: ViceGetAttrPlusSHA() ->
22
[ D(2576) : 0000 : 21:30:00 ] userent::Connect: VGAPlusSHA_Supported ->
1

[ D(2577) : 0000 : 21:30:00 ] WAIT OVER, elapsed = 18.1
[ D(2577) : 0000 : 21:30:00 ] userent::Connect: ViceGetAttrPlusSHA
(dir129)
[ D(2577) : 0000 : 21:30:00 ] userent::Connect: ViceGetAttrPlusSHA() ->
22
[ D(2577) : 0000 : 21:30:00 ] userent::Connect: VGAPlusSHA_Supported ->
1
21:30:01 root acquiring Coda tokens!
21:30:01 root acquiring Coda tokens!
21:30:01 root acquiring Coda tokens!
21:30:01 Fatal Signal (11); pid 17858 becoming a zombie...
21:30:01 You may use gdb to attach to 17858

[ W(70) : 0000 : 21:30:01 ] *****  FATAL SIGNAL (11) *****
Tue May 24 21:30:15 MDT 2005

	And here's the output from gdb (with the coda source rpm installed, but
not the kernel source or headers):

0xb73f99d6 in __sigsuspend (set=0x1532f0bc)
    at ../sysdeps/unix/sysv/linux/sigsuspend.c:45
45      ../sysdeps/unix/sysv/linux/sigsuspend.c: No such file or
directory.
---Type <return> to continue, or q <return> to quit---
        in ../sysdeps/unix/sysv/linux/sigsuspend.c
(gdb) bt
#0  0xb73f99d6 in __sigsuspend (set=0x1532f0bc)
    at ../sysdeps/unix/sysv/linux/sigsuspend.c:45
#1  0x080ab4f5 in strcpy () at ../sysdeps/generic/strcpy.c:31
#2  <signal handler called>
#3  0x080c99a2 in strcpy () at ../sysdeps/generic/strcpy.c:31
#4  0x1532f480 in ?? ()
#5  0x0804e559 in strcpy () at ../sysdeps/generic/strcpy.c:31
#6  0x080ac46c in strcpy () at ../sysdeps/generic/strcpy.c:31
#7  0x080ac14c in strcpy () at ../sysdeps/generic/strcpy.c:31
#8  0x080abdaa in strcpy () at ../sysdeps/generic/strcpy.c:31
#9  0x080a4b33 in strcpy () at ../sysdeps/generic/strcpy.c:31
#10 0x080a5d15 in strcpy () at ../sysdeps/generic/strcpy.c:31
#11 0x080aa3a2 in strcpy () at ../sysdeps/generic/strcpy.c:31
#12 0x080a1396 in strcpy () at ../sysdeps/generic/strcpy.c:31
#13 0xb757cd64 in Create_Process_Part2 () from /usr/lib/liblwp.so.2
(gdb) frame 13
#13 0xb757cd64 in Create_Process_Part2 () from /usr/lib/liblwp.so.2
(gdb) list
40      in ../sysdeps/unix/sysv/linux/sigsuspend.c
(gdb) frame 12
#12 0x080a1396 in strcpy () at ../sysdeps/generic/strcpy.c:31
31      ../sysdeps/generic/strcpy.c: No such file or directory.
        in ../sysdeps/generic/strcpy.c
(gdb) list
26      in ../sysdeps/generic/strcpy.c

	I hope this makes more sense to somebody else since there is no strcpy
() function call from Create_Process_Part2.  Here's the code for that
function:

static void Create_Process_Part2()
{
    PROCESS temp;
    lwpdebug(0, "Entered Create_Process_Part2");
    temp = lwp_cpptr;           /* Get current process id */
    savecontext(Dispatcher, &temp->context, NULL);
    (*temp->ep)(temp->parm);
    LWP_DestroyProcess(temp);
}

	
-- 
Patrick Walsh
eSoft Incorporated
303.444.1600 x3350
http://www.esoft.com/

Received on 2005-05-25 12:29:13