(Illustration by Gaich Muramatsu)
today, venus died on both of my coda clients, within minutes of each other. i was able to get something out of gdb, along with the output from the console. also, on each host, venus.log was filled with hundreds of lines of "WAIT OVER" / "WAITING" pairs, until it ended with a fatal error. my setup as of right now consists of two venus clients, and two replicated servers, but the venus clients only know about one of the two servers. i'm not sure if that's related. (fwiw, once i get backups working, it's likely that i'll stop running the second replicated server. one server is enough to handle our traffic, so it's not worth the tradeoff of dealing with server conflicts.) ** host 1 ** 17:35:09 Reintegrate: sg.media.members, 2/2 records, result = SUCCESS venus: multi1.c:144: RPC2_MultiRPC: Assertion `(context = get_multi_con(HowMany)) != ((void *)0)' failed. 17:36:08 Fatal Signal (6); pid 19237 becoming a zombie... 17:36:08 You may use gdb to attach to 19237 (gdb) where #0 0x420292e5 in sigsuspend () from /lib/i686/libc.so.6 #1 0x080aa9cd in strcpy () #2 <signal handler called> #3 0x42029241 in kill () from /lib/i686/libc.so.6 #4 0x4202902a in raise () from /lib/i686/libc.so.6 #5 0x4202a7d2 in abort () from /lib/i686/libc.so.6 #6 0x42022ddb in __assert_fail () from /lib/i686/libc.so.6 #7 0x40069864 in RPC2_MultiRPC (HowMany=8, ConnHandleList=0x81590b4, RCList=0x81590f4, MCast=0x0, Request=0x813ae60, SDescList=0x0, UnpackMulti=0x4006c410 <MRPC_UnpackMulti>, ArgInfo=0x15545458, BreathOfLife=0x0) at multi1.c:144 #8 0x4006bbcc in MRPC_MakeMulti (ServerOp=41, ArgTypes=0x80f6dc0, HowMany=8, CIDList=0x81590b4, RCList=0x81590f4, MCast=0x0, HandleResult=0, Timeout=0x0) at multi2.c:327 #9 0x0809e98b in strcpy () #10 0x0808a3dc in strcpy () #11 0x080a154d in strcpy () #12 0x080a521d in strcpy () #13 0x080a9c22 in strcpy () #14 0x080a0b46 in strcpy () #15 0x40098f56 in Create_Process_Part2 () at lwp.c:796 ** host 2 ** 16:31:09 Reintegrate: sg.media.members, 2/2 records, result = SUCCESS venus: multi1.c:144: RPC2_MultiRPC: Assertion `(context = get_multi_con(HowMany)) != ((void *)0)' failed. 16:31:09 Fatal Signal (6); pid 7215 becoming a zombie... 16:31:09 You may use gdb to attach to 7215 (gdb) where #0 0x420292e5 in sigsuspend () from /lib/i686/libc.so.6 #1 0x080aa9cd in strcpy () #2 <signal handler called> #3 0x42029241 in kill () from /lib/i686/libc.so.6 #4 0x4202902a in raise () from /lib/i686/libc.so.6 #5 0x4202a7d2 in abort () from /lib/i686/libc.so.6 #6 0x42022ddb in __assert_fail () from /lib/i686/libc.so.6 #7 0x4005a864 in RPC2_MultiRPC (HowMany=8, ConnHandleList=0x819cbb4, RCList=0x819cbf4, MCast=0x0, Request=0x811ad48, SDescList=0x0, UnpackMulti=0x4005d410 <MRPC_UnpackMulti>, ArgInfo=0x151d36d8, BreathOfLife=0x0) at multi1.c:144 #8 0x4005cbcc in MRPC_MakeMulti (ServerOp=38, ArgTypes=0x80f6c00, HowMany=8, CIDList=0x819cbb4, RCList=0x819cbf4, MCast=0x0, HandleResult=0, Timeout=0x0) at multi2.c:327 #9 0x0805cf21 in strcpy () #10 0x0806c9c5 in strcpy () #11 0x080a59b8 in strcpy () #12 0x080a9479 in strcpy () #13 0x080a0b46 in strcpy () #14 0x40089f56 in Create_Process_Part2 () at lwp.c:796 (gdb) -- steve simitzis : /sim' - i - jees/ pala : saturn5 productions www.steve.org : 415.282.9979 hath the daemon spawn no fire?Received on 2003-05-21 21:29:48