(Illustration by Gaich Muramatsu)
On Tue, Apr 25, 2006 at 10:43:49AM -0400, Sean Caron wrote: > On 4/25/06, Greg Troxel <gdt_at_ir.bbn.com> wrote: > > In gdb, after attaching, do "bt" to get a stack backtrace. Then do > > "up" to move to where the signal was, and there "i frame" and "list". > > Done! Here's the data: > > (gdb) bt > #0 0x403bc3a0 in sleep () from /usr/local/lib/libc.so.12 > #1 0x0008f2a8 in coda_assert ( > pred=0xf9e <Error reading address 0xf9e: Invalid argument>, > file=0x94518 "srv.cc", line=302) at coda_assert.c:46 > #2 0x00013c64 in zombie(int) (sig=3998) at srv.cc:302 > #3 <signal handler called> > > (gdb) up Annoyingly it doesn't actually show where the signal occured, only what happened after the signal was caught. Another way to catch this in GDB is to run codasrv under gdb. Something like, # gdb codasrv gdb> run -d 1 (the -d 1 bumps the debug level slightly and should also prevent the server from detaching from the console). Then any signals will be trapped first by GDB. You seem to be getting a sig10 (sigbus?) which seems to commonly indicate unaligned memory accesses. A null-ptr would have been sig11 (sigsegv), and an assertion typically generates a sig6 (sigabort). Because signals might also be used to set up thread stacks it could be that the signal we're looking for doesn't happen until later, so you might have to enter 'continue' a couple of times. However we used to have both clients and servers running on ARM with a kernel that did no unaligned access fixups, so I thought we pretty much had already dealt with most of those. One other thing I am interested in, what is the output of LWP's configure. I wonder which type of thread switching it picked. I think NetBSD deprecated makecontext and friends, so it might be using tricks with signal handlers (sigaltstack) to kickstart new threads, or it could be falling back on the old assembly code which might not realize that this is a 32-bit kernel/userspace. So it could be that the first thread switch is trying to perform a 64-bit read which triggers the bus error. JanReceived on 2006-04-25 11:55:27