Coda File System

Re: Codasrv keeps crashing

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Fri, 30 Dec 2005 16:07:22 -0500
On Fri, Dec 23, 2005 at 02:55:40PM +0100, Markus Wiesecke wrote:
> I am running the codaserver for my workgroup. Since we hat some
> hardwareproblems with the server which required coping (the vicep? were
> copied with cp -a, the RVM-Data with dd, RVM-Log remained where it was,
> the server.conf is of course updated) the volumes to a new disk the
> server keeps crashing if a user tries to write on a special volume (the
> 7f000044-volume, see below). Other volumes seem to be fine. By the
> opportunity with the server downtime I also updated the server from
> version 6.0.5 to 6.0.14.
> 
> When it crashes SrvErr then looks like:
> Assertion failed: 0, file "srv.cc", line 302
> EXITING! Bye!
> 
> SrvLog looks like (the tail, i guess the rest is not of interest):
> 14:20:32 VAllocFid: volume disk uniquifier being extended
> 14:20:33 PutReintegrateObjects: stale directory fid 7f000044.9b.8fe5, num 0,
> 14:20:33 PutReintegrateObjects: stale directory fid 7f000044.a3.8fe9, num 1,
> 14:20:33 PutReintegrateObjects: stale directory fid 7f000044.ab.8fed, num 2,
> 14:20:34 ****** FILE SERVER INTERRUPTED BY SIGNAL 11 ******
> 14:20:34 ****** Aborting outstanding transactions, stand by...
> 14:20:34 Uncommitted transactions: 1
> 14:20:34 Uncommitted transactions: 1
> 14:20:34 Committing suicide now ........
> 
> So, I need some hints were to look. Google did not help much so far :(

Sig11 is typically caused by a bad pointer dereference.

It could be caused by the 7f000044 volume, but it could be something
completely different. PutReintegrateObjects is logged at the end of a
reintegration, and it might have finished at 14:20:33 and the segfault
is because of some other operation that started later.

Logging often isn't very helpful in these cases, you could increase the
server debug level (volutil setdebug 100), which might help to narrow it
down further. Or, if your server has debugging symbols, you can attach
gdb which will then trap the segfault and allow you to grab a backtrace
which will pinpoint exactly where the fault occurred.

    gdb codasrv `pidof codasrv`
    gdb> c
    ** gdb catches segfault and drops you back at the prompt **
    gdb> bt
    ...

> Greetings and merry christmas,

Thanks, it was, happy new year :)

Jan
Received on 2005-12-30 16:07:57