Coda File System

venus randomly dies

From: Steve Simitzis <steve_at_saturn5.com>
Date: Thu, 1 May 2003 04:26:59 -0700
lately, i've watched venus randomly die on one of my clients. it seems
to take place in the middle of the night, when it's getting used the
least. i'll restart venus, and it will continue to run along without
any problems. i'm running venus with maxclients set to 100, fwiw.

just before it dies, it spews about 30,000 lines of "WAITING" and
"WAIT OVER" in a matter of a minute or two.

any suggestions about what the problem could be? it seems to happen
roughly once each week or so. packet loss has been suggested as a
possible cause of some of my earlier problems, but given the fact that
the client and the server share an ethernet switch, i'm inclined to
suspect otherwise. also, this apparent suicide seems to take place in
the middle of the night, when the traffic is otherwise minimal.

from venus.log:


[ H(06) : 0657 : 04:03:10 ] HDBDaemon just woke up
[ H(06) : 0657 : 04:03:11 ] DataWalk:  Restarting Iterator!!!!  Reset availabili
ty status information.
[ H(06) : 0657 : 04:03:11 ] Tally for vuid=0:
[ H(06) : 0657 : 04:03:11 ] BeginRvmFlush (1, 292, F)
[ H(06) : 0657 : 04:03:11 ] EndRvmFlush
[ H(06) : 0657 : 04:03:11 ] Tally for vuid=0:

[ H(06) : 0658 : 04:03:11 ] HDBDaemon about to sleep on hdbdaemon_sync

[ W(42) : 0000 : 04:03:32 ] *** Long Running (Multi)ValidateAttrs: code = -2001,
 elapsed = 17279.5 ***
[ W(42) : 0000 : 04:04:20 ] FidToNodeid: called for volume root (7f000000)!!!
[ W(42) : 0000 : 04:04:20 ] Cachefile::SetLength 53248
[ W(42) : 0000 : 04:04:21 ] Cachefile::SetLength 6656
[ W(42) : 0000 : 04:04:23 ] WAITING(SRVRQ):

[ W(41) : 0000 : 04:04:27 ] WAITING(SRVRQ):

[ W(40) : 0000 : 04:04:27 ] WAITING(SRVRQ):

[ D(44) : 0000 : 04:04:27 ] *** Long Running NewConnectFS: code = 0, elapsed = 4
406.2 ***

...

[ W(42) : 0000 : 04:04:27 ] WAIT OVER, elapsed = 11.5
[ W(42) : 0000 : 04:04:27 ] WAITING(SRVRQ):

[ W(41) : 0000 : 04:04:27 ] WAIT OVER, elapsed = 11.0
[ W(41) : 0000 : 04:04:27 ] WAITING(SRVRQ):
[ W(38) : 0000 : 04:04:27 ] WAITING(SRVRQ):

[ W(37) : 0000 : 04:04:27 ] WAITING(SRVRQ):

[ W(36) : 0000 : 04:04:27 ] WAITING(SRVRQ):

[ W(35) : 0000 : 04:04:27 ] WAITING(SRVRQ):

[ W(42) : 0000 : 04:04:27 ] WAIT OVER, elapsed = 50.0
[ W(42) : 0000 : 04:04:27 ] WAITING(SRVRQ):

[ W(41) : 0000 : 04:04:27 ] WAIT OVER, elapsed = 50.0
[ W(41) : 0000 : 04:04:27 ] WAITING(SRVRQ):

[ W(40) : 0000 : 04:04:27 ] WAIT OVER, elapsed = 50.1
[ W(40) : 0000 : 04:04:27 ] WAITING(SRVRQ):

[ W(42) : 0000 : 04:04:27 ] WAITING((0x7f000001.0x10802.0xf2c2)): level = RD, re
aders = 0, writers = 1
[ W(42) : 0000 : 04:04:27 ] WAIT OVER, elapsed = 1.5
[ W(42) : 0000 : 04:04:27 ] WAITING((0x7f000001.0x10804.0xf2c3)): level = RD, re
aders = 0, writers = 1
[ W(42) : 0000 : 04:04:27 ] WAIT OVER, elapsed = 12.1
[ W(42) : 0000 : 04:04:27 ] WAITING((0x7f000001.0x16004.0x11fe3)): level = RD, r
eaders = 0, writers = 1
[ W(42) : 0000 : 04:04:27 ] WAIT OVER, elapsed = 10.3
[ W(42) : 0000 : 04:04:27 ] WAITING((0x7f000001.0x11806.0xfb24)): level = RD, re
aders = 0, writers = 1
[ W(42) : 0000 : 04:04:27 ] WAIT OVER, elapsed = 2.6
[ W(42) : 0000 : 04:04:27 ] WAITING((0x7f000001.0x10806.0xf2c4)): level = RD, re
aders = 0, writers = 1
[ W(42) : 0000 : 04:04:27 ] WAIT OVER, elapsed = 3.0
[ W(42) : 0000 : 04:04:27 ] WAITING((0x7f000001.0xd808.0xda25)): level = RD, rea
ders = 0, writers = 1


** about 30,000 lines of the above **


[ W(33) : 0000 : 04:05:28 ] WAIT OVER, elapsed = 39.8
[ W(33) : 0000 : 04:05:28 ] WAITING((0x7f000001.0x1de28.0x1680f)): level = RD, readers = 0, writers = 1

[ W(41) : 0000 : 04:05:28 ] WAIT OVER, elapsed = 1.9

[ W(42) : 0000 : 04:05:28 ] WAIT OVER, elapsed = 1.9

[ W(40) : 0000 : 04:05:28 ] WAIT OVER, elapsed = 1.9

[ W(38) : 0000 : 04:05:28 ] WAIT OVER, elapsed = 1.9

[ W(37) : 0000 : 04:05:28 ] WAIT OVER, elapsed = 1.8

[ W(36) : 0000 : 04:05:28 ] WAIT OVER, elapsed = 1.9

[ W(35) : 0000 : 04:05:28 ] WAIT OVER, elapsed = 1.8

[ W(33) : 0000 : 04:05:28 ] WAIT OVER, elapsed = 1.8
[ W(33) : 0000 : 04:05:28 ] *****  FATAL SIGNAL (6) *****
[ W(33) : 0000 : 04:11:30 ] TERM: About to terminate venus
[ W(33) : 0000 : 04:11:30 ] BeginRvmFlush (1, 1480, F)
[ W(33) : 0000 : 04:11:30 ] EndRvmFlush


-- 

steve simitzis : /sim' - i - jees/
          pala : saturn5 productions
 www.steve.org : 415.282.9979
  hath the daemon spawn no fire?
Received on 2003-05-01 07:29:41