(Illustration by Gaich Muramatsu)
lately, i've watched venus randomly die on one of my clients. it seems to take place in the middle of the night, when it's getting used the least. i'll restart venus, and it will continue to run along without any problems. i'm running venus with maxclients set to 100, fwiw. just before it dies, it spews about 30,000 lines of "WAITING" and "WAIT OVER" in a matter of a minute or two. any suggestions about what the problem could be? it seems to happen roughly once each week or so. packet loss has been suggested as a possible cause of some of my earlier problems, but given the fact that the client and the server share an ethernet switch, i'm inclined to suspect otherwise. also, this apparent suicide seems to take place in the middle of the night, when the traffic is otherwise minimal. from venus.log: [ H(06) : 0657 : 04:03:10 ] HDBDaemon just woke up [ H(06) : 0657 : 04:03:11 ] DataWalk: Restarting Iterator!!!! Reset availabili ty status information. [ H(06) : 0657 : 04:03:11 ] Tally for vuid=0: [ H(06) : 0657 : 04:03:11 ] BeginRvmFlush (1, 292, F) [ H(06) : 0657 : 04:03:11 ] EndRvmFlush [ H(06) : 0657 : 04:03:11 ] Tally for vuid=0: [ H(06) : 0658 : 04:03:11 ] HDBDaemon about to sleep on hdbdaemon_sync [ W(42) : 0000 : 04:03:32 ] *** Long Running (Multi)ValidateAttrs: code = -2001, elapsed = 17279.5 *** [ W(42) : 0000 : 04:04:20 ] FidToNodeid: called for volume root (7f000000)!!! [ W(42) : 0000 : 04:04:20 ] Cachefile::SetLength 53248 [ W(42) : 0000 : 04:04:21 ] Cachefile::SetLength 6656 [ W(42) : 0000 : 04:04:23 ] WAITING(SRVRQ): [ W(41) : 0000 : 04:04:27 ] WAITING(SRVRQ): [ W(40) : 0000 : 04:04:27 ] WAITING(SRVRQ): [ D(44) : 0000 : 04:04:27 ] *** Long Running NewConnectFS: code = 0, elapsed = 4 406.2 *** ... [ W(42) : 0000 : 04:04:27 ] WAIT OVER, elapsed = 11.5 [ W(42) : 0000 : 04:04:27 ] WAITING(SRVRQ): [ W(41) : 0000 : 04:04:27 ] WAIT OVER, elapsed = 11.0 [ W(41) : 0000 : 04:04:27 ] WAITING(SRVRQ): [ W(38) : 0000 : 04:04:27 ] WAITING(SRVRQ): [ W(37) : 0000 : 04:04:27 ] WAITING(SRVRQ): [ W(36) : 0000 : 04:04:27 ] WAITING(SRVRQ): [ W(35) : 0000 : 04:04:27 ] WAITING(SRVRQ): [ W(42) : 0000 : 04:04:27 ] WAIT OVER, elapsed = 50.0 [ W(42) : 0000 : 04:04:27 ] WAITING(SRVRQ): [ W(41) : 0000 : 04:04:27 ] WAIT OVER, elapsed = 50.0 [ W(41) : 0000 : 04:04:27 ] WAITING(SRVRQ): [ W(40) : 0000 : 04:04:27 ] WAIT OVER, elapsed = 50.1 [ W(40) : 0000 : 04:04:27 ] WAITING(SRVRQ): [ W(42) : 0000 : 04:04:27 ] WAITING((0x7f000001.0x10802.0xf2c2)): level = RD, re aders = 0, writers = 1 [ W(42) : 0000 : 04:04:27 ] WAIT OVER, elapsed = 1.5 [ W(42) : 0000 : 04:04:27 ] WAITING((0x7f000001.0x10804.0xf2c3)): level = RD, re aders = 0, writers = 1 [ W(42) : 0000 : 04:04:27 ] WAIT OVER, elapsed = 12.1 [ W(42) : 0000 : 04:04:27 ] WAITING((0x7f000001.0x16004.0x11fe3)): level = RD, r eaders = 0, writers = 1 [ W(42) : 0000 : 04:04:27 ] WAIT OVER, elapsed = 10.3 [ W(42) : 0000 : 04:04:27 ] WAITING((0x7f000001.0x11806.0xfb24)): level = RD, re aders = 0, writers = 1 [ W(42) : 0000 : 04:04:27 ] WAIT OVER, elapsed = 2.6 [ W(42) : 0000 : 04:04:27 ] WAITING((0x7f000001.0x10806.0xf2c4)): level = RD, re aders = 0, writers = 1 [ W(42) : 0000 : 04:04:27 ] WAIT OVER, elapsed = 3.0 [ W(42) : 0000 : 04:04:27 ] WAITING((0x7f000001.0xd808.0xda25)): level = RD, rea ders = 0, writers = 1 ** about 30,000 lines of the above ** [ W(33) : 0000 : 04:05:28 ] WAIT OVER, elapsed = 39.8 [ W(33) : 0000 : 04:05:28 ] WAITING((0x7f000001.0x1de28.0x1680f)): level = RD, readers = 0, writers = 1 [ W(41) : 0000 : 04:05:28 ] WAIT OVER, elapsed = 1.9 [ W(42) : 0000 : 04:05:28 ] WAIT OVER, elapsed = 1.9 [ W(40) : 0000 : 04:05:28 ] WAIT OVER, elapsed = 1.9 [ W(38) : 0000 : 04:05:28 ] WAIT OVER, elapsed = 1.9 [ W(37) : 0000 : 04:05:28 ] WAIT OVER, elapsed = 1.8 [ W(36) : 0000 : 04:05:28 ] WAIT OVER, elapsed = 1.9 [ W(35) : 0000 : 04:05:28 ] WAIT OVER, elapsed = 1.8 [ W(33) : 0000 : 04:05:28 ] WAIT OVER, elapsed = 1.8 [ W(33) : 0000 : 04:05:28 ] ***** FATAL SIGNAL (6) ***** [ W(33) : 0000 : 04:11:30 ] TERM: About to terminate venus [ W(33) : 0000 : 04:11:30 ] BeginRvmFlush (1, 1480, F) [ W(33) : 0000 : 04:11:30 ] EndRvmFlush -- steve simitzis : /sim' - i - jees/ pala : saturn5 productions www.steve.org : 415.282.9979 hath the daemon spawn no fire?Received on 2003-05-01 07:29:41