(Illustration by Gaich Muramatsu)
hello. i'm running venus on two high traffic web servers. both machines have been running fine with venus for a couple weeks, with no problems. for reasons i haven't been able to diagnose, one of the web servers had to be rebooted. since then, i've had numerous problems with venus on that machine. it started at 6am, when venus died randomly: Apr 18 06:40:10 sg2 kernel: No pseudo device in upcall comms at f89cbdc0 Apr 18 06:40:10 sg2 last message repeated 25 times Apr 18 06:40:10 sg2 kernel: coda_upcall: Venus dead on (op,un) (7.1564091) flags 8 Apr 18 06:40:10 sg2 kernel: No pseudo device in upcall comms at f89cbdc0 Apr 18 06:40:10 sg2 kernel: No pseudo device in upcall comms at f89cbdc0 (sorry, i don't have the venus log from that time, as this is a production machine, causing me to act hastily to bring it back up.) since it died, i haven't been able to get venus running for very long without my volumes going disconnected - a problem causing visitors to our website to be greeted by broken images. out of desperation, i tried running venus-setup again to get everything into a known state. but it seems that everytime the volumes are disconnected, i get something like this in the logs: [ W(27) : 0000 : 15:46:10 ] Cachefile::SetValidData 53987 [ W(27) : 0000 : 15:46:10 ] Cachefile::SetValidData 61650 [ W(23) : 0000 : 15:46:11 ] *** Long Running (Multi)Fetch: code = -2001, elapsed = 30761.0 *** [ W(23) : 0000 : 15:46:11 ] Cachefile::SetValidData 52224 [ W(21) : 0000 : 15:46:11 ] WAIT OVER, elapsed = 14896.6 [ W(27) : 0000 : 15:46:11 ] volent::Enter: observe with proc_key = 1463 [ W(27) : 0000 : 15:46:11 ] WAITING(VOL): sg.media, state = Hoarding, [1, 0], co unts = [2 0 1 0] [ W(27) : 0000 : 15:46:11 ] CML= [0, -666], Res = 0 [ W(27) : 0000 : 15:46:11 ] WAITING(VOL): shrd_count = 2, excl_count = 0, excl_p gid = 0 [ W(25) : 0000 : 15:46:11 ] volent::Enter: observe with proc_key = 1463 [ W(25) : 0000 : 15:46:11 ] WAITING(VOL): sg.media, state = Hoarding, [1, 0], co unts = [2 0 2 0] [ W(25) : 0000 : 15:46:11 ] CML= [0, -666], Res = 0 [ W(25) : 0000 : 15:46:11 ] WAITING(VOL): shrd_count = 2, excl_count = 0, excl_p gid = 0 [ X(00) : 0000 : 15:46:15 ] DispatchWorker: out of workers (max 20), queueing me ssage [ X(00) : 0000 : 15:46:15 ] DispatchWorker: out of workers (max 20), queueing me ssage [ X(00) : 0000 : 15:46:15 ] DispatchWorker: out of workers (max 20), queueing me ssage [ X(00) : 0000 : 15:46:16 ] DispatchWorker: out of workers (max 20), queueing me ssage [ W(22) : 0000 : 15:46:27 ] *** Long Running (Multi)Fetch: code = -2001, elapsed = 30232.0 *** [ W(22) : 0000 : 15:46:27 ] Cachefile::SetValidData 2270 [ W(27) : 0000 : 15:46:27 ] WAIT OVER, elapsed = 15025.2 [ W(25) : 0000 : 15:46:27 ] WAIT OVER, elapsed = 15026.8 [ W(24) : 0000 : 15:46:27 ] WAIT OVER, elapsed = 15026.6 now, bear in mind that the coda client and server are sitting on the same 100 Mbps switched ethernet. i'm even using cfs strong to force a strong connection. what i've assumed is happening is that venus is trying to rebuild a rather large cache of files, but is tripping over itself in the process. i'm also unclear why it's going into "hoarding" mode. i have never seen this behavior before, and i'm not sure why it would start happening all of a sudden. i've never had a problem with venus caching files after a restart, so i'm a bit perplexed. if anyone has any suggestions of what else to look for, that would be appreciated. btw, i am running linux 2.4.20, coda-debug-client 5.3.20, rvm 1.7, and rpc2 1.15. thanks :) -- steve simitzis : /sim' - i - jees/ pala : saturn5 productions www.steve.org : 415.282.9979 hath the daemon spawn no fire?Received on 2003-04-18 22:26:34