(Illustration by Gaich Muramatsu)
From: Jan Harkes <jaharkes_at_cs.cmu.edu> > Yeah, the venus-at-100%-of-cpu thing is pretty common right after > I get back on the net; it usually lasts for about 10-15 minutes. > During this time, by the way, codacon is pretty calm -- it's not > blasting out "validate" messages or anything. Interesting, in that case the 100% cpu usage probably doesn't have anything to do with reintegration. I guess it is the demotion of all cached objects as a result of the server/volume state change(s). I guess that code path may be missing a yield in the outer loop. This wouldn't fix the CPU usage, but make the system a little more responsive again. A better fix may be to use some sort of an epoch/event counter when the volume state changes and use that to detect which objects need to be revalidated. Not sure if such a solution would merge well into the existing revalidation mechanism. It is spinning right now -- it is morning and I opened up my laptop after a night of suspension. Here's codacon's current output: Probe ( 23:39:29 ) BackProbe lambda.csail.mit.edu ( 23:39:29 ) Probe ( 23:42:02 ) BackProbe lambda.csail.mit.edu ( 23:42:02 ) Probe ( 08:20:30 ) BeginStatusWalk [27693] ( 08:20:30 ) [28366, 0, 0, 0] [28365] ( 08:20:30 ) EndStatusWalk [27693] ( 08:20:30 ) [28366, 0, 0, 0] [28365, 0, 0] [1, 0, 0.1] ( 08:20:30 ) BeginDataWalk [2585437] ( 08:20:30 ) EndDataWalk [2585437] ( 08:20:30 ) [1, 0, 0.1] [0, 0, 0, 0] ( 08:20:30 ) unreachable lambda.csail.mit.edu ( 08:21:56 ) NewConnectFS lambda.csail.mit.edu ( 08:23:02 ) NewConnection lambda.csail.mit.edu ( 08:23:02 ) up lambda.csail.mit.edu ( 08:23:02 ) BackProbe lambda.csail.mit.edu ( 08:23:02 ) Probe ( 08:23:03 ) BackProbe lambda.csail.mit.edu ( 08:23:03 ) bandwidth lambda.csail.mit.edu 31747 54558 77370 ( 08:23:03 ) NewConnectFS lambda.csail.mit.edu ( 08:23:08 ) BackProbe lambda.csail.mit.edu ( 08:23:08 ) ValidateVols / [1] ( 08:23:08 ) Probe ( 08:25:41 ) BackProbe lambda.csail.mit.edu ( 08:25:41 ) Probe ( 08:28:15 ) BackProbe lambda.csail.mit.edu ( 08:28:15 ) Probe ( 08:30:48 ) BackProbe lambda.csail.mit.edu ( 08:30:48 ) Probe ( 08:33:21 ) BackProbe lambda.csail.mit.edu ( 08:33:21 ) BeginStatusWalk [27693] ( 08:35:28 ) [0, 28366, 0, 0] [28365] ( 08:35:28 ) Probe ( 08:39:52 ) BackProbe lambda.csail.mit.edu ( 08:39:52 ) Probe ( 08:42:22 ) BackProbe lambda.csail.mit.edu ( 08:42:22 ) Ahh... it just stopped spinning, and codacon simultaneously ouput EndStatusWalk [27693] ( 08:43:23 ) [28366, 0, 0, 0] [28365, 28369, 28369] [1, 28370, 475.4] ( 08:43:23 ) BeginDataWalk [2585437] ( 08:43:23 ) EndDataWalk [2585437] ( 08:43:23 ) [1, 0, 0.0] [0, 0, 0, 0] ( 08:43:23 ) Does that help? -OlinReceived on 2007-05-15 08:47:10