(Illustration by Gaich Muramatsu)
On Wed, May 28, 2003 at 10:09:56AM +0200, Steffen Neumann wrote: > [ W(21) : 0000 : 09:47:25 ] vproc::Begin_VFS(Remove): vid = 7f00002a, u.u_vol = 0, mode = -1 > [ W(21) : 0000 : 09:47:25 ] vproc::End_VFS(Remove): code = 157 > [ W(21) : 0000 : 09:47:25 ] Remove : returns Resource temporarily unavailable, elapsed = 23.9 msec Hmm, we're clearly not even trying to talk to the server. In fact, it looks like we're failing in the Begin_VFS call, because at loglevel 100 we should at least see the logmessage from fsdb::Get. Hmm, looking at the code, it seems to think that an application specific resolver is currently active for that volume. It will kick out any process that is not in the 'correct' process group. > [ W(21) : 0000 : 09:47:25 ] vproc::lookup: fid = (0x7f00002a.0xd1.0xd54), name = flexibility.tex, nc = 0 > [ W(21) : 0000 : 09:47:25 ] vproc::Begin_VFS(Lookup): vid = 7f00002a, u.u_vol = 0, mode = -1 > [ W(21) : 0000 : 09:47:25 ] vproc::End_VFS(Lookup): code = 0 Here we're clearly not doing anything, so we must still be failing in Begin_VFS, I'm not sure why it doesn't show an errorcode. > Another thing I see is that venus.log claims > the volume 0x7f00002a would be under repair, > which it isn't: ... > [ W(21) : 0000 : 09:45:44 ] mgrpent::PutHostSet: 0x8145500 > [ W(21) : 0000 : 09:45:44 ] fsdb::Get-mre: key = (7f00002a.23b6.47b3), uid = 17154, rights = 3, comp = .#flexibility.tex > [ W(21) : 0000 : 09:45:44 ] repvol::IsUnderRepair: vol = 7f00002a, vuid = -1 That's just calling the test that checks whether the volume is under repair, but it doesn't show the true or false result. Interestingly, we did manage to 'enter' the problematic volume here and are attempting to get objects in the volume. Ok, it looks like the client believes objects still exist and is trying to remove them on the server. However they are already gone on the server. When the server returns an error, the client tries to start an application specific resolver, and while the rpc to trigger the resolver is timing out, the volume is locked, probably to avoid repeatedly trying to invoke the asr. So there is some process trying to remove an object (or maybe even recursively a subtree) that was already removed on the server, but because that process is holding a reference on a (possibly removed) directory it will not or cannot refetch the updated information from the server that the directory is in fact non-existant. Kill all current processes with references in /coda? JanReceived on 2003-05-28 11:47:26