(Illustration by Gaich Muramatsu)
Hi Jan, Jan Harkes wrote: > What happens is that the client detects whether a server is up or down > based on the existence of a callback connection. So when the client > sends a probe, the server pings back on the callback connection. Yepp, seen that from the log > However backfetches are using the same connection, and your backfetch is > taking very long. So the server is unable to send the ping back to the > client. This shouldn't be a problem because the server should be > responding with RPC2_BUSY which will make the client wait an extra 15 > seconds or so. I guess at some point the client did give up, returned > ETIMEDOUT and disconnected. Hmm. How is this RPC2_BUSY supposed to work. Should this ObtainWriteLock somehow timeout? Which thread should send the BUSY reply? As far as I can see, the client sends the Probe several times on the RPC2 level, without getting *any* reply. And the Server itself also 'looses faith' in the client, some seconds after hanging in this lock. It drops the callback-conn then. > long, it should have been broken up by the client. > This is actually a > known bug (introduced somewhere between 6.0.9 and 6.0.12), the fix is > fairly simple, just removing an unnecessary test. I've attached the > patch. OK, will try that. > And arguably, the client shouldn't even have to probe the server because > clearly there is still traffic between the two. But that is more of an > optimization and not really a correctness issue. Hmm: I don't know. In my tests I explicitly stimulated the 'Probe' by running cfs cs But Venus itself pings also automatically after a certain time (150s), and this will take down the connection also. Therefore monitoring the activity on the client-server connection for all activity and stopping explicit probes, when venus received other traffic seems to be logical. > This is actually not a reintegration write lock, this is caused by the > fact that there is only a single RPC2 connection from the server to the > client, so it can only do one thing at a time. Fetch a file, or send a > callback probe. You mentioned, that clients should break up their store OPs. Can they break up the transfer-size below file-size? Or do they have to transmit at least one file completely. I think this matters for the ISO-Images. Thanks for your help Martin -- +-[Martin Ginkel]-------------[mailto:mginkel(at)mpi-magdeburg.mpg.de]-+ | MPI Magdeburg, Zi S2.09 Sandtorstr. 1, D-39106 Magdeburg, Germany | | What is this talk of 'release'? We are Klingons. Our software | | 'escapes' leaving a bloody trail of designers and quality assurance | +-[tel/fax: +49 391 6110 482/529]----[http://www.mpi-magdeburg.mpg.de]-+Received on 2006-03-06 12:05:48