(Illustration by Gaich Muramatsu)
On Fri, Apr 06, 2001 at 06:56:07PM +0200, Steffen Schaefer wrote: > Hi, > my question: If the system gets disconnected, then a lot of time pasts by, > till venus recognized that the network was down. Is it possible to change > the timeout value, or is this a RPC2 specific parameter? Yes, it is possible to change the parameter. No, it is not adviseable to do so due to the way RPC2 works. Whenever the client sends an rpc2-request, it might get lost (UDP is unreliable). Most request can be serviced reasonably quickly, so instead of sending an ACK when the request is received, the server just starts processing the request and the rpc2-reply is the implicit ack. So when a request (or reply) is lost, the client times out and retransmits the request. Now if the server has already sent a reply (i.e. reply was lost), it will simply repeat the reply. When the server never saw this request before, the request will be processed. Now there is a third case which is tricky. Not all request can be dealt with quickly so the server might still be working on the request. In this case the server reports RPC2_BUSY, which for the client is an intermediate ACK, and it will wait for a full timeout period before retrying (as the reply could have been lost). The handling of the incoming messages and sending back RPC2_BUSY can be thought of as being pretty much a simple operation, however Coda is using cooperative threads with no preemption. So the thread that is doing the long computation has to yield periodically so that the rpc2_socketlistener can grab the incoming messages and reply with BUSY's. And sometimes the server doesn't yield often enough to avoid network timeouts, this is in some cases due to excessive looping, or blocking library/system calls. It is also related to network latency and how soon the client gives up while waiting for the RPC2_BUSY reply. So when you change something like the client's rpc2 disconnect timer. It will actually introduce problems of clients switching to disconnected operation for no apparent reason during f.i. server-server resolution, or client-server reintegration, while breaking callbacks, or possibly even during the set up of a new incoming rpc2 connection. In general a 30 second timeout seemed to give the best balance between giving up soon enough, and not giving up too soon. Have you ever checked how long it takes for an idle TCP connection to die. Especially when there are no ICMP errors, which is common when there a network cable is pulled or breaks, or in wireless networks when you walk out of radio range. JanReceived on 2001-04-06 16:04:21