(Illustration by Gaich Muramatsu)
On Wed, Jun 30, 2004 at 12:52:53AM -0400, Chet Murthy wrote: > (1) I'm untaring the Linxu kernel into my coda directory on my client, > which is connected to the server via DSL. ... > (3) Basically, at some point, I see in codacon: > > connection::unreachable lambda.csail.mit.edu > > (4) I use the timestamp to look in the venus.log, and see: > > [ W(24) : 0000 : 00:36:05 ] *** Long Running (Multi)Store: code = -2001, elapsed = 17576.1 *** > > where the operation might be Reintegrate, one of the SHA operations, > etc. But there are lots of operations that seem to hit this timeout. > And it doesn't seem like this timeout ALWAYS causes a problem. Often, > even for long periods, such operations will get such return-codes, and > things will continue swimmingly. Correct, the message you pulled out of the log simply tells us that we made a ViceStore MultiRPC call which returned after 17.6 seconds. This doesn't really tell us what the return code was. There are operations that easily take a long enough time to trigger the message. A reintegration, a getattr on a large file when SHA checksums are enabled, or a store of any large file. Most of the time these return successfully, but sometimes they return with ETIMEDOUT because the code assumes we lost connectivity to the server. The problem is in the RPC2 library and depends a bit on the fact that connectivity is checked by the RPC2 part (librpc2), while file transfers are handled by SFTP (libse). Whenever a remote procedure call is executed which has a bulk transfer side-effect we send the RPC2 request to the server, which then starts up the side-effect transfer which is a separate thread on the client. At some point the original thread which sent the RPC2 request becomes impatient and retransmits the request. At this point the server notices that we got a duplicate request which is still being processed and responds with a RPC2_BUSY. Now if we don't see this BUSY response in time, we resend the request up to 5 times and then give up. I think there is a 15 second upper limit, if we haven't had any response in that time, the server is considered unreachable. The BUSY could have been dropped or delayed due to the overall network congestion, or the client might be too optimistic with its roundtrip time estimate and sends its 5 retries off way to fast, only adding to the congesting on the network. The second problem is that each logical RPC2 connnection independently handles it's own retransmissions. This works fine when there is only a single request at a time, but if we have several processes hitting the system at the same time and concurrent requests, each with possibly a data transfer the effective bandwidth is much lower than the estimated bandwidth, and as a result the RTT estimate should be considerably higher. Finally the whole RTT estimate is a bit fragile during a client->server data transfer, when we're pulling data we get a bunch of data packets, send a small ack and then receive the next batch. The timestamps in the packets are almost all useful for estimating the available bandwidth. However when sending data we end up queueing a lot in the local network buffers, so the timestamps on sent packets don't really say much by the time they are echo'd back on the ack. It is also here that we suffer from the asynchronicity of the DSL link, which seems to be better suited to pulling down bulk data (webpages) than sending. TCP can deal with because of 2 reasons, the TCP connections is typically either used to pull a lot, or to push a lot, but rarely alternates between the two. So it can almost assume a synchronous connection (fast connection when downloading, slow when uploading), but doesn't have to switch behaviour once it stabilizes. The second reason is that it is far more relaxed with respect to timeouts it doesn't matter if it hasn't heard from the other side for a minute or so, the transfer simply stalls for a while. Coda on the other hand tries to be knows that some application is seemingly 'frozen' as long as we're stuck in the RPC2 layer and will give up on an uncooperative network more quickly. JanReceived on 2004-06-30 17:55:56