(Illustration by Gaich Muramatsu)
> Ahh, cable modem, asynchronous network... I don't have DSL or cable > myself and Coda used to only work reliably on networks that had > identical up and download speeds. > > What happens is that during the fetches we see an amazingly fast > network, but we time out and get disconnected as soon as we try to write > even a little bit of data because the acks are taking far too long. RPC2 > 'thinks' we have a 3MB/s sync network, so when sending several KB and > not seeing the ack within a couple of milliseconds it believes the > packet got lost and retransmits. This only makes the congestion on the > uplink even worse. Once we hit about 5 retransmissions and haven't yet > seen the ACK message, the client gives up and disconnects from the > server. > > > When this bottleneck causes enough reintegration data to build up, blammo. > > The lossage is as I described in my last message: cfs lv shows the system > > in some kind of disconnected state, and cfs wr won't make it reconnect. > > > > So the message seems to be that if I don't press the system hard, it works. > > Under pressure, it falls over. For me, that's progress. Now I want to > > understand the current hosage. Can anyone help? > > Well, one thing is that your connection really is 'weak' in Coda's > terms. The uplink speed is probably in the order of 64 or 128Kb/s, so it > prefers to work write-disconnected. You can tell it not to adapt to > network bandwidth estimated by using 'cfs strong'. This should prevent > the (connected -> write-disconnected) transition. However you can still > become write-disconnected because of the (connected -> disconnected -> > write-disconnected) transition, in other words if RPC2 misses the bat > and times out you end up logging the change and won't automatically > return to connected state when we notice that the server hasn't really > gone. > > The reason your client isn't reintegrating is either because the pending > changes haven't 'aged' long enough. Statistically, any file that hasn't > been removed within 5 or 10 minutes after creation, it is likely going > to be around for several months. So a lot of bandwidth is saved by > delaying reintegration long enough so that short lived (temporary) files > can be optimized away locally. > > The other reason could be that the estimated bandwidth is so incredibly > low, that the client thinks it can't even reintegrate a single record > without blocking the user for a significant amount of time. I believe > the formula was something like, size of reintegration / bandwidth has to > be less than 15 seconds. The low bandwidth estimate would be caused by > RPC2's own insistence on retransmitting 'lost' packets, if every packet > is sent 4 or 5 times, these all eat up the available link bandwidth. > 128Kb/s would end up looking more like 32Kb/s (4KB/s) which really is a > trickle. Waittaminit. - I have a real live, continuous, non-telephone-modem net connection. - The server's up So how come I am unable to convince the client that it is *not* disconnected? That seems like a bug to me. Now that I'm disconnected, and unable to get the system to reconnect, it seems like I'm hosed. I am trying to write a mess of data into /coda. But if the client is disconnected, then the writes just pile up in the venus cache until it fills up, them boom, trouble. And this is for a stationary box in continuous connection to the net, no less. I tried "cfs fr" as per your suggestion, but it just bombs out with a mysterious error message that tells me nothing: $ cfs fr /coda/myserver/shivers VIOC_SYNCCACHE: Invalid argument VIOC_SYNCCACHE returns -1 $ The server is fine, btw -- other clients on other systems are connected & happy. What is the distinction between "write-disconnected" and "disconnected"? -OlinReceived on 2004-05-17 15:25:20