(Illustration by Gaich Muramatsu)
With recent coda cvs on fairly recent freebsd RELENG_4: notebook at home on ethernet, with 28.8 ppp link (both ends freebsd) to ethernet with coda server. Things generally have been working ok. I put a 100kish tgz in coda, and untarred it. I think I was write disconnected at the time. I noticed that I was going disconnected during reintegration. 'cfs cs' would go back to wd, and try to reintegrate. The bandwidth estimates got way bigger than the 2500 that might be justifiable, and the reintegration timed out. This happened repeatedly. I then munged the bw estimation code in venus to clamp at 2000, since I didn't want to stay disconnected all weekend. venus then reintegrated fine. I didn't try stopping/starting venus; it seemed like my clamp didn't get hit though. tcpdump showed huge blasts of packets (10 in a row) going out to the server. So, it seems like RPC2 needs selective acks and TCP-friendly congestion control :-) (<duck>Either that or actual TCP.</duck>) I also saw some 2980 byte packets, while 'ls -l .'. These got fragmented, but it worked anyway. While I'm rambling, it would be really nice if hoarding could be configured to run very slowly. I wonder how well the bw estimation code really works, and if having one be able to configure 'if < 10000, assume 2000' might not be helpful. Then have hoarding only use 10% of the available bw. Another thought is to have venus record the bw estimate at failure and divide by two, refuse to go above that for 10 minutes or one CML is handled, whichever comes later. Another thought is to limit rpc2 to one outstanding packet instead of going disconnected, and stay in that mode for a while.Received on 2001-06-23 12:24:58