(Illustration by Gaich Muramatsu)
interesting. do you have any suggestions for what i might do to get around the problem? this seems to be happening to me with increasing regularity. On 12/05/03, Jan Harkes <jaharkes_at_cs.cmu.edu> wrote: > On Tue, Dec 02, 2003 at 08:25:05AM -0800, Steve Simitzis wrote: > > the problem is that codasrv will freeze, apparently unbind all its > > connections, and refuse to do much of anything. the only way to get it > > running again is to kill -9 codasrv, and restart everything. > > I've seen similar freezes on our testserver and attributed those to > clients that are connecting from behind a masquerading firewall without > lowering the server-probe timeout. > > The problem is that the netfilter/iptables UDP connection tracking > forgets about forwarded ports within 3 minutes, but the normal server > probe is only about once every 5 minutes. So each probe sets up a bunch > of new connections from a new port when it revalidates the local cache. > > The server isn't very smart yet, and tracks a client based on the > ip-address. So over time it builds up more and more RPC2 connection > endpoints, but because some of these connections have always recently > been used it never expires them. After a couple of days (weeks) it > spends so much time looking for a matching connection endpoint for each > incoming packet that the server seems to freeze. This disconnected any > clients with pending operations, and they reconnect, only making the > problem worse. > > This is my current 'theory' about what is causing this. A server > restart clearly fixes it for a while because that we we get rid of all > those 'dead' endpoints. Another solution is to pull the network wire for > about 10 minutes :) > > I'm not yet sure where to 'attack' this problem. For one, the server > should become a little smarter about tracking clients and which > connections belong to them/are still active. But maybe rpc2 has a > exponential growth problem in the lookup path where it is matching > incoming packets. > > Jan -- steve simitzis : /sim' - i - jees/ pala : saturn5 productions www.steve.org : 415.282.9979 hath the daemon spawn no fire?Received on 2003-12-13 07:20:57