Coda File System

Re: codasrv gets stuck

From: Steve Simitzis <steve_at_saturn5.com>
Date: Sat, 13 Dec 2003 04:18:45 -0800
interesting. do you have any suggestions for what i might do to
get around the problem? this seems to be happening to me with
increasing regularity.

On 12/05/03, Jan Harkes <jaharkes_at_cs.cmu.edu> wrote: 

> On Tue, Dec 02, 2003 at 08:25:05AM -0800, Steve Simitzis wrote:
> > the problem is that codasrv will freeze, apparently unbind all its
> > connections, and refuse to do much of anything. the only way to get it
> > running again is to kill -9 codasrv, and restart everything.
> 
> I've seen similar freezes on our testserver and attributed those to
> clients that are connecting from behind a masquerading firewall without
> lowering the server-probe timeout.
> 
> The problem is that the netfilter/iptables UDP connection tracking
> forgets about forwarded ports within 3 minutes, but the normal server
> probe is only about once every 5 minutes. So each probe sets up a bunch
> of new connections from a new port when it revalidates the local cache.
> 
> The server isn't very smart yet, and tracks a client based on the
> ip-address. So over time it builds up more and more RPC2 connection
> endpoints, but because some of these connections have always recently
> been used it never expires them. After a couple of days (weeks) it
> spends so much time looking for a matching connection endpoint for each
> incoming packet that the server seems to freeze. This disconnected any
> clients with pending operations, and they reconnect, only making the
> problem worse.
> 
> This is my current 'theory' about what is causing this. A server
> restart clearly fixes it for a while because that we we get rid of all
> those 'dead' endpoints. Another solution is to pull the network wire for
> about 10 minutes :)
> 
> I'm not yet sure where to 'attack' this problem. For one, the server
> should become a little smarter about tracking clients and which
> connections belong to them/are still active. But maybe rpc2 has a
> exponential growth problem in the lookup path where it is matching
> incoming packets.
> 
> Jan

-- 

steve simitzis : /sim' - i - jees/
          pala : saturn5 productions
 www.steve.org : 415.282.9979
  hath the daemon spawn no fire?
Received on 2003-12-13 07:20:57