Coda File System

Re: Coda over TCP

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Tue, 23 Sep 2003 10:01:31 -0400
On Tue, Sep 23, 2003 at 03:36:49AM -0400, solomon_weldeyesus wrote:
> any source on how to get Coda work over TCP .

There isn't any. TCP is not that useful because there is no real upper
bound on the keepalive, a Coda client (and the user) could simply be
left hanging for a long time before it realizes that it got disconnected
20 minutes ago.

So a Coda client will have to actively send it's own probes across the
TCP connection to see whether we are still connected. However this then
adds to the amount of data that is sent and when the connection is
backed up or in slow start it could be delayed enough that we end up
killing off a perfectly good connection. Some of that might be mitigated
by sending keepalives as special OOB messages.

The other thing is that we have many 'logical' RPC2/SFTP connections
between a client and a server. The lower bound is about 20 per client,
but this increases rapidly as clients can have multiple users and
connections may linger a bit. Right now our testserver has about 21
clients that 'pinged' it in the past 5 minutes, about 34 clients that
it knows about and close to 2900 logical RPC2 connections [*].

Now each of these logical connections should be queued independently
from all others. If we shove them all in a single TCP session, a quick 
getattr call could be delayed by a file fetch or store.

But if we use a separate TCP connection for each logical RPC2 connection
we use up way too many filedescriptors for most systems, and because
most connections only do an occasional 'request/reply' TCP will never
really get out of slow start or ramp up its window, and will be very
unresponsive when a packet is dropped because of the delayed acks.

Just leave an ssh session doing nothing for a couple of hours, hit a
single key and see how long it can take to get the character echo'd
back

TCP just isn't a good alternative for the RPC2 over UDP communication.
An interesting protocol that probably would work well for our situation
is SCTP.

Jan


[*] The number of RPC2 connections is unusually high for only 34 clients
because several of these clients are behind masquerading firewalls that
forget they are there and are changing their local port number about
once every 5 minutes. Because the Coda server currently keeps track of
clients by ip-address, and the ip-address never changes, each of these
masqueraded clients is probably responsible for several hundred RPC2
connections. The way to fix this is at the client by setting the
serverprobe timeout to less than the timeout of the firewall
redirections.

I can't forcibly disconnect connections when a new port number is used
because there could be more than a single client behind the firewall. In
that case it is normal and we're just incorrectly counting the number
of clients and we would have close to 150 clients connected to our
testserver (yeah right ;).

Jan
Received on 2003-09-23 10:05:57