(Illustration by Gaich Muramatsu)
On Tue, Sep 23, 2003 at 03:36:49AM -0400, solomon_weldeyesus wrote: > any source on how to get Coda work over TCP . There isn't any. TCP is not that useful because there is no real upper bound on the keepalive, a Coda client (and the user) could simply be left hanging for a long time before it realizes that it got disconnected 20 minutes ago. So a Coda client will have to actively send it's own probes across the TCP connection to see whether we are still connected. However this then adds to the amount of data that is sent and when the connection is backed up or in slow start it could be delayed enough that we end up killing off a perfectly good connection. Some of that might be mitigated by sending keepalives as special OOB messages. The other thing is that we have many 'logical' RPC2/SFTP connections between a client and a server. The lower bound is about 20 per client, but this increases rapidly as clients can have multiple users and connections may linger a bit. Right now our testserver has about 21 clients that 'pinged' it in the past 5 minutes, about 34 clients that it knows about and close to 2900 logical RPC2 connections [*]. Now each of these logical connections should be queued independently from all others. If we shove them all in a single TCP session, a quick getattr call could be delayed by a file fetch or store. But if we use a separate TCP connection for each logical RPC2 connection we use up way too many filedescriptors for most systems, and because most connections only do an occasional 'request/reply' TCP will never really get out of slow start or ramp up its window, and will be very unresponsive when a packet is dropped because of the delayed acks. Just leave an ssh session doing nothing for a couple of hours, hit a single key and see how long it can take to get the character echo'd back TCP just isn't a good alternative for the RPC2 over UDP communication. An interesting protocol that probably would work well for our situation is SCTP. Jan [*] The number of RPC2 connections is unusually high for only 34 clients because several of these clients are behind masquerading firewalls that forget they are there and are changing their local port number about once every 5 minutes. Because the Coda server currently keeps track of clients by ip-address, and the ip-address never changes, each of these masqueraded clients is probably responsible for several hundred RPC2 connections. The way to fix this is at the client by setting the serverprobe timeout to less than the timeout of the firewall redirections. I can't forcibly disconnect connections when a new port number is used because there could be more than a single client behind the firewall. In that case it is normal and we're just incorrectly counting the number of clients and we would have close to 150 clients connected to our testserver (yeah right ;). JanReceived on 2003-09-23 10:05:57