Coda File System

Re: Behaviour of coda with large files

From: Martin Ginkel <ginkel_at_mpi-magdeburg.mpg.de>
Date: Mon, 06 Mar 2006 12:00:51 -0500
Hi Jan,

Jan Harkes wrote:
> What happens is that the client detects whether a server is up or down
> based on the existence of a callback connection. So when the client
> sends a probe, the server pings back on the callback connection.

Yepp, seen that from the log

> However backfetches are using the same connection, and your backfetch is
> taking very long. So the server is unable to send the ping back to the
> client. This shouldn't be a problem because the server should be
> responding with RPC2_BUSY which will make the client wait an extra 15
> seconds or so. I guess at some point the client did give up, returned
> ETIMEDOUT and disconnected.

Hmm. How is this RPC2_BUSY supposed to work.
Should this ObtainWriteLock somehow timeout? Which thread should
send the BUSY reply?
As far as I can see, the client sends the Probe several times on
the RPC2 level, without getting *any* reply.
And the Server itself also 'looses faith' in the client, some
seconds after hanging in this lock.
It drops the callback-conn then.

> long, it should have been broken up by the client. 
> This is actually a
> known bug (introduced somewhere between 6.0.9 and 6.0.12), the fix is
> fairly simple, just removing an unnecessary test. I've attached the
> patch.

OK, will try that.

> And arguably, the client shouldn't even have to probe the server because
> clearly there is still traffic between the two. But that is more of an
> optimization and not really a correctness issue.

Hmm: I don't know. In my tests I explicitly stimulated the
'Probe' by running
cfs cs
But Venus itself pings also automatically after a certain time (150s),
and this will take down the connection also. Therefore monitoring
the activity on the client-server connection for all activity
and stopping explicit probes, when venus received other traffic seems
to be logical.

> This is actually not a reintegration write lock, this is caused by the
> fact that there is only a single RPC2 connection from the server to the
> client, so it can only do one thing at a time. Fetch a file, or send a
> callback probe.

You mentioned, that clients should break up their store OPs.
Can they break up the transfer-size below file-size?
Or do they have to transmit at least one file completely.
I think this matters for the ISO-Images.

	Thanks for your help
	Martin


-- 
+-[Martin Ginkel]-------------[mailto:mginkel(at)mpi-magdeburg.mpg.de]-+
| MPI Magdeburg, Zi S2.09    Sandtorstr. 1, D-39106 Magdeburg, Germany |
| What is this talk of 'release'?  We are Klingons. Our software       |
| 'escapes' leaving a bloody trail of designers and quality assurance  |
+-[tel/fax: +49 391 6110 482/529]----[http://www.mpi-magdeburg.mpg.de]-+
Received on 2006-03-06 12:05:48