Coda File System

Re: Coda lossage

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Fri, 30 Jul 2004 12:52:00 -0400
On Sun, Jul 25, 2004 at 02:14:54PM -0400, shivers_at_cc.gatech.edu wrote:
>    From: Ivan Popov <pin_at_medic.chalmers.se>
> 
>    (yet I do not seem to find how big your client caches were)
> 
> Pretty big, varying from 100Mb to 10Gb.

I use a 200MB cache which works pretty well. It translates to about 8000
locally cached objects.

But without tweaks, a 10GB Coda client will try to cache up to 420000
files. This in itself shouldn't be a problem except for the fact that
there are a couple of places where every object is compared to every
other object. So with 8K objects there are about 64 million
comparisons, while with 420K objects there are more than 176 billion.

If each of these comparisons takes a tenth of a microsecond (pulling
random number out of the air here), my venus could run through this loop
in about 6 seconds. But your client will need about 17640 seconds (5
hours) for the same comparison loop. In addition, with 8000 objects I
use about 21MB of RVM on the client, but when I extrapolate to 420000
objects there would be 1.1GB of RVM needed. And the any-to-any
comparison will trash through that memory, if there isn't enough RAM the
machine will probably end up swapping itself to death. Even what used to
be a simple linear lookupa find 'foo' will become a very expensive swap
heavy exercise.

Now the RPC2 layer gives us about 15 seconds for a reply on either the
client or the server side before we give up and disconnect. Which client
do you think is more likely to have unexpected connectivity problems.

We have been adding special yield operations in such loops to give the
rpc2 layer at least a chance to deal with incoming requests and send a
busy reply. But these yields don't take time lost due to swapping into
account, and since I've never ran a client with a 10GB cache there might
be several other places where additional yields are necessary. At some
point the original algorithms need to be rethought. A large cache simply
cannot work with O(n^2) operations and even O(n) should be avoided as
much as possible.

> I note that it has been > 10 years. And it appears to be, in some sense,
> a deep part of coda's design philosophy.

Not really part of the design philosophy. It is just a result of the
implementation, which did of course start over 10 years ago, probably
around the same time that it was great if your computer had a 20MB
harddrive.

We've been working on fixing such implementation problems within the
existing framework which can be quite difficult at times. Intermezzo was
a 'start from scratch' attempt, but I don't think it really took off.

Jan
Received on 2004-07-30 12:54:24