Coda File System

From: Jan Harkes <jaharkes_at_cs.cmu.edu> Date: Sun, 3 Jul 2005 17:49:19 -0400

On Sun, Jul 03, 2005 at 01:47:51PM +0200, Florian Schaefer wrote:
> The symptom is that venus will become a zombie at some point and cannot
> be gotten to restart other than with a -init. This happens on both
> clients. Therefore I thinks that my volume may be corrupted in some way.

It doesn't look like you have a corrupted volume.

> I'm sorry, that I cannot serve some back-trace information, but the
> binaries don't contain debug information.

Yeah, rpm helpfully strips all unstripped binaries during packaging,
does installing coda-debug-debuginfo-6.0.11-1-i366.rpm help?

> Is there a cure for this illness?

Not sure. This is what I think might be happening.

The crash happens on a lookup of your realm name on the fake root
volume, where the fake root doesn't have any contents.

You have a fairly small cache (50MB), and are writing one or more 26MB
files in write-disconnected mode. During this your clients runs out of
available cache space. The 'cache overflow' message are shown whenever
venus has dropped everything it could.

One of the dropped objects must have been the data for the fake root
root volume that is mounted on /coda. This data is generated as realms
are discovered or expired, and we can't actually refetch it from the
servers. I am setting a couple of flags on these objects that I thought
would prevent the data purge, but clearly they are only (successfully)
preventing the fetch attempt. In any case, we're not crashing yet at
this point, because the kernel caches directory lookups.

Then the reintegration completes and at this point something must
trigger an kernel cache invalidation/refresh, and a couple of seconds
later we hit the lookup that kills the client.

I have several options to fix the problem, avoid purging the fake root
directory contents, or automatically recreate it when we are trying to
refetch the data.

What you could to to mitigate some of the problems, is increase the size
of your venus cache to at least double the size of the largest object
you might be working with, but preferably more. Something like 100-500MB
(100000-500000 blocks) will significantly reduce the relative amount of
objects that have to be discarded to make room for a new file, instead
of having to throw out 50% or more of all cached data you'd only need to
discard 5%, which gives venus a better chance to make a good pick.

Jan

Coda File System

Re: Stability issues