Coda File System

Re: Cache Overflow

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Fri, 30 Apr 2004 13:06:17 -0400
On Fri, Apr 30, 2004 at 04:50:36PM +0200, Johannes Martin wrote:
> Hi,
> 
> I just tried to copy a rather large file (bigger than my localcache) to a
> coda volume and of course ran out of cache space.
> 15:05:58 Cache Overflow: (4119, -51509)

That is because we don't intercept write calls in the kernel module.
When you copy a file into /coda, the only thing venus gets to see is,

    CREATE file
	(creates a 0-length file, no problem here)

    OPEN file for writing
	(sure, we hand a reference to the file to the kernel module)

    cp program writes, and writes, and writes

    CLOSE file
	(OMG we exceeded the available cache space)

At this point venus can't discard the file, because it is marked dirty.
So it probably tries to throw as much as possible out of the cache and
tries to write the file back to the server as soon as possible. Then it
can throw the huge file out of the cache and from that point on will
refuse to refetch it from the servers because it is to big.

> error message (and it looks like the file is actually there). When I tried
> an ls on the volume, I got a:
> 15:43:47 Fatal Signal (11); pid 23240 becoming a zombie...

Hmm, never seen that before. I guess the ls tried to fetch a directory
that was purged, for which it needed to free up some more space but
something must be wrong with the volume and so it crashed.

> I ran venus -init to reinitialize things, and it works again. Is this the
> expected behaviour or should venus be able to recover from a cache
> overflow?

It should normally 'recover' as soon as it manages to write the data
back to the server.

> Then, more trouble: another client got write-disconnected from the server:
>   Status of volume 0x7f000001 (2130706433) named "coda:home:jmartin"
>   Volume type is ReadWrite
>   Connection State is WriteDisconnected
>   Minimum quota is 0, maximum quota is unlimited
>   Current blocks used are 915870
>   The partition has 34709316 blocks available out of 35737652
>   Write-back is disabled
>   There are 320 CML entries pending for reintegration
> but cfs writereconnect doesn't seem to work, maybe because of
>   16:36:50 volume coda:home:jmartin has unrepaired local subtree(s), skip checkpointing CML!
> 
> I can't see any unrepaired subtrees, is there a way to find them?

I typically use,

    find /coda/path/to/volume -lname '@*'

Or you could look at the what the first entry of the CML checkpoint file
is in,
    /usr/coda/spool/<uid>/<volumename>.cml
    (/var/lib/coda/spool on Debian)

Jan
Received on 2004-04-30 13:10:09