Coda File System

Re: coda server crash

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Fri, 16 May 2003 17:15:15 -0400
On Fri, May 16, 2003 at 01:31:17PM -0700, Steve Simitzis wrote:
> here's what i woke up to this morning:
> 
> (i love how things only crash when i'm sleeping - no fault of coda
> of course. :)

I've been trying to find who added the code that can tell whether I'm
asleep. You are right that some crashes only seem to happen at the most
unfortunate moments.

There is in fact still some problem my webserver that seems to hit
almost every morning between 7 and 8. It doesn't kill the machine, but
apache stops responding to requests for about an hour.

Very very strange.

> 11:07:14 ViceValidateAttrs: (1000002.1d0b.198ac) failed ()!
> 11:07:14 AllocRecord: No space left in volume log.

Ouch, one server or more? This is an indication that a client has been
talking to a single server for an extended period of time. The
resolution log has a 'limited' size of about 8000 operations.

This should never happen to single replicas, as we work around it by
turning off log-based resolution.

With volumes replicated across multiple servers, this can happen if one
server is down for an extended period of time. But I've also had it
happen when the switch to which all our servers are connected got
confused, and although I could ping all servers on that switch, the
servers could not ping each other until the switch got rebooted.

> i don't know if this is a known problem or not. i'm using 1 GB RVM,
> 100MB log, and 25 GB data. (the data partition is only 18% full.)
> 
> should i consider increasing my log? i had assumed that any size would
> do, for the most part, as codasrv was good about reusing it.

Yes, the log has to be increased because we need at least one new entry
during resolution before we can truncate the log and reclaim all
entries.

    volutil setlogparms <volume-replica-id> reson 4 logsize 16384

reson 4       - log based resolution is enabled
logsize 16384 - doubles the number of resolution log entries, I don't
		 remember ever overflowing after growing it to this.

You need to do this for each volume replica that is used by the
replicated volume (hope that makes sense).

Jan
Received on 2003-05-16 17:16:54