Coda File System

Re: maximum cache size in venus

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Mon, 17 May 2004 15:12:48 -0400
On Sat, May 15, 2004 at 09:59:08PM -0700, Steve Simitzis wrote:
> On 05/11/04, Jan Harkes <jaharkes_at_cs.cmu.edu> wrote: 
> > This is strange, the reference counts (refs) are tracking the internal
> > references [readers writers refcnt] and the open counts (openers) are
> > trying to follow the kernel references [reading writing executing]. It
> > seems strange that we have a readers refcount, but no filehandles open
> > for reading. I'll have to check the source to see where these counters
> > are manipulated.
> 
> strange indeed.

It actually looks like there was no problem with those reference counts.

> i think i may have finally tracked down what may be causing the problem.
> 
> i noticed that i hadn't seen this type of crash until around the same time
> that we launched a new feature that relied on imagemagick. (imagemagick
> is a commonly used set of command line tools that converts and manipulates
> images to different formats and sizes.)
> 
> the crashes seem to take place after convert (part of imagemagick) does
> its work, and venus completes a reintegration of the files. unfortunately,
> it's not consistent. perhaps convert is writing the file out in an
> unpleasant way, and venus is reintegrating before the file is actually
> ready. (?)

I've straced convert and there are only a couple of noticable things,

- The output file is opened with the O_LARGEFILE flag. This really
  shouldn't be a problem, as far as I can remember the kernel even
  strips off this flag for us because we don't have the large file
  compatibility flag set in our superblock descriptor.

- There are 2 stat calls (which translate to getattr operations in venus)
  One is right after the file is created/opened. The other is before the
  final (partial?) write is done. Imagemagick seems to write the file in
  8KB blocks until there is less than 8KB left to write, calls a stat
  and then writes the last bits to disk.

That last one is unusual, although I don't immediately see how it could
affect anything. I'd like to know more about the reintegrations, for
instance is convert running for a long time and is the initial CREATE
operation already reintegrated before we're done writing the file.

> so now i have convert writing the file to /tmp, and then i'm renaming
> the file into coda. since making this change to our web application a
> few days ago, i haven't had a single crash. (still crossing my
> fingers!)

Well, if that works, definitely keep it like that for now. I'll try to
simulate the convert behaviour with a small test program and see if I
can trigger the problem.

Jan
Received on 2004-05-17 15:19:00