Coda File System

Re: volume connection timeouts

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Tue, 3 May 2005 21:23:59 -0400
On Tue, May 03, 2005 at 08:38:36AM -0600, Patrick Walsh wrote:
> 	What would you suggest would be the best way to detect when a conflict
> occurs so that an administrator can be notified?  Is there a particular
> message we can monitor one of the logs for?  Or perhaps a cron job with
> a find command similar to this:
> 
> find . -type l -a -not \( -xtype f -o -xtype d \)

I typically use
    find . -lname '@*'

The problem is really that local-global conflicts only appear on the
client that failed to reintegrate. server-server conflict are seen first
by the client that noticed the version-vector differences when it called
getattr.

> Or perhaps monitoring the /usr/coda/spool directory?  How is this
> managed in other places?

Not sure how other are doing it, but for most conflicts it is typically
a user that alerts me. I don't get all that many conflicts on our
'backend' servers because things like the hypermail mailinglist archives
are actually built on the local disk and rsync'd over to /coda. 

If the rsync gets stuck it only affects the client that is writing the
update, so users just don't see the new mails.

> 	Also, the repair utility doesn't seem to have a way to list what
> objects are in conflict -- you have to already know the full path to
> them.  Are there any undocumented commands or shortcuts for using this
> utility?

one shortcut in combination with the previous find is,

    find . -lname '@*' -exec repair {} /tmp/fix -owner 7768 -mode 755 \;

However this only works reliably for directory conflicts, if there are
any file conflicts this would overwrite them with the contents of the
fix-file that was written by the previous directory repair.

> 	Would it help my situation if there was a minimum for the RTT estimate
> in the case where the estimate is near zero?  That would make it so the
> server can take a moment to flush a file without the client write
> disconnecting.  

There is a minimum RTT value which I think is 300ms. That should be
pretty conservative, especially since even a 10baseT network tends to
have <10ms RTTs. I think this value was picked because it was 50% more
than a typical roundtrip on a ppp link.

> > At the same time, the poor server is still
> > stuck waiting for the disk, and can't even dash off a quick ack telling
> > the client that it did get the request and is working on it.
> 
> 	Are there any plans to make the server multi-threaded to avoid these
> sorts of bottle-necks?

There is a version of LWP that runs on top of pthreads. If that is used
when building RVM, it runs the flush/truncate daemon thread fully
concurrent. But the RPC2 socket listener still runs as a non-concurrent
thread. I have used a venus built this way for a bit when I was trying
to catch some memory leaks with valgrind. But overall is isn't totally
reliable and a bit slower. It is also not really possible to go
completely multi-threaded, a lot of the code expects that threads are
cooperative and that concurrency is limited only to places where we
explicitly yield control.

Jan
Received on 2005-05-03 21:24:49