Coda File System

Re: volume connection timeouts

From: Patrick Walsh <pwalsh_at_esoft.com>
Date: Mon, 02 May 2005 15:22:13 -0600
Jan,

	Thanks again for a helpful explanation.  We've just now pinned down a
solution to some of our problems.  I'll explain below.

> them to flush the data back to the server. Basically you won't see a log
> on the server until it got rotated.

	That's fine.

> You either lost tokens. With 6.0.8 losing tokens means losing access
> even to parts that are accessible to anonymous users, this is 'fixed' in
> 6.0.9 by dropping the established user identity when tokens expire. It
> could also be that you have a reintegration conflict, which makes the
> volume switch to disconnected state. 

	We have cron jobs that run as local user root and coda uid 502.  Apache
runs as user www-data with coda uid 501.  But there's a catch.  We
forgot that apache needs to start as user root in order to listen on
ports 80 and 443.  It also does its logging as user root.  Only its
children listener processes run as user www-data.  So apache was
starting with proper coda permissions, then a cron job was essentially
logging user root in to coda with new tokens, thus disconnecting apache
from its log directory and causing conflicts.

	Unfortunately, the conflict resolution process was frustrating because
it would show identical files in "local" and "global" down to the file
size and time stamp.  The checklocal command inside the repiar utility
was saying something about the parent dir having changed permissions,
which it didn't.  I suspect that this error really means that tokens
were changed while a file was open so that locally it was "owned" by one
coda user while globally by another.  Since there's no way to see this
information, it appears that the files are the same. Or am I not right
about this?

> > 	8 volume replicas
> > 	6 replicated volumes  (is this a strange discrepancy?  there should
> > only be 6 volumes)
> 
> Not necessarily, there are the local root (/coda) and repair volumes.

	Interesting.

> Well, considering that you're running both a client and a server on the
> same machine, it is to be expected that they end up either weakly
> connected (write-disconnected), or completely disconnected at times.
> Running the both of them in a vmware session probably only makes it do
> that faster. 

	This is disappointing because it makes me worry that we could get hard-
to-reproduce occasional conflicts in coda on our backend.  And in a
server that is basically automated, the problem could go unnoticed for
some time.  I wonder if venus could detect when the server is on the
localhost and then adjust itself to be more patient?  I'm sure it's
common to want the server to be able to mount the coda files.  Intuition
says that that should be the most reliable setup instead of the least-
reliable setup.  It might be slower, but it shouldn't be less reliable.
We're putting `cfs strong` 's all over the place now to try to keep
things from getting write-disconnected.  Is there anything else we can
do to enforce this?  Slow writes are not a problem so much as conflicts
are.

	Thanks for examining this issue with me.  As usual, I'll be sure to
write up my notes in the wiki and share them with others to stave off
similar such problems in the future.  

-- 
Patrick Walsh
eSoft Incorporated
303.444.1600 x3350
http://www.esoft.com/

Received on 2005-05-02 17:23:05