Coda File System

From: Jan Harkes <jaharkes_at_cs.cmu.edu> Date: Sat, 13 Nov 2004 00:48:34 -0500

On Fri, Nov 12, 2004 at 02:52:43PM -0800, redirecting decoy wrote:
> Still having a few problems though, and seemingly
> random errors.  My biggest problem is constant client
> disconnection.  This seems to really only happen on my

Such disconnections are not normal and probably indicate that something
is seriously wrong (i.e. servers can't connect back to themselves). So
these clients don't get callbacks and will keep showing stale data.
Disconnections also lead to reintegration (and possibly conflicts, if we
only sends updates one of the 2 replicas) Which could explain why you
aren't seeing changes propagate to the other client.

> 1) Is it possible to start venus as a normal user,
> instead of root ?   The reason I ask is because, when

No, venus needs root permissions to open /dev/cfs0, to use the mount(2)
system call and to bump the VM limits so that we're actually allowed to
mmap the large RVM data file.

> my script find's a broken client, it ssh's into that
> machine and attempts to run the commands below.  It
> seems that only root can do this, and I want to avoid
> having to ssh into the machine as root.

Maybe use 'super' or 'sudo'.

> echo -n password | clog -pipe user

This isn't really secure, it exposes the password to any user who runs
ps auxwww while venus is restarted. Also restarting venus (which
involves unmounting /coda) doesn't work if any process has an active
reference to any file or directory in /coda. This can be as simple as a
shell that happens to have it's cwd somewhere in the tree.

> 2) What is the preferred method of shutting down
> venus?

Definitely not kill -9 unless venus is dead. Kill -9 doesn't give a
chance to the program to flush dirty data and such. I use vutil
shutdown, which sends a more benign SIGTERM. You also need to unmount
/coda, which can be a problem if anything has a reference, the Linux
kernel simply doesn't allow unmount when there are busy inodes.

> As it is now, when I kill venus on either of my servers, I have to
> venus -init to restart it, otherwise it turns into a zombie. This does
> not seem to big as big a problem on my client machines though. I can't
> find anything that seems relevant in the log files.

Again, this is not normal behaviour and I've got 2 replicated servers
installed from scratch (6.0.7) on two machines with a fresh
Debian-testing installation including local Coda clients on both
machines and am not seeing any of the disconnection, stale data or
restart problems you are describing. I also haven't seen any logs or
gdb backtraces from you that might help me figure out what is going on.

> 3) Is there a list of "Connection States", such that
> are printed with the command "cfs lv
> /coda/storage/..."?  If there are, is there a
> preferred method of reconnecting a client, assuming
> that the client is anything other than "Connected"?

Not really, it is a bit like quantum theory. Even when cfs lv reports
that a volume is connected, we don't really know whether it is until we
try to perform an operation that requires communication with the
servers. At that point we figure out if the server responds or not.
We run a serverprobe (simple rpc2 ping) every 2-3 minutes which can
detect disconnection so that the user hopefully doesn't have to suffer
the 60 second timeout.

> 4) I find that I often have to clog to reconnect a
> client to the servers, if I made an update from

If your client is disconnected from the servers, then 'cfs cs' should
reconnect. If the client has pending reintegrations then 'cfs wr' should
force reintegration. If the reintegration can't proceed because of
server-server conflicts then the only solutions are 'cfs purgeml' (drop
all local pending modifications) or a user using 'repair' to fix the
problem.

The only difference between 'cfs cs' and 'clog' is that clog drops
existing connections and rebinds to the server. cfs cs simply tests
the existing connections and only rebinds if they were unable to get a
reply from the server.

> another client.  Even if "ctokens" says that my token
> is still valid, I can't see the updated files until I
> clog again.  First of all, is this the correct
> behaviour ?

That depends on the connection state. If the client on which the updates
were performed is in write-disconnected or disconnected state, then it
is completely normal. If the updating client only talks to server A,
while the other client only sees the replica on server B, then it is
completely normal. But in reality if the clients are local and fully
connected this is highly unusual.

> 6) How much can coda tolerate ?  I mean, can I use it
> to do something like run www/mail/ftp services off the
> Filesystem "fairly" reliably?  Or am I asking too
> much...

Most of the webpages and mailinglist archives for www.coda.cs.cmu.edu
are served straight from /coda/coda.cs.cmu.edu/project/www/html.
Ofcourse this is all static pages, and mostly readonly. In fact the
webserver only updates the mailinglist archives, everything else is
typically updated from a client on my desktop or laptop.

ftp.coda is running from the local disk, mostly because our servers used
to be pretty old, slow and small, /vicepX partitions were 2GB each, 3
per server, due to replication the total available space exported by 6
servers was < 18GB. Recently we upgraded the hardware so now we have
significantly more space.

My email is delivered straight into coda, but the delivery agent does
run on my desktop so that I can locally repair any conflicts that might
occur. A conflict or loss of tokens makes the volume readonly, procmail
fails and the mail falls through in a 'failed-delivery' mailbox on the
local disk, an xbiff icon on my desktop alerts me and I can get tokens
or repair any conflicts.

Jan

Coda File System

Re: A few questions...