Coda File System

Re: best pratices for 24/7 operation

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Fri, 18 Mar 2005 15:01:47 -0500
On Fri, Mar 18, 2005 at 11:02:48AM -0300, Gabriel B. wrote:
> any of you uses coda for 24/7 operations? As i said earlier, i'm
> replacing a NFS setup for ours webservers with coda.

Sure, all of the Coda web, ftp and public cvs data is read from /coda.

> i first tried to set the tokens without expiration limit, then i
> switched to "cat /etc/coda/pass | clog -pipe user" every 6minutes in
> the cron, and a "cfs cs" every 2 minutes in the cron (for a paranoid
> check to see if the server already up after a possible disconect --we
> have a mayhemnet(tm) here--, in a near future i plan to to run it
> every 20sec only after a disconnect)

I have a directory /etc/coda/auth, which is chown root.root / chmod 600.
A shell script (/root/bin/gettokens.sh) containing,

    #!/bin/sh
    /usr/bin/clog -as www-data websrv_at_coda.cs.cmu.edu < /etc/coda/auth/www-data

A crontab entry,

    55 */3 * * *	root	/root/bin/gettokens.sh

And I call the same gettokens.sh script from /etc/init.d/apache2 before
the webserver starts.

(and ofcourse /etc/coda/auth/www-data contains the password needed to
get the token).

> Well, the authentication is my lesser problem. What frigthens me is
> that in the case of a disconection, i will have new file written to
> both machines. Most images uploaded by the visitors. i will not have
> changes in the images, simply new ones or deleted ones.
> 
> i was wondering how i could deal with that automaticaly, no human
> interaction. Is there a way to set a simple rule like: "copy the file
> there no matter what"?

In general there isn't anything that can totally prevent human
intervention. However, a certain 'subset' of operation should not result
in conflicts.

i.e. if you never actually modify an existing image file, only add or
delete files, avoid cross-directory renames and the names are guaranteed
to be unique then you should not see any conflict, even if there you are
using replicated servers. Simple insert/delete operations with unique
names should always be reintegratable after a disconnection or
resolvable if the servers got out of sync. If that were not true I
would be continually repairing conflicts as my email is delivered into
/coda.

The only problem that I have is that maildir creates the new email file
in mdir/tmp, then moves it to mdir/new. If I look at mdir/new first, I
could end up getting a conflict since file might not yet exist at all
servers until mdir/tmp is resolved.

> The second thing: backup. What option do you use? how long does it
> take to roll back?

We've been using amanda + some changes. Essentially the Amanda server is
mostly standard except that it doesn't complain loudly whenever it sees
a CODA or CODATAR dumptype in the config file. The Amanda clients are a
little bit more hacked up so that they know how to use Coda's volutil
dumpestimate/volutil backup/volutil dump to get the Coda server to dump
a snapshot of the volume.

(attention, backing up all volumes doubles a server's RVM usage, since
we have to keep a copy of all vnodes at the time of the snapshot)

Then if it happens to be a CODATAR dumptype it passes the dump through
codadump2tar which makes it into a normal tar archive. After that
optionally feeds it through gzip and finally sends it over a TCP
connection back to the Amanda server.

Rolling back, depends. if it is just a couple of files, pulling them
from the appropriate tarball and placing them back. If it is a
catastrophic server failure... Well, maybe that is why we have all our
volumes either doubly or triply replicated, I've never had to go back to
tape for that. We have had servers die horribly, but putting in a new
harddrive, reintializing RVM and creating empty volume replicas to
replace the lost ones is most of the work. After that a recursive ls on
/coda will trigger resolution, which at first can be pretty sluggish
since all the clients are trying to reach files somewhere deep down in
the unresolved parts of the tree. I often concentrate on getting the
more important subtrees resolved first and finally run a full recursive
ls across the tree to mop up any leftovers.

    cfs strong ; cfs wr
    ls -l  /coda/.../www
    ls -lR /coda/.../usr/satya
    ls -lR /coda/...

Ofcouse keeping an eye on 'codacon' since 'cfs strong;cfs wr' doesn't
protect me from disconnections.

Jan
Received on 2005-03-18 15:02:38