Coda File System

From: Jan Harkes <jaharkes_at_cs.cmu.edu> Date: Fri, 5 Dec 2003 16:51:46 -0500

I noticed that some of the raised questions weren't answered yet.

On Tue, Nov 25, 2003 at 10:28:18AM +1000, Daniel Andersen wrote:
> Am i correct in assuming that, if one server goes down for whatever reason, 
> the other will continue to function without a problem until the other comes 
> back up, and then pass back any changes that occurred while it was gone? 

Yes, but only for a short time. It mostly depends on how quickly your
clients are making changes, because the server keeps a finite log of
recent operations to make the server-server resolution when the 'lost'
server returns (more) reliable. Once the log runs out, the running
server becomes quite unusable.

> Also, if the SCM was removed to install a larger hard drive, is moving the 
> settings over as simple as copying the /vicepa and /vice directories over, or 
> would there be more to it?

A little bit more (or less),

    /vice/db contains shared state between all servers.
    /vice/* should contain not really important data.
    /vicepa contains file data.
    RVM data contains directory data and file metadata.
    RVM log contains changes to the RVM data that have been acknowledged
    with the client(s), but still have to be committed.

You would want to first truncate any pending operations in the RVM log.
(http://www.coda.cs.cmu.edu/doc/html/rvm_manual-8.html)

    # rvmutl
    * open_log /path/to/logfile
    * recover

Now the only important data that is really needed to rebuild the
server is whatever is in the RVM data file or partition, and the
contents of /vicepa. /vice/db/* can be copied from any other server
within your server group. If you are upgrading any machine that is not
currently 'nominated' as SCM, you only need the update.tk file and
updateclnt will do this for you.

If you want to resize the RVM data partition at the same time,
    http://www.coda.cs.cmu.edu/misc/resize-rvm.html

> Two more questions and then I'm done. Is this system actually production-ready

Don't know, although Coda has been running in 'production' here at CMU
all throughout it's development (f.i. all the webpages are served
directly out of /coda), we only have a handful of active users.

There are still plenty of little problems that make Coda a
not-production ready system in my eyes. There is a 256KB directory size
limitation, which is pretty bad because you can't store more than about
2000-4000 files in any directory. I believe that at the moment a volume
can only hold the metadata for approximately 160000 files. A fix for
this is pretty much 'working' and we have a server that has (close to?)
2 million files in a volume, but it needs some more work and testing, I
believe it breaks backups.

Then there are deliberate design decisions. When a server responds
slowly, a client switches to write-disconnected operation (writeback
caching). Changes are not sent back immediately, but as a batch at some
later point in time. This of course provides a weaker consistency model
and increases the chances on inconsistencies when multiple clients are
trying to update the same files. Any inconsistency will be flagged as a
conflict and requires user intervention to 'repair the conflict'. So
actual write/write sharing is not really something to be advised and
automated 24/7 processes need to be monitored or changed to avoid
problems or data loss.

f.i. my procmail script that drops email into the maildir folders in
Coda has the following,

    :0 W
    * ^X-Mailing-List:.*codalist@
    /coda/.../codalist/
    :0:
    # If any deliveries failed save it in the retry-delivery folder (local-disk)
    * ^X-Mailing-List:.*codalist@
    retry-delivery

Now if any mail is dropped in retry-delivery, I fix the problem (repair
conflict, or revalidate my Coda tokens) and run a 'retry-delivery' shell
script.

    #!/bin/sh
    mv ~/Mail/retry-delivery ~/Mail/retrying
    touch ~/Mail/retry-delivery
    formail -s procmail < ~/Mail/retrying

So in a way I'm working around the problem. I could have the mail daemon
keep things queued, but then I would never get any indication that
something is wrong with the mail delivery. Right now I use an xbiff-like
desktop thingy that can quickly tell me that email is not being delivered
correctly.

Jan

Coda File System

Re: New user queries :)