Coda File System

My experiences with Coda and why I went back to NFS

From: Douglas C. MacKenzie <doug_at_mobile-intelligence.com>
Date: Fri, 26 Jan 2001 13:38:09 -0500
As a thank you to this mailing list and the Coda developers
I would like to pass on my thoughts and experiences with Coda.

I ran Coda for about 8 months on a small office cluster of 
4 workstations and one server.  I really liked the promise
of disconnected operations, and looked forward to running
a coda client on my win98 laptop, but it never sounded 
stable enough to bother with loading it up.

I had three major Coda problems over the 8 months.  The first
was due to clock skew.  Coda needs to have the clocks set pretty
closely and we kept getting reconnect conflicts until we 
started running ntpd.  The daylight savings time roll over was
really a pain.

The second problem was on-going, the clients would continually
disconnect and reconnect, even when on a fast network connection.
This caused no end of random clients running disconnected.
A major problem with Coda is that there is no way for a casual
user (the developers on our network) to quickly decide if they
are running connected or disconnected.  There should be some
obvious alarm given when a client disconnects.  Something like
the popup dialogs that UPS software provides when the power fails.
Losing your network connection is easily as critical.  Anyway,
I spent a lot of time helping people get their clients back
reconnected.  (I saw an e-mail on the list which suggested
that this problem was fixed in the latest version, but I gave
up before I got to try it.)

The final problem was the killer.  One day the coda server 
core dumped with an assert and wouldn't restart.  I fooled
around with it for a day and got the server running and
found that read operations from clients would work OK but
that the server core dumped again on the first write operation.
Anyway, I gave up, copied my files out to an NFS partition and
unloaded Coda from our site.  This was the third time I had
had such a massive failure in about 6 weeks.

My basic conclusion is that Coda is not usable by anyone 
other than very dedicated researchers until you get rid
of all the asserts in the software and replace them with
meaningful error messages.  My biggest frustration was
trying to track down what a particular assert really meant.
One example was trying to install a coda server on a machine
and specifying an RVM data partition that was larger than the
available memory on the machine (I had X and some programs running).
Instead of saying "out of memory" or any such error message
I got an assert message and spent a couple hours figuring
out what that meant.  It is really scary to have your users
asking you when the server will be back up and all you have
to look at is an assert statement that you have to wait for
Jan Harkes (who was always very helpful) to interpret.

Dumping the asserts, adding an alert mechanism to report
when clients disconnect, and modernizing the conflict 
repair mechanism are the three short comings that I would
suggest working on first to make Coda ready for the 
real world.  As it was, I (Ph.D. in Computer Science
and used to advanced system administration problems) 
just was spending way to much time keeping it going
and didn't see how any of my users would ever be able
to take over any Coda system admin.  When that happens
I'm ready to give it another try.

Thanks for all your help,

    Doug

-- 
Douglas C. MacKenzie, Ph.D.
Mobile Intelligence Corporation
33150 Schoolcraft Road, Suite 108
Livonia, MI  48150-1646
   Voice: +1 734 367-0430
   Fax:   +1 734 367-0431
   Cell:  +1 248 225-0288
mailto:doug_at_mobile-intelligence.com
http://www.mobile-intelligence.com
Received on 2001-01-26 16:08:03