Coda File System

RE: Venus dying on file create by xemacs or Star Office 5.0 (new problems)

From: <mattwell_at_us.ibm.com>
Date: Wed, 23 Jun 1999 15:58:21 -0400
Peter,

I did as you suggested and mailed off the log to you but it never
left the machine I mailed it from due to a misconfiguration. I just
noticed this yesterday. If you are still interested I still have it.
However in the meantime I have tried a couple of things.

I obtained the latest source and managed to get it to compile
far enought to produce a venus binary. I installed that and it
worked much better. I was able to (for the first time) save a file
from StarOffice. Great! Unfortunately I started to have some
other problems. For no good reason I can see I began to get
conflicts. The repair utility refuses to operate on the conflicts.
I have a token and I am the coda "superuser." If I use the
cfs beginrepair filename command I get the directory and
am able to view the local and global files. I don't know how
to proceed at that point. A cfs endrepair filename causes the
directory to revert to a "broken file. So I re-installed coda and
started clean.

So far so good. Then I went to put all the files back into
their volumes. I am using tar remotely from a client. This
started fine but then coda started spitting out cache overlow
messages. Note that there was plenty of space left and
a df was reporting negative numbers on the coda
filesystem - a very nice feature BTW. Then tar started
complaining about "unable to create file - filesystem full"
or something like that. So it would create a few files and
then skip a few. I think that, if possible, venus should at
this point block on serving the files until the server catches
up with its requests. By writing a little script that sends the
tar process a STOP signal, waits a second, then sends
a CONT signal, waits two seconds etc. I was able to
sucessfully untar my tape!

Here are the possible sources of problems I can see:

1. weak server machine (P100, 8meg ram)
2. venus is of a younger vintage than the server binaries
3. I used 2Meg for the log value - the script suggested
    that amount but the default in the square brackets is 12M.
    Is the 2Meg value a typo in the script or is 2M o.k. for
    serving 3gig with a 130M DATA partition. All partions
    are bigger than than the values given to the setup
    script, e.g. 140M for DATA, 3M for LOG.

Lastly: is there any way for tar to preserve the ACL's
in backing up and restoring? I know you have the
coda backup mechanism but I'd rather not have to
use it.

By the way. When I actually had coda working well for
a short while I was very, very impressed. Coda looks
like it will be a wonderful solution to keeping data on
several machines and shared among different users.

Thanks,

Matt
--






Hi Matt,

Do you actually get an OOP's, i.e. a kernel level segfault or do you get a
user level one. You can find this out looking at /var/log/messages.

If it is _that_ reproducible, could you do
echo 4095 > /proc/sys/coda/debug
echo 1 > /proc/sys/coda/trace
and get us the content of /var/log/messages.

- Peter -







I updated to 2.2.10 and built the new coda module but am still experiencing
the
same problems.
I think the freezing I am seeing is related to not being able to unmount
coda.
If I do a ctrl-alt-del
shutdown starts but doesn't complete. The only way out I have found at that
time
is to power
cycle.

Is it worthwhile to try compiling a bleeding edge venus? I'm not familiar
with
the mechanism
being used to make venus available for debugging when it dies. Is there a
way I
can detect
that venus died in a script and execute a clean up and restart venus or do a
clean reboot?

Matt
--


On Tue, Jun 15, 1999 at 03:42:52PM -0400, mattwell_at_us.ibm.com wrote:
>
> I am having some problems with venus. Creating files from
> some applications causes venus to die (go into the mode
> waiting for debugging). I can copy files, touch files, and create
> them with vi but if I try to use xemacs or Star Office venus dies
> and I have to reboot (can't login at another VT but the mouse
> still works - usually). Often I can't even reboot and I have to
> kill the power.

Hi Matt,

Venus 5.2.0 has a known null-pointer dereference problem, which is
triggered by all operations that attempt to create a new directory
entry. This is annoying, but cannot cause a complete system lockup.

A lockup like this would indicate that something is wrong with the
kernel code. Could you try to update your kernel to linux-2.2.10, and
combine that with a new Coda kernel module built from:
  ftp.coda.cs.cmu.edu:/pub/coda/src/linux-coda-5.2.3-linux2.2.9.tgz

There are a couple of fixes in our version wrt. to the one in the
official kernel tree, those seem to have solved at least one
(infrequent) lockup we noticed here on a SMP machine.

Jan
Received on 1999-06-23 15:59:17