Coda File System

Re: 4.6.0 trials and tribulations

From: <jaharkes_at_cs.cmu.edu>
Date: Fri, 03 Jul 1998 17:21:38 -0300
> I have been lurking on this list pretty much since its inception but I
> haven't tried coda until a few days ago. I spent two days working
> through the problems I had with the documentation and getting a coda
> server working. I have to admit it was quite a learning process. 
>
> If you are interested, I took quite a few notes along the way and
> would be willing to update your release notes so that they are more
> accurate.

That's great, we work with coda on a day to day basis, and we can easily
overlook some things which are difficult for new users.
 
> Anyway, now I am having some trouble with venus. The first time that I
> got coda working, I decided to copy some stuff over to my vice server
> and so I kicked off a couple of "find . | cpio -p /coda/usr/local/..."
> This seemed to hang venus after a couple minutes. I tried some of the
> cfs commands and they just hung. Eventually, I tried to restart
> venus. Normal kills didn't work and so eventually I had to kill -9
> it.

Ok, I know what happened here. The server became unreachable as the network
link became saturated, the client then started logging all local changes 
(disconnected mode), and ran out of local modification records (cml-entries).
Venus then `asserts', and your cpio process hangs. We are working on fixing
this behaviour.

> After that I couldn't get it to run again. It couldn't seem to
> open /dev/cfs0 it said the resource was busy and so I figured that my
> only way out of this one was reboot.

When venus has crashed, it starts waiting for someone to attach a debugger
to it. So it needs to be killed. (ps aux | grep venus will show the pid)

 1. killall -9 venus

Then the /coda is not automatically unmounted.

 2. umount /coda

However, when any process or shell still has a file open on coda the kernel
refuses to unmount the device (and tell you so). This always leads to some 
hunting for which xterm's/shells/programs are still using anything in /coda.
(fuser and lsof are our friends).

Now venus can be restarted (although we often rmmod coda, to make sure the
kernel module doesn't have any bad `state')

> It worked and the server
> reintegrated with the coda server. However many of the files hadn't
> been copied yet and so I figured that the problem was that coda had
> hung in the middle. I kicked off one of the "cpio -p"'s and that
> seemed to work. Venus reintegrated and all was well. So I kicked off
> the second "cpio -" and then venus hung again. Same thing, I had to
> reboot to free up /dev/cfs0 but when it came back, I found that coda
> was still hanging and the log which was really long ended with this.

`cfs checkservers' can be used to force venus to reconfirm if the servers
are reachable. Also keep an eye on `codacon' it tells you when a server
becomes unreachable. It is sometimes useful to pause (^Z) a big tar/cpio 
operation, force a check if the servers are reachable, wait for all 
reintegration to complete and then continue. But with a 40MB venus.cache
it is not so critical anymore.

> -ben

Jan Harkes
Received on 1998-07-03 17:23:03