Coda File System

From: Jan Harkes <jaharkes_at_cs.cmu.edu> Date: Tue, 15 Apr 2003 22:00:32 -0400

On Tue, Apr 15, 2003 at 03:51:35PM -0400, Samir Patel wrote:
> 2) On client1, I create a file with the 'cat > blah' command... and
> just enter garbage into the file.
> 3) On client2, I delete the file that is still in the process of being
> created by client1.
> 4) I close the created file on client1.  It now shows up as something
> like:

When the file was removed on client 2, the callback was ignored by
client 1 because the file was still open for write. When client 1
finally closed the file and tried to store the data the server returned
an error. To avoid 'losing data', client 1 logged the store operation in
the CML and went into disconnected mode (and declared a conflict).

> 5) I attempt a repair... but get messages like this:
> 
> # repair
> repair > beginrepair blah
> Too few directory entries
> Could not allocate replica list
> beginrepair failed.

Here beginrepair simply tells venus to expand the conflict. Once the
conflict is expanded it checks the contents of the 'fake directory' to
see what type of conflict this is. You can suspend repair (^Z) and do
'ls -l blah' to check for yourself. In this case it should have expanded
to a local and a global directory. However the enumeration step failed,
maybe we got completely disconnected from the servers and global is a
dangling link instead of a proper directory (as I just noticed is the
case further down in your email).

> repair > beginrepair blah
> Could not allocate new repvol: Object not in conflict
> beginrepair failed.

The conflict is still expanded, so the first test before expanding
(check for the symlink) fails and repair claims that there is no
conflict. This version of repair actually forgets to call 'cfs
endrepair' in many cases, I corrected most error paths to collapse
the conflict in the CVS version.

> If I do an 'ls -l blah' now, I get:
> 
> lrw-r--r--    1 root     nfsnobod       27 Apr 15 13:35 global ->
> @7f000000.000018f2.000007ef
> -rw-rw-r--    1 samir    nfsnobod       14 Apr 15 12:19 local

This is easily explained. The object does not exist on the servers, so
ofcourse we cannot show the global object. I believe this conflict
'should' be propagated to a directory conflict in the containing
directory. Perhaps it fails to do so because your shell process has
'pinned' down the directory and therefore we fail to turn the directory
into a conflict. Or maybe there is something wrong in the local-global
repair handling of store conflicts, this is an update/remove type
conflict that hasn't been tested as frequently as update/update
conflicts (when both clients concurrently try to update the file).

So doing 'cfs er blah ; cd .. ; cfs er dirname ; ls -l' could show that
the parent directory is the actual location of the conflict.

> Also, at this point, it seems that I can make changes to directories
> under /coda w/o even having authentication tokens (none of these
...
>   Connection State is WriteDisconnected

That's the reason, the client is logging updates, but you need to
resolve the conflict (and obtain tokens) before the logged updates are
propagated back to the server.

> but I can't ever get the conflicts resolved.  The only way to get
> things back to "normal" is to shut Coda down, and re-run venus-setup.

'cfs purgeml' will throw away all logged updates that are still waiting
for reintegration. Once the logged updates are gone the conflict should
be gone as well and the volume can become 'Connected' again.

> Occasionally, I can't even cleanly shutdown Coda (get timeout error's
> trying to access /coda partition), in which case I just reboot the
> client.

When there is any process with a reference to a file in the /coda
subtree (f.i. working directory of a bash shell), the Linux kernel
rejects the unmount. The timeout and EIO errors occur because there is
no venus process listening for upcalls anymore. The only way out is to
find and kill any processes that have references to objects in /coda,
then umount /coda and then the client can be restarted.

Jan

Coda File System

Re: Simple problem... I think