Coda File System

Re: new coda issue: touch a file and coda dies

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Thu, 7 Jul 2005 14:22:23 -0400
On Thu, Jul 07, 2005 at 09:41:03AM -0600, Patrick Walsh wrote:
> # ls /root/pool_scm/r
> readline2.2.1-2.2.1-4.i386.rpm
> rpm-4.0.4-7x.20.i386.rpm
> readline-4.2-2.i386.rpm
> rsh-0.17-18.AS21.2.i386.rpm
> restore-default-system-1.0-20031001.i386.rpm
> rsh-0.17-18.AS21.4.i686.rpm
> rootfiles-7.2-1.noarch.rpm
> # du -s -h /root/pool_scm/r
> 2.6M    /root/pool_scm/r
> # ls r
> ls: r/readline-4.2-2.i386.rpm: No such device

Ok, 7 directory entries wouldn't be enough to fill a directory.

> 	At this point, venus has crashed.  The console.log file has the
> erroneous seeming errors that I pasted before, but to show again:
> 
> ***LWP (0x810ec50): Select returns error: 4
> 
> 09:28:28 worker::main Got a bogus opcode 36
> 09:29:30 readline-4.2-2.i386.rpm (606e1fc8.7f000003.1018.4de)
> inconsistent!
> 09:29:30 fatal error -- fsobj::dir_Create: (dir225,
> 606e1fc8.7f000003.fffffffc.80002) Create failed!

This is very strange, I looked at the source, we are trying to add a
directory entry to some unknown directory (the name or fid of the parent
in which we are trying to create is not logged). We do know that the new
entry has the name "dir225" and it is pointing at a fake object in the
same volume as the inconsistent rpm file.

However, server-server conflict do not in any way try to create names or
anything. The lookup or getattr operation returns EINCONS and this is
mapped to faked stat data right before we send the reply back to the
kernel. As far as I know there isn't even an actual filesystem object
associated with the inconsistent object, since the servers disagree
about it's contents. Only reintegration related expansion is changing
directory contents, since in that case we do have a locally cached copy
of the object and it has to be modified before we can show the global
version.

I also don't see how anything in that volume would even have a name like
'dir225', there are the [a-z] directories, and a bunch of *.rpm files.

But somehow these two must be related, since they seem to happen so
reliably right after each other.

> 	I should have mentioned that I already tried this.  And as you can see
> from the above terminal transcript, it had little effect.
> 
> 	Any other thoughts?

No idea, it just doesn't make sense. I don't see how a server-server
conflict could possibly get into the expansion code that is used when a
reintegration fails, if you are simply doing an 'ls'. I also don't
understand why it is trying to create a directory named 'dir225' when
all the names in the volume are either a single character 'a-z' or
'*.rpm'.

Maybe start venus with loglevel 100 (venus -init -d 100) and repeat the
same thing. At that point the log might show how we're getting to this
point and if those two events (the inconsistency and the crash) are
really related or not.

Jan
Received on 2005-07-07 14:23:33