Coda File System

Re: Problems after 5.3.8 upgrade

From: Dr A V Le Blanc <LeBlanc_at_mcc.ac.uk>
Date: Mon, 24 Jul 2000 11:24:18 +0100
I've still received no advice about this; did it get through
to the list?  I can still not write any files in coda, though
I can delete files and create or delete directories.


On Wed, Jul 19, 2000 at 10:44:28PM +0100, Dr A V Le Blanc wrote:
> I've just upgraded to coda 5.3.8 (lwp 1.4, rvm 1.1, rpc2 1.5)
> on a group of Linux 2.2.16 systems with libc 2.1.3, gcc 2.95.2.
...
> $ ll /tmp/wwwjob/Makefile
> -rw-------   1 zlsiial  nogroup       518 Jul 19 07:42 /tmp/wwwjob/Makefile
> $ cp -p /tmp/wwwjob/Makefile .
> cp: ./Makefile: Invalid argument
> $ ll Makefile
> -rw-------   1 zlsiial  nogroup         0 Jul 19 22:36 Makefile
> $ rm Makefile
> $ (cd /tmp/wwwjob;tar cf - Makefile)|tar xif -
> tar: Makefile: Cannot write
> tar: Error exit delayed from previous errors

On Thu, Jul 20, 2000 at 11:40:38AM -0400, Jan Harkes wrote:
> Now that is very strange. I am curious what is rejecting the operation.
> Does it only affect the Makefile file in that directory, or simply the
> creation of any file anywhere?

It seems to affect creation of any file anywhere.

> Does the kernel print any errors to the console (or use dmesg to see the
> last kernel error messages).

The last thing in dmesg is this:

     coda_pioctl: Venus returns: -107 for (0x7f000000,0x1,0x1)

but I don't know if it happened when I tried to write a file.

> Is any activity in codacon indicating that venus tries to send a
> Create/Store to the server.

Here is 'cp /etc/passwd .;rm passwd':

     Create passwd ( 23:10:23 )
     Store passwd [0] ( 23:10:23 )
     connection::bandwidth avl.mcc.ac.uk 327225 351741 380228 ( 23:10:28 )
     Remove passwd ( 23:10:29 )
     connection::bandwidth avl.mcc.ac.uk 329055 356252 388349 ( 23:10:34 )

> It looks like the create succeeds, but the store operation is completely
> failing. On what operation does it return EINVAL,
>  `strace -e trace=file cp -p /tmp/wwwjob/Makefile' should show that.

     $ strace -e trace=file cp -p /etc/passwd .
     execve("/bin/cp", ["cp", "-p", "/etc/passwd", "."], [/* 20 vars */]) = 0
     open("/etc/ld.so.preload", O_RDONLY)    = -1 ENOENT (No such file or directory)
     open("/etc/ld.so.cache", O_RDONLY)      = 3
     open("/lib/libc.so.6", O_RDONLY)        = 3
     lstat(".", {st_mode=S_IFDIR|0711, st_size=2048, ...}) = 0
     stat(".", {st_mode=S_IFDIR|0711, st_size=2048, ...}) = 0
     stat("/etc/passwd", {st_mode=S_IFREG|0644, st_size=1104, ...}) = 0
     stat("./passwd", 0xbffff7dc)            = -1 ENOENT (No such file or directory)
     open("/etc/passwd", O_RDONLY|0x8000)    = 3
     open("./passwd", O_WRONLY|O_CREAT|O_TRUNC|0x8000, 0600) = 4
     utime("./passwd", [2000/07/20-23:10:23, 2000/07/19-00:01:34]) = 0
     chown("./passwd", 102, 0)               = 0
     chmod("./passwd", 0100644)              = 0
     $

> Is the disk containing /usr/coda/venus.cache full?

     Filesystem           1k-blocks      Used Available Use% Mounted on
     /dev/hda2              3044060    713412   2330648  23% /
     Coda                    300000       304    106404   0% /coda

This is affecting all clients everywhere, so that was unlikely to be
the problem.  Here's SrvLog from this period:

     23:12:25 VN_PutDirHandle: Vn 1 Uniq 1: cnt 1, vn_cnt 1
     23:12:25 VN_GetDirHandle for Vnode 0x1 Uniq 0x1 cnt 2, vn_cnt 2
     23:12:25 VN_PutDirHandle: Vn 1 Uniq 1: cnt 1, vn_cnt 1
     23:12:25 --DC: (0x80000019.0x1.0x1) ct: 1
     23:12:25 VN_PutDirHandle: Vn 1 Uniq 1: cnt 0, vn_cnt 0
     23:12:35 VN_GetDirHandle for Vnode 0x1 Uniq 0x1 cnt 1, vn_cnt 1
     23:12:35 VN_GetDirHandle for Vnode 0x1 Uniq 0x1 cnt 2, vn_cnt 2
     23:12:35 VN_PutDirHandle: Vn 1 Uniq 1: cnt 1, vn_cnt 1
     23:12:35 VN_GetDirHandle for Vnode 0x1 Uniq 0x1 cnt 2, vn_cnt 2
     23:12:35 VN_PutDirHandle: Vn 1 Uniq 1: cnt 1, vn_cnt 1
     23:12:35 --DC: (0x80000019.0x1.0x1) ct: 1
     23:12:35 VN_PutDirHandle: Vn 1 Uniq 1: cnt 0, vn_cnt 0

In the venus.log on the client, I'm getting thousands of these:

     [ H(07) : 0005 : 23:10:13 ] binding::~binding:  somebody forgot to decrement before delete

and I guess these are from about the time of the problem:

     [ D(02) : 0197 : 23:10:34 ] NotifyUserOfServerBandwidthEvent
     [ D(02) : 0199 : 23:10:49 ] NotifyUserOfServerBandwidthEvent
     [ T(01) : 0203 : 23:11:29 ] BeginRvmTruncate (173, 4116, I)
     [ T(01) : 0203 : 23:11:30 ] cache::EndRvmTruncate
     [ W(20) : 0000 : 23:12:25 ] Cachefile::SetLength 0
     [ D(02) : 0210 : 23:12:31 ] NotifyUserOfServerBandwidthEvent
     [ D(02) : 0211 : 23:12:40 ] NotifyUserOfServerBandwidthEvent
     [ D(02) : 0213 : 23:13:00 ] NotifyUserOfServerBandwidthEvent
     [ T(01) : 0217 : 23:13:40 ] BeginRvmFlush (1, 53636, I)
     [ T(01) : 0217 : 23:13:40 ] EndRvmFlush
     [ T(01) : 0218 : 23:13:50 ] BeginRvmTruncate (28, 53796, I)

in the middle of the above entries.  (The venus.log has 8232 lines,
of which 8024 are the above message about forgetting to decrement;
this is after about 15 minutes uptime.)

     -- Owen
     LeBlanc_at_mcc.ac.uk
Received on 2000-07-24 06:25:32