(Illustration by Gaich Muramatsu)
I think I found the problem - see below. But, there is still the problem of cfs and coda ops failing some times. I think there are multiple outstanding issues, really. Any clues appreciated. From: Greg Troxel <gdt_at_ir.bbn.com> To: tech-kern_at_netbsd.org Subject: vnode refcount panic, perhaps due to kern/vfs_lookup.c:lookup() Date: Fri, 14 Mar 2003 16:01:42 -0500 Sender: gdt_at_fnord.ir.bbn.com I am running recent netbsd-1-6-1, but the code in question is the same in -current. My environment: 1.6.1_RC1ish on i386 Coda, approximately 5.3.20 cfs 1.4.1 emacs I have ciphertext in coda, in /coda/home/gdt/secret-foo. This is attached as gdt-foo, and shows up in /crypt/gdt-foo, which is the NFS mount to cfsd. Running emacs and editing a file causes symlinks to be created to indicate that the file is being edited (the filename is .#<file-being-edited>, and the target is bogus (user/host/pid)). When I use emacs in cfs (no coda - ciphertext on plain ffs on local disk), I get occasional failures to write autosave files. Later, saving works, and the system stays up. I really don't know what's up here. Emacs in coda, without cfs, ends up with emacs stuck in R. I turned off SIGIO and it then worked normally, or at least pretty much ok. There is almost certainly something wrong in the coda kernel code with signal handling. When editing files in cfs-in-code, creating a symlink causes cfs to create a symlink in coda, as well as the symlink used to store the IV. This sometimes works, but I can reliably panic the system by typing, saving, and repeating. I set up kgdb, and got a panic in the symlink system call, trying to link .pvect_4be81305e7f14704 to 6b490fd3. After reading man pages and code, I think that the problem is the end of lookup, where the code at bad2: releases ni_dvp, and then falls through to releasing dp. In my case (likely odd due to coda wierdness), these are the *same* vnode, and the second call (to vput) panics since the reference has already been released. bad2: if ((cnp->cn_flags & LOCKPARENT) && (cnp->cn_flags & ISLASTCN) && ((cnp->cn_flags & PDIRUNLOCK) == 0)) VOP_UNLOCK(ndp->ni_dvp, 0); vrele(ndp->ni_dvp); bad: if (dpunlocked) vrele(dp); else vput(dp); ndp->ni_vp = NULL; return (error); } Note that the coda symlink call is a bit odd; it calls venus to make the symlink and then does a lookup to get the symlink to return. This means if the lookup fails for some reason, symlink can return failure even though the symlink was made. However, that beats a panic in lookup() - if lookup failing is supposed to be fatal, coda_symlink should panic explicitly. So, perhaps the code at bad: can decline to release if ndp->ni_dvp == dp, but that seems perhaps incomplete - I have not grokked the rules about which vnodes are locked when in these call paths. Perhaps these variables being equal is a sign of a larger problem. My backtrace: (gdb) bt #0 kgdb_connect (verbose=0) at /home/gdt/FOO-current/netbsd/src/sys/arch/i386/compile/BAR/../../../../arch/i386/i386/kgdb_machdep.c:258 #1 0xc026d638 in kgdb_panic () at /home/gdt/FOO-current/netbsd/src/sys/arch/i386/compile/BAR/../../../../arch/i386/i386/kgdb_machdep.c:273 #2 0xc01bc1a4 in panic (fmt=0xc0319ca1 "vput: ref cnt") at /home/gdt/FOO-current/netbsd/src/sys/arch/i386/compile/BAR/../../../../kern/subr_prf.c:229 #3 0xc01d6cdd in vput (vp=0xcf55ed34) at /home/gdt/FOO-current/netbsd/src/sys/arch/i386/compile/BAR/../../../../kern/vfs_subr.c:1213 #4 0xc01d57cd in lookup (ndp=0xcf511dc8) at /home/gdt/FOO-current/netbsd/src/sys/arch/i386/compile/BAR/../../../../kern/vfs_lookup.c:650 #5 0xc013524d in coda_symlink (v=0xcf511e38) at /home/gdt/FOO-current/netbsd/src/sys/arch/i386/compile/BAR/../../../../coda/coda_vnops.c:1656 #6 0xc01dda3e in VOP_SYMLINK (dvp=0xcf55ed34, vpp=0xcf511ea4, cnp=0xcf511eb8, vap=0xcf511edc, target=0xcf293400 "6b490fd3") at /home/gdt/FOO-current/netbsd/src/sys/arch/i386/compile/BAR/../../../../kern/vnode_if.c:899 #7 0xc01da8a9 in sys_symlink (p=0xcf2bead0, v=0xcf511f80, retval=0xcf511f78) at /home/gdt/FOO-current/netbsd/src/sys/arch/i386/compile/BAR/../../../../kern/vfs_syscalls.c:1521 #8 0xc0275a97 in syscall_plain (frame={tf_gs = 31, tf_fs = -1078001633, tf_es = -1078001633, tf_ds = 1208942623, tf_edi = -1077954368, tf_esi = -1077951284, tf_ebp = -1077950256, tf_ebx = -1077954380, tf_edx = 0, tf_ecx = 1208925264, tf_eax = 57, tf_trapno = 3, tf_err = 2, tf_eip = 1208484243, tf_cs = 23, tf_eflags = 663, tf_esp = -1077954428, tf_ss = 31, tf_vm86_es = 0, tf_vm86_ds = 0, tf_vm86_fs = 0, tf_vm86_gs = 0}) at /home/gdt/FOO-current/netbsd/src/sys/arch/i386/compile/BAR/../../../../arch/i386/i386/syscall.c:140 #9 0xc0100d42 in syscall1 () #10 0x804d013 in ?? () #11 0x804b03e in ?? () #12 0x480b7751 in ?? () #13 0x480b75d4 in ?? () #14 0x4808742c in ?? () #15 0x8049c50 in ?? () #16 0x80497d0 in ?? () (gdb)Received on 2003-03-14 16:08:11