(Illustration by Gaich Muramatsu)
So, is the value of VBAD perhaps being passed in to the kernel by venus somehow? sys/coda/coda_vnops.c:make_coda_node seems to assign v_type, and this is obtained from venus. These values are declared in libs-src/kernel-includes/coda.h, which is scary, since this isn't the kernel include, but they seem to match. ------- Forwarded Message Return-Path: gdt_at_ir.bbn.com Delivery-Date: Sun Mar 16 15:23:05 2003 Return-Path: <gdt_at_ir.bbn.com> Delivered-To: gdt_at_fnord.ir.bbn.com Received: by fnord.ir.bbn.com (Postfix, from userid 10853) id 81568864; Sun, 16 Mar 2003 15:23:05 -0500 (EST) To: tech-kern_at_netbsd.org Cc: gdt_at_ir.bbn.com Subject: Re: vnode refcount panic, perhaps due to kern/vfs_lookup.c:lookup() References: <20030314210142.E59E07BD_at_fnord.ir.bbn.com> From: Greg Troxel <gdt_at_ir.bbn.com> Date: 16 Mar 2003 15:23:05 -0500 In-Reply-To: <20030314210142.E59E07BD_at_fnord.ir.bbn.com> Message-ID: <rmiisujvz6e.fsf_at_fnord.ir.bbn.com> Lines: 57 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Spam-Status: No, hits=-2.0 required=5.0 tests=AWL,IN_REP_TO,REFERENCES,SPAM_PHRASE_00_01,USER_AGENT, USER_AGENT_GNUS_UA version=2.43 X-Spam-Level: I found that the double-vput problem in vfs_lookup was due to a vnode with type V_BAD. This is passed to vfs_lookup from coda_symlink. Most of the time, the coda_call to symlink in coda_symlink works, and occasionally the call returns without error but the vnode is marked VBAD. I checked for VBAD, and returned -1, but promptly got a panic in nfs_symlink, I think because an mbuf that was free()'d was trashed or just a bad pointer. So, I'm guessing that the coda kernel code occasionally messes up, or there is some locking problem where the vnode gets modified/marked bad by something else. This is all on a 192 MB i386 running cfsd/rpcbind/mountd, venus, bash, emacs, sshd/ntpd/etc. and 3 more gettys. There is basically nothing else going on, and the machine was freshly booted. I am just beginning to grasp the locking rules, and I'd appreciate being set straight if I am confused (and thanks to those who already responeded): the interlock in the vnode protects the vnode ref counts and a few other fields in the struct vnode. It is held for short periods only and is not about locking the vnode itself. Having a reference, expressed via the ref count field, protects you against the vnode going away or turning into something completely different. But it does not guarantee anything about operations on the vnode; to serialize those, the vn_lock is used. struct lock v_lock in the vnode protects the vnode in the larger context in terms of fs operations. When the comments say 'the locked vnode', they always mean the struct lock in the vnode (or rather v->v_vnlock, which in the coda case always points to v->v_lock since there is no stackable fs stuff going on). Little mention is made of the interlock in terms of locking discussions, other than in vnode(9), because that's too obvious. vput, for example, expects that the interlock is not held. It unlocks *v->vn_lock, and then decrements usecount. To do the latter, it has to acquire the interlock, but that's not mentioned. One should in general not hold the interlock when calling VOP_LOCK and VOP_UNLOCK or other vnops. But some operations take the LK_INTERLOCK flag to indicate that the interlock is already held. So, is it reasonable for an unlocked vnode to change to VBAD? Does holding the vn_lock mean that vgone should not be called? Is there any place else I should suspect that is changing the type to VBAD? Greg Troxel <gdt_at_ir.bbn.com> ------- End of Forwarded MessageReceived on 2003-03-16 15:33:05