Coda File System

Greg Troxel: Re: vnode refcount panic, perhaps due to kern/vfs_lookup.c:lookup()

From: Greg Troxel <gdt_at_ir.bbn.com>
Date: Sun, 16 Mar 2003 15:30:41 -0500
So, is the value of VBAD perhaps being passed in to the kernel by
venus somehow?  sys/coda/coda_vnops.c:make_coda_node seems to assign
v_type, and this is obtained from venus.
These values are declared in libs-src/kernel-includes/coda.h, which is
scary, since this isn't the kernel include, but they seem to match.

------- Forwarded Message

Return-Path: gdt_at_ir.bbn.com
Delivery-Date: Sun Mar 16 15:23:05 2003
Return-Path: <gdt_at_ir.bbn.com>
Delivered-To: gdt_at_fnord.ir.bbn.com
Received: by fnord.ir.bbn.com (Postfix, from userid 10853)
	id 81568864; Sun, 16 Mar 2003 15:23:05 -0500 (EST)
To: tech-kern_at_netbsd.org
Cc: gdt_at_ir.bbn.com
Subject: Re: vnode refcount panic, perhaps due to kern/vfs_lookup.c:lookup()
References: <20030314210142.E59E07BD_at_fnord.ir.bbn.com>
From: Greg Troxel <gdt_at_ir.bbn.com>
Date: 16 Mar 2003 15:23:05 -0500
In-Reply-To: <20030314210142.E59E07BD_at_fnord.ir.bbn.com>
Message-ID: <rmiisujvz6e.fsf_at_fnord.ir.bbn.com>
Lines: 57
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Spam-Status: No, hits=-2.0 required=5.0
	tests=AWL,IN_REP_TO,REFERENCES,SPAM_PHRASE_00_01,USER_AGENT,
	      USER_AGENT_GNUS_UA
	version=2.43
X-Spam-Level: 

I found that the double-vput problem in vfs_lookup was due to a vnode
with type V_BAD.  This is passed to vfs_lookup from coda_symlink.
Most of the time, the coda_call to symlink in coda_symlink works, and
occasionally the call returns without error but the vnode is marked
VBAD.

I checked for VBAD, and returned -1, but promptly got a panic in
nfs_symlink, I think because an mbuf that was free()'d was trashed or
just a bad pointer.

So, I'm guessing that the coda kernel code occasionally messes up, or
there is some locking problem where the vnode gets modified/marked bad
by something else.  This is all on a 192 MB i386 running
cfsd/rpcbind/mountd, venus, bash, emacs, sshd/ntpd/etc.  and 3 more
gettys.  There is basically nothing else going on, and the machine was
freshly booted.

I am just beginning to grasp the locking rules, and I'd appreciate
being set straight if I am confused (and thanks to those who already
responeded):

  the interlock in the vnode protects the vnode ref counts and a few
  other fields in the struct vnode.  It is held for short periods only
  and is not about locking the vnode itself.

  Having a reference, expressed via the ref count field, protects you
  against the vnode going away or turning into something completely
  different.  But it does not guarantee anything about operations on
  the vnode; to serialize those, the vn_lock is used.

  struct lock v_lock in the vnode protects the vnode in the larger
  context in terms of fs operations.

  When the comments say 'the locked vnode', they always mean the
  struct lock in the vnode (or rather v->v_vnlock, which in the coda
  case always points to v->v_lock since there is no stackable fs stuff
  going on).

  Little mention is made of the interlock in terms of locking
  discussions, other than in vnode(9), because that's too obvious.

  vput, for example, expects that the interlock is not held.  It
  unlocks *v->vn_lock, and then decrements usecount.  To do the
  latter, it has to acquire the interlock, but that's not mentioned.

  One should in general not hold the interlock when calling VOP_LOCK
  and VOP_UNLOCK or other vnops.  But some operations take the
  LK_INTERLOCK flag to indicate that the interlock is already held.

So, is it reasonable for an unlocked vnode to change to VBAD?

Does holding the vn_lock mean that vgone should not be called?

Is there any place else I should suspect that is changing the type to
VBAD?

        Greg Troxel <gdt_at_ir.bbn.com>

------- End of Forwarded Message
Received on 2003-03-16 15:33:05