Coda File System

Re: a few more build buglets in 6.1.2 on freebsd 6.1-release

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Mon, 6 Nov 2006 10:38:50 -0500
On Fri, Nov 03, 2006 at 06:50:41PM -0500, Jan Harkes wrote:
> Not really, I guess I can see if Shafeeq can have a look at it, or I'll
> just have to learn freebsd kernel debugging. Probably not that different
> from working on linux stuff, and on the plus side they have an in-kernel
> debugger which may make life easier.
> 
> On Linux I'd probably find some old version that does work and then use
> a binary search to pin the problem down to the commit that introduced
> the breakage. The Coda kernel code probably hasn't changed much so the
> problem is most likely caused by some change in the vfs or mm code.

I got freebsd-6.1 up and running in a qemu VM. Reproduced the problem
perfectly, then use cvsup to go back to march 2006, problem still there,
january 2006, problem still there.

Then I noticed that the backtrace from the kernel actually looked like
it made more sense. The path lead through devfs_open -> vc_nb_open,
where vc_nb_open is a function in the Coda kernel module which is called
when the /dev/cfs0 device is first opened. And it looked like we were trying to dereference a pointer to a bad address somewhere early on.

Digging through the history of coda_psdev.c, I found that at some point
(September 1st 2004) there was a rewrite to 'modernize coda' where a
static array was replaced by a linked list. Most of the related code is
in coda_fbsd.c. We find the coda_mntinfo structure by walking the list,

    struct coda_mntinfo *
    dev2coda_mntinfo(struct cdev *dev)
    {
	    struct coda_mntinfo	*mnt;
	    LIST_FOREACH(mnt, &coda_mnttbl, mi_list) {
		    if (mnt->dev == dev)
			    break;
	    }
	    return mnt;
    }

I'm not sure if this list is circular or not, but if we don't find a
coda_mntinfo struct with the correct dev in the list we return some
random pointer. We allocate the coda_mntinfo structure in the same file
in coda_fbsd_clone and in the old code we clearly initialize mnt->dev,
but the new code does not.

@@ -192,31 +187,19 @@ static void coda_fbsd_clone(arg, name, n
    return;
 
     *dev = make_dev(&codadevsw,unit2minor(u),UID_ROOT,GID_WHEEL,0600,"cfs%d",u);
-    coda_mnttbl[unit2minor(u)].dev = *dev;
-  
+    mnt = malloc(sizeof(struct coda_mntinfo), M_CODA, M_WAITOK|M_ZERO);
+    LIST_INSERT_HEAD(&coda_mnttbl, mnt, mi_list);
 }

My guess is that the end of coda_fbsd_clone needs an additional,

    mnt->dev = *dev;

I'm building a kernel with this change, but since I am running in a VM it
is taking a while. It does look like this is the most likely cause of
the problem.

Jan
Received on 2006-11-06 10:43:09