(Illustration by Gaich Muramatsu)
On Fri, Nov 03, 2006 at 06:50:41PM -0500, Jan Harkes wrote: > Not really, I guess I can see if Shafeeq can have a look at it, or I'll > just have to learn freebsd kernel debugging. Probably not that different > from working on linux stuff, and on the plus side they have an in-kernel > debugger which may make life easier. > > On Linux I'd probably find some old version that does work and then use > a binary search to pin the problem down to the commit that introduced > the breakage. The Coda kernel code probably hasn't changed much so the > problem is most likely caused by some change in the vfs or mm code. I got freebsd-6.1 up and running in a qemu VM. Reproduced the problem perfectly, then use cvsup to go back to march 2006, problem still there, january 2006, problem still there. Then I noticed that the backtrace from the kernel actually looked like it made more sense. The path lead through devfs_open -> vc_nb_open, where vc_nb_open is a function in the Coda kernel module which is called when the /dev/cfs0 device is first opened. And it looked like we were trying to dereference a pointer to a bad address somewhere early on. Digging through the history of coda_psdev.c, I found that at some point (September 1st 2004) there was a rewrite to 'modernize coda' where a static array was replaced by a linked list. Most of the related code is in coda_fbsd.c. We find the coda_mntinfo structure by walking the list, struct coda_mntinfo * dev2coda_mntinfo(struct cdev *dev) { struct coda_mntinfo *mnt; LIST_FOREACH(mnt, &coda_mnttbl, mi_list) { if (mnt->dev == dev) break; } return mnt; } I'm not sure if this list is circular or not, but if we don't find a coda_mntinfo struct with the correct dev in the list we return some random pointer. We allocate the coda_mntinfo structure in the same file in coda_fbsd_clone and in the old code we clearly initialize mnt->dev, but the new code does not. @@ -192,31 +187,19 @@ static void coda_fbsd_clone(arg, name, n return; *dev = make_dev(&codadevsw,unit2minor(u),UID_ROOT,GID_WHEEL,0600,"cfs%d",u); - coda_mnttbl[unit2minor(u)].dev = *dev; - + mnt = malloc(sizeof(struct coda_mntinfo), M_CODA, M_WAITOK|M_ZERO); + LIST_INSERT_HEAD(&coda_mnttbl, mnt, mi_list); } My guess is that the end of coda_fbsd_clone needs an additional, mnt->dev = *dev; I'm building a kernel with this change, but since I am running in a VM it is taking a while. It does look like this is the most likely cause of the problem. JanReceived on 2006-11-06 10:43:09