Coda File System

Re: Venus mount problems

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Fri, 11 Mar 2011 16:31:41 -0500
On 02/22/2011 08:26 AM, Dennis Nasarov wrote:
> I'm running Coda 6.9.4 on FreeBSD 8.1 boxes. After reboot, venus does not work, and is showing this:
> 
> [root_at_n2 /coda]# ls -al
> total 5
> dr-xr-xr-x   1 root  nobody  2048 Feb 22 21:03 .
> drwxr-xr-x  24 root  wheel    512 Feb 20 21:51 ..
> 
> ls: ./cloud.kg: Invalid argument
> lrw-r--r--   1 root  nobody    29 Feb 22 21:10 cloud.kg
> 
> I've tried venus -init from different clients, and still have same problem.

Odd, I hadn't noticed the codalist traffic before.

I don't think the reboot had anything to do with this. From the log it
looks like there is a server-server conflict on the root directory of
the root volume. The client is clearly trying to turn the mountpoint
into a conflict link,

> [ W(13) : 0000 : 21:17:55 ] fsdb::Get: transforming 1.ff000001.1.1
(1.ff000001.1.1) into fake mtpt with Fakeify()

Conflicts on volume roots are tricky to begin with, on the root of a
realm's tree introduces an extra level of pain. There is probably a bad
interaction between the FreeBSD kernel module and venus. I know I had
issues with conflicts on volume roots on Linux, but its directory cache
also successfully hides problems sometimes when the parent-child linkage
between directories isn't completely kosher. Still it could be that this
is a problem on Linux as well.

So it isn't totally surprising that a conflict on a realm root is
problematic. Our realm actually has a top-level volume that almost never
gets updated, it just contains mountpoints for 2nd level volumes
(project/home/tmp/etc.) that in turn hold the mounts for individual
project and user volumes. This way when a new user or project volume is
added it only affects a 2nd level volume and a conflict there won't
cause the whole realm to become inaccessible.

That said, in this case we may only be failing to change a realm root to
a dangling symlink, but a normal mountlink/volume root would work then
we could create a temporary root volume, mount the real root, repair it
and then get rid of the temporary volume.

On the SCM Coda server,

	# create a temporary root volume (single replica only)
	createvol_rep tmproot n1.cloud.kg
	echo tmproot > /vice/db/ROOTVOLUME

On a client,

	venus -init
	clog admin_at_n1.cloud.kg
	cd /coda/n1.cloud.kg/
	cfs mkm root /
	cfs fr .
	ls -l root

At this point root should be a dangling symlink, the target of the link
should either be '#/', in which case the mount didn't actually work
(yet), or it will look something like '@7f000000.1.1' in which case it
successfully recognised that there is a server-server conflict.

Then the conflict can be examined and repaired, the easiest way for
server-server directory conflicts is often the following,

	repair root /tmp/fix -mode 755

Repair may or may not ask some questions about recreating or removing
conflicting entries, once everything goes through it should return
successfully and root will be a normal directory again. (/tmp/fix is a
temporary file which will be used to record the operations that should
make all replicas identical again)

If repair fails, you need to expand the conflict, flush any cached
state, collapse the conflict and retry repair, maybe this time remove an
entry instead of trying to recreate it. Normally recreating is the right
solution, but when a file was moved into a subdirectory on one server
you actually cannot recreate it because it already exists and the right
answer is to remove it on the server that didn't see the rename.

	cfs expand root
	cfs fl root/*
	cfs collapse root
	repair root /tmp/fix -mode 755

Once the conflict has been repaired, you can search for any other
conflicts, they can be recognized based on the symlink target,

	find root -lname '@*'

If we're happy we can switch back to the normal root volume by updating
the ROOTVOLUME file on the server and reinitializing the client.

On the server,
	echo / > /vice/db/ROOTVOLUME

On the client,
	umount /coda
	killall venus
	venus -init

Jan
Received on 2011-03-11 16:31:48