(Illustration by Gaich Muramatsu)
On 02/22/2011 08:26 AM, Dennis Nasarov wrote: > I'm running Coda 6.9.4 on FreeBSD 8.1 boxes. After reboot, venus does not work, and is showing this: > > [root_at_n2 /coda]# ls -al > total 5 > dr-xr-xr-x 1 root nobody 2048 Feb 22 21:03 . > drwxr-xr-x 24 root wheel 512 Feb 20 21:51 .. > > ls: ./cloud.kg: Invalid argument > lrw-r--r-- 1 root nobody 29 Feb 22 21:10 cloud.kg > > I've tried venus -init from different clients, and still have same problem. Odd, I hadn't noticed the codalist traffic before. I don't think the reboot had anything to do with this. From the log it looks like there is a server-server conflict on the root directory of the root volume. The client is clearly trying to turn the mountpoint into a conflict link, > [ W(13) : 0000 : 21:17:55 ] fsdb::Get: transforming 1.ff000001.1.1 (1.ff000001.1.1) into fake mtpt with Fakeify() Conflicts on volume roots are tricky to begin with, on the root of a realm's tree introduces an extra level of pain. There is probably a bad interaction between the FreeBSD kernel module and venus. I know I had issues with conflicts on volume roots on Linux, but its directory cache also successfully hides problems sometimes when the parent-child linkage between directories isn't completely kosher. Still it could be that this is a problem on Linux as well. So it isn't totally surprising that a conflict on a realm root is problematic. Our realm actually has a top-level volume that almost never gets updated, it just contains mountpoints for 2nd level volumes (project/home/tmp/etc.) that in turn hold the mounts for individual project and user volumes. This way when a new user or project volume is added it only affects a 2nd level volume and a conflict there won't cause the whole realm to become inaccessible. That said, in this case we may only be failing to change a realm root to a dangling symlink, but a normal mountlink/volume root would work then we could create a temporary root volume, mount the real root, repair it and then get rid of the temporary volume. On the SCM Coda server, # create a temporary root volume (single replica only) createvol_rep tmproot n1.cloud.kg echo tmproot > /vice/db/ROOTVOLUME On a client, venus -init clog admin_at_n1.cloud.kg cd /coda/n1.cloud.kg/ cfs mkm root / cfs fr . ls -l root At this point root should be a dangling symlink, the target of the link should either be '#/', in which case the mount didn't actually work (yet), or it will look something like '@7f000000.1.1' in which case it successfully recognised that there is a server-server conflict. Then the conflict can be examined and repaired, the easiest way for server-server directory conflicts is often the following, repair root /tmp/fix -mode 755 Repair may or may not ask some questions about recreating or removing conflicting entries, once everything goes through it should return successfully and root will be a normal directory again. (/tmp/fix is a temporary file which will be used to record the operations that should make all replicas identical again) If repair fails, you need to expand the conflict, flush any cached state, collapse the conflict and retry repair, maybe this time remove an entry instead of trying to recreate it. Normally recreating is the right solution, but when a file was moved into a subdirectory on one server you actually cannot recreate it because it already exists and the right answer is to remove it on the server that didn't see the rename. cfs expand root cfs fl root/* cfs collapse root repair root /tmp/fix -mode 755 Once the conflict has been repaired, you can search for any other conflicts, they can be recognized based on the symlink target, find root -lname '@*' If we're happy we can switch back to the normal root volume by updating the ROOTVOLUME file on the server and reinitializing the client. On the server, echo / > /vice/db/ROOTVOLUME On the client, umount /coda killall venus venus -init JanReceived on 2011-03-11 16:31:48