(Illustration by Gaich Muramatsu)
On Wed, Jul 21, 2004 at 02:12:08PM -0400, Martin Emrich wrote: > I have 2 Problems on my notebook: > > a) > I have a conflicting subdirectory ("DEBIAN") which should not conflict, > because it never existed on the server or any client except my notebook. So, > if I try to repair the conflict, repair says > > repair > beginrepair DEBIAN > Too few directory entries > Could not allocate replica list > beginrepair failed. > repair > quit > martin_at_gwaihir:/coda/darkzone/packaging/baghira/baghira-engine-0.4b/debian/baghira-engine$ > ls > DEBIAN usr > > with "DEBIAN" being a directory containing the local directory and a dead > symlink to the global copy, which never existed. Another repair on this > fails. Well, it is a reintegration conflict, but since the file doesn't exist on the server we can't expand it correctly. So repair fails and even forgets to re-collapse the expanded tree. This is a combination of several problems. First of all the conflict probably should have been on the parent directory. I don't know why it didn't mark that one instead, maybe it still had active references. Those can be caused by things like an application or shell keeping the directory cache entry pinned and we can't turn the thing into a dangling link. The second problem is that repair is very paranoid and refuses to do anything if it can't reach all copies. That really shouldn't be necessary in all cases, if a server is dead or unreachable it would still be useful to perform a partial repair, even though the conflict would come back as soon as the missing server returns. Finally the repair tool forgot to collapse the expanded tree when it failed. It can be done by hand with 'cfs endrepair'. > Unless I resolve this, this volume won't be reintegrated (I already have > accumulated 957 CMLs ;-) cd out of the parent directory, then do a cfs er baghira-engine to collapse the tree and hopefully flush the cached data from the kernel. Then 'ls -l' and hopefully the parent will show up as a conflict. But removals are an area where repair is probably pretty weak in general. > On another volume, I had this strage behaviour today: > > martin_at_gwaihir:/coda/darkzone/organizer$ ls -l > insgesamt 4 > drwxr-xr-x 2 martin nogroup 2048 2004-06-09 07:55 adressen > drwxr-xr-x 2 martin nogroup 2048 2004-07-01 19:52 kalender > > (German for "Connection timed out"). I already restarted the server components > and the client, nothing happens. This volume stays disconnected, too. Well, that could be the other reason why the global replica isn't accessible. Maybe the server is unreachable for some reason. Do you have multiple network addresses for the server? Are there any firewalls between the client and the server? Does 'cfs cs' (checkservers) help? > What can I do (except backing up everything from another client, removing the > two volumes and making new ones) ? You can copy the tarball from /usr/coda/spool/<userid>/volumename.tar which should contain all the changed files which are in the CML, it doesn't have symlink, rename or remove operations though. Then a 'cfs purgeml' will flush all the pending operations from the local cache at which point it should be possible to bring the volume back into connected state. JanReceived on 2004-07-21 15:08:25