(Illustration by Gaich Muramatsu)
On Tue, Jan 22, 2002 at 11:19:50AM +0100, Ivan Popov wrote: > On Mon, 21 Jan 2002, Tom Carroll wrote: > > > Upon a beginrepair, the following is stated > > > > Could not allocate new repvol: Object not in conflict > > beginrepair failed > > > > Any ideas how to resolve the conflict? The "allocate new repvol" error sounds familiar. This is a result of one of the replicas being inaccessible. Either one of the servers is down, or there is a server/server conflict hidden underneath the local/global conflict. The current conflict expansion code in the clients handles both situations very differently and as a result we cannot repair these kinds of conflicts from a single client. Run, 'cfs br' (beginrepair) on the object in conflict, it should then expand into a directory, do an 'ls -l' on the directory and it should show global -> #volumename local/ if the client is disconnected from the servers. In this case, check the servers and network connections and run 'cfs cs' to force a server probe, which should typically bring the global volume back. global -> @7fxxxxx.xxxxx.xxxxx local/ if there is an underlying server-server conflict. The only way to fix this is to go to the same location on another client and fix the server-server conflict first. After that the reintegration should just start again, although you might get a local-global conflict if the server-server repair had to make any changes to the replicas. 'cfs er' (endrepair) will collapse the directory back into the conflict link. You have to collapse before repair can detect that there actually is a conflict. > I have got a similar effect by: > > 1. setting tight quota on a volume > 2. updating a file (becomes a conflict, file inaccessible) > 3. beginrepair, preservelocal (well I ignore the quota restriction here > once more...), endrepair, commit -> hangs forever > intterrupt causes Segmentation violation death > 4. Files becomes visible again but not modifiable or sometimes > stays a danglink link or (broken?) dir - do not recall exactly. > 5. repair is impossible as "Object not in conflict" ... > > Is there any general way to solve "not existing" conflicts? > I could not get rid of such objects without sacrificing the whole > client cache. Interesting, we don't use quota's, and I believe they are still left over from the AFS2 days. They are definitely not well tested. It could very well be that the repair marked both objects as not-in-conflict and synced the versionvectors so that clients don't (really) notice there is something wrong. There is a magic undocumented cfs call that might help here. It should not be used lightly, but can get you out of a situation where a conflict was marked as cleaned up. It's 'cfs markincon' and it fiddles with the versionvector of an object, so it is really dangerous stuff. JanReceived on 2002-01-22 11:51:07