(Illustration by Gaich Muramatsu)
On Fri, Jul 11, 2003 at 11:15:33AM +0200, Steffen Neumann wrote: > I have a couple of (non-vital) files that seem to have > propagated (partially) to the server, but are not > available for access or removal. There is no > conflict in that volume, the entries appear on all > clients, venus -init has no effect. The rest of the directory > is fine. > > aipc1(sneumann):laptop_liste>ls -la > ls: mondrian: Input/output error > ls: monet: Input/output error > ls: miro: Input/output error > ls: kandinsky: Input/output error > ls: aipc4: Input/output error Interesting, as this happens on all clients, my feeling is that these are directory entries that point to non-existant vnodes. > Question: where do they come from, I'm not sure. If this is a singly replicated volume then we can at least cancel out a resolution related bug. So it either occurred when the client was fully connected, or during reintegration. Most likely case out of these two would be reintegration, It might be related to a rename bug where the object we renamed got removed instead of the one that we renamed over, something like, touch foo ; touch bar mv bar foo # foo 'object' should be removed, and the name foo should point at # the bar 'object' But that is a fairly common operation, maybe it is a corner case where the rename has to occur before the filehandle is closed (store). It could also be related to losing CML entries when stores are optimized. http://www.coda.cs.cmu.edu/rt2/Ticket/Display.html?id=690 i.e. we reintegrated first record in the following log, op#000 create X op#001 store X op#002 store mondrian op#003 store monet ... But the client got disconnected and during the disconnected file X was updated again, the store optimization removed op#001 and added op#007, ... op#006 store aipc4 op#007 store X Now we retry the reintegration and the server says 'hey I already reintegrated everything up to op#001'. As a result, the client starts cancelling CML entries until it finds op#001 which doesn't exist, and we end up discarding the whole CML, losing the actual store operations for these files. But as far as I can see the vnodes should still exist, because those were created at the time we added the names to the directory. > and how to get rid of 'em ? This is about as complicated as the previous question. I'm not sure, maybe the vnodes actually do exist, but are in some 'virgin state'. In that case restarting the server should create empty container files. It should be possible to check with volutil whether we have the vnodes, > /coda/vol/ai/share/laptop_liste/ FID = 0x7f000004.5.3 VV = [329959 0 0 0 0 0 0 0] STOREID = 0x81468b62.3f0d06fb FLAGS = 0x8 According to the above getfid output, this should dump the raw contents of the laptop_liste directory, volutil showvnode 7f000004 3ae 3fddf This output will contain entries like: thisblob: 16 next: 0, flag 1 fid: (42.22) playground vnode^ ^unique ^name So now I can use that information to do a lookup for the actual vnode, $ volutil showvnode 7f000004 42 22 42.22(1), symlink, cloned=0, mode=644, links=1, length=15 inode=0x3, parent=1.1, serverTime=Mon Aug 24 10:07:54 1998 author=7456, owner=7456, modifyTime=Mon Aug 24 10:07:41 1998 , volumeindex = 0{[ 1 0 0 0 0 0 0 0 ] [ 41997777 903450466 ] [ 0 ]} If the vnodes really do not exist, it should be possible to remove the name entries that point to nothing with norton. > I have a good few > > grep 0x7f000004 SrvLog > 09:20:22 ValidateVolumes: 0x7f000004 failed! > 09:46:41 PutReintegrateObjects: stale directory fid 0x7f000004.5.3, num 0, max 50 > 10:01:38 PutReintegrateObjects: stale directory fid 0x7f000004.5.3, num 0, max 50 The stale directory fid stuff really shouldn't matter, this is just an indication that the server should inform the client to refetch the directory contents because the server just reintegrated some operation and knows because of the directory version vector that this client's view of the directory is outdated. JanReceived on 2003-07-11 15:30:37