(Illustration by Gaich Muramatsu)
On Wed, Apr 20, 2005 at 05:35:59PM -0600, Patrick Walsh wrote: > # volutil info /.0 > Recoverable volume log > version: 1 malloced > ... # volutil info vmm:root V_BindToServer: binding to host verdi Recoverable volume log version: 1 malloced adm_limit 4096 size 32 used 2 rec_max_seqno 1400 current_seq_no 1400 index contents I'm wondering if the resolution log is enabled on your 1000001 replica. Normally we turn it off for singly replicated volumes. However this will be a problem once you hit a conflict now that there are multiple replicas. Essentially this log contains all operations that were applied to a server, but not yet confirmed as completed everywhere by a phase2 commit message (COP2). Normally a client will send an operation to all servers and collect the responses, on the next operation it piggybacks the COP2 assocated the previous operation. If there isn't a next operation within a certain time window it flushes any pending COP2's with a separate RPC call. However, if the client disconnects before the COP2 is sent, the logged operation will not get cancelled. Another place where this happens is when a client is performing weak reintegration where it sends updates to only a single replica and then triggers resolution to propagate the changes to the other replicas. However since there is only a single replica a client never notices that there might be uncomfirmed operations, we normally detect them by looking at version vector differences between replicas. As a result we never force the replica to 'resolve' and the log keeps slowly growing until it fills up and the server dies, search for AllocViaWrapAround in the coda mailinglist archives. Since 5.3.17 we turn the resolution log off when we create a replicated volume with only a single replica, this way we never add new entries to the log and avoid the crash. I seem to recall that the last time someone tried to reenable the resolution log the server didn't really like it... Just tried it, I created a single replica, filled it with data, turned resolution back on, created a second replica, extended the volume in the VRDB. That all seemed to work. However my client doesn't seem to pick up the volume change just yet. I already tried 'cfs checkvolumes' 'cfs disconnect; cfs cs; cfs reconnect; cfs cs' Ok, looking at the source, clearly this will never work since it only picks up changes when the volume name <> volume id mapping changes. However the create does look smart enough to simply update an existing volume, I'll see if I can get it to follow that code-path. The only solution right now is to either flush every object that exists in the resized volume which purges the stale volume information from the client cache, or to reinitialize your clients. So for me with my single testvolume the following worked, $ cd .. # making sure nothing has a reference to any object in the volume. $ cfs fl testvolume $ cfs whereis testvolume VIVALDI.CODA.CS.CMU.EDU MAHLER.CODA.CS.CMU.EDU And now I see Resolve messages showing up in codacon when I run 'ls'. JanReceived on 2005-04-21 14:31:11