(Illustration by Gaich Muramatsu)
On Thu, Jul 07, 2005 at 01:47:09PM -0600, Patrick Walsh wrote: > And I've hit upon what I think must be the problem: > > # cfs whereis /coda/director/snapin/pool_scm > dir224 dir225 dir225 > > A quick look at VRList on the server shows: > > /snapin 7f000003 3 1000004 2000004 200000a 0 0 0 0 0 0 Yes, that would be a problem. When the client sends a multirpc call to the servers it doesn't contain the replica-id, but the replicated volume id (i.e. 7f000003). The server then internally maps this to the local replica id by iterating the list of replicas until it finds a volume that has the correct server identifier (the first 8 bits of the volumeid). So it finds replica id 02000004 and performs the operation. In this case it will actually receive 2 of the 3 MultiRPC calls, but performs both operations on only the first replica it finds. So a create should fail with an EEXISTS, and similar strange errors. > So it appears we are triply replicating a volume to two servers. I > have no idea how this happened -- we've automated the setup of coda and > that code hasn't changed for some time. So I'll look into this and try Maybe 2 servers have the same server-id, or you initially created a doubly replicated volume and then added a replica where the new replica is on one of the existing ones. (or it was as simple as "createvol_rep volume server_a server_b server_b") > to figure out what's going on. Sorry to waste your time with a bad > setup. I just can't figure out how it got setup wrong. I can't figure out how it even got this far in the copy. I guess the client might have disconnected from the 2 replicas on the same server, committed on the remaining server and used resolution to propagate the updates. JanReceived on 2005-07-07 16:04:51