(Illustration by Gaich Muramatsu)
On Tue, May 18, 2004 at 09:58:21PM +0530, Sundaram, Arunkumar wrote: > In one of the documents I read that CODA uses read-one, write-all > approach for server replication. I am a little bit concerned here. > > Let's assume that there are three replicated servers A, B and C. > And let's assume that client Ac is connected to A, client Bc is > connected to B and client Cc is connected to C. First problem, clients Ac, Bc, and Cc are all connected to servers A, B and C. The only way they can be connected to a single server is when there is a network split. This doesn't really affect your scenario all that much. > In this scenario let's say there is a file named "CommonFile" under > root directory of CODA share. Consider a sequence of following > operations on this file > > Ac opens "CommonFile" for write > Bc opens "CommonFile" for write > Ac appends some content äA to "CommonFile" > Bc appends some content äB to "CommonFile" > Ac propagates äA to other replicated servers > Bc propagates äB to other replicated servers > lets say at this time äA is propagated to servers in the following order; first to A then to C and then to B There is no ordering, a client writes simultaneously to all available servers. It sends a STORE rpc call to all available servers which (try to) get a write lock on the volume or object. Once the object or volume is locked the servers all fetch the data associated with the file from the client. Once all fetches have completed, the servers bump their local version in the version vector. After that the STORE rpc returns and the return code indicates if the transfer succeeded and which of the available servers committed the update. The client then piggybacks a COP2 (2nd phase commit) message on the next operation, which informs the servers who also committed the operation and this brings the version vector in sync on the replicas. If either Ac or Bc goes first, the modified version vector will cause the other update to be declared 'in conflict' and the second writer will have to decide which version to keep (manual repair by the user). When sending out operations that tend to grab locks, we often order outgoing messages by IP address, this way all clients will try to lock the volume on servers A, B and C in the same order. For the few situations where this doesn't help (Ac and Bc both see/use a different set of addresses to talk to the servers, or a network routing problem has separated client Ac and server A, from Bc and the other servers) we rely on the version vector. Remember that the server who commits an operation bumps it's local version vector slot. So if Ac commits it's write operation at server A, and Bc commits it's version to server B. If the original version of the file was (1,1), the version of Ac's update on server A will become (2,1) while the version of Bc's update on server B will have a version of (1,2). If B was somehow able to still talk to server A and it's update came in later it would still be detected as a local-global (client/server) conflict. But even if we can't detect the client/server conflict at this time and both operations are committed, when the network connectivity is restored and any client accesses the object it will receive both version vectors from the servers and notice the discrepancy. At this point the client will inform the servers of the detected server/server conflict. When servers are told there is a conflict they attempt an automatic resolution process. Some cases are quite simple, it could have been that server A was down for maintenance and simply missed several updates (all slots in it's version vector are lower). Or it missed the piggybacked 2nd phase commit message which looks like a conflict, but identical client identifier/operation sequence number on all servers. These are quickly recognised and the missed update is simply copied from the most up-to-date replica. Worst case, we have the given (1,2) and (2,1) conflicting version vectors and different operation identifiers. Even here we can automatically resolve most directory conflicts because all servers keep a log of recent operations and try to merge these logs and apply the missed individual updates. Only once all resolution attempts have failed, the object is marked as a conflict on the servers and the client is notified. At this point a user will have to expand the conflict into the various versions with the manual repair tool and decide which replica he wants to keep or create a merged copy of the file to replace the existing versions. > Or is it a limitation that files cannot be used simultaneously at two > clients connected to two different duplicated servers. I believe there is an implicit assumption that write-write sharing is uncommon. But multiple readers and a single writer sharing the same files is definitely not a problem and should not lead to conflicts, except if the implementation has a bug somewhere. There is some interesting reading about the optimistic replication and server resolution in Puneet's thesis, Mitigating the Effects of Optimistic Replication in a Distributed File System - Kumar, P. School of Computer Science, Carnegie Mellon University Dec. 1994, CMU-CS-94-215 It can be found among the other PhD theses on this page, http://www-2.cs.cmu.edu/afs/cs/project/coda-www/ResearchWebPages/docs-coda.html JanReceived on 2004-05-18 13:32:10