(Illustration by Gaich Muramatsu)
On Wed, Jan 21, 2004 at 09:04:44AM +0100, Ivan Popov wrote: > > You have one (or more) clients that are not talking to both servers. > > Possibly weak reintegration, or a server was down for a while. As a > > result one server has a stale copy of the volume. Luckily the other > > server kept meticulous track of what operations the missing server > > hasn't seen yet. > > The problem is that directory update operations on the old volume fail > as both servers at the same time complain about > "no space in the volume log". > Why at all would they use the volume log while fully connected? 2 phase commit, we don't know if the operation will succeed at all servers until it is processed. So the code assumes the worst, and the client collects the success results and sends out a COP2 message which tells everyone who actually received the update. At that point the logged operation is removed. If a server is unavailable, the COP2 message indicates that some server(s) have missed an update and the logs are kept around. At the same time there is a version-vector skew, so when the missing server returns, a client will notice it and trigger resolution. Same happens when a client disconnects before it has sent the COP2 even when all servers committed the update. But in that case the resolution notices that it really has nothing to do besides truncating the resolution logs and syncing the version vectors. The end result is that if the log is already full, you can't even make any connected mode modifications, because those need at least one log entry. JanReceived on 2004-01-21 11:16:08