(Illustration by Gaich Muramatsu)
On Mon, 2005-05-02 at 16:32 -0400, Jan Harkes wrote: > could also be that you have a reintegration conflict, which makes the > volume switch to disconnected state. This is because a reintegration > (local-global) conflict actually moves everything in the volume to a > temporary local repair volume so that we can show both the local and the > global versions later on when the conflict is expanded. The problem is What would you suggest would be the best way to detect when a conflict occurs so that an administrator can be notified? Is there a particular message we can monitor one of the logs for? Or perhaps a cron job with a find command similar to this: find . -type l -a -not \( -xtype f -o -xtype d \) Or perhaps monitoring the /usr/coda/spool directory? How is this managed in other places? Also, the repair utility doesn't seem to have a way to list what objects are in conflict -- you have to already know the full path to them. Are there any undocumented commands or shortcuts for using this utility? > So the client ends up dirtying a lot of memory, both rvm and the data > associated with the container file, but doesn't flush anything to disk. > Then it sends an RPC to the server who fetches the data, which is most > likely for the most part still in memory since the client didn't flush. > But now the server is hit with a double whammy, not only is he writing > his own dirtied data, but also the client's dirty stuff. > > And because we're single threaded (userspace threading), the server > process is blocked until all the data has hit the disk. In the mean time > the client is eager for a response, earlier when it was fetching data > the server was real quick to respond, the RTT estimate probably ended up > being near zero. So it get's impatient and assumes the request got > dropped and retransmits. Would it help my situation if there was a minimum for the RTT estimate in the case where the estimate is near zero? That would make it so the server can take a moment to flush a file without the client write disconnecting. > At the same time, the poor server is still > stuck waiting for the disk, and can't even dash off a quick ack telling > the client that it did get the request and is working on it. Are there any plans to make the server multi-threaded to avoid these sorts of bottle-necks? > but in this case the server probably ended up completing the operation, > we just missed out on the final reply. And when we reintegrate we > automatically hit a conflict because the locally cached version vector > is different from the server's version _and_ there is a different store > identifier associated with the operation. So we assume we got an > update/update conflict, i.e. another client wrote to the file while we > were disconnected. Is there any way to compare the files when this happens? I mean, in our case, most often this is what is happening, but the server did write the file and the local and global versions are identical in content, time-stamp, and size. I thought coda used some checksumming to detect this sort of thing. Is that something else? Could it be applied here to reduce false-conflicts? -- Patrick Walsh eSoft Incorporated 303.444.1600 x3350 http://www.esoft.com/Received on 2005-05-03 10:40:08