(Illustration by Gaich Muramatsu)
On Fri, Aug 13, 1999 at 04:01:52PM -0500, Bill Gribble wrote: > > On Fri, Aug 13, 1999 at 02:37:16PM -0500, Bill Gribble wrote: > > > There's a process on the slow client creating log files (in a Coda > > > volume served by a third machine on the fast network) that are to be > > > analyzed by a process on the fast client. > > > > > > Everybody's clocks are synced by xntp. > > > > > > I ran my codes and found that the log files were never appearing on > > > the fast client (or, more precisely, appeared but were empty), even > > > though the slow client issued a 'cfs strong'. > > > > Hi Bill, > > > > Yup, that is a sometimes unexpected `feature' of the way Coda guarantees > > consistency of the filedata. As long as the (syslog?) daemon keeps the > > files open for writing, Coda assumes that the file is still in a state > > of inconsistency (being updated). The file will be sent as soon as the > > last writer closes it,. except if you are weakly connected, then it will > > take about 5 minutes longer. You can rotate the logs every once in a > > while to `flush' them to the servers. > > Hmm... I understand how that might happen, but why is it that even now > (after I have read your reply, almost 2 hours after the last writer > process quit and "fuser" shows nobody having the file open for read or > write) the same thing is showing? i.e. full files on one client, > empty files on others, and specific 'cfs strong' directives to all > clients so everybody is strongly connected. > > > Also you will not be able to easily `outsmart' the logic in venus by > > closing and opening the file between each write call. That stops working > > as soon as venus starts logging, then it uses the open-for-writing state > > of the file as an indication to optimize the pending stores out of the > > log. > > I don't understand. If the file has been closed and that's not enough > to get it to sync, what other operation IS enough? Consider _that_ a bug. If a `dirty' file is not held open by any user-space process, then it either should be sent to the server immediately (connected mode), or be trickle-reintegrated within some bounded time. Except if there is a conflict, or the reintegration is waiting for the user to authenticate. 'cfs wr' should force the volume back in connected mode, and trigger a full reintegration. If that doesn't work, re-authenticate with clog, this disconnects/reconnects the client. Or do a couple of `cfs cs' calls which cause venus to probe the network and adjust its bandwidth estimate. > In any case, no one has opened these files for writing in hours and > they are inconsistent across clients. That seems broken to > me. Somehow venus has forgotten about these particular files. What > can I look for in the venus logs to show me what's going on? /usr/coda/etc/console should show messages if there are inconsistent objects, or the reintegration has failed. codacon should once in a while print bandwidth estimates. There may be a problem when they are very low, I'm suspecting an overflow when the bandwidth estimate falls under 100 B/s, which causes the estimation to never recover without restarting venus. /usr/coda/venus.cache/venus.log should have lots of information about `GetReintegratable' the function that evaluates whether reintegration log entries are ready for reintegration. > If it matters, the current sequence of events with approximate timings is: > > Time/range Client Action > 0s slow fopen("filename.log") > 0+s-20s slow fprintf data > 20+s slow fflush, fclose > 20s-30s fast sleep, hoping for Coda to catch up > 30+s-35s fast analyze data > 35+s slow rename("filename.log", "filename.log.{n}") > > (loop through this cycle indefinitely, incrementing {n}.) > > Currently, on the first iteration of this loop, the data is available > for analysis on the fast client. On second and subsequent iterations, > (i.e. after a rename() call) new data never shows up on the fast > client. > > Could it be the rename() call that's causing problems? Possibly, I know I have fixed some problem where cross-directory rename operations weren't resolved, maybe there are more problems in that area. Also, when weakly connected, all reintegrations are sent to only one server, and then resolution is used to update the others. If you are writing to a replicated volume, starting venus with the -noRoundRobin flag might help, this avoids switching primary-server when multiple servers are available. Then the weak-reintegrations will not get a conflict when the resolution has to be retried. JanReceived on 1999-08-13 17:50:47