(Illustration by Gaich Muramatsu)
Jan, Thanks again for a helpful explanation. We've just now pinned down a solution to some of our problems. I'll explain below. > them to flush the data back to the server. Basically you won't see a log > on the server until it got rotated. That's fine. > You either lost tokens. With 6.0.8 losing tokens means losing access > even to parts that are accessible to anonymous users, this is 'fixed' in > 6.0.9 by dropping the established user identity when tokens expire. It > could also be that you have a reintegration conflict, which makes the > volume switch to disconnected state. We have cron jobs that run as local user root and coda uid 502. Apache runs as user www-data with coda uid 501. But there's a catch. We forgot that apache needs to start as user root in order to listen on ports 80 and 443. It also does its logging as user root. Only its children listener processes run as user www-data. So apache was starting with proper coda permissions, then a cron job was essentially logging user root in to coda with new tokens, thus disconnecting apache from its log directory and causing conflicts. Unfortunately, the conflict resolution process was frustrating because it would show identical files in "local" and "global" down to the file size and time stamp. The checklocal command inside the repiar utility was saying something about the parent dir having changed permissions, which it didn't. I suspect that this error really means that tokens were changed while a file was open so that locally it was "owned" by one coda user while globally by another. Since there's no way to see this information, it appears that the files are the same. Or am I not right about this? > > 8 volume replicas > > 6 replicated volumes (is this a strange discrepancy? there should > > only be 6 volumes) > > Not necessarily, there are the local root (/coda) and repair volumes. Interesting. > Well, considering that you're running both a client and a server on the > same machine, it is to be expected that they end up either weakly > connected (write-disconnected), or completely disconnected at times. > Running the both of them in a vmware session probably only makes it do > that faster. This is disappointing because it makes me worry that we could get hard- to-reproduce occasional conflicts in coda on our backend. And in a server that is basically automated, the problem could go unnoticed for some time. I wonder if venus could detect when the server is on the localhost and then adjust itself to be more patient? I'm sure it's common to want the server to be able to mount the coda files. Intuition says that that should be the most reliable setup instead of the least- reliable setup. It might be slower, but it shouldn't be less reliable. We're putting `cfs strong` 's all over the place now to try to keep things from getting write-disconnected. Is there anything else we can do to enforce this? Slow writes are not a problem so much as conflicts are. Thanks for examining this issue with me. As usual, I'll be sure to write up my notes in the wiki and share them with others to stave off similar such problems in the future. -- Patrick Walsh eSoft Incorporated 303.444.1600 x3350 http://www.esoft.com/Received on 2005-05-02 17:23:05