(Illustration by Gaich Muramatsu)
On Mon, May 02, 2005 at 10:47:32AM -0600, Patrick Walsh wrote: > The apache user, www-data, gets tokens that give it read-only access to > the httpd directory and write access to the logs directory. There's Logs won't get written back until they are closed. Also if the client is running (write-)disconnected the pending store is optimized away when the logfile is reopened which makes it pointless to periodically close them to flush the data back to the server. Basically you won't see a log on the server until it got rotated. > nothing fancy in the setup. For testing purposes, the coda client and > server are on the same vmware machine. There's only a single server. > We're strongly connected and checkservers shows all up. This is Coda > 6.0.8. > > Try to `ls httpd` and get "permission denied". Try to `cfs la httpd` > and get "httpd: Connection timed out". Try `cfs flushobject httpd` and > get "can't flush active file". Stopping apache and then checking with You either lost tokens. With 6.0.8 losing tokens means losing access even to parts that are accessible to anonymous users, this is 'fixed' in 6.0.9 by dropping the established user identity when tokens expire. It could also be that you have a reintegration conflict, which makes the volume switch to disconnected state. This is because a reintegration (local-global) conflict actually moves everything in the volume to a temporary local repair volume so that we can show both the local and the global versions later on when the conflict is expanded. The problem is that the temporary volume doesn't actually exist on the server so any uncached object can not be fetched. The active file error could very well be that that object is considered important for repair purposes, if it was thrown out we wouldn't be able to reach the location of the conflict, or maybe there is a not-yet-reintegrated CML entry associated with it. > venus.log: > CacheFile::ValidContainer: 00/00/00/2B invalid This was probably an open-for-write file, during recovery the size doesn't match the recorded size in RVM. We only get to see the new size when the file is closed. I guess it got filed away in /usr/coda/spool (or /var/lib/coda/spool). > console: > 8 volume replicas > 6 replicated volumes (is this a strange discrepancy? there should > only be 6 volumes) Not necessarily, there are the local root (/coda) and repair volumes. > Local inconsistent object at ???/html/blah/blah, please check (this is > on the httpd volume) > worker::main got bogus opcode 35 That's probably why httpd couldn't get flushed. (Mountpoints are by default not covered, so during startup we don't yet know where the volume is mounted and show ???) > We can reproduce this reliably on a clean install and a reboot doesn't > fix it. The files in httpd are put there using a standard `cp -dR` Well, considering that you're running both a client and a server on the same machine, it is to be expected that they end up either weakly connected (write-disconnected), or completely disconnected at times. Running the both of them in a vmware session probably only makes it do that faster. On Mon, 02 May 2005 13:33:02 -0400, Patrick Walsh wrote: > with only one client and one server and both strongly connected, how > does a conflict happen and how can it be avoided? We can't guarantee strong connectivity, not when there is a network between machines, and not even when both processes are running on the same machine. The whole strong/weak connectivity is an estimate, and the estimate is based on observed delays and losses when sending RPC2 and SFTP packets. Since we use UDP, losses are pretty normal, even when both processes are on the same machine. In fact the kernel is perfectly allowed to drop the packet without even passing it to the receiving process just because it can't immediately free up memory for a packet buffer, or when the transmit/receive buffer gets full. So packet loss is unavoidable, which is why we have retransmission. Once we get a successful reply to a request we divide the delay into several components, server processing time (which is already subtracted on the server side), outgoing latency and bandwidth, and incoming latency and bandwidth. The latency and bandwidth components are calculated based on previous estimates, depending on the size of the request and reply packets and the expected latency and bandwidth in either direction, current estimates are made. But really we're taking one moderately reliable number (time between sending a request and receiving a reply) and attempt to derive 5 different values out of it based on historical measurements. This works most of the time when the client and server are on different machine, because we can assume that almost all of the measured delay can be attributed to the network, as the server already subtracted it's internal processing time. However, when both the client and the server are on the same machine, we get additional scheduling/context switching, and resource contention. Both the client and the server have a big chunk of mmapped data that they need to access (and modify), the RVM data. The client tends to work optimistically and flushes dirty data as little as possible, while the server is very conservative and flushes pretty much as soon as anything important happens. The way that fflush/fsync is implemented in the Linux kernel, the process that calls sync is blocked until all dirty date for the same device is flushed to disk. So the client ends up dirtying a lot of memory, both rvm and the data associated with the container file, but doesn't flush anything to disk. Then it sends an RPC to the server who fetches the data, which is most likely for the most part still in memory since the client didn't flush. But now the server is hit with a double whammy, not only is he writing his own dirtied data, but also the client's dirty stuff. And because we're single threaded (userspace threading), the server process is blocked until all the data has hit the disk. In the mean time the client is eager for a response, earlier when it was fetching data the server was real quick to respond, the RTT estimate probably ended up being near zero. So it get's impatient and assumes the request got dropped and retransmits. At the same time, the poor server is still stuck waiting for the disk, and can't even dash off a quick ack telling the client that it did get the request and is working on it. So the client tries another couple of times and either gives up and disconnects (not so likely with recent rpc2), or by the time it gets an answer from the server decides that the previous latency and bandwidth estimates were far too optimistic and switched to write-disconnected mode. Now all of this is most likely to happen when we're performing a STORE operation. And the fallback for a store during disconnection is a very special case. Most other operations actually return ETIMEDOUT to userspace, but since stores happen on a close and most applications don't retry when close returns a failed, the store is logged in the CML. This is done by internally retrying the operation, but we end up logging the operation with a different operation identifier compared to the timed out store operation. Not really a problem if we really timed out, but in this case the server probably ended up completing the operation, we just missed out on the final reply. And when we reintegrate we automatically hit a conflict because the locally cached version vector is different from the server's version _and_ there is a different store identifier associated with the operation. So we assume we got an update/update conflict, i.e. another client wrote to the file while we were disconnected. JanReceived on 2005-05-02 16:34:02