(Illustration by Gaich Muramatsu)
Hello, I am using Coda server replication for many years and it seemed to work as expected. Nevertheless I recently encountered a situation when the behaviour looks odd. Running Coda 6.9.5 on ia32. All volumes in the realm are replicated across the realm's two servers. All concerned computers have reliable hardware and are on the same LAN. The realm seems to be "properly setup in every way" including DNS SRV records pointing out the two servers. When I take down one of the servers some parts of the data become unavailable on clients. An attempt to access the data indicates either I/O error (observed on directories) or 'Connection timed out'. I am expecting the client to be able to fetch the data from the remaining server but this does not happen. A "cfs flush" or "cfs flushvolume" makes also the flushed parts of the data unavailable until the missing server is brought back online. Remarkably, a computer with a freshly initialized Coda client (memory cache, no rvm) started when one of the servers already is down can access the data, fetching from the remaining server just fine. Its twilling which was online and ran find over the data while talking to both servers gets with a single server 'Input/output error' on many directories when trying the same find command, until the missing server is back. >From my observations it looks like it is the data having been "stat()-ed" during the availability of both servers but not present in the cache which suffers when one of the servers goes away. Does the cache/rvm become "poisoned" by references to the exact server the getattr data had been fetched from? Jan, would you comment on this issue and suggest what can be wrong in my expectations or in my setup? Regards, RuneReceived on 2014-05-29 08:27:23