(Illustration by Gaich Muramatsu)
On Fri, Aug 10, 2001 at 09:20:51PM +0000, irbis_at_orcero.org wrote: > > > Hello, coda hackers! > > I am working with a replicated coda server, and it works fine. Anyway, > from times to times reading and writting on a client "hangs", that is, I > can not read, I can not write, the applications stay waiting forever, and > a ls -l also stay waiting forever. > > There is no colition -I tested also with only one venus client-, and > venus says to me: > > 20:57:22 DispatchWorker: signal received (seq = 134401) > 20:58:38 DispatchWorker: signal received (seq = 135274) > 20:59:54 DispatchWorker: signal received (seq = 136919) > 21:00:06 DispatchWorker: signal received (seq = 136964) > 21:00:22 DispatchWorker: signal received (seq = 137024) > 21:01:54 DispatchWorker: signal received (seq = 137126) Ok, these are a result of pressing ^C to interrupt the hanging process. However the worker threads in venus are in most cases not aborted, because they might have locks on volumes or they are in the RPC2 layers waiting for the server reply. There are about 25 worker threads so it's relatively easy to run out once something in the system locks up. And I think I know who/what locked up. There is probably some volume with a CML that is owned by an unusual user (such as root). Any other user that tries to access the volume is put on hold until the CML has been reintegrated, which blocks the worker thread. However the CML owner user isn't getting tokens anytime soon and before you know it all worker threads are blocked waiting for reintegration to start. Probably around this point even simple cfs operations are beginning to fail (they need a worker thread as well), and before you know it even becomes impossible to pass the token which starts the reintegration up to venus because even that requires an available thread. You might have to restart venus to be able to see which volume has the CML and who is the CML owner (and authenticate/reintegrate or purge it). JanReceived on 2001-08-10 16:20:35