(Illustration by Gaich Muramatsu)
On Fri, Mar 11, 2005 at 09:59:00AM -0300, Gabriel B. wrote: > yesterday i let the client "uploading" some data to the server. today, > it was hanging there. I canceled the cp command, checked the tokens. > everything seemed fine It sounds like your client got disconnected from the server (reintegration conflict maybe?) and it ran out of log space to keep track of the changes that still need to be reintegrated with the server. It blocks all write operations, reads can still go through as long as there are worker threads available. There's only 20 worker threads and every blocked write 'consumes' one, so things can get to a grinding halt pretty quickly. > then, i restarted venus thinking it's some kind of cache. and started > it again. Venus has a persistent cache that survives restarts (and reboots) the only way to clear the venus cache is to run 'venus -init' or to create a file named /usr/coda/venus.cache/INIT, on Debian that would be /var/lib/coda/cache/INIT, where the various paths are set to comply with the Linux file system standard or whatever the name is. > Well, i have only realm. one SCM, one line in /etc/coda/realms. > But venus said it found 3. And i recall it saying 2 the startup before this. There are 2 realms that are being used internally. One realm is the 'localhost' realm, and it is used to hold the various realm mountpoints that are visible in the /coda directory. The other realm is Repair related, for local/global repair we have to move locally modified files out of the way so that we can show the version that is currently the servers. So the locally modified files are relocated to the 'Repair' realm until the conflict is resolved. > I was copying about 7Gb of small files to a coda server that has about > 20Gb of free space in /vicepa, 20M of rvm log, and 130M of rvm data. > all of them are files since i can't repartition the server without > knowing that we will use coda for sure. I'm pretty sure that files are better nowadays. I've been using them exclusively now that we moved all our servers to new hardware. > btw, right after starting the server i send the SIGWINCH signal. but SIGWINCH? Ah, I see, never actually used that myself. I just run 'volutil setdebug 10'. > it generate nothing in the logs. On the other hand, my venus.log is > more than 90M! > > the lines right before the hand in venus.err are > 05:01:30 Checkpointing p.www > 05:01:30 to /usr/coda/spool/0/camboinha.servers_p.www.tar > 05:01:30 and /usr/coda/spool/0/camboinha.servers_p.www.cml That's a pretty good indication that your client is disconnected from the server for some reason, as it is checkpointing a copy of the pending changes (safety feature just in case your client dies, any lost changes should be in that tarball). > 09:07:58 DispatchWorker: signal received (seq = 2417081) > 09:21:46 DispatchWorker: signal received (seq = 2417109) And this is a user hitting '^C', it doesn't do much for venus though, since there is no code that can unwind the worker thread that is handling the 'aborted' request as it might have grabbed a couple of locks. So venus ends up (eventually) completing the aborted opereration, at which point the kernel will spit out some error message about an unexpected reply. JanReceived on 2005-03-11 10:26:46