(Illustration by Gaich Muramatsu)
On Wed, Jun 02, 2004 at 06:43:42PM +0200, Jim Page - emailsystems.com wrote: > > But I would be interested in your current log, and if it is reproducable > > a log at level 10 (volutil setdebug 10), and level 30. Just bump that > > loglevel somewhere after the onslaught has started and hopefully before > > the lockup. > > Unfortunately loglevel 10 changes the dynamics and the error doesn't happen, > as you thought. I have attached some sample SrvLogs anyway. > > I will send more stuff when I can generate something useful - I'm under a > bit of pressure at the mo but I'll get to it. I noticed a couple of things. First of all, all accesses seem to only go to the root volume (coda.root), there are never any files on the mta.pending volume. Are you sure it's mounted in the right place? (not that it should have an effect on this bad behaviour) Second, since your application is unlikely to benefit from lookaside, it isn't necessary to do the SHA checksum calculations. Disabling that should speed up your servers a lot. In the file /etc/coda/server.conf set the option allow_sha=0. For the rest, there are a couple of places where it dumps an error because the volume is already locked, but none of those seem to be fatal. There must be some operation that tries to lock objects in the wrong order which triggers a deadlock. I wonder if I can reproduce this. I know that Steve Simitz is running mostly readonly webservers on top of his Coda clients. He had to give his clients more worker threads than the default to avoid client-side problems. How many concurrent delivery processes do you think might be running at any given time, I know that venus only has about 20 worker threads. Any additional requests should just get queued and handled a bit later, but this could also be a client-side only lockup. One way to test that would be to have a third client that doesn't really do anything and see if it can still access the servers when the other clients are locked up. I noticed that volutil still works, which would indicate that the server is at least still processing new requests. > I have also spotted what I think is a mistake in the code that is preventing > things like 'cfs fr <dir>' from working properly. Source release coda 6.0.6: > file vproc_pioctl.cc, line 1002: looks like it's missing a 'break' to me. Good find, absolutely right and it looks like a couple of the writeback ioctls that were added around the same time have a similar missing break statement as well. JanReceived on 2004-06-02 15:35:37