(Illustration by Gaich Muramatsu)
On Fri, Oct 29, 2004 at 01:58:52PM -0700, redirecting decoy wrote: > problems I'm still having, I am now having a problem > with disconnected volumes. I can see the volume from > all the clients, and it works for a little while, but > then when I do a "cfs lv /coda/machines/whatever", it > tells me it's disconnected. One minute I can see a > directory listing, then next minute I get a > "Connection Timed out". Oddly enough, once in a while > if I "clog user" again, then I can see the directory > again, this isn't consistent though. Seems pretty > random, sometimes it works on some machines, some > times it don't. I would guess there is a network problem. Maybe the servers are sending responses back from a different IP address and the client is unable to match the server it connected to with the incoming callback connection. > I get 1 error in my SrvErr log: > could not open key 2 file: No such file or directory Normal, it is possible to rotate the auth2.tk file on a daily basis. The server tries to use both the old and the new token file to validate the user authentication tokens on incoming connection. > Also have a question about server failover: > If I have 2 servers, one scm and one not, and I pull > the plug on either one, then all the data should go to > the replicated volume on the other server, correct? Well, as soon as the client realizes that the server has become unreachable. Because of all the connectivity problems that people have been seeing the timeouts are pretty generous, we try to resend the packet about 5 times over a 60 second period before we really give up on a server. After that we retry a binding about once every 5 minutes to see if the server came back. > there a way to change this behavior, so that I can > speed things up? And if m1 comes back up, then will Yes you can speed things up, but on the other hand it will also cause the clients to switch to disconnected mode far more often. We used to wait only about 15 seconds before giving up and that was often not enough. Reducing the timeout also pushes the retries closer together and if we lose packets due to network congestion on an intermediate switch or router having all client try harder only makes the congestion worse. > Any ideas on improving coda performance aswell? Right > now codasrv process takes up between 50% - 90% Memory Did you enable the mapprivate flag in /etc/coda/server.conf? It doesn't reduce the total VM size of the process, but typically reduces the RSS (resident set size) and reduces the amount of data that needs to be written to swap so it improves performance quite a bit. JanReceived on 2004-10-31 23:29:24