Coda File System

From: Jan Harkes <jaharkes_at_cs.cmu.edu> Date: Sun, 31 Oct 2004 23:22:35 -0500

On Fri, Oct 29, 2004 at 01:58:52PM -0700, redirecting decoy wrote:
> problems I'm  still having, I am now having a problem
> with disconnected volumes.  I can see the volume from
> all the clients, and it works for a little while, but
> then when I do a "cfs lv /coda/machines/whatever", it
> tells me it's disconnected.  One minute I can see a
> directory listing, then next minute I get a
> "Connection Timed out".  Oddly enough, once in a while
> if I "clog user" again, then I can see the directory
> again, this isn't consistent though.  Seems pretty
> random, sometimes it works on some machines, some
> times it don't.  

I would guess there is a network problem. Maybe the servers are sending
responses back from a different IP address and the client is unable to
match the server it connected to with the incoming callback connection.

> I get 1 error in my SrvErr log:
> could not open key 2 file: No such file or directory

Normal, it is possible to rotate the auth2.tk file on a daily basis. The
server tries to use both the old and the new token file to validate
the user authentication tokens on incoming connection.

> Also have a question about server failover:
> If I have 2 servers, one scm and one not, and I pull
> the plug on either one, then all the data should go to
> the replicated volume on the other server, correct? 

Well, as soon as the client realizes that the server has become
unreachable. Because of all the connectivity problems that people have
been seeing the timeouts are pretty generous, we try to resend the
packet about 5 times over a 60 second period before we really give up on
a server. After that we retry a binding about once every 5 minutes to
see if the server came back.

> there a way to change this behavior, so that I can
> speed things up?  And if m1 comes back up, then will

Yes you can speed things up, but on the other hand it will also cause
the clients to switch to disconnected mode far more often. We used to
wait only about 15 seconds before giving up and that was often not
enough. Reducing the timeout also pushes the retries closer together
and if we lose packets due to network congestion on an intermediate
switch or router having all client try harder only makes the congestion
worse.

> Any ideas on improving coda performance aswell?  Right
> now codasrv process takes up between 50% - 90% Memory

Did you enable the mapprivate flag in /etc/coda/server.conf? It doesn't
reduce the total VM size of the process, but typically reduces the RSS
(resident set size) and reduces the amount of data that needs to be
written to swap so it improves performance quite a bit.

Jan

Coda File System

Re: Coda 6.07 | Replication Problems