(Illustration by Gaich Muramatsu)
On Fri, Nov 12, 2004 at 02:52:43PM -0800, redirecting decoy wrote: > Still having a few problems though, and seemingly > random errors. My biggest problem is constant client > disconnection. This seems to really only happen on my Such disconnections are not normal and probably indicate that something is seriously wrong (i.e. servers can't connect back to themselves). So these clients don't get callbacks and will keep showing stale data. Disconnections also lead to reintegration (and possibly conflicts, if we only sends updates one of the 2 replicas) Which could explain why you aren't seeing changes propagate to the other client. > 1) Is it possible to start venus as a normal user, > instead of root ? The reason I ask is because, when No, venus needs root permissions to open /dev/cfs0, to use the mount(2) system call and to bump the VM limits so that we're actually allowed to mmap the large RVM data file. > my script find's a broken client, it ssh's into that > machine and attempts to run the commands below. It > seems that only root can do this, and I want to avoid > having to ssh into the machine as root. Maybe use 'super' or 'sudo'. > echo -n password | clog -pipe user This isn't really secure, it exposes the password to any user who runs ps auxwww while venus is restarted. Also restarting venus (which involves unmounting /coda) doesn't work if any process has an active reference to any file or directory in /coda. This can be as simple as a shell that happens to have it's cwd somewhere in the tree. > 2) What is the preferred method of shutting down > venus? Definitely not kill -9 unless venus is dead. Kill -9 doesn't give a chance to the program to flush dirty data and such. I use vutil shutdown, which sends a more benign SIGTERM. You also need to unmount /coda, which can be a problem if anything has a reference, the Linux kernel simply doesn't allow unmount when there are busy inodes. > As it is now, when I kill venus on either of my servers, I have to > venus -init to restart it, otherwise it turns into a zombie. This does > not seem to big as big a problem on my client machines though. I can't > find anything that seems relevant in the log files. Again, this is not normal behaviour and I've got 2 replicated servers installed from scratch (6.0.7) on two machines with a fresh Debian-testing installation including local Coda clients on both machines and am not seeing any of the disconnection, stale data or restart problems you are describing. I also haven't seen any logs or gdb backtraces from you that might help me figure out what is going on. > 3) Is there a list of "Connection States", such that > are printed with the command "cfs lv > /coda/storage/..."? If there are, is there a > preferred method of reconnecting a client, assuming > that the client is anything other than "Connected"? Not really, it is a bit like quantum theory. Even when cfs lv reports that a volume is connected, we don't really know whether it is until we try to perform an operation that requires communication with the servers. At that point we figure out if the server responds or not. We run a serverprobe (simple rpc2 ping) every 2-3 minutes which can detect disconnection so that the user hopefully doesn't have to suffer the 60 second timeout. > 4) I find that I often have to clog to reconnect a > client to the servers, if I made an update from If your client is disconnected from the servers, then 'cfs cs' should reconnect. If the client has pending reintegrations then 'cfs wr' should force reintegration. If the reintegration can't proceed because of server-server conflicts then the only solutions are 'cfs purgeml' (drop all local pending modifications) or a user using 'repair' to fix the problem. The only difference between 'cfs cs' and 'clog' is that clog drops existing connections and rebinds to the server. cfs cs simply tests the existing connections and only rebinds if they were unable to get a reply from the server. > another client. Even if "ctokens" says that my token > is still valid, I can't see the updated files until I > clog again. First of all, is this the correct > behaviour ? That depends on the connection state. If the client on which the updates were performed is in write-disconnected or disconnected state, then it is completely normal. If the updating client only talks to server A, while the other client only sees the replica on server B, then it is completely normal. But in reality if the clients are local and fully connected this is highly unusual. > 6) How much can coda tolerate ? I mean, can I use it > to do something like run www/mail/ftp services off the > Filesystem "fairly" reliably? Or am I asking too > much... Most of the webpages and mailinglist archives for www.coda.cs.cmu.edu are served straight from /coda/coda.cs.cmu.edu/project/www/html. Ofcourse this is all static pages, and mostly readonly. In fact the webserver only updates the mailinglist archives, everything else is typically updated from a client on my desktop or laptop. ftp.coda is running from the local disk, mostly because our servers used to be pretty old, slow and small, /vicepX partitions were 2GB each, 3 per server, due to replication the total available space exported by 6 servers was < 18GB. Recently we upgraded the hardware so now we have significantly more space. My email is delivered straight into coda, but the delivery agent does run on my desktop so that I can locally repair any conflicts that might occur. A conflict or loss of tokens makes the volume readonly, procmail fails and the mail falls through in a 'failed-delivery' mailbox on the local disk, an xbiff icon on my desktop alerts me and I can get tokens or repair any conflicts. JanReceived on 2004-11-13 00:50:33