(Illustration by Gaich Muramatsu)
I noticed that some of the raised questions weren't answered yet. On Tue, Nov 25, 2003 at 10:28:18AM +1000, Daniel Andersen wrote: > Am i correct in assuming that, if one server goes down for whatever reason, > the other will continue to function without a problem until the other comes > back up, and then pass back any changes that occurred while it was gone? Yes, but only for a short time. It mostly depends on how quickly your clients are making changes, because the server keeps a finite log of recent operations to make the server-server resolution when the 'lost' server returns (more) reliable. Once the log runs out, the running server becomes quite unusable. > Also, if the SCM was removed to install a larger hard drive, is moving the > settings over as simple as copying the /vicepa and /vice directories over, or > would there be more to it? A little bit more (or less), /vice/db contains shared state between all servers. /vice/* should contain not really important data. /vicepa contains file data. RVM data contains directory data and file metadata. RVM log contains changes to the RVM data that have been acknowledged with the client(s), but still have to be committed. You would want to first truncate any pending operations in the RVM log. (http://www.coda.cs.cmu.edu/doc/html/rvm_manual-8.html) # rvmutl * open_log /path/to/logfile * recover Now the only important data that is really needed to rebuild the server is whatever is in the RVM data file or partition, and the contents of /vicepa. /vice/db/* can be copied from any other server within your server group. If you are upgrading any machine that is not currently 'nominated' as SCM, you only need the update.tk file and updateclnt will do this for you. If you want to resize the RVM data partition at the same time, http://www.coda.cs.cmu.edu/misc/resize-rvm.html > Two more questions and then I'm done. Is this system actually production-ready Don't know, although Coda has been running in 'production' here at CMU all throughout it's development (f.i. all the webpages are served directly out of /coda), we only have a handful of active users. There are still plenty of little problems that make Coda a not-production ready system in my eyes. There is a 256KB directory size limitation, which is pretty bad because you can't store more than about 2000-4000 files in any directory. I believe that at the moment a volume can only hold the metadata for approximately 160000 files. A fix for this is pretty much 'working' and we have a server that has (close to?) 2 million files in a volume, but it needs some more work and testing, I believe it breaks backups. Then there are deliberate design decisions. When a server responds slowly, a client switches to write-disconnected operation (writeback caching). Changes are not sent back immediately, but as a batch at some later point in time. This of course provides a weaker consistency model and increases the chances on inconsistencies when multiple clients are trying to update the same files. Any inconsistency will be flagged as a conflict and requires user intervention to 'repair the conflict'. So actual write/write sharing is not really something to be advised and automated 24/7 processes need to be monitored or changed to avoid problems or data loss. f.i. my procmail script that drops email into the maildir folders in Coda has the following, :0 W * ^X-Mailing-List:.*codalist@ /coda/.../codalist/ :0: # If any deliveries failed save it in the retry-delivery folder (local-disk) * ^X-Mailing-List:.*codalist@ retry-delivery Now if any mail is dropped in retry-delivery, I fix the problem (repair conflict, or revalidate my Coda tokens) and run a 'retry-delivery' shell script. #!/bin/sh mv ~/Mail/retry-delivery ~/Mail/retrying touch ~/Mail/retry-delivery formail -s procmail < ~/Mail/retrying So in a way I'm working around the problem. I could have the mail daemon keep things queued, but then I would never get any indication that something is wrong with the mail delivery. Right now I use an xbiff-like desktop thingy that can quickly tell me that email is not being delivered correctly. JanReceived on 2003-12-05 16:59:22