(Illustration by Gaich Muramatsu)
I am trying out the new coda release, not with success. Here's a description of my failed effort. - create a vice setup on a debian x86 box. The box, lambda.csail.mit.edu, lives in a real machine room with a fat pipe to the net. - create a venus setup on an ubuntu x86_64 box with + 10Gb of cache + big DATA & LOG: % ls -l DATA LOG -rw------- 1 root root 1201865464 2007-05-01 12:07 DATA -rw------- 1 root root 300468736 2007-05-01 12:09 LOG This box lives in my office at Northeastern, about 2 miles away from the server over at MIT. It also has a university-grade net connection. It has 4Gb of ram and 10Gb of (encrypted) swap space on 3 spindles. Here is the /etc/venus.conf: realm="lambda.csail.mit.edu" # 10 Gb of local file caching cacheblocks=10000000 errorlog="/var/log/coda/venus.err" logfile="/var/log/coda/venus.log" rvm_log="/var/lib/coda/LOG" rvm_data="/var/lib/coda/DATA" cachedir="/var/lib/coda/cache" checkpointdir="/var/lib/coda/spool" pid_file="/var/run/coda-client.pid" run_control_file="/var/run/coda-client.ctrl" marinersocket="/var/run/coda-client.mariner" mapprivate=1 DATA, LOG and the cache tree all live in my ext3 /var file system -- no raw partitions of any kind being used here. - copy 2.6 Gb / 26k files of my home dir into my coda filesys, using cp -pr <stuff> /coda/lambda.csail.mit.edu/user/shivers/. This runs for a while, then completes w/no problem. I can poke around in the coda dir from the shell using cd, ls & more, no problem. Meanwhile, codacon is scrolling writeback messages like crazy. Eventually, everything is copied back to the server, cfs lv shows no pending CML entries, and a du -sk of the vice directory shows that it's got all 2.6Gb of the bits. - hoard it all with hoard add /coda/lambda.csail.mit.edu d+ Note that my 2.6Gb hoard should fit entirely within my 10Gb cache. This runs for a while, then terminates successfully. - Test it by walking the whole tree & reading every file, using find(1): find /coda/lambda.csail.mit.edu/user/shivers -type f -exec /tmp/eat {} \; -print where /tmp/eat is a simple shell script that cats its args to /dev/null: #!/bin/sh - exec cat "$@" > /dev/null This runs & completes successfully. Codacon shows no server->client file transfer during this. So far, so good! Now for the trouble. - Test it again by saying "cfs disconnect" and then redoing the find tree-walk to read the whole subdir a second time. This runs along fine for a while, with a silent codacon, then codacon suddenly outputs ValidateAttrsPlusSHA CVS(4.7f000000.baf.28f8) [0] ( 11:56:43 ) Probe ( 11:57:21 ) and the find tree walk hangs. After a minute or two, codacon says unreachable lambda.csail.mit.edu ( 11:58:10 ) and the find walk resumes with the following output: find: ./research/mrlc/mrlc/spim/CVS: Permission denied ./research/mrlc/mrlc/spim/mips-syscall.h ./research/mrlc/mrlc/spim/endian.c ./research/mrlc/mrlc/spim/buttons.h ./research/mrlc/mrlc/spim/mips-syscall.o ./research/mrlc/mrlc/spim/display-utils.c find: ./research/mrlc/mrlc/confpaper: Permission denied find: ./research/mrlc/mrlc/paper: Permission denied find: ./research/mrlc/mrlc/CVS: Permission denied find: ./research/mrlc/mrlc/source: Permission denied find: ./research/mrlc/mrlc/harness: Permission denied find: ./research/mrlc/mrlc/paper1: Permission denied ./research/mrlc/okasaki-msg . . . and the rest of the find walk has these "Permission denied" message scattered throughout the transcript. - Then I go poke around in the file system. I now have trouble accessing the problem directories. For example: % ls -ld research/mrlc/mrlc/spim/CVS drwxr-xr-x 1 shivers nogroup 2048 2007-04-30 16:04 research/mrlc/mrlc/spim/CVS % ls research/mrlc/mrlc/spim/CVS ls: research/mrlc/mrlc/spim/CVS: Permission denied % cfs la research/mrlc/mrlc/spim/CVS research/mrlc/mrlc/spim/CVS: Connection timed out % Timed out? Hey, I hoarded the file and *disconnected*. Why is venus even trying to connect at all? Here is what cfs lv says while I'm in the disconnected state: % cfs lv /coda/lambda.csail.mit.edu/ Status of volume 7f000000 (2130706432) named "/" Volume type is Replicated Connection State is Unreachable Reintegration age: 0 sec, time 5.000 sec - Then I reconnect with cfs reconnect Now I can see the problem directories with no trouble. I redo the find tree-walk a third time and it completes with no problems. + codacon shows *no* server->client file motion. + my network load meter shows only minor traffic. So I have reason to believe the whole tree walk ran entirely out of the cache. I'm mystified. By the way, I don't think it's because I'm running on an x86_64 client. I have gotten similar problems when running with my simple x86 notebook as the client this week. -OlinReceived on 2007-05-01 13:56:58