(Illustration by Gaich Muramatsu)
I am trying to set up a coda filesys. My experience is that when the system works, it's very nice. This is, however, rare. Mostly the system acts in flaky & sensitive ways that require constant intervention. Can anyone advise me? I am running all the latest stuff: release 6.0.6-1 on both client & server. Server & clients all run on linux boxes. The server has 1Gb of RVM, in a file, not a partition, due to hints I've seen in the docs & on the mailing list about paging, mmaping, etc. (1Gb of RVM is an undocumented option handled by the vice-setup scripts.) The RVM log is 25Mb, on a raw partition. The files live in a 400Gb ext3 filesys on /vicepa. For the initial trial, I started with my personal music collection -- 9Gb of mp3 & flac files. The mp3 files are roughly 1-5 Mb each; the flac files are 5-30Mb each. So: a small number of big files. The server is sitting in a real machine room, with real network connectivity: gigabit ethernet to a lan, and a real pipe to the internet from there. I set up several clients: CM: Client "CM" is a linux box sitting behind a standard home cable modem. It has constant connectivity ranging up to 1Mb/s. LOCAL: Client LOCAL is on the server; no network in the picture. WAN: Client WAN is a linux box sitting on an ethernet in my office at Georgia Tech -- no cable-modem between it & the Net. I mention the cable-modemness of the client connection, because Jan has posted earlier saying that the asymmetry of cable-modem bandwidth confuses coda's bandwith measurements -- it assumes incoming bandwidth equals outgoing. ------------------------------------------------------------------------------- Failure 1: The first thing I did was copy 9Gb of music files into my coda fs on the LOCAL client -- that is, the files were copied from a local ext3 filesys into a coda filesys *on the machine where the coda server runs*. This worked, thought it was a little weird to see "red zone -- stalling blah blah" messages on such net-less operation. I was able to access these files from client CM & WAN in onesies & twosies with no problem. Then I tried, on client WAN (that is, the client that communicates with the server over a long-distance Internet connection, but doesn't have a wimpy cable modem connecting it to the Net): find . -type f -exec md5sum {} \; At first, it ran like a champ. Then it didn't. Here's the tail of the recursive md5sum walk: 4c507b84f2191ad0c9e8921e0f543ac7 ./affection/cd.db 152c03eacd67ba3f28462abcacd85453 ./affection/track08.cdda.flac 8cd0cf505aa05c7bebbac9fa94560289 ./affection/track08.cdda.mp3 841ab15fa7bde3e019c06f7b0394351d ./affection/audio_02.inf 9e6b48f2d7fe648f83bec2a221a8e5d8 ./affection/track11.cdda.flac ae7ca0a4a359a8f92d9a079a7cc8e364 ./affection/audio_12.inf 0ccaf063cbf89f7345955257d96134ad ./affection/cdp-q 4c1d9cc34e790c8fb52975a9973ce10d ./affection/track13.cdda.mp3 4f9377b72053579bcb80e59bb5ad610e ./affection/audio_10.inf 44e6c8a3596b59f589d934c624141e8d ./affection/track07.cdda.flac 9bea35fbbb5f679a8de559bdfd37bf6c ./affection/track10.cdda.mp3 4b0fb02ca5944804cc403b6ff1f3797a ./affection/audio_01.inf md5sum: ./affection/track05.cdda.flac: Connection timed out find: ./affection/audio_08.inf: Connection timed out find: ./affection/track05.cdda.mp3: Connection timed out find: ./affection/audio_11.inf: Connection timed out find: ./rampal1: Connection timed out find: ./rampal2: Connection timed out find: ./sleepbeauty+toyshop: Connection timed out find: ./porter-on-mind: Connection timed out find: ./th-md5s: Connection timed out find: ./mozart-horn-concerti3: Connection timed out find: ./th: Connection timed out find: ./algreen: Connection timed out find: ./bush-story: Connection timed out find: ./mozart-wind-concerti: Connection timed out find: ./oconor-piano: Connection timed out find: ./eagles-hits: Connection timed out find: ./anything-goes-yoyo: Connection timed out find: ./beeth-piano1: Connection timed out Again, note that this lossage occurred on a system with no cable modem, and presumably symmetric bandwidth to the server. ------------------------------------------------------------------------------- Failure 2: On client CM -- the system connected to the Net via a home cable-modem -- I attempted to copy a really large dir of media (about 200Gb) into my coda filesys: cp -prv . /coda/lambda.csail.mit.edu/shivers/music-npx It managed to copy about 15 files, then codacon began blatting out Red zone, stalling writer ( 00:33:35 ) messages, and then the client went write-disconnected. % cfs lv ~/c Status of volume 0x7f000000 (2130706432) named "coda:root" Volume type is ReadWrite Connection State is WriteDisconnected Minimum quota is 0, maximum quota is unlimited Current blocks used are 10324696 The partition has 371998976 blocks available out of 382693232 Write-back is VIOC_STATUSWB: Invalid argument Note the weirdo final line -- "VIOC_STATUSWB: Invalid argument"? What's that? During this time, the net connection was completely solid. I mean, I might have gotten less than my nominal 1Mb/sec, but the connection was always there. So the real-world operation of coda here is that if you start writing a lot of data, you disconnect, and then your writes just fail. So you can't ever count on some operation actually working; it could very easily fail mid-stream. ------------------------------------------------------------------------------- Failure 3: On client CM -- the one connected via cable-modem -- I also did a find . -type f -exec md5sum {} \; in the coda dir holding the 9Gb of music. It won for a couple of files, then began to barf out msgs like this md5sum: ./thelonious/track03.cdda.flac: Connection timed out find: ./thelonious/track03.cdda.mp3: Connection timed out find: ./thelonious/audio_17.inf: Connection timed out find: ./thelonious/track07.cdda.mp3: Connection timed out . . . ------------------------------------------------------------------------------- What I find, in general, is that I cannot rely on file ops completing. Apps that access my coda files sometimes win and sometimes seem to drive the system into disconnected state, and then I must go through a cfs wr cfs cs cfs lv . dance to reconnect. This happens when I am on a client with a completely stable connection to the ethernet. We are not talking phone lines here. This essentially renders coda unusable. I tried jacking up the timeout & retry values on the client and server to see if that would help. Maybe it did, some. But I am still definitely losing. I also tried doing a cfs strong I don't have a super-clear idea of how this would affect my operation -- the one-line description with the cfs doc is that it prevents the system from ever going into weak-connectivity mode, but that doesn't mean it would prevent the system from going write-disconnected. In any event, when I do this, my client becomes more or less coda-catatonic. Some questions: 1. Am I doing something wrong? 2. Do other people lose in this way? / Are other people winning? I do not see similar reports on this mailing list. Is it that no one is hammering on their servers with big files? Is it that no one is connecting via cable modems? I don't have a good feeling for how many people are really using coda and in what configuations. 3. Is coda not ready for really big repositories (800Gb filesys, 1Gb rvm metadata)? 4. Any advice at all? I'm a little dismayed to be losing at such a simple stage of useage. I'm not having problems with reintegration conflicts or any of the real voodoo. I'm getting hosed just reading & writing files *while connected*. BTW, I'm also surprised that coda is having problems with asymmetric network connections like cable modems in 2004. The lion's share of mobile connections these days at private residences is through connections of this sort. -OlinReceived on 2004-07-20 15:48:17