Coda File System

RE: Coda connectivity lossage

From: <coda_at_allenport.com>
Date: Tue, 20 Jul 2004 16:07:22 -0400
I've noticed behavior similar to what you're describing on a Red Hat box
with the default firewall running.  Check the "validateattrs" and
"serverprobe" values in venus.conf... lowering them may help.  Also, how big
is your client side cache... having an oversized cache eliminates lots of
problems.  I've noticed Venus does have a habit of getting disconnected for
no reason... but haven't had much chance to explore the cause.  Hopefully
someone more knowledgeable can shed some more light on this issue.

Samir


> -----Original Message-----
> From: shivers_at_cc.gatech.edu [mailto:shivers_at_cc.gatech.edu] 
> Sent: Tuesday, July 20, 2004 3:15 PM
> To: codalist_at_TELEMANN.coda.cs.cmu.edu
> Subject: Coda connectivity lossage
> 
> 
> I am trying to set up a coda filesys. My experience is that 
> when the system works, it's very nice. This is, however, 
> rare. Mostly the system acts in flaky & sensitive ways that 
> require constant intervention. Can anyone advise me?
> 
> I am running all the latest stuff: release 6.0.6-1 on both 
> client & server. Server & clients all run on linux boxes. The 
> server has 1Gb of RVM, in a file, not a partition, due to 
> hints I've seen in the docs & on the mailing list about 
> paging, mmaping, etc. (1Gb of RVM is an undocumented option 
> handled by the vice-setup scripts.) The RVM log is 25Mb, on a 
> raw partition. The files live in a 400Gb ext3 filesys on 
> /vicepa. For the initial trial, I started with my personal 
> music collection -- 9Gb of mp3 & flac files. The mp3 files 
> are roughly 1-5 Mb each; the flac files are 5-30Mb each. So: 
> a small number of big files.
> 
> The server is sitting in a real machine room, with real 
> network connectivity: gigabit ethernet to a lan, and a real 
> pipe to the internet from there.
> 
> I set up several clients:
>   CM: Client "CM" is a linux box sitting behind a standard 
> home cable modem.
>   It has constant connectivity ranging up to 1Mb/s.
> 
>   LOCAL: Client LOCAL is on the server; no network in the picture.
> 
>   WAN: Client WAN is a linux box sitting on an ethernet in my 
> office at 
>   Georgia Tech -- no cable-modem between it & the Net.
> 
> I mention the cable-modemness of the client connection, 
> because Jan has posted earlier saying that the asymmetry of 
> cable-modem bandwidth confuses coda's bandwith measurements 
> -- it assumes incoming bandwidth equals outgoing.
> 
> --------------------------------------------------------------
> -----------------
> Failure 1:
> 
> The first thing I did was copy 9Gb of music files into my 
> coda fs on the LOCAL client -- that is, the files were copied 
> from a local ext3 filesys into a coda filesys *on the machine 
> where the coda server runs*. This worked, thought it was a 
> little weird to see "red zone -- stalling blah blah" messages 
> on such net-less operation.
> 
> I was able to access these files from client CM & WAN in 
> onesies & twosies with no problem. Then I tried, on client 
> WAN (that is, the client that communicates with the server 
> over a long-distance Internet connection, but doesn't have a 
> wimpy cable modem connecting it to the Net):
> 
>     find . -type f -exec md5sum {} \;
> 
> At first, it ran like a champ. Then it didn't. Here's the 
> tail of the recursive md5sum walk:
>     4c507b84f2191ad0c9e8921e0f543ac7  ./affection/cd.db
>     152c03eacd67ba3f28462abcacd85453  ./affection/track08.cdda.flac
>     8cd0cf505aa05c7bebbac9fa94560289  ./affection/track08.cdda.mp3
>     841ab15fa7bde3e019c06f7b0394351d  ./affection/audio_02.inf
>     9e6b48f2d7fe648f83bec2a221a8e5d8  ./affection/track11.cdda.flac
>     ae7ca0a4a359a8f92d9a079a7cc8e364  ./affection/audio_12.inf
>     0ccaf063cbf89f7345955257d96134ad  ./affection/cdp-q
>     4c1d9cc34e790c8fb52975a9973ce10d  ./affection/track13.cdda.mp3
>     4f9377b72053579bcb80e59bb5ad610e  ./affection/audio_10.inf
>     44e6c8a3596b59f589d934c624141e8d  ./affection/track07.cdda.flac
>     9bea35fbbb5f679a8de559bdfd37bf6c  ./affection/track10.cdda.mp3
>     4b0fb02ca5944804cc403b6ff1f3797a  ./affection/audio_01.inf
>     md5sum: ./affection/track05.cdda.flac: Connection timed out
>     find: ./affection/audio_08.inf: Connection timed out
>     find: ./affection/track05.cdda.mp3: Connection timed out
>     find: ./affection/audio_11.inf: Connection timed out
>     find: ./rampal1: Connection timed out
>     find: ./rampal2: Connection timed out
>     find: ./sleepbeauty+toyshop: Connection timed out
>     find: ./porter-on-mind: Connection timed out
>     find: ./th-md5s: Connection timed out
>     find: ./mozart-horn-concerti3: Connection timed out
>     find: ./th: Connection timed out
>     find: ./algreen: Connection timed out
>     find: ./bush-story: Connection timed out
>     find: ./mozart-wind-concerti: Connection timed out
>     find: ./oconor-piano: Connection timed out
>     find: ./eagles-hits: Connection timed out
>     find: ./anything-goes-yoyo: Connection timed out
>     find: ./beeth-piano1: Connection timed out
> 
> Again, note that this lossage occurred on a system with no 
> cable modem, and presumably symmetric bandwidth to the server.
> 
> --------------------------------------------------------------
> -----------------
> Failure 2:
> 
> On client CM -- the system connected to the Net via a home 
> cable-modem -- I attempted to copy a really large dir of 
> media (about 200Gb) into my coda
> filesys:
> 
>     cp -prv . /coda/lambda.csail.mit.edu/shivers/music-npx
> 
> It managed to copy about 15 files, then codacon began blatting out 
> 
>     Red zone, stalling writer ( 00:33:35 )
> 
> messages, and then the client went write-disconnected.
> 
>   % cfs lv ~/c
>   Status of volume 0x7f000000 (2130706432) named "coda:root"
>   Volume type is ReadWrite
>   Connection State is WriteDisconnected
>   Minimum quota is 0, maximum quota is unlimited
>   Current blocks used are 10324696
>   The partition has 371998976 blocks available out of 382693232
>   Write-back is VIOC_STATUSWB: Invalid argument
> 
> Note the weirdo final line -- "VIOC_STATUSWB: Invalid 
> argument"? What's that? 
> 
> During this time, the net connection was completely solid. I 
> mean, I might have gotten less than my nominal 1Mb/sec, but 
> the connection was always there.
> 
> So the real-world operation of coda here is that if you start 
> writing a lot of data, you disconnect, and then your writes 
> just fail. So you can't ever count on some operation actually 
> working; it could very easily fail mid-stream.
> 
> --------------------------------------------------------------
> -----------------
> Failure 3:
> 
> On client CM -- the one connected via cable-modem -- I also did a
> 
>     find . -type f -exec md5sum {} \;
> 
> in the coda dir holding the 9Gb of music. It won for a couple 
> of files, then began to barf out msgs like this
> 
>     md5sum: ./thelonious/track03.cdda.flac: Connection timed out
>     find: ./thelonious/track03.cdda.mp3: Connection timed out
>     find: ./thelonious/audio_17.inf: Connection timed out
>     find: ./thelonious/track07.cdda.mp3: Connection timed out
>     .
>     .
>     .
> 
> --------------------------------------------------------------
> -----------------
> 
> What I find, in general, is that I cannot rely on file ops 
> completing. Apps that access my coda files sometimes win and 
> sometimes seem to drive the system into disconnected state, 
> and then I must go through a
>     cfs wr
>     cfs cs
>     cfs lv .
> dance to reconnect. This happens when I am on a client with a 
> completely stable connection to the ethernet. We are not 
> talking phone lines here. This essentially renders coda unusable.
> 
> I tried jacking up the timeout & retry values on the client 
> and server to see if that would help. Maybe it did, some. But 
> I am still definitely losing.
> 
> I also tried doing a
>     cfs strong
> I don't have a super-clear idea of how this would affect my 
> operation -- the one-line description with the cfs doc is 
> that it prevents the system from ever going into 
> weak-connectivity mode, but that doesn't mean it would 
> prevent the system from going write-disconnected. In any 
> event, when I do this, my client becomes more or less coda-catatonic.
> 
> Some questions:
> 1. Am I doing something wrong?
> 
> 2. Do other people lose in this way? / Are other people winning?
>    I do not see similar reports on this mailing list. Is it 
> that no one
>    is hammering on their servers with big files? Is it that no one is
>    connecting via cable modems? I don't have a good feeling 
> for how many
>    people are really using coda and in what configuations.
> 
> 3. Is coda not ready for really big repositories (800Gb 
> filesys, 1Gb rvm
>    metadata)?
> 
> 4. Any advice at all?
> 
> I'm a little dismayed to be losing at such a simple stage of 
> useage. I'm not having problems with reintegration conflicts 
> or any of the real voodoo. I'm getting hosed just reading & 
> writing files *while connected*.
> 
> BTW, I'm also surprised that coda is having problems with 
> asymmetric network connections like cable modems in 2004. The 
> lion's share of mobile connections these days at private 
> residences is through connections of this sort.
>     -Olin
> 
> 
Received on 2004-07-20 16:14:26