Coda File System

Re: Coda connectivity lossage

From: Steve Simitzis <steve_at_saturn5.com>
Date: Tue, 20 Jul 2004 22:51:47 -0700
i have the same problem whenever i do something like tar'ing up /coda.
this is on a regular ol' 100 meg ethernet network, with no firewalling
or anything else suspect going on.

i'm also not really sure what to do.

On 07/20/04, shivers_at_cc.gatech.edu wrote: 

> I am trying to set up a coda filesys. My experience is that when the
> system works, it's very nice. This is, however, rare. Mostly the system
> acts in flaky & sensitive ways that require constant intervention. Can
> anyone advise me?
> 
> I am running all the latest stuff: release 6.0.6-1 on both client & server.
> Server & clients all run on linux boxes. The server has 1Gb of RVM, in a file,
> not a partition, due to hints I've seen in the docs & on the mailing list
> about paging, mmaping, etc. (1Gb of RVM is an undocumented option handled by
> the vice-setup scripts.) The RVM log is 25Mb, on a raw partition. The files
> live in a 400Gb ext3 filesys on /vicepa. For the initial trial, I started with
> my personal music collection -- 9Gb of mp3 & flac files. The mp3 files are
> roughly 1-5 Mb each; the flac files are 5-30Mb each. So: a small number of big
> files.
> 
> The server is sitting in a real machine room, with real network connectivity:
> gigabit ethernet to a lan, and a real pipe to the internet from there.
> 
> I set up several clients:
>   CM: Client "CM" is a linux box sitting behind a standard home cable modem.
>   It has constant connectivity ranging up to 1Mb/s.
> 
>   LOCAL: Client LOCAL is on the server; no network in the picture.
> 
>   WAN: Client WAN is a linux box sitting on an ethernet in my office at 
>   Georgia Tech -- no cable-modem between it & the Net.
> 
> I mention the cable-modemness of the client connection, because Jan has posted
> earlier saying that the asymmetry of cable-modem bandwidth confuses coda's
> bandwith measurements -- it assumes incoming bandwidth equals outgoing.
> 
> -------------------------------------------------------------------------------
> Failure 1:
> 
> The first thing I did was copy 9Gb of music files into my coda fs on the LOCAL
> client -- that is, the files were copied from a local ext3 filesys into a coda
> filesys *on the machine where the coda server runs*. This worked, thought it
> was a little weird to see "red zone -- stalling blah blah" messages on such
> net-less operation.
> 
> I was able to access these files from client CM & WAN in onesies & twosies
> with no problem. Then I tried, on client WAN (that is, the client that
> communicates with the server over a long-distance Internet connection, but
> doesn't have a wimpy cable modem connecting it to the Net):
> 
>     find . -type f -exec md5sum {} \;
> 
> At first, it ran like a champ. Then it didn't. Here's the tail of the
> recursive md5sum walk:
>     4c507b84f2191ad0c9e8921e0f543ac7  ./affection/cd.db
>     152c03eacd67ba3f28462abcacd85453  ./affection/track08.cdda.flac
>     8cd0cf505aa05c7bebbac9fa94560289  ./affection/track08.cdda.mp3
>     841ab15fa7bde3e019c06f7b0394351d  ./affection/audio_02.inf
>     9e6b48f2d7fe648f83bec2a221a8e5d8  ./affection/track11.cdda.flac
>     ae7ca0a4a359a8f92d9a079a7cc8e364  ./affection/audio_12.inf
>     0ccaf063cbf89f7345955257d96134ad  ./affection/cdp-q
>     4c1d9cc34e790c8fb52975a9973ce10d  ./affection/track13.cdda.mp3
>     4f9377b72053579bcb80e59bb5ad610e  ./affection/audio_10.inf
>     44e6c8a3596b59f589d934c624141e8d  ./affection/track07.cdda.flac
>     9bea35fbbb5f679a8de559bdfd37bf6c  ./affection/track10.cdda.mp3
>     4b0fb02ca5944804cc403b6ff1f3797a  ./affection/audio_01.inf
>     md5sum: ./affection/track05.cdda.flac: Connection timed out
>     find: ./affection/audio_08.inf: Connection timed out
>     find: ./affection/track05.cdda.mp3: Connection timed out
>     find: ./affection/audio_11.inf: Connection timed out
>     find: ./rampal1: Connection timed out
>     find: ./rampal2: Connection timed out
>     find: ./sleepbeauty+toyshop: Connection timed out
>     find: ./porter-on-mind: Connection timed out
>     find: ./th-md5s: Connection timed out
>     find: ./mozart-horn-concerti3: Connection timed out
>     find: ./th: Connection timed out
>     find: ./algreen: Connection timed out
>     find: ./bush-story: Connection timed out
>     find: ./mozart-wind-concerti: Connection timed out
>     find: ./oconor-piano: Connection timed out
>     find: ./eagles-hits: Connection timed out
>     find: ./anything-goes-yoyo: Connection timed out
>     find: ./beeth-piano1: Connection timed out
> 
> Again, note that this lossage occurred on a system with no cable modem, and
> presumably symmetric bandwidth to the server.
> 
> -------------------------------------------------------------------------------
> Failure 2:
> 
> On client CM -- the system connected to the Net via a home cable-modem --
> I attempted to copy a really large dir of media (about 200Gb) into my coda
> filesys:
> 
>     cp -prv . /coda/lambda.csail.mit.edu/shivers/music-npx
> 
> It managed to copy about 15 files, then codacon began blatting out 
> 
>     Red zone, stalling writer ( 00:33:35 )
> 
> messages, and then the client went write-disconnected.
> 
>   % cfs lv ~/c
>   Status of volume 0x7f000000 (2130706432) named "coda:root"
>   Volume type is ReadWrite
>   Connection State is WriteDisconnected
>   Minimum quota is 0, maximum quota is unlimited
>   Current blocks used are 10324696
>   The partition has 371998976 blocks available out of 382693232
>   Write-back is VIOC_STATUSWB: Invalid argument
> 
> Note the weirdo final line -- "VIOC_STATUSWB: Invalid argument"? What's that? 
> 
> During this time, the net connection was completely solid. I mean, I might
> have gotten less than my nominal 1Mb/sec, but the connection was always there.
> 
> So the real-world operation of coda here is that if you start writing a lot of
> data, you disconnect, and then your writes just fail. So you can't ever count
> on some operation actually working; it could very easily fail mid-stream.
> 
> -------------------------------------------------------------------------------
> Failure 3:
> 
> On client CM -- the one connected via cable-modem -- I also did a
> 
>     find . -type f -exec md5sum {} \;
> 
> in the coda dir holding the 9Gb of music. It won for a couple of files, then
> began to barf out msgs like this
> 
>     md5sum: ./thelonious/track03.cdda.flac: Connection timed out
>     find: ./thelonious/track03.cdda.mp3: Connection timed out
>     find: ./thelonious/audio_17.inf: Connection timed out
>     find: ./thelonious/track07.cdda.mp3: Connection timed out
>     .
>     .
>     .
> 
> -------------------------------------------------------------------------------
> 
> What I find, in general, is that I cannot rely on file ops completing. Apps
> that access my coda files sometimes win and sometimes seem to drive the
> system into disconnected state, and then I must go through a
>     cfs wr
>     cfs cs
>     cfs lv .
> dance to reconnect. This happens when I am on a client with a completely
> stable connection to the ethernet. We are not talking phone lines here.
> This essentially renders coda unusable.
> 
> I tried jacking up the timeout & retry values on the client and server to
> see if that would help. Maybe it did, some. But I am still definitely losing.
> 
> I also tried doing a
>     cfs strong
> I don't have a super-clear idea of how this would affect my operation --
> the one-line description with the cfs doc is that it prevents the system
> from ever going into weak-connectivity mode, but that doesn't mean it would
> prevent the system from going write-disconnected. In any event, when I do
> this, my client becomes more or less coda-catatonic.
> 
> Some questions:
> 1. Am I doing something wrong?
> 
> 2. Do other people lose in this way? / Are other people winning?
>    I do not see similar reports on this mailing list. Is it that no one
>    is hammering on their servers with big files? Is it that no one is
>    connecting via cable modems? I don't have a good feeling for how many
>    people are really using coda and in what configuations.
> 
> 3. Is coda not ready for really big repositories (800Gb filesys, 1Gb rvm
>    metadata)?
> 
> 4. Any advice at all?
> 
> I'm a little dismayed to be losing at such a simple stage of useage. I'm
> not having problems with reintegration conflicts or any of the real voodoo.
> I'm getting hosed just reading & writing files *while connected*.
> 
> BTW, I'm also surprised that coda is having problems with asymmetric network
> connections like cable modems in 2004. The lion's share of mobile connections
> these days at private residences is through connections of this sort.
>     -Olin

-- 

steve simitzis : /sim' - i - jees/
          pala : saturn5 productions
 www.steve.org : 415.282.9979
  hath the daemon spawn no fire?
Received on 2004-07-21 01:57:19