(Illustration by Gaich Muramatsu)
On Tue, Feb 27, 2007 at 02:15:53PM +0100, Enrico Weigelt wrote: > * Jan Harkes <jaharkes_at_cs.cmu.edu> wrote: > > <snip> > > > In theory the Coda client could start to prefetch the attributes > > whenever an application reads the directory contents, which at > > first may seem very beneficial. However in the long run this is > > surprisingly often a bad idea. > > Well, if the directories are small, I don't see a big problem. > When fetching an directory's status, there could be an bit for > getting the whole contents too: we just need a list of filenames > with there inode_id's and attributes. Should be well compressable. > We also could use an timestamp (or sth similar) to generate an > difference datagram. I'm not too concerned with the amount of data on the wire, adding attribute information of all children of a small directory along with the directory data is possible. However in the Coda cache those attributes are not stored as directory data, but as file-objects with only attribute information. And the cache has a limited number of file-objects so we would have to throw out existing objects to make room for this (possibly never used) attribute information. > > In many cases I don't really care about file attributes, for > > instance I may try to run > > /coda/coda.foo.bar/path/to/my/openoffice-installation/oowriter > > > > In which case, yes the shell will open each directory in the path while > > it walks towards the binary I want to run. But I don't want my client to > > waste precious network bandwidth (and cache space) by fetching all the > > siblings of each path component. > > (such as /coda/coda.foo.bar/path/to/my/gimp-installation, etc.) > > Hmm, in this case, we just need 4x dir_stat() instead of 4x dir_get() No, we do need the directory contents to look up the object identifier of the next path component so we do need 4x dir_get(). > > > How can I make it work more smoothly ? > > > > 1. Use /bin/ls. 'ls' is probably an alias for 'ls --color', and calling > > the binary directly avoids the alias. > > What should this exactly help ? When you use /bin/ls (or unalias ls) you miss out on the pretty colors, but ls will only call opendir/readdir/closedir and it is done. It doesn't need to call stat on all the children. > > 2. Have patience, the Coda cache is persistent and it will try to keep > > the most useful files around, so the next time 'ls' is called on a > > directory you've been to before it will be considerably faster. > > hmm, the problem is: I often have to reinit venus because it fails > to start due errors in the RVM :( That is strange, I rarely need reinit. Could you send me a log of such a failing restart. Rotate the logfiles and restart venus with a high log level, vutil -swap venus -d 100 (venus will not detach from the console) Another trick when you have to reinit venus but you are behind a slow link. It is possible to use lookaside to avoid having to refetch file contents. Venus will only get directory data and attributes. # move the old cache directory aside mv /usr/coda/venus.cache /usr/coda/venus.old mkdir /usr/coda/venus.cache # create a lookaside index mklka /usr/coda/venus.lka /usr/coda/venus.old # reinitialize venus venus -init # tell venus where the lookaside index is located cfs lka +/usr/coda/venus.lka (if you are using the debian packages replace /usr/coda with /var/lib/coda) > > 3. Tell Coda what files you are actually interested in, that way it can > > prefetch them in the background, and also refresh them periodically > > if any of them have been updated on the servers. This is what we call > > 'hoarding' and it is controlled by the hoard command, see 'man hoard'. > > You may have to set the 'primaryuser' option in /etc/coda/venus.conf > > to match your local userid. > > Yeah, this works. But the fetch process is quite slow too. > It took several minutes to get an 1.5 MB file (XUL.mfasl from > mozilla's profile) via an DSL link. FTP and scp are several times > faster. Codacon's output looks like each data chunk has to be > ackknowledged before the next is sent. There is some windowing, but the window has a fixed size. We send 8 1KB packets at a time and send the next set of 8 around the time we expect the ACK (without actually waiting for the ack). We do this up to 4 times, and then retransmit the first unacknowledged packet until at least some of those 32-packets were acked. If any packets were lost they are selectively retransmitted and we continue advancing the window. This is roughly what happens, the details may vary slightly (i.e. we actually retransmit the first unacknowledged packet every time we advance the window, etc). But the maximum window is at most 32KB, and if your link has high bandwidth combined with high latency we can't fill the pipe as well as TCP does. Still a 32KB window with 100ms latency should be able to fill a 2.5Mbit/s link, I'm seeing closer to 10ms from home to CMU so I'm able to fully utilize the available bandwidth. Of course if your latencies are in the order of 2 seconds we will never be able to push more than about 128Kbit/s, and on a faster link that would definitely make it look like SFTP is doing a stop and wait data transfer. JanReceived on 2007-02-27 11:03:30