Coda File System

From: Jan Harkes <jaharkes_at_cs.cmu.edu> Date: Thu, 17 May 2001 10:36:02 -0400

On Thu, May 17, 2001 at 11:05:51AM +1200, Steve Wray wrote:
> > From: Shafeeq Sinnamohideen [mailto:shafeeq_at_cs.cmu.edu]
> > On Wed, 16 May 2001, Steve Wray wrote:
> >
> > > > It doesn't matter what kind of FS is used on the server. Only
> > the client
> [snip]
> > >
> > > I'm not sure how to interpret your comment about the client...?
> >
> > The client venus cache partition must be on an ext2, reiser, or ramfs
> > partition for it to work. This is because when the Coda kernel module gets
> > a request, it must be able, in the kernel, to forward it to the file
> > system that contains the container file so it can do the operation.
> 
> Which is the container? /vicepa?
> Is that the cache partition on the client?
> I'm still groping around the terminology here...

No, /vicepa is on the server. The server doesn't do any tricky stuff, so
it doesn't matter what type of filesystem is used.

On the client, the file data is stored in 'container files', which are
located under /usr/coda/venus.cache/. When an application opens a file
in /coda, the cachemanager opens the associated container file and
passes the filedescriptor (before 2.4.4 it was device/inode numbers)
back.

>From that point on, all read/write and mmap operations operate directly
on the container file without bothering about sending upcalls to
userspace. This allows Coda to achieve the same performance for read and
write calls as the filesystem on which the container files are stored.

However, the code that redirects the read/write/mmap operations assumes
that the container file can be accessed using the kernel's generic
read/write and mmap implementation. This assumption is valid for at
least ext2fs, reiserfs and ramfs (I checked and tested these). But it is
broken for filesystems like tmpfs and vfat.

> On the toy client I was working with, all partitions
> were LVM/XFS except for those holding rvmlog and rvmdata,
> these were seperate logical volumes and were unformatted.

LVM (like RAID) shouldn't have any influence because it operates on a
much lower layer, the block layer. If you didn't see any strange
behaviour, especially when writing to files in /coda, XFS must be using
the generic read/write calls.

> > The overall design of Coda assumes that writes are much less frequent than
> > reads, which is the experience from AFS. Thus Coda is less suited for
> > workloads that write heavily.
> 
> ohhhhh so you wouldn't want /usr/*/src on it...
> :)

Just consider the fact that most object/dependency files are very
machine specific, CPU type, compiler version, installed libraries,
available include files, etc. I do tend to keep my source in /coda, but
typically have the object/build tree somewhere on the local disk.

$ cd /home/jaharkes/build/x-obj
$ /coda/usr/jaharkes/x/configure
$ make x

All of the Coda sources and many other applications have no problem with
this.

> > Of course, the server doesn't do anything special for the client running
> > on the same machine, only the bulk data transfers go faster across the
> > "network".
> 
> ok so WRT populating a coda directory, it doesn't really matter if
> its done by the server or some client? (performancewise)

I believe it doesn't make much difference, as long as your network is
100Base-T or faster.

> > > Does this mean that Linux is particularly bad for Coda?
> > > Is this fixable with any tweaking? Different filesystems?
> > >
> > On the BSDs, one can place the RVM log file on a raw disk partiton, so
> > accesses will not go through a file system.
> 
> Thats what I did in Linux...

Linux doesn't really provide raw access to the partition, all data is
still goes through the page or buffer caches. Stephen Tweedie made rawio
patches for 2.2, but I haven't looked at those.

> > Generally, placing the log file an a separate physical disk will help,
> > since only the log needs to be appended to synchronously, while the data
> > file and /vicepa can be written lazily.
> 
> It can be hard to arrange that on LVM...
> :)
> it kinda makes the disks transparent...

We really should do logging differently, for debugging it is useful to
log unbuffered (or use fsync after every fprintf). However, for
production use and performance reasons it would be better to allow both
libc and the OS to buffer writes. Performance really hasn't been a
concern yet, and there are probably many places where we can do a lot
better with simple changes, like reducing the amount of unnecesary
information in the logs and removing fsync's after fprintf's.

Jan

Coda File System

Re: ok lets see now...