Coda File System

From: Jan Harkes <jaharkes_at_cs.cmu.edu> Date: Fri, 1 Apr 2005 15:41:59 -0500

On Thu, Mar 31, 2005 at 10:36:06PM +0100, E. Rosten wrote:
> > On Thu, Mar 31, 2005 at 07:34:29PM +0100, E. Rosten wrote:
> > > Well, your comment just made me take another look at that block of code 
> > > (and I have already looked at it 1e99 times) and it turns out I was 
> > > forgetting to set the write size to sizeof(reply.coda_create). I think 
> > > that bit was missing because I had a nut loose on my keyboard.
> > 
> > Nice catch, 
> 
> Ah, thanks! At least something good came of it. How would this manifest 
> itself? As an error viewable in dmesg, or as an error from the returning 
> write() system call?

The easiest way to catch this in the kernel is actually in the return
path to the application that requested the operation. The path that
passes messages up and down between the kernel and the cache manager
doesn't really know anything about the message contents, so any write
(even the short writes) would be considered valid. Well I can make sure
the number of bytes is at least the size of the output header.

So your cachemanager would get some error back on the write() if it
writes < sizeof(struct coda_out_hdr), not sure what a suitable error
would be though, most errors are because we write too much, or because
the 'other size' stopped reading. EIO seems the best one.

Other errors, i.e. when we write at least sizeof(struct coda_out_hdr)
but less than the size of what we expected for the specific operation
would get handled in the context of the requesting process, so in this
case the create(2), or any other operation performed by the requesting
application, would see the error. Again there seem to be no suitable
error numbers in the SuS specification, I guess EIO might be the most
neutral one as most others seems to indicate that the application did
something wrong.

> If you're interested, another problem is that returning an invalid file 
> descriptor from CODA_OPEN_BY_FD, causes all sorts of trouble: it the 
> program issuing then open command hangs for ever, making it impossible to 
> unmount the system. That said, I managed to develop my filesystem without 
> a single kernel panic, which sure beats developing kernel land 
> filesystems.

I'll look into this, one problem is that if the returned fd does
correspond to a open file in the cache manager (i.e. fd == 0 nicely maps
to stdin) there is no way to tell that it was an incorrect descriptor.
It will probably be possible to catch most cases where the cachemanager
accidentally returns a libc stream (FILE*) instead of a file descriptor,
and hopefully that is the more common mistake.

> > I guess I can add some sanity checks in the upcall reply path, although
> > it will be the user application that sees an error and not the cache
> > manager process that performed the short write.
> 
> I don't fully understand the whole of CODA yet; is the upcall when the 
> kernel sends a message down /dev/cfs? to the userland process?

Upcalls are calls initiated by the kernel that are going to the cache
manager, these are typically triggered by an application performing an
operation on some object in /coda. Downcalls are initiated by the cache
manager and sent to the kernel, these are used to flush objects from the
kernel's caches when the cache manager receives a message from the
servers that something changed.

Jan

Coda File System

Re: Problem with CODA_LOOKUP (not anymore!!!)