(Illustration by Gaich Muramatsu)
On Thu, Mar 31, 2005 at 10:36:06PM +0100, E. Rosten wrote: > > On Thu, Mar 31, 2005 at 07:34:29PM +0100, E. Rosten wrote: > > > Well, your comment just made me take another look at that block of code > > > (and I have already looked at it 1e99 times) and it turns out I was > > > forgetting to set the write size to sizeof(reply.coda_create). I think > > > that bit was missing because I had a nut loose on my keyboard. > > > > Nice catch, > > Ah, thanks! At least something good came of it. How would this manifest > itself? As an error viewable in dmesg, or as an error from the returning > write() system call? The easiest way to catch this in the kernel is actually in the return path to the application that requested the operation. The path that passes messages up and down between the kernel and the cache manager doesn't really know anything about the message contents, so any write (even the short writes) would be considered valid. Well I can make sure the number of bytes is at least the size of the output header. So your cachemanager would get some error back on the write() if it writes < sizeof(struct coda_out_hdr), not sure what a suitable error would be though, most errors are because we write too much, or because the 'other size' stopped reading. EIO seems the best one. Other errors, i.e. when we write at least sizeof(struct coda_out_hdr) but less than the size of what we expected for the specific operation would get handled in the context of the requesting process, so in this case the create(2), or any other operation performed by the requesting application, would see the error. Again there seem to be no suitable error numbers in the SuS specification, I guess EIO might be the most neutral one as most others seems to indicate that the application did something wrong. > If you're interested, another problem is that returning an invalid file > descriptor from CODA_OPEN_BY_FD, causes all sorts of trouble: it the > program issuing then open command hangs for ever, making it impossible to > unmount the system. That said, I managed to develop my filesystem without > a single kernel panic, which sure beats developing kernel land > filesystems. I'll look into this, one problem is that if the returned fd does correspond to a open file in the cache manager (i.e. fd == 0 nicely maps to stdin) there is no way to tell that it was an incorrect descriptor. It will probably be possible to catch most cases where the cachemanager accidentally returns a libc stream (FILE*) instead of a file descriptor, and hopefully that is the more common mistake. > > I guess I can add some sanity checks in the upcall reply path, although > > it will be the user application that sees an error and not the cache > > manager process that performed the short write. > > I don't fully understand the whole of CODA yet; is the upcall when the > kernel sends a message down /dev/cfs? to the userland process? Upcalls are calls initiated by the kernel that are going to the cache manager, these are typically triggered by an application performing an operation on some object in /coda. Downcalls are initiated by the cache manager and sent to the kernel, these are used to flush objects from the kernel's caches when the cache manager receives a message from the servers that something changed. JanReceived on 2005-04-01 15:43:34