(Illustration by Gaich Muramatsu)
On Wed, May 16, 2001 at 07:46:55PM -0400, Greg Troxel wrote: > I had problems before, but I now think it was the same one. No, I'm very sure it was a real off-by-one bug before, and you send in the correct patch which did get applied. Otherwise your client would have crashed in dirbody.c. > It would be nice if we packed the directories more efficiently, and > allowed them to grow beyond 256KB. But a lot of the cruft in this area > is legacy to avoid breaking backwards compatibility with the way servers > have been storing their directory information in RVM. > > Well, that's unfortunate. I wonder how hard it would be to make this > fail more gracefully, e.g. with ENOSPC instead of crashing. It might > be worth having a server format change to remove the arbitrary limit > [ducking!] (even though total RVM etc. is bounded, 4000 files in a > directory doesn't seem all that wierd...). I'm trying to cut a 'fail graceful' path through the spaghetti, But I don't know whether I can make it nicely all the way. Perhaps we need a function 'DIR_SpaceForName()' to check and perhaps reserve allocation before some point of no return. 256KB already takes up a lot of RVM, and I don't think any of it is reclaimed when directory entries are deleted until the directory itself is removed. Should we keep storing directories in RVM, or move the storage to on-disk container files just like the data for regular files? There are currently 3 directory formats in use, and they pop up all over the place. The server stores directories in RVM, as a fixed size array of possibly allocated 2KB pages. This is probably done both for legacy reasons and to avoid using realloc and memory fragmentation, the RVM allocator isn't very good at merging fragmented memory. There is also a small hash table to improve lookups. (1st format) When modifications are made to a directory on the server the whole structure is copied to regular memory. Once all operations have succeeded it is copied back into RVM. (still 1st format) When a client request a directory, a contiguous chunk of memory is allocated, the contents of all pages is copied into this and sent over to the client. (2nd format) The client then converts the received directory data back to an array of pages in RVM for storage and local manipulation. (1st format again) When userspace opens the directory for reading, the client writes the directory data to a container file in a BSD-style directory format (3rd format). So the 1st format is used to store and manipulate the directory. The 2nd format is the on-the-wire version. And the 3rd format is used to pass the directory data to the kernel. And there is a lot of copying going on. We could use the BSD-style format all the way, in which case we would only need to realloc RVM memory (or grow the container file). It probably complicates the direntry create/delete code and we would lose the hash based directory lookup. However the client can trivially drop the received directory data into the container file and won't have to do anything special when the kernel opens it for reading. If we want to retain the hash-based (or use a tree-based) lookup, extending the BSD-style format would work, but then the client has to either munge the directory before passing it to the kernel, or all the kernel code needs to be taught the new directory structure. In the end we'd still need some of the code to read/convert the array of pages format, otherwise we can't restore volume dumps (backups). JanReceived on 2001-05-17 10:10:10