(Illustration by Gaich Muramatsu)
On Fri, Apr 29, 2005 at 10:14:00AM -0700, John Anderson wrote: > >Some of these I know, but there are probably many I don't even know. > > > > directory size: 256KB > > - this doesn't easily translate to # of files, because it > > depends on the average filename size, padding, space for the > > file identifiers, etc. > > a ballpark figure would be between 2048 and 4096 files. > > Is there any way to modify or override this limitation? I think this was > the wall I was running into while trying to load all those .gif's into a > coda directory. Not easily. Right now the directory format is something like a fixed array of 128 pointers to 2048 byte blocks (or maybe 256 pointers to 1KB blocks). Somewhat similar to a Unix directory, but without the indirect blocks. Scattered through the code are actually 3 different representations of the directory data, one is used between the Coda client and the kernel and is mostly a flat file that I believe is identical to a directory on BSD FFS, I guess the first kernel implementations didn't parse the directory contents and let the kernel's readdir handle it. On Linux it had to be parsed from the start because of subtle differences, but it still uses the same layout. Because there is no room for FIDs or hashed lookups this is actually a suboptimal format, so the RVM representation on the client and the servers is actually the array of pointers to blocks along with a hash to speed up lookup-by-name. The final format is used between the client and server where the indexing array is dropped and all the individual blocks are coalesced into to one big chunk. This is because SFTP doesn't do scatter-gather, so this is the only way a directory can be sent through SFTP. In any case, changing the basic directory structure will affect all servers and clients. There is no simple way to make things backward or forward compatible. The other problem is that when directories do get significantly larger than 256KB, allocating, constructing and unpacking the blob that is sent by SFTP might become an issue. Ideally we'd have a single format that is stored on disk on both the client and the servers, instead of in RVM, which can be fetched just like a file (i.e. no special case packing of a directory into a memory blob or a local tmpfile) way as files, and that the kernel module would parse into whatever the readdir needs to return to userspace. Of course any modifications to the file would still have to be logged in RVM to avoid inconsistencies and some care has to be taken with how the directory files are updated wrt. concurrent readdir and block/page boundaries. JanReceived on 2005-04-29 14:40:33