(Illustration by Gaich Muramatsu)
On Wed, Aug 21, 2002 at 01:40:38PM -0400, Jan Harkes wrote: > Yup, only the file metadata is storen in RVM, pathnames, version > vectors, creator/owner/last author, references to copy on write > versions of the file in backup volumes etc. Do you know an average size per file? > Ah, that's from the time that Coda servers used a special 'inodefs' hack > to get direct access to the underlying filesystem. Nowadays we store > files in a tree structure, which adds a bit of overhead, but is far more > generic and fsck can't mess with us (too much) anymore. Goodo. > Ah, we do have directories, but store them as files. I was actually referring to no directories being used on the underlying filesystem, which you have said is no longer the case. The performance hit I envisaged was having 10,000 files in a single ext2/ext3 directory. > The only existing > problem is that there are no double or triple indirect blocks in the > in-file directory representation. As a result Coda's directories are > limited to about 256KB in size. i.e. it is even impossible to have a > single directory with all RFC's. Ah, that limitation is probably worth adding to the docs with an update on the filesystem access mode. > It's in a way really funny, because the directory lookup code uses > extensive hashes to ensure that a directory lookup can be done very > quickly even when we have huge directories, but the actual directory > data structure can't scale to such sizes. I'd rather have had a scalable > structure with a dumb, but simple linear search because that would have > been easier to fix and optimize. Perhaps you could use more of the facilities of the underlying filesystem, e.g. if it were ReiserFS just let the filesystem manage the directory work. > > Finally I arrived at rationale (d) which I hope you'll confirm ... > > Used to be the case, but I dropped the whole (device,inode) file access, > it was causing too many problem when the underlying filesystem was > trying to do journalling etc. Ok, I guessed correctly what you used to have based on old docs :-) > Simple, it is not a shared mapping, but an anonymous mapping. I.e. the > code 'mallocs' as much space as the RVM-data partition and reads all > of it into memory(/swap). mmap's not my area, my brain just groups it with the black arts of internationalization, the X protocol, glibc and satanism, so I'll just take your word for it. > Maildir works fine with Coda as long as you replace the 'link/unlink' > with 'rename', we don't allow cross-directory links Ah, I expected no cross-volume links, didn't realise it was more restrictive. > and our rename is > atomic and Coda declares a 'conflict' whenever it notices that rename is > trying to remove a file that the client didn't know about (update/delete > sharing conflict). You mean there will be a conflict when the server is too fast to create and rename the message? Maildir writes to "new" subdir and when it is done it moves (somehow) the file to "cur". > > > Union mounts are available for linux, see e.g. > > > http://kernelnewbies.org/status/latest.html > > Those are union mounts, not the union/overlay filesystem that people > always talk about. I guess it must be hard to do, if people always talk about it but never do it. I imagine that the union filesystem code has to map the inode numbers from its two (or more) underlying filesystems, into one inode space for return to the kernel and user applications. Otherwise, tools which traverse a filesystem might have a bad time as every time they stat a file they find it has different major/minor, or maybe the same inode number as another file, yet they are different files. Nick.Received on 2002-08-21 20:44:43