(Illustration by Gaich Muramatsu)
On Thu, Jul 19, 2001 at 01:43:58AM +0200, Cain Ransbottyn wrote: > What about qmail ? Will coda or reiserfs have problems with qmail ? What > should we do about it to get this fixed ? Like Andrea said, qmail uses maildir which tries to be NFS safe during mail delivery. Renames tend to remove the target file if it already existed, so on the odd chance that maildir picked the same name for two different email messages they use link()/unlink() to move files between directories. However, Coda doesn't allow cross directory links, which we fail with EXDEV. The maildir (qmail) patch changes the safe_rename to safe_rename(old, new) { err = link(old, new); + if (err == -1 && errno == EXDEV) { + err = rename(old, new); + return err; + } if (err == -1) return err; err = unlink(old); return err; } As non-Coda filesystems will never report EXDEV as an error result when linking within the same FS this does not break the existing guarantees. With Coda there is a chance that names collide and therefore one is lost, but the way these names are constructed makes it very small indeed. >From the maildir specification (http://cr.yp.to/proto/maildir.html) Okay, so you're writing messages. A unique name has three pieces, separated by dots. On the left is the result of time(). On the right is the result of gethostname(). In the middle is something that doesn't repeat within one second on a single host. I fork a new process for each delivery, so I just use the process ID. If you're delivering several messages from one process, use starttime.pid_count.host, where starttime is the time that your process started, and count is the number of messages you've delivered. If your system provides a sequence number syscall, use that instead of the pid, preceded by #. So only when the system uses a different process to deliver each mail, and there are so many emails that the PID numbers wrap within a second, there is a chance that these names collide. I don't believe Coda is fast enough to handle over 32768 (assuming pid's wrap at 2^15) * 3 (create, store, rename) ~= 100000 RPC calls (and associated server transactions) per second. Now, there are some problems. One serious problem is that the current format of directories in RVM limits the maximum directory size to about 256KB. Because the maildir names are relatively long, and there is some additional data stored in the directory for each entry the directory size will max out somewhere between 2000 and 4000 messages per mailbox. I often get more than that from linux-kernel in a month! The 'quick and dirty' patch is to redefine the DIR_PAGESIZE from 2048 to 8192 bytes which makes the max directory size approximately 1MB, and allows up to 16000 entries per directory (unless all names are longer than 16 characters, in which case the max is 8000). !!! This is not advised at all !!! Noone ever tried it. It breaks both the in-memory and on-the-wire formats of a directory so much that all clients and server _have_ to run the patched version. Also backups can only be dumped/restored with patched tools etc. And existing servers cannot be upgraded, they have to be reinitialized. Another solution is to redefine DIR_MAXPAGES from 128 to 512, this will be on-the-wire compatible with existing clients/servers as long as all directories are smaller than 256KB. However when a larger directory is passed to an unpatched client or server it is likely to keel over and die, or at least do unexpected things like not showing new files. Because the in-memory format is different an existing server cannot simply be upgraded by running a patched version, the server will have to be reinitialized. Both solutions will push the limit a bit further, but don't reliably solve the problem, what if I now want 100000 files per directory, reinit the whole server group again??? Also the forced server reinitialization is a PITA. I'm still looking for the real solution that will allow directories to scale to similar sizes as files (about 2^31 bytes). Directories will have be be stored in container files instead of RVM, the current 'copy to VM, modify, copy back to RVM' way of manipulating directories should change to 'log changes in VM, store log in RVM, reliably apply to container file' etc. Problem #2, server performance. Coda servers have significant overhead to reliably commit changes, or actually the kernel seems to have problems with it's fsync implementation because that's were we're blocking a lot of the time. I believe that at some point it was measured that we couldn't do more than about 100 directory modifying operations per second (create/rename/delete). So handling an incoming email stream of 50 mails per second is already pushing it. Similarily, when a client has read new messages, moving them to 'cur' is very slow. It is less noticable when the imap client is committing the change whenever the user has read a message (i.e. pine), and more noticable when it batches these types of updates until it actually changes folders (i.e. mutt). Don't really know how to optimize this. A lot of this overhead could be a result of the way directories are currently manipulated (allocate VM/ copy to VM/ modify/ copy back to RVM). But I could be wrong as I haven't actually profiled any clients or servers. JanReceived on 2001-07-19 09:19:46