(Illustration by Gaich Muramatsu)
On Fri, Mar 03, 2006 at 09:18:28AM -0500, Greg Troxel wrote: > But that is not really true, the inode we return as a in the coda_open > reply is not a Coda inode, but the device/inode pair of the container > file, so it really is a true ino_t. > > I have changed NetBSD-current back to ino_t. But this means an ABI > change in the kernel/user protocol. My inclination is to say "oh > well, you really have to compile venus on the same system you run it > on", but then I wonder if we should be bumping the number. Presumably I don't see it as an ABI change, since venus correctly uses ino_t. It was an incorrect kernel-side change from ino_t to uint32_t that broke things. > the Linux ABI is different from the NetBSD ABI anyway - I wouldn't Yes, although the difference is really small, for some reason the Linux kernel module actually parses the same BSD directory format and translates it into a Linux equivalent inside the kernel. I think the only difference is that Linux uses CODA_OPEN_BY_FD, which expects an open file-descriptor for the container file. This was needed because not all filesystems actually guarantee unique inode numbers. ReiserFS was the first, where inode numbers may collide (I think the combination of the parent directory inode number and the file inode number is still guaranteed unique), ramfs and tmpfs only exist at the pagecache level and some journalling filesystems associate journalling with the file handle. So even if we manage to open a file based on the device/inode number we might actually lose updates since they are never committed to the journal. But really that should be the only change. Interestingly even on Linux, venus will still accept the CODA_OPEN upcall, so you should mostly be able to run a Linux compiled binary on FreeBSD (and possibly NetBSD) if you have the necessary run-time support for Linux binaries. You are right that there may be some issues around the mounting of /coda. > I have been having a lot of crashes on recent NetBSD with an old > venus, in coda_readlink. I wonder if this is related to the ino_t > changes (venus compiled with old ino_t definition, kernel with wrong > type in coda.h, so they mostly match). But a system with a recent > kernel still crashes. I don't really think it can be related, since readlink uses it's own upcall and doesn't rely on CODA_OPEN to get the link contents. Is this a kernel crash, or does venus die? > NetBSD's coda.h has: > > static inline ino_t coda_f2i(CodaFid *fid) > { > if (!fid) return 0; > return (fid->opaque[1] + (fid->opaque[2]<<10) + (fid->opaque[3]<<20)); > } > > This will not necessarily produce unique inode numbers, but one can't > collapse the opaque fields down to inodes anyway. I presume that's ok > and these are just to provide an inode to user space. But perhaps we > shoudl do better with the 64-bit space on NetBSD. Correct, we have a 128-bit file identifier in venus, which we try to map onto a 32-bit value. Linux still uses 32-bit inode numbers, but the iget4 operation makes collisions not that critical anymore, we effectively use the 32-bit space to identify hash buckets. The only problem are userspace programs that try to keep track of inode numbers to find hardlinks. I guess as long as we avoid using hardlinks, so that every object has i_nlink == 1 this should be fine. But back to the coda_f2i function. This one should really be kept in sync between userspace and kernel space because of the way directory contents are passed down to the kernel. Venus creates BSD-style directory entries with (name,ino) pairs, and uses its own copy of coda_f2i to map from fids to inode numbers. Now the kernel should use the same function otherwise we end up with different inode numbers identifying the same object. This is really the only place where venus knows about Coda inodes, and it would be a lot cleaner if it just sent down (name,fid) tuples for the directory entries and left all the fid->ino_t mapping up to the kernel. I was kind of surprised not to see opaque[0] (the realm) value being used and there is a pretty big difference, venus actually seems to use a very different calculation, which seems to makes a bit more sense for trying to avoid collisions, static __inline__ ino_t coda_f2i(struct CodaFid *fid) { if (!fid) return 0; return (fid->opaque[3] ^ (fid->opaque[2]<<10) ^ (fid->opaque[1]<<20) ^ fid->opaque[0]); } The adds and shifts are somewhat intentional. The fid consists of Realm, Volume, Vnode, Uniquifier. The realm is essentially a pointer, so it is somewhat arbitrary, but will be within the range of venus's address space (so a fairly limited 'random' value). The volume identifiers have 2 distinct parts, the top byte identifies the server (or 0x7f for replicated), the lowest bytes the volume number, which is bumped by one for every created volume. The vnode numbers are assigned by the server and count up from 0 from the time the volume was created (first vnode 1), while the uniquifier is typically assigned by the client and is a counter that is initialized with a random value when venus is initialized. Now if you have 64-bits available, I would expect it to be slightly better to xor the randomizing ids (realm,unique) with the counter values (volume,vnode), ino = (ino_t)(opaque[0] ^ opaque[2]) << 32 | (ino_t)(opaque[1] ^ opaque[3]); You would have to make a corresponding change in userspace though. JanReceived on 2006-03-04 01:44:16