(Illustration by Gaich Muramatsu)
On Thu, Jul 07, 2005 at 09:41:03AM -0600, Patrick Walsh wrote: > # ls /root/pool_scm/r > readline2.2.1-2.2.1-4.i386.rpm > rpm-4.0.4-7x.20.i386.rpm > readline-4.2-2.i386.rpm > rsh-0.17-18.AS21.2.i386.rpm > restore-default-system-1.0-20031001.i386.rpm > rsh-0.17-18.AS21.4.i686.rpm > rootfiles-7.2-1.noarch.rpm > # du -s -h /root/pool_scm/r > 2.6M /root/pool_scm/r > # ls r > ls: r/readline-4.2-2.i386.rpm: No such device Ok, 7 directory entries wouldn't be enough to fill a directory. > At this point, venus has crashed. The console.log file has the > erroneous seeming errors that I pasted before, but to show again: > > ***LWP (0x810ec50): Select returns error: 4 > > 09:28:28 worker::main Got a bogus opcode 36 > 09:29:30 readline-4.2-2.i386.rpm (606e1fc8.7f000003.1018.4de) > inconsistent! > 09:29:30 fatal error -- fsobj::dir_Create: (dir225, > 606e1fc8.7f000003.fffffffc.80002) Create failed! This is very strange, I looked at the source, we are trying to add a directory entry to some unknown directory (the name or fid of the parent in which we are trying to create is not logged). We do know that the new entry has the name "dir225" and it is pointing at a fake object in the same volume as the inconsistent rpm file. However, server-server conflict do not in any way try to create names or anything. The lookup or getattr operation returns EINCONS and this is mapped to faked stat data right before we send the reply back to the kernel. As far as I know there isn't even an actual filesystem object associated with the inconsistent object, since the servers disagree about it's contents. Only reintegration related expansion is changing directory contents, since in that case we do have a locally cached copy of the object and it has to be modified before we can show the global version. I also don't see how anything in that volume would even have a name like 'dir225', there are the [a-z] directories, and a bunch of *.rpm files. But somehow these two must be related, since they seem to happen so reliably right after each other. > I should have mentioned that I already tried this. And as you can see > from the above terminal transcript, it had little effect. > > Any other thoughts? No idea, it just doesn't make sense. I don't see how a server-server conflict could possibly get into the expansion code that is used when a reintegration fails, if you are simply doing an 'ls'. I also don't understand why it is trying to create a directory named 'dir225' when all the names in the volume are either a single character 'a-z' or '*.rpm'. Maybe start venus with loglevel 100 (venus -init -d 100) and repeat the same thing. At that point the log might show how we're getting to this point and if those two events (the inconsistency and the crash) are really related or not. JanReceived on 2005-07-07 14:23:33