(Illustration by Gaich Muramatsu)
Hello Jan, On Fri, Oct 08, 2010 at 01:34:51AM -0400, Jan Harkes wrote: > What I did see was the following commit [1] which I believe may fix the > problem either way, as it removes/replaces the test where ENOENT is > returned when revalidation fails. > > It looks like that patch went into 2.6.36-rc2. For about a week I could not observe the problem with 2.6.35 despite stressing hard the scripts which are known to trigger the bug (this is actually a way better than the behaviour with 2.6.32). But now I saw at least one occasion when the scripts failed, which suggests that the bug is still there at 2.6.35. Looking forward to test 2.6.36! The bug is a real PITA when you happen to hit it. > I have not reliably reproduced your problem and for some reason am > unable to reproduce on any machine with >1GB of main memory. It may be Hmm. This really looks like a(n in memory) caching issue. > unusable for you, but it is a quite hard to trigger race condition that > doesn't affect most people. It seems to require storing binaries and > shared libraries in Coda which are accessed through a recursive and deep > symlink forest. I see. Actually I tried to arrange simpler test cases before, with fewer binaries/libraries on Coda and sometimes I could trigger the bug but it was really hard/unreliable to reproduce. Now I am just using my everyday production scripts - which rely on forests of files and symlinks. This does not help the analysis but at least gives some measure of stability/brokenness with different kernels. > Other file system developers have occasionally hit on the same problem > [2][3], Nick's patch seems to be the first one that has actually been > accepted. > > [1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=2e2e88ea8c3bd9e1bd6e42faf047a4ac3fbb3b2f > [2] http://marc.info/?l=linux-fsdevel&m=121936252707440&w=2 > [3] http://marc.info/?l=linux-fsdevel&m=125378110215043&w=2 Thanks Jan! Regards, RuneReceived on 2010-10-13 03:48:23