(Illustration by Gaich Muramatsu)
On Mon, May 10, 2004 at 01:16:31AM -0700, Steve Simitzis wrote: > are there any practical limits to the size of venus's cache? i've been > running caches in the GB range, and i seem to have problems as soon as > df -i /coda hits 94%. (it also seems impossible to go above 94% for > reasons that are unknown to me.) I don't know. My cache is typically in the order of 200-250MB. > > for example, with a 4GB cache, i've been getting these errors: > > [ W(74) : 0000 : 08:34:20 ] fsdb::FreeFsoCount: counts disagree (174762 - 163840 != 10921) > [ W(74) : 0000 : 08:34:20 ] fsdb::FreeFsoCount: counts disagree (174762 - 163840 != 10921) That does seem like a correct message (174762 - 163840 = 10922 and not 10921) but I don't know why it happened. Venus is telling us that it has a maximum of 174762 files and 163840 are considered used, but there are only 10921 on the freelist. So there is one object that is neither considered in-use, but it also is not found on the freelist. It is probably this 'missing object' that triggers the crash. > if i bring the cache down to anywhere between 1.9GB - 3GB, i start > seeing these errors, in addition to the others above: > > [ W(34) : 0000 : 00:53:52 ] fsobj::StatusEq: (5a684608.7f000008.93726.504e9), VVs differ I believe this is somewhat normal, it simply means that when we are fetching a file, we notice that it had changed in the mean time. So we update the status to reflect this. We count cache-sizes in 'blocks' so overall the size could be 2^31 * 1024 (or 512), but I'm sure there are places where we actually work with this number in bytes, in which case >2GB would overflow a simple integer. > venus crashed within an hour of hitting 94% on the 3GB venus. i haven't > been able to crash the 1.9GB venus yet, though i'm still keeping my eye > on it. (i purposely chose the number 1.9GB because it's less than 2GB.) > > also, as soon as i hit that magic 94% mark, i start seeing lots of these, > whenever the cache is in the GB range: > > [ W(34) : 0000 : 01:05:01 ] fsobj::StatusEq: (5a684608.7f000003.1.1), LinkCount 10 != 266 Hmm, that first number is the local number, the second is what the server reported. Locally we use an unsigned char, which maxes out at 256, which matches the datatype that the kernel is using. But the client-server protocol uses an RPC2_Integer, which clearly can be larger. 266 is precisely 256 + 10, which would match a wraparound on the link count field. But... we don't have cross-directory hardlinks, and I find it hard to believe that you would have so many links to one object in a single directory. JanReceived on 2004-05-10 15:05:14