(Illustration by Gaich Muramatsu)
On Sat, Mar 13, 2004 at 12:37:00AM -0800, Steve Simitzis wrote: > when running backups, codasrv dies with this: > > Assertion failed: SRV_RVM(VolumeList[rwIndex]).data.nlargeLists == SRV_RVM(VolumeList[backupIndex]).data.nlargeLists, file "/usr/src/redhat/BUILD/coda-6.0.3/coda-src/volutil/vol-backup.cc", line 449 This looks a lot like is an older bug which was fixed in 6.0.3. > and this in SrvLog: > > 00:06:50 GetVolObj: Volume (1000052) already write locked > 00:06:50 GrabFsObj, GetVolObj error Resource temporarily unavailable > 00:07:03 GetAttrPlusSHA: Computing SHA 1000052.44862.2615a, disk.inode=3fc4d > 00:07:04 GetAttrPlusSHA: Computing SHA 1000004.236a.2456, disk.inode=37f6 > 00:07:05 GetVolObj: Volume (1000052) already write locked > 00:07:05 GrabFsObj, GetVolObj error Resource temporarily unavailable > 00:07:06 GetAttrPlusSHA: Computing SHA 1000005.126b2.97fa, disk.inode=d407 > 00:07:06 GrowVnodes: growing Large list from 15744 to 16000 for volume 0x1000059 Did these messages get logged around the same time as backups were active? It looks like the original volume is grown at the same time as the backup volume was being cloned. So when the clone is done it doesn't match the size of the original volume and we'll see the assertion trigger. This wouldn't happen all too often, a server restart should fix it. I'll have a look to see how I can avoid the race, either temporarily blocking the growth of the original volume, or by restarting the clone when the numbers don't match up. JanReceived on 2004-03-14 21:31:40