Coda File System

Re: backup breakage

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Sun, 14 Mar 2004 21:26:59 -0500
On Sat, Mar 13, 2004 at 12:37:00AM -0800, Steve Simitzis wrote:
> when running backups, codasrv dies with this:
> 
> Assertion failed: SRV_RVM(VolumeList[rwIndex]).data.nlargeLists == SRV_RVM(VolumeList[backupIndex]).data.nlargeLists, file "/usr/src/redhat/BUILD/coda-6.0.3/coda-src/volutil/vol-backup.cc", line 449

This looks a lot like is an older bug which was fixed in 6.0.3.

> and this in SrvLog:
> 
> 00:06:50 GetVolObj: Volume (1000052) already write locked
> 00:06:50 GrabFsObj, GetVolObj error Resource temporarily unavailable
> 00:07:03 GetAttrPlusSHA: Computing SHA 1000052.44862.2615a, disk.inode=3fc4d
> 00:07:04 GetAttrPlusSHA: Computing SHA 1000004.236a.2456, disk.inode=37f6
> 00:07:05 GetVolObj: Volume (1000052) already write locked
> 00:07:05 GrabFsObj, GetVolObj error Resource temporarily unavailable
> 00:07:06 GetAttrPlusSHA: Computing SHA 1000005.126b2.97fa, disk.inode=d407
> 00:07:06 GrowVnodes: growing Large list from 15744 to 16000 for volume 0x1000059

Did these messages get logged around the same time as backups were
active? It looks like the original volume is grown at the same time
as the backup volume was being cloned. So when the clone is done it
doesn't match the size of the original volume and we'll see the
assertion trigger.

This wouldn't happen all too often, a server restart should fix it. I'll
have a look to see how I can avoid the race, either temporarily blocking
the growth of the original volume, or by restarting the clone when the
numbers don't match up.

Jan
Received on 2004-03-14 21:31:40