Coda File System

more backup woes

From: Steve Simitzis <steve_at_saturn5.com>
Date: Sat, 10 Apr 2004 02:43:47 -0700
when the disk fills up on the backup server, it causes codasrv to
freeze up and hang indefinitely, until i kill it with -9. and the
venus clients are stuck waiting forever. note that i'm still running
6.0.3 - perhaps this isn't a problem in the latest rev.

here's what i found in the logs:

00:05:00 Worker4: Unbinding RPC connection 8135
00:05:00 Worker2: Unbinding RPC connection 15785
00:05:00 Worker0: Unbinding RPC connection 6270
00:05:00 RevokeWBPermit on conn 2679 returned 0
00:05:00 Worker5: Unbinding RPC connection 1644
00:05:00 Worker1: Unbinding RPC connection 9824
00:05:00 Worker3: Unbinding RPC connection 9412
00:05:00 RevokeWBPermit on conn 174a returned 0
00:05:00 RevokeWBPermit on conn 2679 returned 0
00:05:01 RevokeWBPermit on conn 174a returned 0
00:05:01 RevokeWBPermit on conn 174a returned 0
00:05:01 Worker4: Unbinding RPC connection 14089
00:05:01 Worker0: Unbinding RPC connection 8323
00:05:01 Worker3: Unbinding RPC connection 6765
00:05:01 RevokeWBPermit on conn 34e9 returned 0
00:05:01 GrowVnodes: growing Small list from 240896 to 242688 for volume 0x10000
64
00:05:01 RevokeWBPermit on conn 34e9 returned 0
00:05:03 RevokeWBPermit on conn 34e9 returned 0
00:05:58 GetVolObj: Volume (1000052) already write locked
00:05:58 GrabFsObj, GetVolObj error Resource temporarily unavailable
00:06:13 GetVolObj: Volume (1000052) already write locked
00:06:13 GrabFsObj, GetVolObj error Resource temporarily unavailable
00:06:24 RevokeWBPermit on conn 2679 returned 0
00:06:28 GetVolObj: Volume (1000052) already write locked
00:06:28 GrabFsObj, GetVolObj error Resource temporarily unavailable
00:06:43 GetVolObj: Volume (1000052) already write locked
00:06:43 GrabFsObj, GetVolObj error Resource temporarily unavailable
00:06:58 GetVolObj: Volume (1000052) already write locked
00:06:58 GrabFsObj, GetVolObj error Resource temporarily unavailable
00:07:13 GetVolObj: Volume (1000052) already write locked
00:07:13 GrabFsObj, GetVolObj error Resource temporarily unavailable
00:07:18 GetVolObj: Volume (1000052) already write locked
00:07:18 GrabFsObj, GetVolObj error Resource temporarily unavailable
00:07:28 GetVolObj: Volume (1000052) already write locked
00:07:28 GrabFsObj, GetVolObj error Resource temporarily unavailable
00:07:33 GetVolObj: Volume (1000052) already write locked
00:07:33 GrabFsObj, GetVolObj error Resource temporarily unavailable
00:07:44 GetVolObj: Volume (1000052) already write locked
00:07:44 GrabFsObj, GetVolObj error Resource temporarily unavailable
00:07:48 GetVolObj: Volume (1000052) already write locked
00:07:48 GrabFsObj, GetVolObj error Resource temporarily unavailable
00:07:59 GetVolObj: Volume (1000052) already write locked
00:07:59 GrabFsObj, GetVolObj error Resource temporarily unavailable
00:08:03 GetVolObj: Volume (1000052) already write locked
00:08:03 GrabFsObj, GetVolObj error Resource temporarily unavailable
00:08:14 GetVolObj: Volume (1000052) already write locked
00:08:14 GrabFsObj, GetVolObj error Resource temporarily unavailable
00:08:18 GetVolObj: Volume (1000052) already write locked
00:08:18 GrabFsObj, GetVolObj error Resource temporarily unavailable
00:08:22 VAttachVolumeById: vol 1000064 (sg.media.members.hashed.0.backu) attach
ed and online
00:08:22 S_VolMakeBackups: backup (1000064) made of volume 1000052 
00:08:22 VAttachVolumeById: vol 1000065 (sg.media.site.0.backup) attached and on
line
00:08:22 S_VolMakeBackups: backup (1000065) made of volume 1000006 
00:08:22 GrowVnodes: growing Small list from 83456 to 83712 for volume 0x1000066
00:08:27 RevokeWBPermit on conn 2679 returned 0
00:08:29 CopyOnWrite: Copying directory vnode = 36711
00:08:29 ****** WARNING entry at 0x81800f8 already has deqing set!

00:08:29 RevokeWBPermit on conn 174a returned 0
00:08:33 CopyOnWrite: Copying directory vnode = 22001
00:08:39 ****** WARNING entry at 0x81800f8 already has deqing set!

00:08:50 VGetVnode: memory vnode was snatched away
00:08:50 VAttachVolumeById: vol 1000066 (sg.media.girls.0.backup) attached and o
nline
00:08:50 S_VolMakeBackups: backup (1000066) made of volume 1000005 
00:08:50 GrowVnodes: growing Small list from 39680 to 39936 for volume 0x1000061
00:09:00 GetVolObj: Volume (1000004) already write locked
00:09:00 GrabFsObj, GetVolObj error Resource temporarily unavailable
00:09:14 VAttachVolumeById: vol 1000061 (sg.media.groups.0.backup) attached and 
online
00:09:14 S_VolMakeBackups: backup (1000061) made of volume 1000004 
00:09:15 CopyOnWrite: Copying directory vnode = 1
00:09:21 VAttachVolumeById: vol 1000062 (sg.media.0.backup) attached and online
00:09:21 S_VolMakeBackups: backup (1000062) made of volume 1000002 
00:09:21 VAttachVolumeById: vol 1000063 (sg.0.backup) attached and online
00:09:21 S_VolMakeBackups: backup (1000063) made of volume 1000001 
00:09:21 NewDump: file /vice/backup/7f000008.1000052.newlist volnum 7f000008 id 
1000064 parent 1000052
00:09:31 FlushBuf: WriteDump failed RPC2_SEFAIL3 (F).
00:09:31 Dump: DumpFile failed.
00:09:31 
00:09:32 Dump failed due to FlushBuf failure.
00:09:32 S_VolNewDump: Incremental volume dump failed with status = 200
00:09:32 NewDump: file /vice/backup/7f000005.1000006.newlist volnum 7f000005 id 
1000065 parent 1000006
00:09:33 S_VolNewDump: Incremental volume dump succeeded
00:09:33 NewDump: file /vice/backup/7f000004.1000005.newlist volnum 7f000004 id 
1000066 parent 1000005
00:09:38 FlushBuf: WriteDump failed RPC2_SEFAIL3 (F).
00:09:38 Dump: DumpFile failed.
00:09:38 
00:09:38 Dump failed due to FlushBuf failure.
00:09:38 S_VolNewDump: Incremental volume dump failed with status = 200
00:09:38 NewDump: file /vice/backup/7f000003.1000004.newlist volnum 7f000003 id 
1000061 parent 1000004
00:09:38 FlushBuf: WriteDump failed RPC2_SEFAIL3 (F).
00:09:38 
00:09:38 Dump failed due to FlushBuf failure.
00:09:38 S_VolNewDump: Incremental volume dump failed with status = 200
00:09:38 NewDump: file /vice/backup/7f000001.1000002.newlist volnum 7f000001 id 
1000062 parent 1000002
00:09:38 FlushBuf: WriteDump failed RPC2_SEFAIL3 (F).
00:09:38 
00:09:38 Dump failed due to FlushBuf failure.
00:09:38 S_VolNewDump: Incremental volume dump failed with status = 200
00:09:38 NewDump: file /vice/backup/7f000000.1000001.newlist volnum 7f000000 id 
1000063 parent 1000001
00:09:38 FlushBuf: WriteDump failed RPC2_SEFAIL3 (F).
00:09:38 Dump failed due to FlushBuf failure.
00:09:38 S_VolNewDump: Incremental volume dump failed with status = 200
00:09:38 MarkAsAncient succeeded
00:09:38 NewDump: file /vice/backup/7f000008.1000052.newlist volnum 7f000008 id 
1000064 parent 1000052
00:09:38 FlushBuf: WriteDump failed RPC2_SEFAIL3 (F).
00:09:38 DumpVnode: BumpByteString failed.
00:09:38 
00:09:38 Dump failed due to FlushBuf failure.
00:09:38 S_VolNewDump: Incremental volume dump failed with status = 200
00:09:38 NewDump: file /vice/backup/7f000004.1000005.newlist volnum 7f000004 id 
1000066 parent 1000005
00:09:40 FlushBuf: WriteDump failed RPC2_SEFAIL3 (F).
00:09:40 Dump: DumpFile failed.
00:09:40 
00:09:40 Dump failed due to FlushBuf failure.
00:09:40 S_VolNewDump: Incremental volume dump failed with status = 200
00:09:40 NewDump: file /vice/backup/7f000003.1000004.newlist volnum 7f000003 id 
1000061 parent 1000004
00:09:40 FlushBuf: WriteDump failed RPC2_SEFAIL3 (F).
00:09:40 
00:09:40 Dump failed due to FlushBuf failure.
00:09:40 S_VolNewDump: Incremental volume dump failed with status = 200
00:09:40 NewDump: file /vice/backup/7f000001.1000002.newlist volnum 7f000001 id 
1000062 parent 1000002
00:09:40 FlushBuf: WriteDump failed RPC2_SEFAIL3 (F).
00:09:40 
00:09:40 Dump failed due to FlushBuf failure.
00:09:40 S_VolNewDump: Incremental volume dump failed with status = 200
00:09:40 NewDump: file /vice/backup/7f000000.1000001.newlist volnum 7f000000 id 
1000063 parent 1000001
00:09:40 FlushBuf: WriteDump failed RPC2_SEFAIL3 (F).
00:09:40 Dump failed due to FlushBuf failure.
00:09:40 S_VolNewDump: Incremental volume dump failed with status = 200
00:10:00 Worker1: Unbinding RPC connection 13404
00:10:00 Worker2: Unbinding RPC connection 11893
00:10:00 RevokeWBPermit on conn 174a returned 0
00:10:01 Worker2: Unbinding RPC connection 13670
00:10:01 Worker1: Unbinding RPC connection 15384
00:10:01 Worker2: Unbinding RPC connection 2102
00:10:01 Worker1: Unbinding RPC connection 16233
00:10:01 Worker2: Unbinding RPC connection 400
00:10:01 RevokeWBPermit on conn 34e9 returned 0
00:10:01 RevokeWBPermit on conn 2679 returned 0
00:10:07 RevokeWBPermit on conn 174a returned 0
00:10:13 RevokeWBPermit on conn 34e9 returned 0
00:22:37 Unbinding RPC2 connection 9284
00:22:37 Unbinding RPC2 connection 15551
00:22:37 Unbinding RPC2 connection 6441
00:22:37 Unbinding RPC2 connection 4153
00:22:37 Unbinding RPC2 connection 521
00:22:37 Unbinding RPC2 connection 10607
00:22:37 Unbinding RPC2 connection 10176
00:22:37 Unbinding RPC2 connection 9521
00:22:37 Unbinding RPC2 connection 1709
00:22:37 Unbinding RPC2 connection 6931
00:22:37 Unbinding RPC2 connection 8475
00:22:37 Unbinding RPC2 connection 3482
00:22:37 Unbinding RPC2 connection 12772
00:22:37 Unbinding RPC2 connection 1300
00:22:37 Unbinding RPC2 connection 3028
00:22:37 Unbinding RPC2 connection 1494
00:22:37 Unbinding RPC2 connection 2446
00:22:37 Unbinding RPC2 connection 9010
00:22:37 Unbinding RPC2 connection 10614
00:22:37 Unbinding RPC2 connection 6074
00:22:37 Unbinding RPC2 connection 673
00:22:39 Callback failed RPC2_NAKED (F) for ws 209.237.248.36:16386
00:22:39 Unbinding RPC2 connection 16302
00:22:39 Unbinding RPC2 connection 541
00:22:39 Unbinding RPC2 connection 15982
00:22:39 Unbinding RPC2 connection 9597
00:22:39 Unbinding RPC2 connection 1055
00:22:39 Unbinding RPC2 connection 15853
00:22:39 Unbinding RPC2 connection 3250
00:22:39 Unbinding RPC2 connection 1789
00:22:39 Unbinding RPC2 connection 2726
00:22:39 Unbinding RPC2 connection 6904
00:22:39 Unbinding RPC2 connection 7823
00:22:39 Unbinding RPC2 connection 14134
00:22:39 Unbinding RPC2 connection 10805
00:22:39 Unbinding RPC2 connection 12746
00:22:39 Unbinding RPC2 connection 2324
00:22:39 Unbinding RPC2 connection 13376
00:22:39 Unbinding RPC2 connection 10309

and then around two hours later, i noticed it was hung, and i killed it.

-- 

steve simitzis : /sim' - i - jees/
          pala : saturn5 productions
 www.steve.org : 415.282.9979
  hath the daemon spawn no fire?
Received on 2004-04-10 05:49:31