(Illustration by Gaich Muramatsu)
when the disk fills up on the backup server, it causes codasrv to freeze up and hang indefinitely, until i kill it with -9. and the venus clients are stuck waiting forever. note that i'm still running 6.0.3 - perhaps this isn't a problem in the latest rev. here's what i found in the logs: 00:05:00 Worker4: Unbinding RPC connection 8135 00:05:00 Worker2: Unbinding RPC connection 15785 00:05:00 Worker0: Unbinding RPC connection 6270 00:05:00 RevokeWBPermit on conn 2679 returned 0 00:05:00 Worker5: Unbinding RPC connection 1644 00:05:00 Worker1: Unbinding RPC connection 9824 00:05:00 Worker3: Unbinding RPC connection 9412 00:05:00 RevokeWBPermit on conn 174a returned 0 00:05:00 RevokeWBPermit on conn 2679 returned 0 00:05:01 RevokeWBPermit on conn 174a returned 0 00:05:01 RevokeWBPermit on conn 174a returned 0 00:05:01 Worker4: Unbinding RPC connection 14089 00:05:01 Worker0: Unbinding RPC connection 8323 00:05:01 Worker3: Unbinding RPC connection 6765 00:05:01 RevokeWBPermit on conn 34e9 returned 0 00:05:01 GrowVnodes: growing Small list from 240896 to 242688 for volume 0x10000 64 00:05:01 RevokeWBPermit on conn 34e9 returned 0 00:05:03 RevokeWBPermit on conn 34e9 returned 0 00:05:58 GetVolObj: Volume (1000052) already write locked 00:05:58 GrabFsObj, GetVolObj error Resource temporarily unavailable 00:06:13 GetVolObj: Volume (1000052) already write locked 00:06:13 GrabFsObj, GetVolObj error Resource temporarily unavailable 00:06:24 RevokeWBPermit on conn 2679 returned 0 00:06:28 GetVolObj: Volume (1000052) already write locked 00:06:28 GrabFsObj, GetVolObj error Resource temporarily unavailable 00:06:43 GetVolObj: Volume (1000052) already write locked 00:06:43 GrabFsObj, GetVolObj error Resource temporarily unavailable 00:06:58 GetVolObj: Volume (1000052) already write locked 00:06:58 GrabFsObj, GetVolObj error Resource temporarily unavailable 00:07:13 GetVolObj: Volume (1000052) already write locked 00:07:13 GrabFsObj, GetVolObj error Resource temporarily unavailable 00:07:18 GetVolObj: Volume (1000052) already write locked 00:07:18 GrabFsObj, GetVolObj error Resource temporarily unavailable 00:07:28 GetVolObj: Volume (1000052) already write locked 00:07:28 GrabFsObj, GetVolObj error Resource temporarily unavailable 00:07:33 GetVolObj: Volume (1000052) already write locked 00:07:33 GrabFsObj, GetVolObj error Resource temporarily unavailable 00:07:44 GetVolObj: Volume (1000052) already write locked 00:07:44 GrabFsObj, GetVolObj error Resource temporarily unavailable 00:07:48 GetVolObj: Volume (1000052) already write locked 00:07:48 GrabFsObj, GetVolObj error Resource temporarily unavailable 00:07:59 GetVolObj: Volume (1000052) already write locked 00:07:59 GrabFsObj, GetVolObj error Resource temporarily unavailable 00:08:03 GetVolObj: Volume (1000052) already write locked 00:08:03 GrabFsObj, GetVolObj error Resource temporarily unavailable 00:08:14 GetVolObj: Volume (1000052) already write locked 00:08:14 GrabFsObj, GetVolObj error Resource temporarily unavailable 00:08:18 GetVolObj: Volume (1000052) already write locked 00:08:18 GrabFsObj, GetVolObj error Resource temporarily unavailable 00:08:22 VAttachVolumeById: vol 1000064 (sg.media.members.hashed.0.backu) attach ed and online 00:08:22 S_VolMakeBackups: backup (1000064) made of volume 1000052 00:08:22 VAttachVolumeById: vol 1000065 (sg.media.site.0.backup) attached and on line 00:08:22 S_VolMakeBackups: backup (1000065) made of volume 1000006 00:08:22 GrowVnodes: growing Small list from 83456 to 83712 for volume 0x1000066 00:08:27 RevokeWBPermit on conn 2679 returned 0 00:08:29 CopyOnWrite: Copying directory vnode = 36711 00:08:29 ****** WARNING entry at 0x81800f8 already has deqing set! 00:08:29 RevokeWBPermit on conn 174a returned 0 00:08:33 CopyOnWrite: Copying directory vnode = 22001 00:08:39 ****** WARNING entry at 0x81800f8 already has deqing set! 00:08:50 VGetVnode: memory vnode was snatched away 00:08:50 VAttachVolumeById: vol 1000066 (sg.media.girls.0.backup) attached and o nline 00:08:50 S_VolMakeBackups: backup (1000066) made of volume 1000005 00:08:50 GrowVnodes: growing Small list from 39680 to 39936 for volume 0x1000061 00:09:00 GetVolObj: Volume (1000004) already write locked 00:09:00 GrabFsObj, GetVolObj error Resource temporarily unavailable 00:09:14 VAttachVolumeById: vol 1000061 (sg.media.groups.0.backup) attached and online 00:09:14 S_VolMakeBackups: backup (1000061) made of volume 1000004 00:09:15 CopyOnWrite: Copying directory vnode = 1 00:09:21 VAttachVolumeById: vol 1000062 (sg.media.0.backup) attached and online 00:09:21 S_VolMakeBackups: backup (1000062) made of volume 1000002 00:09:21 VAttachVolumeById: vol 1000063 (sg.0.backup) attached and online 00:09:21 S_VolMakeBackups: backup (1000063) made of volume 1000001 00:09:21 NewDump: file /vice/backup/7f000008.1000052.newlist volnum 7f000008 id 1000064 parent 1000052 00:09:31 FlushBuf: WriteDump failed RPC2_SEFAIL3 (F). 00:09:31 Dump: DumpFile failed. 00:09:31 00:09:32 Dump failed due to FlushBuf failure. 00:09:32 S_VolNewDump: Incremental volume dump failed with status = 200 00:09:32 NewDump: file /vice/backup/7f000005.1000006.newlist volnum 7f000005 id 1000065 parent 1000006 00:09:33 S_VolNewDump: Incremental volume dump succeeded 00:09:33 NewDump: file /vice/backup/7f000004.1000005.newlist volnum 7f000004 id 1000066 parent 1000005 00:09:38 FlushBuf: WriteDump failed RPC2_SEFAIL3 (F). 00:09:38 Dump: DumpFile failed. 00:09:38 00:09:38 Dump failed due to FlushBuf failure. 00:09:38 S_VolNewDump: Incremental volume dump failed with status = 200 00:09:38 NewDump: file /vice/backup/7f000003.1000004.newlist volnum 7f000003 id 1000061 parent 1000004 00:09:38 FlushBuf: WriteDump failed RPC2_SEFAIL3 (F). 00:09:38 00:09:38 Dump failed due to FlushBuf failure. 00:09:38 S_VolNewDump: Incremental volume dump failed with status = 200 00:09:38 NewDump: file /vice/backup/7f000001.1000002.newlist volnum 7f000001 id 1000062 parent 1000002 00:09:38 FlushBuf: WriteDump failed RPC2_SEFAIL3 (F). 00:09:38 00:09:38 Dump failed due to FlushBuf failure. 00:09:38 S_VolNewDump: Incremental volume dump failed with status = 200 00:09:38 NewDump: file /vice/backup/7f000000.1000001.newlist volnum 7f000000 id 1000063 parent 1000001 00:09:38 FlushBuf: WriteDump failed RPC2_SEFAIL3 (F). 00:09:38 Dump failed due to FlushBuf failure. 00:09:38 S_VolNewDump: Incremental volume dump failed with status = 200 00:09:38 MarkAsAncient succeeded 00:09:38 NewDump: file /vice/backup/7f000008.1000052.newlist volnum 7f000008 id 1000064 parent 1000052 00:09:38 FlushBuf: WriteDump failed RPC2_SEFAIL3 (F). 00:09:38 DumpVnode: BumpByteString failed. 00:09:38 00:09:38 Dump failed due to FlushBuf failure. 00:09:38 S_VolNewDump: Incremental volume dump failed with status = 200 00:09:38 NewDump: file /vice/backup/7f000004.1000005.newlist volnum 7f000004 id 1000066 parent 1000005 00:09:40 FlushBuf: WriteDump failed RPC2_SEFAIL3 (F). 00:09:40 Dump: DumpFile failed. 00:09:40 00:09:40 Dump failed due to FlushBuf failure. 00:09:40 S_VolNewDump: Incremental volume dump failed with status = 200 00:09:40 NewDump: file /vice/backup/7f000003.1000004.newlist volnum 7f000003 id 1000061 parent 1000004 00:09:40 FlushBuf: WriteDump failed RPC2_SEFAIL3 (F). 00:09:40 00:09:40 Dump failed due to FlushBuf failure. 00:09:40 S_VolNewDump: Incremental volume dump failed with status = 200 00:09:40 NewDump: file /vice/backup/7f000001.1000002.newlist volnum 7f000001 id 1000062 parent 1000002 00:09:40 FlushBuf: WriteDump failed RPC2_SEFAIL3 (F). 00:09:40 00:09:40 Dump failed due to FlushBuf failure. 00:09:40 S_VolNewDump: Incremental volume dump failed with status = 200 00:09:40 NewDump: file /vice/backup/7f000000.1000001.newlist volnum 7f000000 id 1000063 parent 1000001 00:09:40 FlushBuf: WriteDump failed RPC2_SEFAIL3 (F). 00:09:40 Dump failed due to FlushBuf failure. 00:09:40 S_VolNewDump: Incremental volume dump failed with status = 200 00:10:00 Worker1: Unbinding RPC connection 13404 00:10:00 Worker2: Unbinding RPC connection 11893 00:10:00 RevokeWBPermit on conn 174a returned 0 00:10:01 Worker2: Unbinding RPC connection 13670 00:10:01 Worker1: Unbinding RPC connection 15384 00:10:01 Worker2: Unbinding RPC connection 2102 00:10:01 Worker1: Unbinding RPC connection 16233 00:10:01 Worker2: Unbinding RPC connection 400 00:10:01 RevokeWBPermit on conn 34e9 returned 0 00:10:01 RevokeWBPermit on conn 2679 returned 0 00:10:07 RevokeWBPermit on conn 174a returned 0 00:10:13 RevokeWBPermit on conn 34e9 returned 0 00:22:37 Unbinding RPC2 connection 9284 00:22:37 Unbinding RPC2 connection 15551 00:22:37 Unbinding RPC2 connection 6441 00:22:37 Unbinding RPC2 connection 4153 00:22:37 Unbinding RPC2 connection 521 00:22:37 Unbinding RPC2 connection 10607 00:22:37 Unbinding RPC2 connection 10176 00:22:37 Unbinding RPC2 connection 9521 00:22:37 Unbinding RPC2 connection 1709 00:22:37 Unbinding RPC2 connection 6931 00:22:37 Unbinding RPC2 connection 8475 00:22:37 Unbinding RPC2 connection 3482 00:22:37 Unbinding RPC2 connection 12772 00:22:37 Unbinding RPC2 connection 1300 00:22:37 Unbinding RPC2 connection 3028 00:22:37 Unbinding RPC2 connection 1494 00:22:37 Unbinding RPC2 connection 2446 00:22:37 Unbinding RPC2 connection 9010 00:22:37 Unbinding RPC2 connection 10614 00:22:37 Unbinding RPC2 connection 6074 00:22:37 Unbinding RPC2 connection 673 00:22:39 Callback failed RPC2_NAKED (F) for ws 209.237.248.36:16386 00:22:39 Unbinding RPC2 connection 16302 00:22:39 Unbinding RPC2 connection 541 00:22:39 Unbinding RPC2 connection 15982 00:22:39 Unbinding RPC2 connection 9597 00:22:39 Unbinding RPC2 connection 1055 00:22:39 Unbinding RPC2 connection 15853 00:22:39 Unbinding RPC2 connection 3250 00:22:39 Unbinding RPC2 connection 1789 00:22:39 Unbinding RPC2 connection 2726 00:22:39 Unbinding RPC2 connection 6904 00:22:39 Unbinding RPC2 connection 7823 00:22:39 Unbinding RPC2 connection 14134 00:22:39 Unbinding RPC2 connection 10805 00:22:39 Unbinding RPC2 connection 12746 00:22:39 Unbinding RPC2 connection 2324 00:22:39 Unbinding RPC2 connection 13376 00:22:39 Unbinding RPC2 connection 10309 and then around two hours later, i noticed it was hung, and i killed it. -- steve simitzis : /sim' - i - jees/ pala : saturn5 productions www.steve.org : 415.282.9979 hath the daemon spawn no fire?Received on 2004-04-10 05:49:31