(Illustration by Gaich Muramatsu)
I have two clients (my laptop and my desktop) and one server, all running 5.3.20 on Red Hat 8.0 (WITHOUT my IPv6 code :-). Right now, I'm in a bad state: I can create files on my laptop that never get reintegrated properly, and my laptop also doesn't see files created on my desktop. The files from the desktop are getting correctly pushed to the server (they appear in /vicepa), but the ones from the laptop don't. The laptop moves around from network to network, but even being rebooted while connected to the same network as the other two doesn't help. When I did clog rdv, I got: [root_at_localhost rdv]# 16:10:37 Checkpointing developers:rdv 16:10:37 to /usr/coda/spool/500/developers_rdv@_coda_rdv.tar 16:10:37 and /usr/coda/spool/500/developers_rdv@_coda_rdv.cml 16:10:37 Reintegrate: developers:rdv, 100/114 records, result = Unknown error 198 ===== [root_at_localhost rdv]# cfs lv /coda Status of volume 0x7f000000 (2130706432) named "codaroot" Volume type is ReadWrite Connection State is Connected Minimum quota is 0, maximum quota is unlimited Current blocks used are 3 The partition has 22625968 blocks available out of 22626568 Write-back is disabled [root_at_localhost rdv]# cfs cs Contacting servers ..... All servers up ===== The tail of /usr/coda/etc/venus.log on my laptop: Rename : sid = (ba26f17.562), time = 1041955171, uid = 500 tid = -1 bytes = 265 pred = (0, 0), succ = (0, 0) to_be_repaired = 0 repair_mutation = 0 frozen = 0, cancel = 0, failed = 0, committed = 0 spfid = (0x7f000001.0x3.0x42), sname = (labbook-200301.txt) tpfid = (0x7f000001.0x3.0x42), tname = (labbook-200301.txt~) sfid = (0x7f000001.0xfffffffe.0x800b8) spvv = [ 0 0 0 0 0 0 0 0 ] [ 0 0 ] [ 0 ] tpvv = [ 0 0 0 0 0 0 0 0 ] [ 0 0 ] [ 0 ] svv = [ 0 0 0 0 0 0 0 0 ] [ 0 0 ] [ 0 ] Create : sid = (ba26f17.563), time = 1041955171, uid = 500 tid = -1 bytes = 246 pred = (0, 0), succ = (0, 0) to_be_repaired = 0 repair_mutation = 0 frozen = 0, cancel = 0, failed = 0, committed = 0 pfid = (0x7f000001.0x3.0x42), name = (labbook-200301.txt) cfid = (0x7f000001.0xfffffffe.0x800ba), mode = 664 pvv = [ 0 0 0 0 0 0 0 0 ] [ 0 0 ] [ 0 ] Store : sid = (ba26f17.650), time = 1042067193, uid = 500 tid = -1 bytes = 228 pred = (0, 0), succ = (0, 0) to_be_repaired = 0 repair_mutation = 0 frozen = 0, cancel = 0, failed = 0, committed = 0 fid = (0x7f000001.0xfffffffe.0x800ba), length = 3391 vv = [ 0 0 0 0 0 0 0 0 ] [ 0 0 ] [ 0 ] rhandle = (0,0,0) ph = 0.0.0.0 (-1) Create : sid = (ba26f17.651), time = 1042068719, uid = 500 tid = -1 bytes = 234 pred = (0, 0), succ = (0, 0) to_be_repaired = 0 repair_mutation = 0 frozen = 0, cancel = 0, failed = 0, committed = 0 pfid = (0x7f000001.0x1.0x1), name = (delme2) cfid = (0x7f000001.0xfffffffe.0x800d9), mode = 664 pvv = [ 0 0 0 0 0 0 0 0 ] [ 0 0 ] [ 0 ] Store : sid = (ba26f17.654), time = 1042069566, uid = 500 tid = -1 bytes = 228 pred = (0, 0), succ = (0, 0) to_be_repaired = 0 repair_mutation = 0 frozen = 0, cancel = 0, failed = 0, committed = 0 fid = (0x7f000001.0xfffffffe.0x800d9), length = 54 vv = [ 0 0 0 0 0 0 0 0 ] [ 0 0 ] [ 0 ] rhandle = (0,0,0) ph = 0.0.0.0 (-1) [ I(23) : 0000 : 16:10:37 ] IncReintegrate: (developers:rdv,-106) result = Unknown error 198, elapsed = 514.1 (80.8, 46.9, 386.4) [ I(23) : 0000 : 16:10:37 ] new stats = [ 49, 10.9, 2885.5, 51, 12.0], [ 52, 11.6, 33.5, 60, 16.0] [ W(19) : 0000 : 16:11:07 ] Cachefile::SetLength 1536 [ W(19) : 0000 : 16:11:07 ] fsobj::StatusEq: ((0x7f000001.0x54.0x2b)), Owner 500 != 82191 [ T(01) : 0124 : 16:11:20 ] BeginRvmFlush (1, 37356, T) [ T(01) : 0124 : 16:11:20 ] EndRvmFlush [ W(19) : 0000 : 16:11:29 ] Cachefile::SetLength 1536 [ W(19) : 0000 : 16:11:29 ] repvol::LogRemove: record cancelled, labbook-200301.txt~, size = 247 [ W(19) : 0000 : 16:11:29 ] Cachefile::SetLength 3690 [ W(19) : 0000 : 16:11:29 ] repvol::LogRemove: record cancelled, .#labbook-200301.txt, size = 248 [ W(19) : 0000 : 16:11:48 ] Cachefile::SetLength 4636 [ W(19) : 0000 : 16:11:48 ] repvol::LogRemove: record cancelled, .#labbook-200301.txt, size = 248 [ T(01) : 0131 : 16:12:13 ] BeginRvmFlush (1, 28436, T) [ T(01) : 0131 : 16:12:13 ] EndRvmFlush [ T(01) : 0135 : 16:12:48 ] BeginRvmTruncate (207, 66028, I) [ T(01) : 0135 : 16:12:48 ] EndRvmTruncate [ H(06) : 0003 : 16:13:13 ] HDBDaemon just woke up [ H(06) : 0003 : 16:13:13 ] DataWalk: Restarting Iterator!!!! Reset availability status information. [ H(06) : 0003 : 16:13:13 ] Tally for vuid=0: [ H(06) : 0003 : 16:13:13 ] BeginRvmFlush (1, 160, F) [ H(06) : 0003 : 16:13:13 ] EndRvmFlush [ H(06) : 0003 : 16:13:13 ] Tally for vuid=0: [ H(06) : 0003 : 16:13:13 ] Priority=600: Available=516 Unavailable=0 TotalSize=516 Unknown=0 [ H(06) : 0004 : 16:13:13 ] HDBDaemon about to sleep on hdbdaemon_sync [ T(01) : 0138 : 16:13:18 ] BeginRvmTruncate (4, 320, I) [ V(04) : 0139 : 16:13:18 ] repvol::CheckLocalSubtree: (developers:rdv)reset has_local_subtree flag! [ T(01) : 0138 : 16:13:18 ] EndRvmTruncate [ T(01) : 0138 : 16:13:18 ] BeginRvmFlush (1, 60, T) [ T(01) : 0138 : 16:13:18 ] EndRvmFlush [ T(01) : 0139 : 16:13:23 ] BeginRvmTruncate (0, 220, I) [ T(01) : 0139 : 16:13:23 ] EndRvmTruncate ====== And then later, on the console: [root_at_localhost rdv]# 16:23:27 volume developers:rdv has unrepaired local subtree(s), skip checkpointing CML! This, despite the fact that I believe there should be no conflicts, and repair says: [rdv_at_localhost rdv]$ repair This repair tool can be used to manually repair server/server or local/global conflicts on files and directories. You will first need to do a "beginrepair" to start a repair session where messages about the nature of the conflict and the commands that should be used to repair the conflict will be displayed. Help message on individual commands can also be obtained by using the "help" facility. Finally, you can use the "endrepair" or "quit" to terminate the current repair session. repair > beginrepair Pathname of object in conflict? []: /coda/rdv Could not allocate new repvol: Object not in conflict beginrepair failed. repair > beginrepair Pathname of object in conflict? []: /coda/rdv/nokia Could not allocate new repvol: Object not in conflict beginrepair failed. repair > quit ===== Advice on how to go about debugging this? What should I be looking for, and where should I look for it? --RodReceived on 2003-01-08 19:44:41