(Illustration by Gaich Muramatsu)
Hi all, Running coda 6.0.14 on NetBSD/sparc64 back-end servers and a mix of NetBSD/sparc, NetBSD/sparc64, and NetBSD/macppc clients (well described in the list archives). I have recently taken note of the fact that the 'cfs lv' command does not work on any of the SPARC systems -- perhaps an endianness bug? On my macppc system, I get the expected behavour, # cfs lv /coda/diablonet.net/tmp Status of volume 7f00000b (2130706443) named "diablonet.tmp" Volume type is ReadWrite Connection State is Connected Reintegration age: 4294967295 sec, hogtime 4294967.295 sec Minimum quota is 0, maximum quota is unlimited Current blocks used are 49 The partition has 51484224 blocks available out of 63091568 but on any of the SPARC (sun4m or sun4u) systems, I just get something like, sonnet% cfs lv /coda/diablonet.net/tmp (just sits forever) ^C/coda/diablonet.net/tmp: Interrupted system call sonnet% (this happens both against my own servers, and against testserver.coda.cs.cmu.edu so I know its not anything wrong with my server configuration) After this point, venus is completely hosed -- if you go to CODA-space and try to do anything like list a directory, read/write a file, whatever, it is just hung up and will sit, until you kill it, always with the interrupted system call error. You have to kill venus off and re-invoke with venus -init to make it work again. If I ktruss it, I see it is hanging up on the system call, 803 cfs open("/coda/.CONTROL", 0, 0) = 3 and cranking venus up with debuglevel -d 100, I see this in the logs: [ W(15) : 0000 : 21:21:03 ] fsobj::Lookup: (diablonet.net/tmp), uid = 0 [ W(15) : 0000 : 21:21:03 ] fsobj::Access : (diablonet.net, 8, 0), uid = 0 [ W(15) : 0000 : 21:21:03 ] Realm::GetUser local uid '0' for realm 'diablonet.ne t' [ W(15) : 0000 : 21:21:03 ] PutUser: [ W(15) : 0000 : 21:21:03 ] fsdb::Get-mre: key = (5043bac8.7f000000.84.10a), uid = 0, rights = 3, comp = tmp [ W(15) : 0000 : 21:21:03 ] repvol::IsUnderRepair: vol = 7f000000, uid = -1 [ W(15) : 0000 : 21:21:03 ] vdb::Put: (7f000000, /), refcnt = 3 [ W(15) : 0000 : 21:21:03 ] vdb::Get: vid = 5043bac8.7f000000 [ W(15) : 0000 : 21:21:03 ] fsobj::fsobj: fid = (5043bac8.7f000000.84.10a), comp = tmp [ W(15) : 0000 : 21:21:03 ] fsobj::ClearAcRights: (0.0.0.0), uid = -1 [ W(15) : 0000 : 21:21:03 ] fsobj::ClearRcRights: (5043bac8.7f000000.84.10a), ri ghts = 0 [ W(15) : 0000 : 21:21:03 ] fsobj::DemoteAcRights: (5043bac8.7f000000.84.10a), u id = -1 [ W(15) : 0000 : 21:21:03 ] vdb::Put: (7f000000, /), refcnt = 4 [ W(15) : 0000 : 21:21:03 ] fsobj::GetAttr: (tmp), uid = 0 [ W(15) : 0000 : 21:21:03 ] vsgent::GetMgrp 0x120900 0 1 [ W(15) : 0000 : 21:21:03 ] mgrpent::GetHostSet: 0x10dc00, uid = 0, mid = 1 [ W(15) : 0000 : 21:21:03 ] volent::FlushCOP2(Piggy): vol = 7f000000 [ W(15) : 0000 : 21:21:03 ] RepOpCommCtxt::AllReplicasSupportSHA(): 0x10dc30 [ W(15) : 0000 : 21:21:03 ] fsobj::GetAttr: ViceGetAttrPlusSHA(0x7f000000.84.10a ) [ W(15) : 0000 : 21:21:03 ] CommQueue: insert pri 3 count = 1 [ W(15) : 0000 : 21:21:03 ] (Multi)GetAttrPlusSHA: start [ W(15) : 0000 : 21:21:03 ] CommQueue: remove pri 3 count = 0 [ W(15) : 0000 : 21:21:03 ] (Multi)GetAttrPlusSHA: code = -2001, elapsed = 1.1 [ W(15) : 0000 : 21:21:03 ] mgrpent::CheckNonMutating: acode = -2001 hosts = [0x45123a78 0x45123a70 0x45123a68 0x45123a60 0x45123a58 0x45123a50 0x45123a48 0x45123a40], retcodes = [0 -2002 -2002 -2002 -2002 -2002 -2002 -2002] 0: [ 1 0 0 0 0 0 0 0 ] [ 50167af.0 ] [ 0 ] [ W(15) : 0000 : 21:21:03 ] fsobj::SetRcRights: (5043bac8.7f000000.84.10a), righ ts = 3 [ W(15) : 0000 : 21:21:03 ] fsobj::UpdateStatus: (5043bac8.7f000000.84.10a), uid = 0 [ W(15) : 0000 : 21:21:03 ] fsobj::Matriculate: (5043bac8.7f000000.84.10a) [ W(15) : 0000 : 21:21:03 ] fsobj::AttachChild: (5043bac8.7f000000.1.1), (5043ba c8.7f000000.84.10a) [ W(15) : 0000 : 21:21:03 ] mgrpent::Put 0x10dc00, uid = 0, mid = 1, auth = 1, r efcount = 2, detached = 0 [ W(15) : 0000 : 21:21:03 ] mgrpent::PutHostSet: 0x10dc00 [ W(15) : 0000 : 21:21:03 ] fsobj::Reference: (5043bac8.7f000000.84.10a), old = 0, new = 5 [ W(15) : 0000 : 21:21:03 ] fsdb::Put: (5043bac8.7f000000.84.10a), refcnt = 1, r eaders = 1, writers = 0, openers = 0 [ W(15) : 0000 : 21:21:03 ] fsdb::Get-mre: key = (5043bac8.7f000000.84.10a), uid = 0, rights = 15, comp = tmp [ W(15) : 0000 : 21:21:03 ] repvol::IsUnderRepair: vol = 7f000000, uid = -1 [ W(15) : 0000 : 21:21:03 ] vdb::Put: (7f000000, /), refcnt = 4 [ W(15) : 0000 : 21:21:03 ] The advice was to ReadDiscFetch --> Fetching. [ W(15) : 0000 : 21:21:03 ] fsobj::Fetch: (tmp), uid = 0 [ W(15) : 0000 : 21:21:03 ] vsgent::GetMgrp 0x120900 0 1 [ W(15) : 0000 : 21:21:03 ] mgrpent::GetHostSet: 0x10dc00, uid = 0, mid = 1 [ W(15) : 0000 : 21:21:03 ] volent::FlushCOP2(Piggy): vol = 7f000000 [ W(15) : 0000 : 21:21:03 ] CommQueue: insert pri 3 count = 1 [ W(15) : 0000 : 21:21:03 ] (Multi)Fetch: start [ W(15) : 0000 : 21:21:03 ] CommQueue: remove pri 3 count = 0 [ W(15) : 0000 : 21:21:03 ] (Multi)Fetch: code = -2001, elapsed = 1.3 [ W(15) : 0000 : 21:21:03 ] mgrpent::CheckNonMutating: acode = -2001 hosts = [0x45125f10 0x45125f08 0x45125f00 0x45125ef8 0x45125ef0 0x45125ee8 0x45125ee0 0x45125ed8], retcodes = [0 -2002 -2002 -2002 -2002 -2002 -2002 -2002] 0: [ 1 0 0 0 0 0 0 0 ] [ 50167af.0 ] [ 0 ] [ W(15) : 0000 : 21:21:03 ] (Multi)ViceFetch: fetched 15 bytes [ W(15) : 0000 : 21:21:03 ] fsobj::UpdateStatus: (5043bac8.7f000000.84.10a), uid = 0 [ W(15) : 0000 : 21:21:03 ] mgrpent::Put 0x10dc00, uid = 0, mid = 1, auth = 1, r efcount = 2, detached = 0 [ W(15) : 0000 : 21:21:03 ] mgrpent::PutHostSet: 0x10dc00 [ W(15) : 0000 : 21:21:03 ] fsobj::SetRcRights: (5043bac8.7f000000.84.10a), righ ts = 15 [ W(15) : 0000 : 21:21:03 ] fsobj::Reference: (5043bac8.7f000000.84.10a), old = 5, new = 6 [ W(15) : 0000 : 21:21:03 ] fsobj::TryToCover: fid = (5043bac8.7f000000.84.10a) [ W(15) : 0000 : 21:21:03 ] vdb::Get: volname = diablonet.tmp_at_diablonet.net [ W(15) : 0000 : 21:21:03 ] GetAdmConn: diablonet.net [ W(15) : 0000 : 21:21:03 ] GetServer: host = 64.7.160.150 [ W(15) : 0000 : 21:21:03 ] srvent::GetConn: host = harvest.diablonet.net, uid = -1, force = 0 [ W(15) : 0000 : 21:21:03 ] PutConn: host = harvest.diablonet.net, uid = -1, cid = 415605628, auth = 0 [ W(15) : 0000 : 21:21:03 ] PutServer: harvest.diablonet.net [ W(15) : 0000 : 21:21:03 ] CommQueue: insert pri 3 count = 1 [ W(15) : 0000 : 21:21:03 ] GetVolumeInfo: start [ W(15) : 0000 : 21:21:03 ] CommQueue: remove pri 3 count = 0 [ W(15) : 0000 : 21:21:03 ] GetVolumeInfo: code = 0, elapsed = 1.0 [ W(15) : 0000 : 21:21:03 ] connent::CheckResult: code = 0, vid = 0 [ W(15) : 0000 : 21:21:03 ] PutConn: host = harvest.diablonet.net, uid = -1, cid = 415605628, auth = 0 [ W(15) : 0000 : 21:21:03 ] vdb::Get: vid = 5043bac8.1000007 [ W(15) : 0000 : 21:21:03 ] vdb::Get: volname = 16777223_at_diablonet.net [ W(15) : 0000 : 21:21:03 ] GetAdmConn: diablonet.net [ W(15) : 0000 : 21:21:03 ] GetServer: host = 64.7.160.151 [ W(15) : 0000 : 21:21:03 ] srvent::GetConn: host = blossom.diablnet.net, uid = -1, force = 0 [ W(15) : 0000 : 21:21:03 ] PutConn: host = blossom.diablnet.net, uid = -1, cid = 391227550, auth = 0 [ W(15) : 0000 : 21:21:03 ] PutServer: blossom.diablnet.net [ W(15) : 0000 : 21:21:03 ] CommQueue: insert pri 3 count = 1 [ W(15) : 0000 : 21:21:03 ] GetVolumeInfo: start [ W(15) : 0000 : 21:21:03 ] CommQueue: remove pri 3 count = 0 [ W(15) : 0000 : 21:21:03 ] GetVolumeInfo: code = 0, elapsed = 1.0 [ W(15) : 0000 : 21:21:03 ] connent::CheckResult: code = 0, vid = 0 [ W(15) : 0000 : 21:21:03 ] PutConn: host = blossom.diablnet.net, uid = -1, cid = 391227550, auth = 0 [ W(15) : 0000 : 21:21:03 ] volent::volent: (1000007, 16777223) [ W(15) : 0000 : 21:21:03 ] volrep::volrep: host: 64.7.160.151 readonly: 0 paren t: 0x7f00000b) [ W(15) : 0000 : 21:21:03 ] GetServer: host = 64.7.160.151 [ W(15) : 0000 : 21:21:03 ] volent::volent: (7f00000b, diablonet.tmp) [ W(15) : 0000 : 21:21:03 ] repvol::repvol 5043a4c8 00000000 00000000 00000000 0 0000000 00000000 00000000 00000000 [ W(15) : 0000 : 21:21:03 ] vsgdb::GetVSG 451276c8 451276c0 451276b8 451276b0 45 1276a8 451276a0 45127698 45127690 [ W(15) : 0000 : 21:21:03 ] vdb::Put: (1000007, 16777223), refcnt = 2 [ W(15) : 0000 : 21:21:03 ] fsdb::Get-mre: key = (5043bac8.7f00000b.1.1), uid = 0, rights = 3, comp = tmp [ W(15) : 0000 : 21:21:03 ] repvol::IsUnderRepair: vol = 7f00000b, uid = -1 [ W(15) : 0000 : 21:21:03 ] vdb::Put: (7f00000b, diablonet.tmp), refcnt = 2 [ W(15) : 0000 : 21:21:03 ] vproc::Begin_VFS(Vget): vid = 5043bac8.7f00000b, u.u _vol = 0, mode = -1 [ W(15) : 0000 : 21:21:03 ] vdb::Get: vid = 5043bac8.7f00000b [ W(15) : 0000 : 21:21:03 ] volent::TakeTransition: diablonet.tmp, state = Hoard ing, |AVSG| = 1, logv = 0 [ W(15) : 0000 : 21:21:03 ] CML = 0, Res = 0 [ W(15) : 0000 : 21:21:03 ] fsdb::Get-mre: key = (5043bac8.7f00000b.1.1), uid = 0, rights = 3, comp = tmp [ W(15) : 0000 : 21:21:03 ] repvol::IsUnderRepair: vol = 7f00000b, uid = -1 [ W(15) : 0000 : 21:21:03 ] vdb::Put: (7f00000b, diablonet.tmp), refcnt = 3 [ W(15) : 0000 : 21:21:03 ] vdb::Get: vid = 5043bac8.7f00000b [ W(15) : 0000 : 21:21:03 ] fsobj::fsobj: fid = (5043bac8.7f00000b.1.1), comp = tmp [ W(15) : 0000 : 21:21:03 ] fsobj::ClearAcRights: (0.0.0.0), uid = -1 [ W(15) : 0000 : 21:21:03 ] fsobj::ClearRcRights: (5043bac8.7f00000b.1.1), right s = 0 [ W(15) : 0000 : 21:21:03 ] fsobj::DemoteAcRights: (5043bac8.7f00000b.1.1), uid = -1 [ W(15) : 0000 : 21:21:03 ] vdb::Put: (7f00000b, diablonet.tmp), refcnt = 4 [ W(15) : 0000 : 21:21:03 ] fsobj::GetAttr: (tmp), uid = 0 [ W(15) : 0000 : 21:21:03 ] vsgent::GetMgrp 0x120900 0 1 [ W(15) : 0000 : 21:21:03 ] mgrpent::GetHostSet: 0x10dc00, uid = 0, mid = 1 [ W(15) : 0000 : 21:21:03 ] volent::FlushCOP2(Piggy): vol = 7f00000b [ W(15) : 0000 : 21:21:03 ] RepOpCommCtxt::AllReplicasSupportSHA(): 0x10dc30 [ W(15) : 0000 : 21:21:03 ] fsobj::GetAttr: ViceGetAttrPlusSHA(0x7f00000b.1.1) [ W(15) : 0000 : 21:21:03 ] CommQueue: insert pri 3 count = 1 [ W(15) : 0000 : 21:21:03 ] (Multi)GetAttrPlusSHA: start [ W(15) : 0000 : 21:21:03 ] CommQueue: remove pri 3 count = 0 [ W(15) : 0000 : 21:21:03 ] (Multi)GetAttrPlusSHA: code = -2001, elapsed = 1.1 [ W(15) : 0000 : 21:21:03 ] mgrpent::CheckNonMutating: acode = -2001 hosts = [0x451234e0 0x451234d8 0x451234d0 0x451234c8 0x451234c0 0x451234b8 0x451234b0 0x451234a8], retcodes = [0 -2002 -2002 -2002 -2002 -2002 -2002 -2002] 0: [ 57 0 0 0 0 0 0 0 ] [ ebcd120.0 ] [ 0 ] [ W(15) : 0000 : 21:21:03 ] fsobj::SetRcRights: (5043bac8.7f00000b.1.1), rights = 3 [ W(15) : 0000 : 21:21:03 ] fsobj::UpdateStatus: (5043bac8.7f00000b.1.1), uid = 0 [ W(15) : 0000 : 21:21:03 ] fsobj::Matriculate: (5043bac8.7f00000b.1.1) [ W(15) : 0000 : 21:21:03 ] Realm::GetUser local uid '0' for realm 'diablonet.ne t' [ W(15) : 0000 : 21:21:03 ] fsobj::SetAcRights: (5043bac8.7f00000b.1.1), uid = 0 , my_rights = 127, any_rights = 9 [ W(15) : 0000 : 21:21:03 ] PutUser: [ W(15) : 0000 : 21:21:03 ] mgrpent::Put 0x10dc00, uid = 0, mid = 1, auth = 1, r efcount = 2, detached = 0 [ W(15) : 0000 : 21:21:03 ] mgrpent::PutHostSet: 0x10dc00 [ W(15) : 0000 : 21:21:03 ] fsobj::Reference: (5043bac8.7f00000b.1.1), old = 0, new = 7 [ W(15) : 0000 : 21:21:03 ] vproc::End_VFS(Vget): code = 0 [ W(15) : 0000 : 21:21:03 ] vdb::Put: (7f00000b, diablonet.tmp), refcnt = 3 [ W(15) : 0000 : 21:21:03 ] fsobj::MountRoot: fid = 5043bac8.7f00000b.1.1, mtptf id = 5043bac8.7f000000.84.10a [ W(15) : 0000 : 21:21:03 ] k_Purge: fid = (5043bac8.7f00000b.1.1), severely = 0 [ W(15) : 0000 : 21:21:03 ] k_Purge: CODA_ZAPDIR, returns 0 [ W(15) : 0000 : 21:21:03 ] fsobj::CoverMtPt: fid = (5043bac8.7f000000.84.10a), rootfid = (5043bac8.7f00000b.1.1) [ W(15) : 0000 : 21:21:03 ] k_Purge: fid = (5043bac8.7f000000.84.10a), severely = 1 [ W(15) : 0000 : 21:21:03 ] k_Purge: CODA_PURGEFID, returns 0 [ W(15) : 0000 : 21:21:03 ] fsdb::Put: (5043bac8.7f00000b.1.1), refcnt = 1, read ers = 0, writers = 1, openers = 0 [ W(15) : 0000 : 21:21:03 ] vdb::Put: (7f00000b, diablonet.tmp), refcnt = 2 [ W(15) : 0000 : 21:21:03 ] fsobj::Lookup: crossing mount point [ W(15) : 0000 : 21:21:03 ] fsdb::Put: (5043bac8.7f000000.84.10a), refcnt = 1, r eaders = 1, writers = 0, openers = 0 [ W(15) : 0000 : 21:21:03 ] fsdb::Put: (5043bac8.7f000000.1.1), refcnt = 1, read ers = 1, writers = 0, openers = 0 [ W(15) : 0000 : 21:21:03 ] fsdb::Put: (5043bac8.7f00000b.1.1), refcnt = 1, read ers = 1, writers = 0, openers = 0 [ W(15) : 0000 : 21:21:03 ] vproc::End_VFS(Lookup): code = 0 [ W(15) : 0000 : 21:21:03 ] vdb::Put: (7f000000, /), refcnt = 3 [ W(15) : 0000 : 21:21:03 ] Lookup : returns SUCCESS, elapsed = 30.3 msec [ X(00) : 0000 : 21:21:03 ] MarinerMux: mask = 0xffffc7e0 [ W(15) : 0000 : 21:21:03 ] CODA_IOCTL: u.u_pid = 20088 u.u_pgid = 20088 [ W(15) : 0000 : 21:21:03 ] vproc::ioctl(0): fid = 5043bac8.7f00000b.1.1, com = Get VolStat [ W(15) : 0000 : 21:21:03 ] vdb::Get: vid = 5043bac8.7f00000b [ W(15) : 0000 : 21:21:03 ] volent::GetVolStat: vid = 7f00000b, uid = 0 [ W(15) : 0000 : 21:21:03 ] vsgent::GetMgrp 0x120900 0 1 [ W(15) : 0000 : 21:21:03 ] mgrpent::GetHostSet: 0x10dc00, uid = 0, mid = 1 [ W(15) : 0000 : 21:21:03 ] CommQueue: insert pri 3 count = 1 [ W(15) : 0000 : 21:21:03 ] (Multi)GetVolumeStatus: start [ W(15) : 0000 : 21:21:03 ] CommQueue: remove pri 3 count = 0 [ W(15) : 0000 : 21:21:03 ] (Multi)GetVolumeStatus: code = -2001, elapsed = 1.2 [ W(15) : 0000 : 21:21:03 ] mgrpent::CheckNonMutating: acode = -2001 hosts = [0x45120ed0 0x45120ec8 0x45120ec0 0x45120eb8 0x45120eb0 0x45120ea8 0x45120ea0 0x45120e98], retcodes = [0 -2002 -2002 -2002 -2002 -2002 -2002 -2002] [ W(15) : 0000 : 21:21:03 ] mgrpent::Put 0x10dc00, uid = 0, mid = 1, auth = 1, r efcount = 2, detached = 0 [ W(15) : 0000 : 21:21:03 ] mgrpent::PutHostSet: 0x10dc00 [ W(15) : 0000 : 21:21:03 ] ***** FATAL SIGNAL (10) ***** and GDB output from the hung venus process, 0x0008b7f0 in SigChoke (sig=1158837200) at sighand.cc:241 241 sigsuspend(&mask); (gdb) list 236 { 237 int living_dead = 1; 238 sigset_t mask; 239 sigemptyset(&mask); 240 while (living_dead) { 241 sigsuspend(&mask); 242 } 243 } 244 #endif 245 (gdb) bt #0 0x0008b7f0 in SigChoke (sig=1158837200) at sighand.cc:241 #1 <signal handler called> (gdb) up #1 <signal handler called> (gdb) list 241 sigsuspend(&mask); 242 } 243 } 244 #endif 245 246 SigExit(sig); 247 } 248 249 static void SigExit(int sig) 250 { (gdb) i frame Stack level 1, frame at 0x451274d0: pc = 0x403c9250 in __sigtramp_sigcontext_1; saved pc 0x82d68 caller of frame at 0x451273e8 Arglist at 0x451274d0, args: Locals at 0x451274d0, Previous frame's sp in sp Hopefully this all might be useful to someone in suggesting where the fault might be? If anyone would like to see more data, please let me know and I will provide it ASAP. Thanks, Sean scaron_at_umich.eduReceived on 2006-07-26 23:36:30