(Illustration by Gaich Muramatsu)
I wrote about a week ago about being unable to setup a volume on a second coda server. While that problem has been fixed (after upgrading the SCM and the second server to FreeBSD 4.3-STABLE, and stopping the updatesrv process on the second server), a related problem remains. If the SCM is aware of the second server (i.e. has entries for the second server in /vice/db/servers and /vice/db/VSGDB, and a volume entry for the new server in /vice/db/VSList), *no* client can access /coda anymore. If the coda services on the SCM are stopped, references to the second server taken out of the relevant files, and the services restarted, everything on the clients works fine again and they can access SCM-stored files. The SCM is running coda server version 5.3.13_1, and the second server has 5.3.14. The SCM has the following processes: auth2, rpc2portmap, updatesrv, updateclnt, and codasrv The second server has the following processes: auth2, updateclnt, codasrv (these are the only ones started by the /usr/local/etc/rc.d/rc.vice script.) When the SCM is configured for both servers, the coda processes start up fine, but as soon as a client machine tries to access /coda (even with an ls) the following error is returned on the client: : andria_at_doj; ls -la /coda ls: /coda: Device not configured The second server doesn't seem to be logging anything of significance (in either of the Update logs or the Srv logs) but the SCM has the following SrvLog: 16:39:40 New SrvLog started at Thu Jun 14 16:39:40 2001 16:39:40 Resource limit on data size are set to 536870912 16:39:40 RvmType is Rvm 16:39:40 Main process doing a LWP_Init() 16:39:40 Main thread just did a RVM_SET_THREAD_DATA 16:39:40 Setting Rvm Truncate threshhold to 5. Partition /data: inodes in use: 12633, total: 2097152. 16:39:55 Partition /data: 5127996K available (minfree=7%), 4369168K free. 16:39:55 The server (pid 488) can be controlled using volutil commands 16:39:55 "volutil -help" will give you a list of these commands 16:39:55 If desperate, "kill -SIGWINCH 488" will increase debugging level 16:39:55 "kill -SIGUSR2 488" will set debugging level to zero 16:39:55 "kill -9 488" will kill a runaway server 16:39:55 Vice file system salvager, version 3.0. 16:39:55 SanityCheckFreeLists: Checking RVM Vnode Free lists. 16:39:55 DestroyBadVolumes: Checking for destroyed volumes. 16:39:55 Salvaging file system partition /data 16:39:55 Force salvage of all volumes on this partition 16:39:55 Scanning inodes in directory /data... 16:39:59 SFS: There are some volumes without any inodes in them 16:39:59 Entering DCC(0x37000001) 16:39:59 DCC: Salvaging Logs for volume 0x37000001 16:40:04 done: 13527 files/dirs, 514649 blocks 16:40:04 SFS:No Inode summary for volume 0x37000003; skipping full salvage 16:40:04 SalvageFileSys: Therefore only resetting inUse flag 16:40:04 SFS:No Inode summary for volume 0x37000004; skipping full salvage 16:40:04 SalvageFileSys: Therefore only resetting inUse flag 16:40:04 Entering DCC(0x37000006) 16:40:04 DCC: Salvaging Logs for volume 0x37000006 16:40:04 done: 135 files/dirs, 82148 blocks 16:40:04 SalvageFileSys completed on /data 16:40:04 VAttachVolumeById: vol 37000001 (/data.0) attached and online 16:40:04 VAttachVolumeById: vol 37000002 (u.andria.0) attached and online 16:40:04 VAttachVolumeById: vol 37000003 (u.adrian.0) attached and online 16:40:04 VAttachVolumeById: vol 37000004 (u.jmalone.0) attached and online 16:40:04 VAttachVolumeById: vol 37000006 (admin.0) attached and online 16:40:04 Attached 5 volumes; 0 volumes not attached lqman: Creating LockQueue Manager.....LockQueue Manager starting ..... 16:40:04 LockQueue Manager just did a rvmlib_set_thread_data() done 16:40:04 CallBackCheckLWP just did a rvmlib_set_thread_data() 16:40:04 CheckLWP just did a rvmlib_set_thread_data() 16:40:04 ServerLWP 0 just did a rvmlib_set_thread_data() 16:40:04 ServerLWP 1 just did a rvmlib_set_thread_data() 16:40:04 ServerLWP 2 just did a rvmlib_set_thread_data() 16:40:04 ServerLWP 3 just did a rvmlib_set_thread_data() 16:40:04 ServerLWP 4 just did a rvmlib_set_thread_data() 16:40:04 ServerLWP 5 just did a rvmlib_set_thread_data() 16:40:04 ResLWP-0 just did a rvmlib_set_thread_data() 16:40:04 ResLWP-1 just did a rvmlib_set_thread_data() 16:40:04 VolUtilLWP 0 just did a rvmlib_set_thread_data() 16:40:04 VolUtilLWP 1 just did a rvmlib_set_thread_data() 16:40:04 Starting SmonDaemon timer 16:40:04 File Server started Thu Jun 14 16:40:04 2001 16:40:04 client_GetVenusId: got new host 10.0.0.21:2430 16:40:04 Building callback conn. 16:40:04 No idle WriteBack conns, building new one 16:40:04 Writeback message to 10.0.0.21 port 2430 on conn 38780c5b succeeded 16:40:04 ValidateVolumes: 0x7f000000 failed! 16:40:04 ValidateVolumes: 0x7f000005 failed! 16:40:04 GetVolObj: VGetVolume(7f000005) error 103 16:40:04 GrabFsObj, GetVolObj error Volume not online 16:40:06 GetVolObj: VGetVolume(7f000005) error 103 16:40:06 GrabFsObj, GetVolObj error Volume not online 16:40:06 GetVolObj: VGetVolume(7f000005) error 103 16:40:06 GrabFsObj, GetVolObj error Volume not online 16:40:10 GetVolObj: VGetVolume(7f000000) error 103 16:40:10 GrabFsObj, GetVolObj error Volume not online 16:40:10 GetVolObj: VGetVolume(7f000000) error 103 16:40:10 GrabFsObj, GetVolObj error Volume not online 16:41:17 client_GetVenusId: got new host 10.0.0.35:2430 16:41:17 Building callback conn. 16:41:17 No idle WriteBack conns, building new one 16:41:17 Writeback message to 10.0.0.35 port 2430 on conn c5ee03c succeeded 16:41:17 RevokeWBPermit on conn c5ee03c returned 0 16:41:17 GetVolObj: VGetVolume(7f000000) error 103 16:41:17 GrabFsObj, GetVolObj error Volume not online 16:46:07 Worker3: Unbinding RPC connection 808164070 16:46:31 RevokeWBPermit on conn 38780c5b returned 0 16:46:31 GetVolObj: VGetVolume(7f000003) error 103 16:46:31 GrabFsObj, GetVolObj error Volume not online 17:00:53 RevokeWBPermit on conn c5ee03c returned 0 17:00:53 GetVolObj: VGetVolume(7f000000) error 103 17:00:53 GrabFsObj, GetVolObj error Volume not online 17:00:56 GetVolObj: VGetVolume(7f000000) error 103 17:00:56 GrabFsObj, GetVolObj error Volume not online 17:00:56 GetVolObj: VGetVolume(7f000000) error 103 17:00:56 GrabFsObj, GetVolObj error Volume not online 17:00:56 GetVolObj: VGetVolume(7f000000) error 103 17:00:56 GrabFsObj, GetVolObj error Volume not online 17:01:08 RevokeWBPermit on conn 38780c5b returned 0 And on and on and on.... The volumes above that are failing (7f000000, 7f0000003, 7f0000005) are all the ones I tried to access from the client, and they are all singly-replicated volumes (from the SCM). All the tokens are the same for both servers, they can otherwise communicate normally, and they each have single entries in their /vice/db/vicetab files, like the following: irs /data ftree width=128,depth=3 Any help, or suggestions for what to try next, would be greatly appreciated. Thanks! AndriaReceived on 2001-06-14 17:28:06