Coda File System

coda servers rebuild

From: Florin Grad <florin_at_mandrakesoft.com>
Date: 24 Jul 2000 17:31:48 +0200
Hello again,

I have some questions about servers crash/rebuild :

Mainly I asked myself these questions from the very beginning and did find
a clear answer yet:


1. Coda backup coordinator

   Do we have to use the coda backup coordinator machine or any other
   backup system will do ? We could, for example restore the entire disk
   and mount it on /vicepa afterwards. Will that create too many
   inconsistencies ?

2. Non-scm crash/ rebuild

   The partition of one of the non-scm servers that I had on amovible hard
   drive crashed because of some test I did. It was on an reiserfs
   partition. 
   Anyway, I re-installed the non-scm server, it managed to get the
   /vice/db files 

   Question: how do I get the volume to populate the non-scm /vicepa
   directory. I type bld...sh non-scm server on the scm server, it said
   success bla bla but this still doesn't work.

   Actually, when i log with the client on the scm only I get the same
   messages. I get the name of the directories on /coda but when I do a cd
   I get an error message.

   So how do I add a new non-scm server on a coda cell an dpopulate it
   with the existing volumes, I should modify the /vice/db/VSGDB adn type
   bldvldb.sh new-non-scm again ? But how will the others non-scm servers
   know about it ?

   If I try to connect the client I get the error message:

cd /coda/pub
17:26:38 MaxRetries exceeded...returning EWOULDBLOCK
17:26:45 MaxRetries exceeded...returning EWOULDBLOCK
bash: cd: /coda/pub: Ressource not available or something like that


on the non-scm server I have in /vice/srv/SrvLog
15:18:08 New SrvLog started at Mon Jul 24 15:18:08 2000

15:18:08 Resource limit on data size are set to 2147483647

15:18:08 Server etext 0x80c83c4, edata 0x8100008
15:18:08 RvmType is Rvm
15:18:08 Main process doing a LWP_Init()
15:18:08 Main thread just did a RVM_SET_THREAD_DATA

15:18:08 Setting Rvm Truncate threshhold to 5.

Partition /vicepa: inodes in use: 0, total: 16777216.
15:18:20 Partition /vicepa: 238255K available (minfree=5%), 238253K free.
15:18:20 The server (pid 704) can be controlled using volutil commands
15:18:20 "volutil -help" will give you a list of these commands
15:18:20 If desperate,
        "kill -SIGWINCH 704" will increase debugging level
15:18:20    "kill -SIGUSR2 704" will set debugging level to zero
15:18:20    "kill -9 704" will kill a runaway server
15:18:20 Vice file system salvager, version 3.0.
15:18:21 SanityCheckFreeLists: Checking RVM Vnode Free lists.
15:18:21 DestroyBadVolumes: Checking for destroyed volumes.
15:18:21 Salvaging file system partition /vicepa
15:18:21 Force salvage of all volumes on this partition
15:18:21 Scanning inodes in directory /vicepa...
15:18:21 SalvageFileSys completed on /vicepa
15:18:21 Attached 0 volumes; 0 volumes not attached
lqman: Creating LockQueue Manager.....LockQueue Manager starting .....
15:18:21 LockQueue Manager just did a rvmlib_set_thread_data()

done
15:18:21 CallBackCheckLWP just did a rvmlib_set_thread_data()

15:18:21 CheckLWP just did a rvmlib_set_thread_data()

15:18:21 ServerLWP 0 just did a rvmlib_set_thread_data()

15:18:21 ServerLWP 1 just did a rvmlib_set_thread_data()

15:18:21 ServerLWP 2 just did a rvmlib_set_thread_data()

15:18:21 ServerLWP 3 just did a rvmlib_set_thread_data()

15:18:21 ServerLWP 4 just did a rvmlib_set_thread_data()

15:18:21 ServerLWP 5 just did a rvmlib_set_thread_data()

15:18:21 ResLWP-0 just did a rvmlib_set_thread_data()

15:18:21 ResLWP-1 just did a rvmlib_set_thread_data()

15:18:21 VolUtilLWP 0 just did a rvmlib_set_thread_data()

15:18:21 VolUtilLWP 1 just did a rvmlib_set_thread_data()

15:18:21 Starting SmonDaemon timer
15:18:21 File Server started Mon Jul 24 15:18:21 2000

15:21:34 client_GetVenusId: got new host 192.168.1.214:2430
15:21:34 Building callback conn.
15:21:34 No idle WriteBack conns, building new one
15:21:34 Writeback message to 192.168.1.214 port 2430 on conn 159e92a0 succeeded
15:21:35 client_GetVenusId: got new host 192.168.1.27:2430
15:21:35 Building callback conn.
15:21:35 No idle WriteBack conns, building new one
15:21:35 Writeback message to 192.168.1.27 port 2430 on conn 17ccb899 succeeded
15:21:35 GetVolObj: VGetVolume(2000007) error 103
15:21:35 GrabFsObj, GetVolObj error Volume not online
15:21:35 GetVolObj: VGetVolume(2000007) error 103
15:21:35 GrabFsObj, GetVolObj error Volume not online
15:21:35 GetVolObj: VGetVolume(2000007) error 103
15:21:35 RS_LockAndFetch: Error 103 during GetVolObj for (0x2000007.0x1.0x1)
15:21:35 GetVolObj: VGetVolume(2000007) error 103
15:21:35 GrabFsObj, GetVolObj error Volume not online
15:21:35 GetVolObj: VGetVolume(2000007) error 103
15:21:35 RS_LockAndFetch: Error 103 during GetVolObj for (0x2000007.0x1.0x1)
15:21:35 RevokeWBPermit on conn 17ccb899 returned 0
15:21:35 GetVolObj: VGetVolume(2000001) error 103
15:21:35 GrabFsObj, GetVolObj error Volume not online
15:21:35 GetVolObj: VGetVolume(2000001) error 103
15:21:35 GrabFsObj, GetVolObj error Volume not online
15:21:35 GetVolObj: VGetVolume(2000001) error 103
15:21:35 RS_LockAndFetch: Error 103 during GetVolObj for (0x2000001.0x1.0x1)
15:21:35 GetVolObj: VGetVolume(2000001) error 103
15:21:35 GrabFsObj, GetVolObj error Volume not online
15:21:35 GetVolObj: VGetVolume(2000001) error 103
15:21:35 RS_LockAndFetch: Error 103 during GetVolObj for (0x2000001.0x1.0x1)
15:21:35 RevokeWBPermit on conn 17ccb899 returned 0
15:21:35 GetVolObj: VGetVolume(2000004) error 103
15:21:35 GrabFsObj, GetVolObj error Volume not online
15:21:35 GetVolObj: VGetVolume(2000004) error 103
15:21:35 RS_LockAndFetch: Error 103 during GetVolObj for (0x2000004.0x1.0x1)
15:21:35 GetVolObj: VGetVolume(2000004) error 103

15:31:36 Unbinding RPC2 connection 126131479
15:31:36 Unbinding RPC2 connection 109566470
15:31:36 Unbinding RPC2 connection 968266099
15:31:36 Unbinding RPC2 connection 62628995
15:31:36 Unbinding RPC2 connection 261667013
16:18:51 SmonDaemon timer expired
16:18:51 Entered CheckRVMResStat
16:18:51 Starting SmonDaemon timer
17:19:21 SmonDaemon timer expired
17:19:21 Entered CheckRVMResStat
17:19:21 Starting SmonDaemon timer
17:23:56 client_GetVenusId: got new host "a client ip address":2430



on the Scm i get :


15:19:07 New SrvLog started at Mon Jul 24 15:19:07 2000

15:19:07 Resource limit on data size are set to 2147483647

15:19:07 Server etext 0x80c83c4, edata 0x8100008
15:19:07 RvmType is Rvm
15:19:07 Main process doing a LWP_Init()
15:19:07 Main thread just did a RVM_SET_THREAD_DATA

15:19:07 Setting Rvm Truncate threshhold to 5.

Partition /vicepa: inodes in use: 167, total: 16777216.
15:19:11 Partition /vicepa: 255887K available (minfree=0%), 220169K free.
15:19:11 The server (pid 679) can be controlled using volutil commands
15:19:11 "volutil -help" will give you a list of these commands
15:19:11 If desperate,
        "kill -SIGWINCH 679" will increase debugging level
15:19:11    "kill -SIGUSR2 679" will set debugging level to zero
15:19:11    "kill -9 679" will kill a runaway server
15:19:11 Vice file system salvager, version 3.0.
15:19:11 SanityCheckFreeLists: Checking RVM Vnode Free lists.
15:19:11 DestroyBadVolumes: Checking for destroyed volumes.
15:19:11 Salvaging file system partition /vicepa
15:19:11 Force salvage of all volumes on this partition
15:19:11 Scanning inodes in directory /vicepa...
15:19:11 SFS: There are some volumes without any inodes in them
15:19:11 Entering DCC(0x1000001)
15:19:11 DCC: Salvaging Logs for volume 0x1000001

15:19:11 done:  5 files/dirs,   6 blocks
15:19:11 Entering DCC(0x1000002)
15:19:11 DCC: Salvaging Logs for volume 0x1000002

15:19:11 done:  22 files/dirs,  7421 blocks
15:19:11 Entering DCC(0x1000003)
15:19:11 DCC: Salvaging Logs for volume 0x1000003

15:19:11 done:  104 files/dirs, 1238 blocks
15:19:11 Entering DCC(0x1000004)
15:19:11 DCC: Salvaging Logs for volume 0x1000004

15:19:11 done:  8 files/dirs,   70 blocks
15:19:11 Entering DCC(0x1000005)
15:19:11 DCC: Salvaging Logs for volume 0x1000005

15:19:11 done:  3 files/dirs,   4 blocks
15:19:11 SFS:No Inode summary for volume 0x1000006; skipping full salvage
15:19:11 SalvageFileSys: Therefore only resetting inUse flag
15:19:11 SFS:No Inode summary for volume 0x1000007; skipping full salvage
15:19:11 SalvageFileSys: Therefore only resetting inUse flag
15:19:11 Entering DCC(0x1000008)
15:19:11 DCC: Salvaging Logs for volume 0x1000008
15:19:11 done:  3 files/dirs,   4 blocks
15:19:11 Entering DCC(0x1000009)
15:19:11 DCC: Salvaging Logs for volume 0x1000009

15:19:11 done:  18 files/dirs,  56676 blocks
15:19:11 Entering DCC(0x100000a)
15:19:11 DCC: Salvaging Logs for volume 0x100000a

15:19:11 done:  16 files/dirs,  44359 blocks
15:19:11 SalvageFileSys completed on /vicepa
15:19:11 VAttachVolumeById: vol 1000001 (coda_root.0) attached and online
15:19:11 VAttachVolumeById: vol 1000002 (coda.0) attached and online
15:19:11 VAttachVolumeById: vol 1000003 (doc.0) attached and online
15:19:11 VAttachVolumeById: vol 1000004 (pub.0) attached and online
15:19:11 VAttachVolumeById: vol 1000005 (users.0) attached and online
15:19:11 VAttachVolumeById: vol 1000006 (mandrake.0) attached and online
15:19:11 VAttachVolumeById: vol 1000007 (a.0) attached and online
15:19:11 VAttachVolumeById: vol 1000008 (mp3.0) attached and online
15:19:11 VAttachVolumeById: vol 1000009 (cubana.0) attached and online
15:19:11 VAttachVolumeById: vol 100000a (Golden_Gate.0) attached and online
15:19:11 Attached 10 volumes; 0 volumes not attached
lqman: Creating LockQueue Manager.....LockQueue Manager starting .....
15:19:11 LockQueue Manager just did a rvmlib_set_thread_data()

done
15:19:11 CallBackCheckLWP just did a rvmlib_set_thread_data()

15:19:11 CheckLWP just did a rvmlib_set_thread_data()

15:19:11 ServerLWP 0 just did a rvmlib_set_thread_data()

15:19:11 ServerLWP 1 just did a rvmlib_set_thread_data()

15:19:11 ServerLWP 2 just did a rvmlib_set_thread_data()

15:19:11 ServerLWP 3 just did a rvmlib_set_thread_data()

15:19:11 ServerLWP 4 just did a rvmlib_set_thread_data()

15:19:11 ServerLWP 5 just did a rvmlib_set_thread_data()

15:19:11 ResLWP-0 just did a rvmlib_set_thread_data()

15:19:11 ResLWP-1 just did a rvmlib_set_thread_data()

15:19:11 VolUtilLWP 0 just did a rvmlib_set_thread_data()

15:19:11 VolUtilLWP 1 just did a rvmlib_set_thread_data()

15:19:11 Starting SmonDaemon timer
15:19:11 File Server started Mon Jul 24 15:19:11 2000

15:22:48 client_GetVenusId: got new host 192.168.1.27:2430

15:31:56 Unbinding RPC2 connection 77674659
15:31:56 Unbinding RPC2 connection 607742238
15:31:56 Unbinding RPC2 connection 193043662
15:31:56 Unbinding RPC2 connection 146736843
15:31:56 Unbinding RPC2 connection 438581500
15:31:56 Unbinding RPC2 connection 909797812
15:31:56 Unbinding RPC2 connection 713149311
15:31:56 Unbinding RPC2 connection 495551406
16:19:41 SmonDaemon timer expired
16:19:41 Entered CheckRVMResStat
16:19:41 Starting SmonDaemon timer
17:19:41 SmonDaemon timer expired
17:19:41 Entered CheckRVMResStat
17:19:41 Starting SmonDaemon timer
17:25:21 client_GetVenusId: got new host 192.168.1.27:2430  (the client IP address)
17:25:21 Building callback conn.
17:25:21 No idle WriteBack conns, building new one
17:25:21 Writeback message to 192.168.1.27 port 2430 on conn 22c5d1a7 succeeded
17:25:21 RevokeWBPermit on conn 22c5d1a7 returned 0
17:25:24 RevokeWBPermit on conn 22c5d1a7 returned 0
17:25:24 CheckRetCodes: server 192.168.1.64 returned error 103 (a non scm IP address)
17:25:24 ViceResolve:Couldnt lock volume 7f000007 at all accessible servers
17:25:24 CheckRetCodes: server 192.168.1.64 returned error 103
17:25:24 RevokeWBPermit on conn 22c5d1a7 returned 0
17:25:24 ViceResolve:Couldnt lock volume 7f000007 at all accessible servers
17:25:24 RevokeWBPermit on conn 22c5d1a7 returned 0
17:25:24 CheckRetCodes: server 192.168.1.64 returned error 103
17:25:24 ViceResolve:Couldnt lock volume 7f000001 at all accessible servers
17:25:24 CheckRetCodes: server 192.168.1.64 returned error 103
17:25:24 ViceResolve:Couldnt lock volume 7f000001 at all accessible servers
17:25:24 RevokeWBPermit on conn 22c5d1a7 returned 0
17:25:25 CheckRetCodes: server 192.168.1.64 returned error 103
17:25:25 ViceResolve:Couldnt lock volume 7f000004 at all accessible servers
17:25:25 CheckRetCodes: server 192.168.1.64 returned error 103
17:25:25 ViceResolve:Couldnt lock volume 7f000004 at all accessible servers
17:25:25 RevokeWBPermit on conn 22c5d1a7 returned 0
17:25:25 CheckRetCodes: server 192.168.1.64 returned error 103
17:25:25 ViceResolve:Couldnt lock volume 7f000003 at all accessible servers
17:25:25 CheckRetCodes: server 192.168.1.64 returned error 103
17:25:25 ViceResolve:Couldnt lock volume 7f000003 at all accessible servers
17:25:27 CheckRetCodes: server 192.168.1.64 returned error 103
17:25:27 ViceResolve:Couldnt lock volume 7f000003 at all accessible servers
17:25:27 CheckRetCodes: server 192.168.1.64 returned error 103
17:25:27 ViceResolve:Couldnt lock volume 7f000003 at all accessible servers
17:25:28 CheckRetCodes: server 192.168.1.64 returned error 103
17:25:28 ViceResolve:Couldnt lock volume 7f000003 at all accessible servers
17:25:29 CheckRetCodes: server 192.168.1.64 returned error 103


3. Scm crash/rebuild

   Now, if the scm crashes or if I turn it off, will the cell still be
   working, eventually i I restart the clients ?


sincerely,
-- 
Florin 
florin@mandrakesoft.com			http://www.linux-mandrake.com	
Received on 2000-07-24 11:34:04