Coda File System

Replication problem...

From: EXT / FOCAL CASSUTO Marc <ext.focal.cassuto_at_sncf.fr>
Date: Wed, 11 Aug 2004 17:35:27 +0200
Good evening again,
 
the setup :
========
2 coda servers (coda-debug-server-6.0.6-1) running on Fedora core1, kernel
2.4.20-8; let's say servint1 & proxytest
servint1 is the SCM.
2 coda clients (coda-debug-client-6.0.6-1) each running on the same previous
servers; lets say C1 & C2
 
# cat VRList
volROOT 7f000001 2 a000002 14000001 0 0 0 0 0 0 E0000103
volSHARE 7f000002 2 a000003 14000002 0 0 0 0 0 0 E0000103
 
# cat VSGDB
E0000100 servint1
E0000102 proxytest
E0000103 servint1 proxytest

The problem :
==========
I was just beginnig all my tests.
The 2 servers up. The first time I modified the file toto with the client on
the non SCM server 
I could not see the modifications with the client on the SCM server.
It sounds like suddently the replication process failed
 
I am playing and testing codaFS for a production environnement for 1.5 week.
This sort of unexplainable problem is rather annoying...
 
Please find an explanation
 
The logs :
========
 
SCM codacon:
(I manually remove all the bandwith land ValidateAttrsPlusSHA logs)
 
NewConnectFS servint1 ( 16:53:04 )
NewConnection servint1 ( 16:53:04 )
BackProbe servint1 ( 16:53:04 )
NewConnectFS servint1 ( 16:53:04 )
BackProbe servint1 ( 16:53:04 )
unreachable proxytest ( 16:53:09 )
bandwidth servint1 762195 1175088 2564102 ( 16:53:09 )
ValidateVols volROOT [1] ( 16:53:09 )
ValidateVols volROOT [1] ( 16:53:09 )
Probe ( 16:53:14 )
BackProbe servint1 ( 16:53:14 )
bandwidth servint1 642673 1066098 3125000 ( 16:53:14 )
DisconnectFS servint1 ( 16:53:16 )
DisconnectFS ( 16:53:16 )
NewConnectFS servint1 ( 16:53:16 )
BackProbe servint1 ( 16:53:16 )
bandwidth servint1 951474 1436781 2932551 ( 16:53:21 )
GetRootVolume @servint1.dsit.sncf.fr ( 16:53:27 )
GetVolumeInfo volROOT_at_servint1.dsit.sncf.fr
<mailto:volROOT_at_servint1.dsit.sncf.fr>  ( 16:53:27 )
NewConnectFS servint1 ( 16:53:27 )
BackProbe servint1 ( 16:53:27 )
000001.1.1) [0] ( 16:53:27 )
GetVolumeInfo /myMount_at_servint1.dsit.sncf.fr ( 16:53:27 )
COP2 volROOT ( 16:55:00 )
NewConnectFS proxytest ( 16:55:43 )
NewConnection proxytest ( 16:55:43 )
up proxytest ( 16:55:43 )
BackProbe proxytest ( 16:55:43 )
Probe ( 16:55:43 )
BackProbe proxytest ( 16:55:43 )
NewConnectFS proxytest ( 16:55:44 )
BackProbe proxytest ( 16:55:44 )
volROOT ( 16:55:44 )
Resolve (501be108.7f000001.1.1) ( 16:55:44 )
Callback servint1 (501be108.a000002.0.0) ( 16:55:44 )
Callback servint1 (501be108.7f000001.1.1) ( 16:55:44 )
Callback proxytest (501be108.7f000001.1.1) ( 16:55:44)
Fetch servint1.dsit.sncf.fr [2] ( 16:55:55 )
fetching (servint1.dsit.sncf.fr) 100% ( 16:55:55 )
Symlink #volSHARE. ( 16:56:30 )
NewConnectFS servint1 ( 16:56:30 )
BackProbe servint1 ( 16:56:30 )
NewConnectFS proxytest ( 16:56:30 )
BackProbe proxytest ( 16:56:30 )
GetVolVS volROOT ( 16:56:30 )
COP2 volROOT ( 16:56:30 )
GetVolumeInfo volSHARE_at_servint1.dsit.sncf.fr
<mailto:volSHARE_at_servint1.dsit.sncf.fr>  ( 16:56:40 )
GetVolumeInfo 167772163_at_servint1.dsit.sncf.fr
<mailto:167772163_at_servint1.dsit.sncf.fr>  ( 16:56:40 )
GetVolumeInfo 335544322_at_servint1.dsit.sncf.fr
<mailto:335544322_at_servint1.dsit.sncf.fr>  ( 16:56:40 )
GetAttr share ( 16:56:40 )
Fetch share [2] ( 16:56:40 )
fetching (share) 100% ( 16:56:40 )
AllocFids volSHARE ( 16:58:50 )
Create toto ( 16:58:50 )
Store toto [1] ( 16:58:50 )
COP2 volSHARE ( 16:59:00 )
Probe ( 17:01:30 )
BackProbe servint1 ( 17:01:30 )
BackProbe proxytest ( 17:01:30 )
BeginStatusWalk [8] ( 17:03:05 )
   [0, 0, 0, 0] [0] ( 17:03:05 )
EndStatusWalk [8] ( 17:03:05 )
   [0, 0, 0, 0] [0, 0, 0] [1, 0, 0.0] ( 17:03:05 )
BeginDataWalk [4] ( 17:03:05 )
EndDataWalk [4] ( 17:03:05 )
   [1, 0, 0.0] [0, 0, 0, 0] ( 17:03:05 )
Probe ( 17:04:04 )
BackProbe servint1 ( 17:04:04 )
BackProbe proxytest ( 17:04:04 )
GetVolumeInfo volSHARE_at_servint1.dsit.sncf.fr
<mailto:volSHARE_at_servint1.dsit.sncf.fr>  ( 17:05:54 )
GetVolumeInfo 167772163_at_servint1.dsit.sncf.fr
<mailto:167772163_at_servint1.dsit.sncf.fr>  ( 17:05:54 )
GetVolumeInfo 335544322_at_servint1.dsit.sncf.fr
<mailto:335544322_at_servint1.dsit.sncf.fr>  ( 17:05:54 )
GetAttr share ( 17:05:54 )
Fetch share [2] ( 17:05:57 )
fetching (share) 100% ( 17:05:57 )
GetAttr toto ( 17:05:57 )
Fetch toto [1] ( 17:05:57 )
fetching (toto) 100% ( 17:05:57 )

 
Non SCM codacon :
===============
NewConnectFS servint1 ( 16:57:22 )
NewConnection servint1 ( 16:57:22 )
BackProbe servint1 ( 16:57:22 )
NewConnectFS proxytest ( 16:57:22 )
NewConnectFS servint1 ( 16:57:22 )
NewConnection proxytest ( 16:57:22 )
BackProbe proxytest ( 16:57:22 )
BackProbe servint1 ( 16:57:22 )
Probe ( 16:57:22 )
GetRootVolume @servint1.dsit.sncf.fr ( 16:57:22 )
BackProbe proxytest ( 16:57:22 )
BackProbe servint1 ( 16:57:22 )
GetVolumeInfo volROOT_at_servint1.dsit.sncf.fr
<mailto:volROOT_at_servint1.dsit.sncf.fr>  ( 16:57:22 )
NewConnectFS servint1 ( 16:57:22 )
BackProbe servint1 ( 16:57:22 )
NewConnectFS proxytest ( 16:57:22 )
BackProbe proxytest ( 16:57:22 )
Fetch servint1.dsit.sncf.fr [2] ( 16:57:23 )
fetching (servint1.dsit.sncf.fr) 100% ( 16:57:23 )
GetAttr share ( 16:57:23 )
Fetch share [1] ( 16:57:23 )
fetching (share) 100% ( 16:57:23 )
GetVolumeInfo volSHARE_at_servint1.dsit.sncf.fr
<mailto:volSHARE_at_servint1.dsit.sncf.fr>  ( 16:57:23 )
GetVolumeInfo 167772163_at_servint1.dsit.sncf.fr
<mailto:167772163_at_servint1.dsit.sncf.fr>  ( 16:57:23 )
GetVolumeInfo 335544322_at_servint1.dsit.sncf.fr
<mailto:335544322_at_servint1.dsit.sncf.fr>  ( 16:57:23 )
GetAttr share ( 16:57:23 )
Fetch share [2] ( 16:57:26 )
fetching (share) 100% ( 16:57:26 )
DisconnectFS servint1 ( 16:57:53 )
DisconnectFS proxytest ( 16:57:53 )
DisconnectFS servint1 ( 16:57:53 )
DisconnectFS ( 16:57:53 )
NewConnectFS servint1 ( 16:57:53 )
BackProbe servint1 ( 16:57:53 )
Callback proxytest (501be108.7f000002.1.1) ( 16:59:03 )
Callback servint1 (501be108.7f000002.1.1) ( 16:59:03 )
NewConnectFS proxytest ( 17:01:33 )
BackProbe proxytest ( 17:01:33 )
Probe ( 17:01:33 )
BackProbe proxytest ( 17:01:33 )
BackProbe servint1 ( 17:01:33 )
NewConnectFS servint1 ( 17:01:39 )
BackProbe servint1 ( 17:01:39 )
NewConnectFS proxytest ( 17:01:39 )
BackProbe proxytest ( 17:01:39 )
Fetch share [2] ( 17:01:39 )
fetching (share) 100% ( 17:01:39 )
GetAttr toto ( 17:01:39 )
Fetch toto [1] ( 17:01:39 )
fetching (toto) 100% ( 17:01:39 )

SCM SrvLog:
==========
16:55:00 Incomplete host set in COP2.
16:55:44 Entering RecovDirResolve 7f000001.1.1
16:55:44 ComputeCompOps: fid(0x7f000001.1.1)
16:55:44 RS_ShipLogs - returning 0
16:55:44 Going to spool log entry for phase3
16:57:09 client_GetVenusId: got new host 10.40.51.19:32786
16:57:09 Building callback conn.
16:57:09 VGetVnode: vnode a000002.2 is not allocated
16:57:40 Worker1: Unbinding RPC connection 3489
16:57:40 Worker2: Unbinding RPC connection 12284
16:57:40 Worker4: Unbinding RPC connection 8810
16:58:50 VAllocFid: volume disk uniquifier being extended
17:01:26 GetAttrPlusSHA: Computing SHA a000003.4.2, disk.inode=2

 
non SCM SrvLog:
============
16:54:01 ResLWP-1 just did a rvmlib_set_thread_data()
16:54:01 VolUtilLWP 0 just did a rvmlib_set_thread_data()
16:54:01 VolUtilLWP 1 just did a rvmlib_set_thread_data()
16:54:01 Starting SmonDaemon timer
16:54:01 File Server started Wed Aug 11 16:54:01 2004
16:54:21 New Data Base received
16:54:51 New Data Base received
16:55:56 client_GetVenusId: got new host 10.40.51.52:32817
16:55:56 Building callback conn.
16:55:58 ComputeCompOps: fid(0x7f000001.1.1)
16:55:58 RS_ShipLogs - returning 0
16:55:58 Going to spool log entry for phase3
16:57:22 client_GetVenusId: got new host 10.40.51.19:32786
16:57:22 Building callback conn.
16:57:22 VGetVnode: vnode 14000001.2 is not allocated
16:57:53 Worker3: Unbinding RPC connection 11959
16:57:53 Worker4: Unbinding RPC connection 4902
16:59:03 VAllocFid: volume disk uniquifier being extended
17:01:39 GetAttrPlusSHA: Computing SHA 14000002.4.2, disk.inode=2
17:11:10 Entering RecovDirResolve 7f000002.1.1
17:11:10 RecovDirResolve: RegDirResolution succeeded

Sincerely yours,
Marc Cassuto.


 
Received on 2004-08-11 11:37:58