Coda File System

From: redirecting decoy <redirectingdecoy_at_yahoo.com> Date: Thu, 24 Feb 2005 15:00:11 -0800 (PST)

Hello again,

I have a new problem. Well, not really a new problem, just newly noticed.
I can't get my client to connect to my server.
Here's the situation:

1 Server:
     1) Called M1
     2) Has 2 Nics
        a) eth0: 10.1.2.1
        b) eth1: 192.168.10.101
     3) 1 Volume on the server called "test".
     4) No replication or anything like that.  

1 Client:
     1) Called M2
     2) Has 1 Nic
        a) eth0: 10.1.2.254
        b) M1(eth0) is the gateway for this machine.

Test Server:
     1) Called M3
     2) Has 2 Nics
        a) eth0: 10.1.3.1
        b) eth1: 192.168.10.102

I can ping M1 from M2 and vice versa. No network connectivity problems whatsoever, so far as
I can see.  In server.conf on M1,  I set "ipaddress=192.168.10.101", to tell the server to
listen on eth1.   

So I have 1 volume on 1 server, and I try to do "ls -alF /coda/realm/test" from my client.  That
doesn't work.  It just hangs there, or returns a broken link.  Looking at my codacon output:

##Codacon on M2##:   
------------------------------------------------------------------------------------------------
NewConnectFS M1 ( 17:18:59 )

I get nothing but this message repeatedly.  no bandwidth, or data walk or anything.

##/vice/srv/SrvLog##:
------------------------------------------------------------------------------------------------
17:06:30 Callback message to 10.1.2.254 port 1060 failed RPC2_NAKED (F)
17:06:30 Worker1: Unbinding RPC connection 9620
17:06:30 Deleting client entry for user System:AnyUser at 0.0.0.0.0 rpcid 9620
17:06:30 client_GetVenusId: got new host 10.1.2.254:1060
17:06:30 Building callback conn.
17:06:30 Callback message to 10.1.2.254 port 1060 failed RPC2_NOTCLIENT (F)
17:06:30 Worker3: Unbinding RPC connection 5991
17:06:30 Deleting client entry for user System:AnyUser at 0.0.0.0.0 rpcid 5991

I get this message repeatedly in my srvlog on M1

##/usr/local/venus.log##:
------------------------------------------------------------------------------------------------
[ X(00) : 0000 : 17:18:02 ] Coda Venus, version 6.0.7
[ X(00) : 0000 : 17:18:02 ] Logfile initialized with LogLevel = 0 at Thu Feb 24 17:18:02 2005
[ X(00) : 0000 : 17:18:02 ] E StatsInit()
[ X(00) : 0000 : 17:18:02 ] L StatsInit()
[ X(00) : 0000 : 17:18:02 ] BeginRvmFlush (1, 60, F)
[ X(00) : 0000 : 17:18:02 ] EndRvmFlush
[ X(00) : 0000 : 17:18:02 ] BeginRvmTruncate (1, 220, F)
[ X(00) : 0000 : 17:18:03 ] EndRvmTruncate
[ X(00) : 0000 : 17:18:03 ] BeginRvmFlush (1, 924, F)
[ X(00) : 0000 : 17:18:03 ] EndRvmFlush
[ X(00) : 0000 : 17:18:03 ] BeginRvmTruncate (7, 1084, F)
[ X(00) : 0000 : 17:18:03 ] EndRvmTruncate
[ X(00) : 0000 : 17:18:03 ] E adv_daemon::adv_daemon: AdviceServer
[ A(18) : 0000 : 17:18:03 ] adv_daemon::main()
[ H(07) : 0000 : 17:18:03 ] HDBDaemon about to sleep on hdbdaemon_sync
[ W(20) : 0000 : 17:18:03 ] FidToNodeid: called for volume root (51e90f88.ff000001)!!!
[ D(22) : 0000 : 17:18:12 ] WAITING(SRVRQ):
[ T(01) : 0001 : 17:18:17 ] BeginRvmFlush (1, 50056, T)
[ T(01) : 0001 : 17:18:17 ] EndRvmFlush
[ D(22) : 0000 : 17:18:34 ] WAIT OVER, elapsed = 21646.7
[ D(22) : 0000 : 17:18:34 ] WAITING(SRVRQ):
[ W(20) : 0000 : 17:18:34 ] WAITING(SRVRQ):
[ C(09) : 0000 : 17:18:34 ] callbackserver::main: can't find server (handle = 9811)
[ D(22) : 0000 : 17:18:34 ] WAIT OVER, elapsed = 8.2
[ W(20) : 0000 : 17:18:34 ] WAIT OVER, elapsed = 5.0
[ W(20) : 0000 : 17:18:34 ] WAITING(SRVRQ):
[ C(11) : 0000 : 17:18:34 ] callbackserver::main: can't find server (handle = 3222)
[ W(20) : 0000 : 17:18:34 ] WAIT OVER, elapsed = 5.3
[ D(24) : 0000 : 17:18:34 ] WAITING(SRVRQ):
[ D(22) : 0000 : 17:18:34 ] WAITING(SRVRQ):
[ C(09) : 0011 : 17:40:56 ] callbackserver::main: can't find server (handle = 7447)
[ C(11) : 0011 : 17:41:01 ] callbackserver::main: can't find server (handle = 12678)
[ C(08) : 0012 : 17:41:01 ] callbackserver::main: can't find server (handle = 7072)
[ C(10) : 0012 : 17:41:01 ] callbackserver::main: can't find server (handle = 9983)
[ C(12) : 0012 : 17:41:06 ] callbackserver::main: can't find server (handle = 9362)
[ C(09) : 0012 : 17:41:06 ] callbackserver::main: can't find server (handle = 9786)

And I get this output in my venus log.  

As a test, I tried using a different server (M3:eth1), and managed to connect from
M2 without any problems.  So I believe that my firewall/network setup is correct.

After examining ethereal logs, this is what I figured is happening:

M2(10.1.2.254)          =>    M1(192.168.10.101)
M1(192.168.10.101)      =>    M2(10.1.2.254)  From  M1(10.1.2.1) as source
M2(10.1.2.254)          =>    M1(10.1.2.1)

So M2 tries to connect to M1 on public interface, recieves a  reply
from the private interface, then M2 want's to respond to M1(10.1.2.1).
This is where I think it is breaking.   Since the coda server is not
listening on M1(eth0), then I'm guessing that the packets just disappear into network land.

I tried changing the server.conf file on M1 to "ipaddress=10.1.2.1", and attempted to connect
to the client.  This partially worked.   Codacon showed me that stuff was actually going on this
time, however I could not get past the "Resource Temporarily Unavailable" message.

I don't...think  this is a firewall issue, as I am not really doing much in my iptable rules
other than port blocking,

M1 does have this for masquerading:
-A POSTROUTING -o eth1 -s 10.1.2.0/24 -j MASQUERADE

and M2 isn't doing anything special. I have even tested with iptables turned off.
It appears that the reason I didn't notice this before is because I had 3 servers
running with replication turned on.  so my clients were able to connect to one 
of the other 2 servers, just not the coda server that also acts as the gateway for the client.

So what am I doing wrong?

Any help would be greatly appreciated, thanks in advance...

-R.D.

__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - 250MB free storage. Do more. Manage less. 
http://info.mail.yahoo.com/mail_250

Coda File System

1 Server + 1 Client = Big Headache