Coda File System

From: Jan Harkes <jaharkes_at_cs.cmu.edu> Date: Fri, 6 May 2016 09:18:55 -0400

On Fri, May 06, 2016 at 10:07:35AM +0200, u-myfx_at_aetey.se wrote:
> On Thu, May 05, 2016 at 11:25:36AM -0400, Jan Harkes wrote:
> > On Thu, May 05, 2016 at 01:13:53PM +0200, u-myfx_at_aetey.se wrote:
> > > they verify that the other party belongs to the correct realm,
> > > but this might happen to be a different server in the same realm. I guess
> > > mixing the server id into the handshake would eliminate this uncertainty.
> > 
> > Eh? Server ids should not be exposed like that to begin with.
> > 
> > Aside from that a client isn't trying to connect to a server, it is
> > trying to bind to a volume. If you get connected to the the wrong server
> > (how in the world is that even a thing that would 'happen'?) it wouldn't
> > be able to bind to the volume anyway and so the end result is the same
> > without needing to put serverids in the handshake.
> > 
> > A client should have no need to know a server id, ever.
> 
> I guess you are thinking about things which are unrelated.
> 
> Think of a "server" as of an "RPC2 server" (f.i. update?),
> the "server id" is the idea of the client about
> "which service instance I am to talk to".

We are talking about RPC2, which is a messaging protocol between clients
and servers that relies on shared secrets to get a common session key.

If client A wants to connect to server B and somehow finds out that it
is supposed to connect to address X port Y, and whoever 'picks up the
phone' at that address uses the correct shared secret to complete the
handshake, then there is no use for the instance id.

Because if A somehow got connected to something that isn't B, then there
is no reliable way to resolve the issue of 'how do I connect to B'. And
no, disconnecting and reconnecting until we randomly hit the right
instance id is not a solution. This is not an RPC2 issue, this is an
issue with the application and adding that instance id in the RPC2
handshake is not going to solve it.

> This is not the case for other services. The server instances
> generally are not equal e.g. for update the scm is special, for

Only because of how some Coda server updates are stored and propagated.
They could be stored in etcd or zookeeper or a mysql database for all I
care and things would be very different. The problem is that this backend
detail shouldn't even have to propagate out to clients, there should be
no need for cpasswd to know that the auth2 daemon on 'server5681' is any
more or less special than any of the other auth2 daemons. The auth2
daemons should be aware how updates are propagated, so they could
delegate the password change request to the correct place.

That is how every other system that uses something like PAXOS or RAFT to
choose a master/coordinator handles such situations. Not by having
clients just reconnect to a random server in the hope to hit the one
that is special. How would a client know that instance '1', or maybe
'0' happens to be the special one, if there is a PAXOS style master
selection where any of the servers could be the read/write replica and
they can even revote an pick a new one in case the current master is
lost.

> resolution the coordinator is special.

Yes it is and how it is chosen? The client connects to a random replica
for a volume and by sheer luck that server is the coordinator for the
resolution. Or maybe it wasn't luck after all, maybe any of the servers
can become the coordinator for the duration of a resolution and it is
whichever server the client connected to.

> I would not dare to analyze _all_ cases including possibly unknown future
> ones and be sure that talking to a wrong instance never ever can lead
> to a problem.

There is no such thing as a wrong instance, and if you think the client
application could have a better idea than the server instances I've got
some bad news for you.

Jan

Coda File System

Re: Coda development (rpc2 handshake / instance authentication)