(Illustration by Gaich Muramatsu)
On Fri, May 06, 2016 at 10:07:35AM +0200, u-myfx_at_aetey.se wrote: > On Thu, May 05, 2016 at 11:25:36AM -0400, Jan Harkes wrote: > > On Thu, May 05, 2016 at 01:13:53PM +0200, u-myfx_at_aetey.se wrote: > > > they verify that the other party belongs to the correct realm, > > > but this might happen to be a different server in the same realm. I guess > > > mixing the server id into the handshake would eliminate this uncertainty. > > > > Eh? Server ids should not be exposed like that to begin with. > > > > Aside from that a client isn't trying to connect to a server, it is > > trying to bind to a volume. If you get connected to the the wrong server > > (how in the world is that even a thing that would 'happen'?) it wouldn't > > be able to bind to the volume anyway and so the end result is the same > > without needing to put serverids in the handshake. > > > > A client should have no need to know a server id, ever. > > I guess you are thinking about things which are unrelated. > > Think of a "server" as of an "RPC2 server" (f.i. update?), > the "server id" is the idea of the client about > "which service instance I am to talk to". We are talking about RPC2, which is a messaging protocol between clients and servers that relies on shared secrets to get a common session key. If client A wants to connect to server B and somehow finds out that it is supposed to connect to address X port Y, and whoever 'picks up the phone' at that address uses the correct shared secret to complete the handshake, then there is no use for the instance id. Because if A somehow got connected to something that isn't B, then there is no reliable way to resolve the issue of 'how do I connect to B'. And no, disconnecting and reconnecting until we randomly hit the right instance id is not a solution. This is not an RPC2 issue, this is an issue with the application and adding that instance id in the RPC2 handshake is not going to solve it. > This is not the case for other services. The server instances > generally are not equal e.g. for update the scm is special, for Only because of how some Coda server updates are stored and propagated. They could be stored in etcd or zookeeper or a mysql database for all I care and things would be very different. The problem is that this backend detail shouldn't even have to propagate out to clients, there should be no need for cpasswd to know that the auth2 daemon on 'server5681' is any more or less special than any of the other auth2 daemons. The auth2 daemons should be aware how updates are propagated, so they could delegate the password change request to the correct place. That is how every other system that uses something like PAXOS or RAFT to choose a master/coordinator handles such situations. Not by having clients just reconnect to a random server in the hope to hit the one that is special. How would a client know that instance '1', or maybe '0' happens to be the special one, if there is a PAXOS style master selection where any of the servers could be the read/write replica and they can even revote an pick a new one in case the current master is lost. > resolution the coordinator is special. Yes it is and how it is chosen? The client connects to a random replica for a volume and by sheer luck that server is the coordinator for the resolution. Or maybe it wasn't luck after all, maybe any of the servers can become the coordinator for the duration of a resolution and it is whichever server the client connected to. > I would not dare to analyze _all_ cases including possibly unknown future > ones and be sure that talking to a wrong instance never ever can lead > to a problem. There is no such thing as a wrong instance, and if you think the client application could have a better idea than the server instances I've got some bad news for you. JanReceived on 2016-05-06 09:19:06