(Illustration by Gaich Muramatsu)
On Tue, Oct 28, 2003 at 08:49:31AM -0000, josef.schwarz_at_bt.com wrote: > > No you have to let the two servers started, running, in order them to > > replicate all data.... > > When one is down for a while, everything work corectly > > because there is > > still one up that could hang the work... > > And when the down server get back up, he will syncronize and get all > > new data automaticaly... > > No, but the point is that I want it to be more flexible. The system > shall not be dependant that the absent server comes back, another > server should take over his position. > So that's not possible with Coda, is it? That is perfectly possible. One thing is that although Coda clients are built with weak and intermittend connectivity in mind and can switch IP-addresses and such, the servers really should not arbitrarily switch addresses and be 'available' as much as possible. Some of the reasons are implementation issues, i.e. how servers announce themselves to clients and how they handle conflict resolution. The other is that clients have only a limited cache space and associated with that only require a finite log for operations that were performed during disconnection. But servers have to deal with possibly hundreds or thousands of clients that might be gone for several weeks (or get reinitialized and never return). So the server can't really keep all that much state around. And because we use a similar operation log (the resolution log) to resolve conflicts between servers we would need an infinite log to be able to deal with long disconnections. These resolution logs are only truncated when we know that all replicas are in sync. So having 3 servers, but only 2 available at any given time won't help either. > > - STOP thinking in IP... you are going in such a complicated > > way for a > > couple of day , getting aroud modifying conf files / commands !! I > > really think you are in a wrong way of work... Try to make it work > > correctly, before modifying things and go into no-ways ! > > ( there is enough to do !!! ) > > Well, IP addresses rather than hostnames... It really seems that the > set-up with IP addresses was no good idea. And that's one thing I > don't understand - how one can design a system which can not handle IP > addresses at the same time; it should not be that much effort, one > check if it's an ip address and when it is one, skip the > gethostbyname()? The set up with IP addresses works just fine, in fact both clients and servers internally only know about each other as a single IPv4 adddress. However using IP addresses is a problem, it makes it impossible to deal with things like multihomed hosts, or do failover when a machine fails. Coda actually attempts to deal with these things by mapping the 'realm' to a group of servers that will be queried for volume information. But if you don't give it enough information to work and tell it the name or IP address of just a single server, then the client won't be able to handle some failover cases. > > 'cfs lv /coda/172.16.1.1' and 'cfs lv /coda/172.16.3.1' > > > > That mean you have two clusters !!!!! > > Sorry, I don't know what you mean. I am not working with clusters. The > reason that the two clients seem to be in different subnets is that > the underlying vpn infrastructure requires it. > > But this really should not matter, since the appropriate files are > adapted, and it is working basically. I'll try to explain.... Before 6.0.0 Coda clients had to be configured to talk to a group of 'root servers' for a specific installation. This group is a proper subset of all available servers. When the client wants to locate some volume it can ask any of the rootservers, if a server is unreachable it will simply try the next one until it either runs out of accessible servers, or succeeds. The volume location information contains information about which servers really are carrying the various replicas of a volume. With 6.0 the list of 'rootservers' is pulled out of DNS. The realm name is used in a DNS query and the results are interpreted as all the hosts that are able to respond to volume location queries. So you could use several IN SRV records, or CNAME, or even IN A so that we can map the realm name into a set of ip-addresses that will be used for volume location information. Now if you can't modify DNS data, there is an /etc/hosts like solution available in /etc/coda/realms. The /etc/coda/realms file ofcourse has the same problems as /etc/hosts, different clients might have different entries, and you don't get a globally consistent view. So what happens in your situation when you talk to /coda/172.16.1.1 is that you are telling the Coda client that there is only a single server usable to answer volume location queries for a realm named '172.16.1.1'. Then when you access /coda/172.16.3.1, you tell the client that there is a realm named '172.16.3.1' that also has only a single responsible volume location server. There is no way for the client to know that both of these are really the same, in a way you've even explicitly told the client that they are in fact different by using different 'realm names'. So the client dutifully binds to both realms, and if you fetch an object in one realm it will need to be refetched when you access it in the other. And when either server goes down you completely lose access to uncached volumes through the path related to that server. Coda servers do get a bit confused by this, it tracks clients based on the ip-address. So the server is just seeing this one client fetching the same files several times, it doesn't really mind about that all that much. But when something changes it will only send one callback message to the client, and that is exactly where the problems start. Because the client really approaches the server from 2 individual instances, if you modify something under /coda/172.16.1.1/ the server should really be sending a callback to the identical (but modified) object in /coda/172.16.3.1/. If something is changed on the server by another client, the server should be sending callbacks to both instances instead of just one, etc. So both instances on the client will quickly diverge and then when you try to write to the realm that hasn't seen any callbacks, the server will reject the operation because it is performed on stale data and the client gives up and declares a conflict. JanReceived on 2003-10-28 12:19:57