Coda File System

From: Jan Harkes <jaharkes_at_cs.cmu.edu> Date: Sat, 28 May 2005 11:46:27 -0400

On Fri, May 27, 2005 at 11:00:04AM -0600, Patrick Walsh wrote:
> 	(I'm still waiting for more help on the problem of venus crashing with
> signal 11's, but in the meantime, I have some more general questions).
> 
> 	I'm writing tests to make sure coda handles various disconnected and
> failure states well.  One test simulates this scenario:
> 
> ,_______.         ,_______.       ,_______.        ,________.
> |Clnt 1 | <----> | Svr A | <---> | Svr B | <----> | Clnt 2 |
> `-------'         `-------'       `-------'        `--------'
> 
> (in case that doesn't come through somewhere, I describe it below)
> 
> 	Client 1 can see Server A, Client 2 can see Server B and the two
> servers can see each other.  Given a replicated volume, R, my
> understanding is that, in strongly connected mode, if Client 1 writes a
> file to R then Server A passes that file on to Server B before returning
> from the store call.  Correct?

No, that is not correct. All the smarts are in the clients, the servers
just store objects.

When client 1 writes, he writes to all available replicas. The server
doesn't do anything special here, so the data only exists on server A.
Now server A does send a callback to all clients that have fetched a
copy of the file or directory that was involved in the update. If your
'filter' is good enough, this would mean that client 2 doesn't even see
the callback, but some client 3 might have received it.

The next access by one of the clients who received the callback collects
the attributes from all available replicas, part of the attributes is
the version vector. If server A and server B disagree about the version,
then the client sends a 'resolve' message to the servers. Resolution
then compares all replicas and forces an update to bring all servers
back in sync. This is a fairly heavyweight process, first all replicas
are locked, then the attributes are collected, for files we know at this
point if there are no conflicting updates, and all servers are told to
fetch the latest version. If there was a conflict the object is marked
as inconsistent which needs to be repaired by the user.

Now if it was a directory and there were conflicting updates, we collect
logs of recent updated on all replicas, merge them and push out the
merged version, each server applies the differences and the final
directory contents is collected and compared. If they are still
different the directory is marked inconsistent and requires user guided
repair.

All in all, there are about 6 phases, and between 3 and 4 RPC calls
between the servers for every single object that is found to have a
version vector difference.

> 	If that's the case, then Client 2 should be able to see the new file
> fairly quickly, right?  Alas, in my tests, it takes between 5 and 10
> minutes before Client 2 can see the change, regardless of how many times
> I try to flush the cache, checkservers or checkvolumes.

The fact that client 2 sees the update at all means that there must be
some client 3 that happens to see both servers, which gets the callback.
Then during the next cache validation/hoard walk, which occurs about
once every 10 minutes, that client fetches the attributes from both
servers, notices the version skew and triggers the resolution process.

Jan

Coda File System

Re: replication