Coda File System

From: Jan Harkes <jaharkes_at_cs.cmu.edu> Date: Thu, 8 Mar 2007 19:23:02 -0500

On Thu, Mar 08, 2007 at 07:33:22PM +0100, Enrico Weigelt wrote:
> hmm, right :(
> 
> But we could transmit just the differences / changed fields 
> (since last transmit). We add an bitmask field to the stat block
> which tells us which fields are actually filled. Empty fields are
> simply left out.
> 
> The new stat request also contains an bitmask which tells the server 
> which fields we're actually interested in. With some additinal heuristics 
> the server could some more fields than the client requested. 

I guess we strayed a bit from the original problem. The slowness was
caused by high latency, not necessarily lack of bandwidth. The only way
to overcome the latency is to send more data than the client asks for,
such as piggybacking all attributes along with a request for directory
data. The drawbacks are that we may be sending information the client
already has (wasting bandwidth) or if we assume limited cache space we
may be sending information that is less useful than the data the client
has to discard to store the new information.

Reducing the amount we send or store does not matter much, since the
idea was to send more in order to hide the latency. So we would ideally
want the client to store an (almost) infinite amount of information so
that it never has to discard already cached data. But if we can cache
everything without having to discard, we would either be be resending
data that is already cached, if it wasn't cached it must have changed.

At this point we get to what Intermezzo tried to do, instead of sending
callbacks to invalidate cached objects forcing the client to refetch the
ones it is interested in, the server would send a log of all recent
changes. This approach works, but probably doesn't scale the same way,
at some point the total amount of change in the system is limited by the
available bandwidth necessary to propagate them to all clients.

Coda's model assumes that we are not that interested in everything that
happens on the server. So we are notified in case something may have
become invalid (callback) and fetch only those updates we know we are
interested in based on the user's hoard profile or as a result of
application requests. In a way everything is a tradeoff.

And because application requests will always be serialized we get
penalized on high latency connections. A hoard profile could potentially
avoid latency cost since it tells the client the complete set of files
that we are interested in. We're not really exploiting such information
though, partly because hoarding is an asynchronous background process,
nobody is really waiting for it to complete (most of the time) and by
only using a single hoard walking thread we interfere less with any
foreground (user) activity. The user may accept 50% of his bandwidth
being used by the background hoard fetches, but would probably not
appreciate it if his webbrowser becomes unusable every 10 minutes.

> > Also Coda clients use the getattr result to detect conflicting versions,
> > while file and directory contents are only fetched from a single server.
> > So if we piggypack file attributes with the directory data the client
> > would not see differences between replicas until we try to revalidate
> > the cached attributes, or fetch the data from another replica and notice
> > the version mismatch.
> 
> Well, we could do this with an separate notification. For example, the 
> client gives an list of files it has in cache, and the server then sends
> change notifications automatically. In other words: the client subscribes
> to certain filesystem object.

That is already done in the form of callbacks. When we first fetch an
object the server remembers this and will send a callback notification
whenever it changes. If the client got disconnected it will first send
the local object identifiers and versions with ValidateAttrs to check if
any of them have changed during the disconnection and reestablish
callbacks for the unchanged objects.

What I was trying to describe is the replicated server case. When one
server knows more (or less) than another and we only get information
from that one server we never get to see that there is a difference.
Also only the one we talked to will inform us about updates. So if some
other client has a poor network connection and only updates the other
server we never get told about it. That is until we disconnect/reconnect
and start checking all of our cached objects with ValidateAttrs.

> BTW: TCP would be good for people sitting behind an firewall,
> stream encapsulation (ie. ssh/ssl), etc.

Firewalls are often there for a reason.

I can see a case for TCP as a means to offload data transfers to the
kernel and because people building network routers try to avoid breaking
TCP connections. But for RPC operations which are always
request-reponse, UDP really isn't that bad, DNS uses it all the time.
Our only difference is that we try to associate state with an the
host/port of an incoming UDP packet, but masquerading may change those.
There are other solutions, a per client random key/identifier, or a
polling 'has anything changed' query, or leases. All of those are not
insignificant changes.

We also don't need to tunnel Coda traffic through an ssh or ssl tunnel,
the existing encryption code should do a pretty fine job already.

RPC2's encryption follows the various IPsec RFCs as closely as possible.
Some differences are that we operate on the UDP level instead of IP and
our session keys are not between hosts but between RPC2 endpoints.
One significant difference is that we use the modified Andrew RPC
handshake for key establishment instead of punting that to either a
separate daemon or static key. This handshake is as far as I know assumed
to be secure, the original Andrew RPC protocol was analyzed by Burrows,
Abadi and Needham which identified a weakness. They suggest an alternate
protocol that provides a stronger guarantees, which is the 'modified
Andrew RPC protocol' that we use.

    A Logic of Authentication (1990)
    http://citeseer.ist.psu.edu/burrows90logic.html

The Andrew RPC protocol analysis starts at page 26 (28 of the pdf)

Jan

Coda File System

Re: coda very slow on roadwarrior