Coda File System

From: Jan Harkes <jaharkes_at_cs.cmu.edu> Date: Wed, 4 Sep 2002 10:58:49 -0400

On Wed, Sep 04, 2002 at 02:38:10PM +0200, Stephan Kanthak wrote:
> After having a more or less detailed look into distributed file systems after
> reading a lot of manuals I am wondering a little bit about one single point
> in the current implementations (includes CODA):
> 
> Why is there the need of dedicated file servers? I think it would be easier to
> use and handle if there are only two types of machines:
> 
> - one central server that holds the complete list of files (not the files
>   itself) and a list of all connected clients
> - a lot of clients with cache(s) that hold the files' data

Disconnected operation

The when a client goes offline, other clients would not be able to get
the data. It is also easier for system administrators to backup all data
from a 'single known location' compared to forcing everyone to bring
their laptops in once in a while. And I would hate to be connected over
a dialup line around the time backups would kick in :)

Cache replacement

When a client's cache fills up, we can make room without having to check
with the servers whether we happen the be the last client that has a
copy of the file.

Security

Clients are considered to be untrusted, any permissions are ultimately
checked on the central server. If your client does not have the required
'token' that gives it permission to read or modify a file, it will not be
able to even touch that data.

With a peer-to-peer cache sharing, how does one client know whether
another completely unknown client is really allowed to get that file.
Either it has to have access to the auth2 password, so that it can check
the presented token, which would conflict with clients being untrusted.
Or it has to forward the token for verification to the central server,
which is a performance killer. Or it could only serve files that are
encrypted, and the keys can only be obtained from the central server by
the client that wants to see the file contents.

> Performance should be not worse than NFS in that case, even better
> because e.g. under linux the file system cache in memory will only hold
> directory inodes.

Actually it caches all inodes in the icache, directory lookups (both
successful and unsuccessful) in the dcache and file data in the
pagecache. So it doesn't really matter where the data is stored, even
when it is in a flat file it will be cached.

A more scalable solution would probably be a backend database, either
mysql, libdb3, or (if updates are infrequent) cdb.

> That system could work quite well even on server disconnect, because
> the local filesystem still works. And you don't need any adminstration
> of file servers or dedicated file servers. The server can also keep track
> of preconfigured minimum and maximum replication counts in order to

Ehh, there still is a central server for the file location lookups, so
there is some administration involved. Also when there are multiple
servers, which ones are responsible for storing the location information
of an object. Do they also store data to validate the object (md5/sha
checksum, decryption keys). What if the location server is offline and
an update occurs, is this then stored at some other available server, so
that when the original server comes back we have 2 conflicting
'locations' for an object?

And when a client updates a file, does the server force-feed the update
to other clients even when they are not interested to maintain the
replication counts? I wouldn't want that when I'm connected over dialup
either.

I do think it would be a very interesting project, and quite feasable,
Intermezzo actually has some of these ideas and others have come up in
Coda meetings. Although we were more discussing clients 'stealing' from
other client caches if it can give them some performance benefit. The
authorative copy would still be on the file servers.

> If I missed any implementation of a distributed file system that has exactly
> the features mentioned above, please tell me. I already tried CODA and

I guess you looked at typical P2P systems like FreeNet and Gnutella, etc.?
Then there are also Unison and Intermezzo.

Jan

Coda File System

Re: coda client/server concept