Coda File System

From: Jan Harkes <jaharkes_at_cs.cmu.edu> Date: Thu, 12 Aug 1999 14:42:50 -0400

On Thu, Aug 12, 1999 at 01:42:40PM -0400, hagopiar_at_vuser.vu.union.edu wrote:
> The trouble is that it's a design decision that eliminates coda from being
> a viable network filesystem for a (potentially large) number of cases. As
> it stands, all clients must have a cache directory as large as the largest
> file they will use. This defeats what I see as a primary benefit of
> networked storage which is the centralization of storage space.
> 
> I imagine everyone here has had the occasion to work with files of a few
> hundred megabytes, with coda you can't work them remotely unless you have
> the local disk space. I shudder to think what would happen if you dialed
> up via PPP and tried to tail a 650MB log file.

:) Funny, you wouldn't see anything until the logfile is rotated on the
machine that is doing the logging. Coda keeps the thing local until the
last writer closes the file. I work a lot with large logs, but when I
want them on a centralized machine I either move them there once a day
standard logfile rotation, or I use syslog's remote logging
functionality.

> I'm guess I'm just surprised this is the case based on the deriviation
> from AFS, and considering that no other network file system that I know of
> has similar requirements (of course, none has the other features of coda).

AFS2 does the same thing, I believe later versions added block level
caching, but with AFS you won't have R/W access when the network is
gone, and, I believe, no replicated servers.

> I don't mean to complain - you designed and built it based on your
> requirements, not mine - but do understand that this limitation eliminates
> coda as a possibility for us.

Coda was designed almost 12 years ago. If it were redesigned from
scratch now, it would probably be done differently. And about the
limitations, that actually depends totally on how you use your very
large files.

One use would be to read them once, filter out the important stuff
(f.i. generating statistics from web-access logs), and never look at the
thing again. No problem, but I would advice using NFS.

But if you are re-processing the file more than once, Coda will only
have to fetch once. And when running in write-disconnected (or logging)
mode, the same observation is valid when modifying the file. Coda will
optimize any duplicate stores from the log, and will write the data only
when it has sufficiently aged.

So you save the time to transfer the file _every_ time you read or write
it while it is cached. And that is a very big deal on a PPP connection.
As the network becomes faster, clients wil have less use for caching the
data, as it takes a fraction of a second to get the whole big thing
again. But... the server savings become more obvious. A server can
support more clients when they are not fetching the same files over and
over again.

> However... :-) would it be possible for coda to cache/lock parts of the
> file locally? Then when the cache fills it could attempt to purge parts of
> the file back to the server?
> 								-Rob H.

It is probably easier to modify the kernel to swap the page and buffer
caches to disk, and avoid shrinking these except when the system is
running low on swapspace. Far fewer code to modify, and you don't need
to remove useful features from it.

Jan

Coda File System

Re: Cache Overflow