Coda File System

From: Jan Harkes <jaharkes_at_cs.cmu.edu> Date: Mon, 4 Jun 2001 14:54:02 -0400

On Fri, Jun 01, 2001 at 12:46:59PM +0200, Peter Sch?ller wrote:
> There are two things that are bothering me however:
> 
> (1) RVM size.
> 
> Suppose we want 100 GB of data on a server. Using the 4% approximation, 
> that's more than 4 gig of RVM. This poses a problem because RVM is mapped to 
> virtual memory (I'm talking 32 bit architectures now). How is this problem 
> usually solved?

There are several approaches. First thing however, the 4% rule only
applies when there is a 'typical filesize distribution'. Measuments on
Unix workstations have show an average filesize between 4-16KB. So the
4% rule for RVM size would allow the server to store 100GB / 16KB ~=
6553600 files.

It's the number of files limit that is the problem and when your average
file is 10MB, there would only be about 10000 files for which we really
only need 6.2MB of RVM space.

If your server really needs to store several millions of files, the
current work around is kind of a hack.

You can effectively run 2 or more coda-servers on the same hardware.
Each server exports part of the filespace so that each server requires
less RVM. The only problem is that clients want to connect to a fixed
port on a server-ip address. The machine has to have several ip-aliases
configured for the same interface, and the server has to be told to bind
to a specific ip-address.

There are 2 other approaches that haven't been implemented, but are
being considered. Per volume RVM segments that can be mapped and
unmapped independently. Basically the Coda server would start to do some
page-in/page-out style management with RVM segments similar to how the
kernel handles multiple processes.

The other approach would 'demote' RVM to be an intermediate cache only
and the actual metadata would reside on disk along with the container
files.

> Also, do we actually need that much physical RAM to achieve high
> performance, or is RVM expected to be mostly accessed on disk? I
> realize all the data is mapped, but I mean in terms of cache:ing the
> data in RAM.

No, the OS should manage the physical memory usage by swapping in active
pages and swapping out inactive pages of the RVM mapping.

> (2) How is user authentication handled in a server environment? That is, 
> assuming the Coda clients are servers with long-running processes. The 
> daemons need to be able to access the Coda file system. Given that 
> authentication tickes expire, how is this done?

It's easy to provide read-only access without tokens, simply give
"System:AnyUser rl" rights to the directories that the daemons need to
get data from.

For write access it is a bit more difficult, as in reality you want the
maildelivery/webserver/etc. to run as a specific authenticated process.
However, the fork/exec and setsid/setgid/setuid tricks that are used to
drop root priviledges confuse Coda's process identity tracking. The
solution that was introduced by AFS, the process authentication group,
wasn't accepted by Linus to become part of the kernel.

The token expiration part is trivially solved by creating a cron job
that runs "echo password | clog -pipe codausername".

> And also, is there a PAM module available? If not, how do you solve the 
> problem inherent to having user home directories on Coda (i.e. the user 
> cannot immediately access his/her home directory after logging in)? Having 
> the user first log in, having access to nothing, and then log into coda 
> (second step) is somewhat "ugly" in my opinion.

There are several PAM modules. But since Coda currently still uses
weak-encoding instead of true encryption it isn't recommended to use the
Coda password for anything but Coda authentication.

Here is one from Borbely Zoltan (or is that Zoltan Borbely?), which was
extended by Adrian Pavlykevych, who visited our group last summer.

    ftp://ftp.coda.cs.cmu.edu:/pub/coda/src/pam_coda.tar.gz

> I'm looking to set up an LDAP + Coda style system with everything centralized 
> and dumb "pluggable" servers. With OpenAFS I can use kerberos for 
> authentication, for which there are PAM modules. Is there a similar solution 
> for Coda?

It is possible to authenticate using kerberos, and then use the
authenticator to obtain a Coda token. I'm not sure how to completely
automate that.

We've tried to merge LDAP and Coda, but they are not a good match.
OpenLDAP relies on pthreads which don't work well when combined with
Coda's userspace threads. And LDAP doesn't really have all the
replicated qualities it is supposed to have when the connections are
reused for a long time. i.e. it works fine if you want to set up a new
TCP connection for every LDAP query, but when something fails while
processing multiple queries from several threads the recovery code 
becomes very ugly.

Jan

Coda File System

Re: RVM size + authentication