Coda File System

Re: General Coda Questions

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Fri, 11 Feb 2005 00:28:46 -0500
On Wed, Feb 09, 2005 at 12:55:20PM -0800, redirecting decoy wrote:
> 1) /vice and /vicepX are where the coda server stores it's important
> files correct ?  If so, then the total amount of space available for
> the coda filesystem would rely on the total amount of space available
> on the device /vice and /vicepX are stored.  Would there be any
> drawbacks (performance wise or other) to putting the /vice and /vicepX
> inside of a single loopback file  (using ext2/3, etc) of an arbitrary
> size?

    /vicepa   contents of all files.

    RVM data  information of all volumes on this server,
              contents of all directories and symlinks,
	      metadata (attributes) of all files, directories and symlinks.

    /vice/db  all data that is shared among servers
	      user/group databases, Coda passwords,
	      location and replication information for all volumes,
	      list of all servers and which one the scm is. 

The amount of data in /vice/db is fairly small and will not change
unless new volumes are created, or users/groups are added.

It shouldn't matter much if /vice and /vicepa are on the same device, it
does matter slightly if the RVM log, RVM data and /vicepa are, since any
filesystem update will update will access all three, and if they are on
the same disk it has to seek back and forth when writing the updates to
disk. I haven't actually measured if the extra seek overhead is
noticable though.

> 2) To my understanding, the RVM data file needs to be at least 4% of
> intended total storage in /vicepX. So for say, 100GB of desired
> storage, I would need a 4GB RVM Data file.  Is this correct ? If so,
> does coda support RVM data files of this size ?

No, it used to be a useful ballpark figure, but wasn't very realistic in
most cases. Actual RVM usage depends mostly on the directory sizes and
the number of files and has nothing to do with the file sizes. There is
a tool called 'rvmsizer' which should be installed along with the Coda
server which you can point at a tree of data and it will give you a
precise amount of RVM that would be required to store that file tree.
Then simply extrapolate that number to what you expect to need in the
long term.

> 3) Large Files:  I was recentley attempting to copy a 1.5 GB file into
> coda.  It got to about 85% then it died horribly.  Not sure what went
> wrong, but I think it had something to do with my venus cache, and
> lack of local harddrive space on the client.  

To copy a 1.5GB file in Coda you need to have a venus cache that is at
least larger than 1.5GB. And since the defaults assume an average 16KB
filesize, simply setting the venus cache size to something like 2GB
would try to give you a monster client that probably won't even be able
to start. So you also need to set the cachefiles= option in venus.conf
to a more realistic value (5000 or 10000).

The other caveat is that since we want to allow people to reopen a file
that has not yet been written back to the server, so during disconnected
or weakly connected operation we actually make a temporary shadow copy
of the file after the close and before we send it on to the servers, so
you really need twice the amount of diskspace compared to what you have
configured as local venus cache space. (I guess with such large files we
should really consider creating the shadow copy even during connected
operation since we now block the processes that try to open the file for
a very long time).

The way Coda works is that we only get to see 'open' and 'close' and
none of the read/write/mmap stuff that comes in the middle. So you
really do need to have enough diskspace to store the whole thing in the
local cache.

> a) Can coda work with really large files 1GB and up, without major problems ?

Don't really know, there is definitely a 2GB ceiling, since we only use a
32-bit integer to represent file size in our RPC2 protocol.

> b) If I have a client machine, with 50 megs setup for Venus cache,
> with a total of 200 megs freespace on the drive, and I attempt to
> access a large file 1GB or more, will venus try to copy this file
> locally (stick it in cache), or can it simply read from the server as
> needed (like nfs/samba)? Something like maintaining an active
> connection to the server.

No, we only see 'open', then we (try to) fetch the whole file into the
local cache and pass the open file handle to the kernel, from that point
on it is hands-off until the kernel tells us that the user is done and
has closed the file at which point venus may decide to discard the file
if it needs the space. If your client is set up with a 50MB cache it
will simply refuse to even start fetching the 1GB file. Interestingly
enough because of the course granularity you can _create_ such a file if
there is enough diskspace, since the open/create only created a 0-byte
file, and venus doesn't realize that the file is huge until we see the
close upcall at which point it tries to dump it to the servers as
quickly as possible and refuses to ever refetch it.

> 4) Given hardware availability, what is the maximum amount of storage
> space can coda handle?  Is there a difference if I had 100gb of small
> files instead of 100 1GB files as far as RVM data, or anything else is
> concerned?

Unknown, based on that 4% rule it was assumed that we would max out
addressable memory for RVM on a 32-bit machine (2-3GB) when we hit
around 50GB. But one of our servers already stores more than 35GB, but
is only using about 165MB of RVM. Going from there I would put the
maximum capacity for that particular file set around 2.8TB, but we
probably would hit a lot of other problems before we get there.

Really it is the number of files which is the problem, which is where
rvmsizer is useful.

> 5) If I were running httpd services, can I rely on coda to act as my
> shared /www directory across all my (http)servers ?  Given that there
> are minimal write operations, and the web services are mostly read
> only.   I imagine some sort of write-caching system can be
> implemented, where 1 server writes to the /www dir when necessary (to
> prevent conflicts), but my concern is in the connection state of coda.
> I often find that my /coda volumes become disconnected often for no
> obvious reason.  Easy enough to get back online where the most drastic
> case would involve stopping and re-init venus, stop venus again, then
> restart normally.   If this is normal behavior, how can I expect coda
> to reliably serve out web-pages ?

I don't know why you have connection issues, especially now that the
timeouts have been relaxed to about 60 seconds. But if the caches on the
clients are large enough you could look into using 'hoard'. That is a
way of telling the client which files are important and it will
periodically (about once every 10 minutes) make sure that those files
are still cached and up to date.

> 6) Server Replication:   Using my test number, let's say I want 100GB
> total of coda storage space. If I want server replication, I assume
> that I would need 100GB + (total rvm data/log size) of space available
> on each of my coda servers.  First of all, is this correct or does
> coda do some sort of wierd parity deal like raid-5 does?    Also, I
> noticed that if I unplug one or more(out of 3) of my coda servers, my
> clients have trouble connecting, most of them eventually do reconnect
> to the existing server, but not without a long (sometimes several
> minute) delay.   Is this expected behavior ?  Can I change this ?

Nope, each replica is a full copy. You could possibly lose every server
but one and still recover by bringing up empty servers and recreating
empty volume replicas, the clients will detect the differences and
trigger resolution which repopulates the empty servers.

The several minute delay is exactly there because of complaints of
unexpected disconnections. I think there is something fishy in your
configuration because if you unplug 1 server there should be a single 60
second delay until the client decides that that server is gone and it
continues with the remaining servers. If the delay is longer than that
there are possibly connections to other servers timing out as well.

> 7) Volumes:  I'm guessing all the volumes/volume data are stored in
> /vicepX? If not, where is it? If so, then is there a way to
> distinguish what data belongs to what volume?  I only ask so I can
> explore other back-up options other than what is provided with coda.

All in RVM, there really isn't an easy way to figure out which files in
/vicepa belong to which volume.

> a) What is the maximum amount of Volumes I am allowed to have ?

I think that was about 1024 per server.

> b) How much data can I store in each volume,  is there a limit ?

I don't think there is a hard limit, except that at some point the vnode
lookup table becomes unmanageable, probably somewhere between 100K and
200K files. I have some patches that change the lookup table into a hash
table which allow for something like a million files in a single volume,
but they at least break backups/volume cloning and possible other things
and change the layout of RVM enough that it would require a complete
server rebuild. i.e. a bit too experimental at the moment to be merged
into the main tree.

> 8) Assuming all of my actual coda data is stored in /vicepX; 
> a) If I lost my RVM Data/Log files somehow (damn gremlins), can I
> rebuild them without losing my data ?

No, the really important stuff is in RVM. But if you have replicated
servers you could lose everything on one server and rebuild without
losing data (as long as the replicas were consistent).

> b) Can I copy the /vicepX directory to another machine, and build
> another completley different coda cell and retain all the data from
> the original cell ?

If you dump and restore the RVM data along with the /vicepX contents.
On the old server,
    norton-reinit -rvm <log> <data> <datasize> -dump rvmcontents

On the new server,
    Create empty rvm log and data. (vice-setup-rvm)
    norton-reinit -rvm <log> <data> <datasize> -load rvmcontents

Jan
Received on 2005-02-11 00:29:54