Coda File System

Re: problem reading files from coda when disconnected

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Tue, 1 May 2007 14:41:14 -0400
On Tue, May 01, 2007 at 12:50:29PM -0400, shivers_at_ccs.neu.edu wrote:
> - create a venus setup on an ubuntu x86_64 box with 
>   + 10Gb of cache
>   + big DATA & LOG:
>     % ls -l DATA LOG 
>     -rw------- 1 root root 1201865464 2007-05-01 12:07 DATA
>     -rw------- 1 root root  300468736 2007-05-01 12:09 LOG

Impressive, my caches tend to be up to 500Mb or so. I think you may
possibly be the first one to ever use a 10Gb client cache.

> - hoard it all with
>     hoard add /coda/lambda.csail.mit.edu d+

Do you use a single volume on the server, or are there multiple volumes?

If you run 'hoard list', you'll notice that hoard only gets files
relative to a volume root. So if /user/shivers is a separate volume this
hoard command would only get everything up to and including the
mountpoint information, but nothing from your user volume.

>   This runs for a while, then terminates successfully.

As it runs for a while, I guess everything must be in a single volume.

> - Test it again by saying "cfs disconnect" and then redoing the find
>   tree-walk to read the whole subdir a second time.

First of all, cfs disconnect doesn't actually change any state in the
client, it simply inserts a 'filter' into the rpc2 layer so that all
outgoing and incoming packets get dropped.

>   This runs along fine for a while, with a silent codacon, then codacon
>   suddenly outputs
> 
>     ValidateAttrsPlusSHA CVS(4.7f000000.baf.28f8) [0] ( 11:56:43 )
>     Probe ( 11:57:21 )

The client actually ran up until this point thinking that it was still
connected and it hasn't received any callbacks so it doesn't have to
check anything with the servers.

At some point the client gets suspicious because there hasn't been any
communication with the server for a while, and it triggers a probe. We
send a ping to the server, which then pings back so we get an end-to-end
check of the connectivity. However the outgoing ping packets are dropped
on the floor, so we end up disconnecting.

I am not sure why it would want to revalidate the attributes of that CVS
directory. Something like that would indicate that we either don't have
cached access rights for the user who is trying to access it, or that
the object is considered 'demoted' (i.e. we got a callback or returned
from disconnection).

Since there are no callbacks (incoming packets are dropped) I assume it
is trying to check if the current user has access.

>   and the find tree walk hangs. After a minute or two, codacon says
>     unreachable lambda.csail.mit.edu ( 11:58:10 )
>   and the find walk resumes with the following output:
> 
>     find: ./research/mrlc/mrlc/spim/CVS: Permission denied
...
>     find: ./research/mrlc/mrlc/confpaper: Permission denied

Interesting, when we are disconnected the permission check is extremely
permissive. Even if the cached rights would otherwise be considered
stale we still allow access, and if they are missing we fall back on the
System:AnyUser rights.

I guess your volume is protected by an ACL, so that anonymous users
cannot read those files. And we don't actually remove rights, but only
mark them as stale (removal of cached rights only happens when you
explicitly drop tokens with cunlog).

So this would indicate that we never actually cached rights for those
object, which is strange. Both the earlier connected treewalk didn't
actually walk the complete tree and hoard didn't cache rights. The hoard
not caching rights may very well be possible, I may only try to make
sure the file attributes and data are 'fresh'.

I know GNU find has some optimizations that may make it skip
subdirectories when the directory linkcount is off. You can disable
those by using 'find -noleaf'. Maybe the workaround we try to use isn't
working.

> - Then I go poke around in the file system. I now have trouble accessing
>   the problem directories. For example:
> 
>     % ls -ld research/mrlc/mrlc/spim/CVS
>     drwxr-xr-x 1 shivers nogroup 2048 2007-04-30 16:04 research/mrlc/mrlc/spim/CVS
>     % ls research/mrlc/mrlc/spim/CVS
>     ls: research/mrlc/mrlc/spim/CVS: Permission denied
>     % cfs la research/mrlc/mrlc/spim/CVS
>     research/mrlc/mrlc/spim/CVS: Connection timed out
> 
>    Timed out? Hey, I hoarded the file and *disconnected*. Why is venus
>    even trying to connect at all?

cfs listacl and setacl only work while we are connected. This is because
the client doesn't really know anything about users and groups so it is
not useful to locally cache ACL data. What the client caches is what set
of rights the user that fetched the object had. And if we don't have
cached rights for another user on the system he will have to refetch at
least the attributes.

> - Then I reconnect with
>     cfs reconnect
>   Now I can see the problem directories with no trouble. I redo the find
>   tree-walk a third time and it completes with no problems.
>   + codacon shows *no* server->client file motion. 

Where there one or more 'validateattr' or 'validatevols' calls after
reconnection? The client will use those calls to check the validity of
several objects or a complete volume at a time. On the other hand it is
strange, I would have expected at least a GetAttr for the object that
previously returned permission denied, maybe there is an inverted test
somewhere in the area we check if the current user has access to a file.

Jan
Received on 2007-05-01 14:43:10