Coda File System

From: Jan Harkes <jaharkes_at_cs.cmu.edu> Date: Tue, 15 Mar 2005 16:36:37 -0500

On Mon, Mar 14, 2005 at 10:05:31AM -0300, Gabriel B. wrote:
> > And once codasrv is started it asks if it can create the rootvolume.
> 
> I'm using the .deb package from CM servers. It never asked me about
> root volume. I even opened a thread asking if the docs were outdate
> because of this.

And in that thread I responded,

"Things have changed in the hope to simplify the initial setup."
    http://www.coda.cs.cmu.edu/maillists/codalist/codalist-2005/7198.html

The change happened with the release of Coda-6.0.7,

"createvol_rep doesn't use the VSGDB anymore, instead of specifying a
server group with the mystical E0000XXX numbers, the servers are now
explicitly named; 'createvol_rep <volumename> <server>[/<partition] ...'"
    http://www.coda.cs.cmu.edu/pub/coda/Announcement.6.0.7

> > > pbtool
> > > > nu bossnes
> > > > ng www 1076 (bossnes id)
> > >
> > > createvol_rep / camboinha.servers/vicepa -3
> > > that didn't worked. i waite 2hours and ctrl_Clled.
> > 
> > Where did you get that '-3'?
> 
> from the list command
> GROUP www OWNED BY bossnes
>   *  id: -3
>   *  owner id: 1076
>   *  belongs to no groups
>   *  cps: [ -3 ]
>   *  has members: [ 1076 ]

Right, so createvol_rep interprets that as a server named '-3', and
because it is '-3' we fail to catch it with the following test,

    # Validate the server
    grep $SERVER ${vicedir}/db/servers > /dev/null 
    if [ $? -ne 0 ];  then
	echo Server $SERVER not in servers file
	exit 1
    fi

So we end up running
    volutil -h "$SERVER" getvolumelist "/tmp/vollist.$$"

Which then tries to contact a server named '-3'. Now on my machine it
quickly returns with '-3 is not a valid hostname'.

> > > volutil create_rep /vicepa / 00001
> > > bldvldb.sh
> > > (a valid workaround?)
> > 
> > No it is not, since this only creates the underlying volume replica
> > (which should be named "/.0") And again, where does that strange 00001
> > number come from? The createvol_rep script does this first but then
> > creates the replicated volume by dumping the existing (currently empty)
> > VRDB into the /vice/db/VRList file, appending a entry that describes
> > which replicas are part of the replicated volume and recreates a new
> > VRDB file from the data in the updated /vice/db/VRList.
> 
> hum... is it in binary form or human readable? can you send an example?

Again, where does that strange 00001 come from? That is setting the
replicated volume id to 1, but you can't actually have a replicated
volume with the volume id 1 as replicated volumes are supposed to always
have a volume id that looks like 0x7f0000nn.

The first byte in the 4-byte volume id number is used to map to the
specific server identifier in /vice/db/servers on which the volume
replica is located, and 0x7f (127) is reserved to indicate that this is
a replicated volume that isn't located on any particular single server
but represents a group of individual replicas. We need this, because in
some cases we get just the volumeid and we don't know if it is supposed
to be a replicated volume or some underlying volume replica.

So although the VRList file is human readable and to a certain extend
can be edited by hand. I don't think it will be a wise thing to do so
without really knowing how it is used to glue individual replicas
toghether. There are a lot of constraints of what is considered valid
or not and and a single misplaced character can break the parser that
has to convert it back to the VRDB file.

> > The only 'worst case' that I know of is when we initially contact a
> > realm since we are hit by multiple RPC2 timeouts, one when we try to get
> > the rootvolume name, one when we fall back on getvolumeinfo, at this
> > point the mount fails, but with a colorizing ls we get an additional
> > readlink and getattr on the mountpoint both of which also trigger an
> > attempt to mount the realm (i.e. another 4 timeouts). So we end up
> > blocking for about 6 minutes if the realm is unreachable.
> 
> I just let a "cfs lv /coda/camboinha.servers" running friday. It's
> monday and i had to control+c it.
> The server is still running tought.

On the client run,

    strace -e trace=network -p `pidof venus` (optionally add "-o strace.dump")

This should show all the network related stuff that the client is doing.
Here is what I get when I run 'cfs lv /coda/coda.cs.cmu.edu'

    # strace -e trace=network -p `pidof venus`  
    Process 17369 attached - interrupt to quit
    sendto(8, ..., 92, 0, {... sin_port=htons(2432), sin_addr=inet_addr("128.2.222.111")}, 16) = 92
    sendto(8, ..., 92, 0, {... sin_port=htons(2432), sin_addr=inet_addr("128.2.209.199")}, 16) = 92
    sendto(8, ..., 92, 0, {... sin_port=htons(2432), sin_addr=inet_addr("128.2.191.192")}, 16) = 92
    recvfrom(8, .., 4360, 0, {... sin_port=htons(2432), sin_addr=inet_addr("128.2.222.111")}, [16]) = 156
    recvfrom(8, ., 4360, 0, {... sin_port=htons(2432), sin_addr=inet_addr("128.2.209.199")}, [16]) = 156
    recvfrom(8, ..., 4360, 0, {... sin_port=htons(2432), sin_addr=inet_addr("128.2.191.192")}, [16]) = 156
    Process 17369 detached

Now coda.cs.cmu.edu is mapped by an entry in /etc/coda/realms to a group
of 3 servers, so we're sending the request to all three servers, and
then get three replies back. The Coda client decides which reply to
actually use.

If your server isn't responding you would only see sendto's are
exponentially increasing intervals for about 60 seconds, at which point
the RPC2 layer gives up. This then percolates up and we end up returning
ETIMEDOUT to userspace.

> now, i did a cunlog and a clog, and "time ls -la /coda/"
> <ctrl+c>
> 
> real    14m44.403s
> user    0m0.001s
> sys     0m0.003s

You could try to use /bin/ls, which shouldn't be colorizing and as such
doesn't try to stat every entry in coda, readlink all unmounted
mountpoints, and then stat every link destination.

Also, is there anything in the venus.log file which could indicate that
we've already ran out of worker threads (message looks something like),

    DispatchWorker: out of workers (max 20), queueing message

> > > i then created two more volumes. now venus report  "2 volume replicas"
> > 
> > Did you mount those volumes then? How would venus know about the newly
> > created volumes? Those 2 replicas are probably the one that is at /coda
> > and the one at /coda/camboinha.servers.
> 
> It's show in the venus startup. And i'm starting it with -init every
> time. cfs hangs as well, so i will never know wich volumes it claim to
> have found.

Those volumes are 'CodaRoot_at_localhost', the volume/directory that
is mounted at /coda). And 'Repair_at_localhost', the volume that is used
during local/global repair. Both are internal volumes that always exist.
Both volumes are in the 'localhost' realm, which is an invalid name
since localhost represents 'this machine' and as such is not usable in a
Coda's global volume naming scheme.

> Did someone have success using the .deb version? I tried it with 3
> sets of server/clients. each more troublesome than the other.

Not counting 'testserver.coda.cs.cmu.edu' and the 6 servers responsible
for 'coda.cs.cmu.edu' and the 2 servers for 'isr.coda', all running the
server packages on debian-testing?

I am using both client and server debian packages on a machine at home,
my laptop and my desktop at work (debian-unstable) although I tend to
alternate with recent CVS builds. Also one of the students in our group
is using the Coda client debian packages on something like 12-15 laptops
to move data for his experiments.

Now a lot of things depend on whether you have a traditional /dev, devfs
or udev, if your kernel is 2.4 or 2.6, if your machine has a static or
dynamic IP address, if the network connection is permanent or
intermittend, if you have a multi-homed machine and how exactly the
multi-homing is set up, since there are about 3-4 different variations
on that theme, if there are (possibly masquerading) firewalls in your
network, and much more.

There are literally thousands of combinations that might make or break a
seemingly simply setup. Coda servers by far the most sensitive, since
they are expected to be reachable through a single static ip-address and
that servers have reliable fairly fast connections to each other (i.e.
located in the same machine room) and there are some assumptions like
'gethostbyname(hostname())' returning a usable IP address that we can
pass to a client instead of 127.0.0.1.

A Coda client is a lot less picky, as it is assumed to be mobile and as
such can hop from one network to another and possibly has unreliable
connections. My laptop switches quite a bit between various wired and
wireless networks as well as running the Coda traffic through an openvpn
tunnel. I used to use dialup almost daily, but nowadays it is a cable
modem connection. But still, I've configured everything so that it never
tries to route to the Coda servers over multiple networks at the same
time as the servers wouldn't know where to send the replies to.

Jan

Coda File System

Re: starting over