(Illustration by Gaich Muramatsu)
On Mon, Mar 14, 2005 at 10:05:31AM -0300, Gabriel B. wrote: > > And once codasrv is started it asks if it can create the rootvolume. > > I'm using the .deb package from CM servers. It never asked me about > root volume. I even opened a thread asking if the docs were outdate > because of this. And in that thread I responded, "Things have changed in the hope to simplify the initial setup." http://www.coda.cs.cmu.edu/maillists/codalist/codalist-2005/7198.html The change happened with the release of Coda-6.0.7, "createvol_rep doesn't use the VSGDB anymore, instead of specifying a server group with the mystical E0000XXX numbers, the servers are now explicitly named; 'createvol_rep <volumename> <server>[/<partition] ...'" http://www.coda.cs.cmu.edu/pub/coda/Announcement.6.0.7 > > > pbtool > > > > nu bossnes > > > > ng www 1076 (bossnes id) > > > > > > createvol_rep / camboinha.servers/vicepa -3 > > > that didn't worked. i waite 2hours and ctrl_Clled. > > > > Where did you get that '-3'? > > from the list command > GROUP www OWNED BY bossnes > * id: -3 > * owner id: 1076 > * belongs to no groups > * cps: [ -3 ] > * has members: [ 1076 ] Right, so createvol_rep interprets that as a server named '-3', and because it is '-3' we fail to catch it with the following test, # Validate the server grep $SERVER ${vicedir}/db/servers > /dev/null if [ $? -ne 0 ]; then echo Server $SERVER not in servers file exit 1 fi So we end up running volutil -h "$SERVER" getvolumelist "/tmp/vollist.$$" Which then tries to contact a server named '-3'. Now on my machine it quickly returns with '-3 is not a valid hostname'. > > > volutil create_rep /vicepa / 00001 > > > bldvldb.sh > > > (a valid workaround?) > > > > No it is not, since this only creates the underlying volume replica > > (which should be named "/.0") And again, where does that strange 00001 > > number come from? The createvol_rep script does this first but then > > creates the replicated volume by dumping the existing (currently empty) > > VRDB into the /vice/db/VRList file, appending a entry that describes > > which replicas are part of the replicated volume and recreates a new > > VRDB file from the data in the updated /vice/db/VRList. > > hum... is it in binary form or human readable? can you send an example? Again, where does that strange 00001 come from? That is setting the replicated volume id to 1, but you can't actually have a replicated volume with the volume id 1 as replicated volumes are supposed to always have a volume id that looks like 0x7f0000nn. The first byte in the 4-byte volume id number is used to map to the specific server identifier in /vice/db/servers on which the volume replica is located, and 0x7f (127) is reserved to indicate that this is a replicated volume that isn't located on any particular single server but represents a group of individual replicas. We need this, because in some cases we get just the volumeid and we don't know if it is supposed to be a replicated volume or some underlying volume replica. So although the VRList file is human readable and to a certain extend can be edited by hand. I don't think it will be a wise thing to do so without really knowing how it is used to glue individual replicas toghether. There are a lot of constraints of what is considered valid or not and and a single misplaced character can break the parser that has to convert it back to the VRDB file. > > The only 'worst case' that I know of is when we initially contact a > > realm since we are hit by multiple RPC2 timeouts, one when we try to get > > the rootvolume name, one when we fall back on getvolumeinfo, at this > > point the mount fails, but with a colorizing ls we get an additional > > readlink and getattr on the mountpoint both of which also trigger an > > attempt to mount the realm (i.e. another 4 timeouts). So we end up > > blocking for about 6 minutes if the realm is unreachable. > > I just let a "cfs lv /coda/camboinha.servers" running friday. It's > monday and i had to control+c it. > The server is still running tought. On the client run, strace -e trace=network -p `pidof venus` (optionally add "-o strace.dump") This should show all the network related stuff that the client is doing. Here is what I get when I run 'cfs lv /coda/coda.cs.cmu.edu' # strace -e trace=network -p `pidof venus` Process 17369 attached - interrupt to quit sendto(8, ..., 92, 0, {... sin_port=htons(2432), sin_addr=inet_addr("128.2.222.111")}, 16) = 92 sendto(8, ..., 92, 0, {... sin_port=htons(2432), sin_addr=inet_addr("128.2.209.199")}, 16) = 92 sendto(8, ..., 92, 0, {... sin_port=htons(2432), sin_addr=inet_addr("128.2.191.192")}, 16) = 92 recvfrom(8, .., 4360, 0, {... sin_port=htons(2432), sin_addr=inet_addr("128.2.222.111")}, [16]) = 156 recvfrom(8, ., 4360, 0, {... sin_port=htons(2432), sin_addr=inet_addr("128.2.209.199")}, [16]) = 156 recvfrom(8, ..., 4360, 0, {... sin_port=htons(2432), sin_addr=inet_addr("128.2.191.192")}, [16]) = 156 Process 17369 detached Now coda.cs.cmu.edu is mapped by an entry in /etc/coda/realms to a group of 3 servers, so we're sending the request to all three servers, and then get three replies back. The Coda client decides which reply to actually use. If your server isn't responding you would only see sendto's are exponentially increasing intervals for about 60 seconds, at which point the RPC2 layer gives up. This then percolates up and we end up returning ETIMEDOUT to userspace. > now, i did a cunlog and a clog, and "time ls -la /coda/" > <ctrl+c> > > real 14m44.403s > user 0m0.001s > sys 0m0.003s You could try to use /bin/ls, which shouldn't be colorizing and as such doesn't try to stat every entry in coda, readlink all unmounted mountpoints, and then stat every link destination. Also, is there anything in the venus.log file which could indicate that we've already ran out of worker threads (message looks something like), DispatchWorker: out of workers (max 20), queueing message > > > i then created two more volumes. now venus report "2 volume replicas" > > > > Did you mount those volumes then? How would venus know about the newly > > created volumes? Those 2 replicas are probably the one that is at /coda > > and the one at /coda/camboinha.servers. > > It's show in the venus startup. And i'm starting it with -init every > time. cfs hangs as well, so i will never know wich volumes it claim to > have found. Those volumes are 'CodaRoot_at_localhost', the volume/directory that is mounted at /coda). And 'Repair_at_localhost', the volume that is used during local/global repair. Both are internal volumes that always exist. Both volumes are in the 'localhost' realm, which is an invalid name since localhost represents 'this machine' and as such is not usable in a Coda's global volume naming scheme. > Did someone have success using the .deb version? I tried it with 3 > sets of server/clients. each more troublesome than the other. Not counting 'testserver.coda.cs.cmu.edu' and the 6 servers responsible for 'coda.cs.cmu.edu' and the 2 servers for 'isr.coda', all running the server packages on debian-testing? I am using both client and server debian packages on a machine at home, my laptop and my desktop at work (debian-unstable) although I tend to alternate with recent CVS builds. Also one of the students in our group is using the Coda client debian packages on something like 12-15 laptops to move data for his experiments. Now a lot of things depend on whether you have a traditional /dev, devfs or udev, if your kernel is 2.4 or 2.6, if your machine has a static or dynamic IP address, if the network connection is permanent or intermittend, if you have a multi-homed machine and how exactly the multi-homing is set up, since there are about 3-4 different variations on that theme, if there are (possibly masquerading) firewalls in your network, and much more. There are literally thousands of combinations that might make or break a seemingly simply setup. Coda servers by far the most sensitive, since they are expected to be reachable through a single static ip-address and that servers have reliable fairly fast connections to each other (i.e. located in the same machine room) and there are some assumptions like 'gethostbyname(hostname())' returning a usable IP address that we can pass to a client instead of 127.0.0.1. A Coda client is a lot less picky, as it is assumed to be mobile and as such can hop from one network to another and possibly has unreliable connections. My laptop switches quite a bit between various wired and wireless networks as well as running the Coda traffic through an openvpn tunnel. I used to use dialup almost daily, but nowadays it is a cable modem connection. But still, I've configured everything so that it never tries to route to the Coda servers over multiple networks at the same time as the servers wouldn't know where to send the replies to. JanReceived on 2005-03-15 16:38:06