Coda File System

From: Jan Harkes <jaharkes_at_cs.cmu.edu> Date: Mon, 23 Jun 2003 14:50:28 -0400

On Mon, Jun 23, 2003 at 01:06:35PM +0100, lou wrote:
> In some email I received from Ivan Popov <pin_at_math.chalmers.se> on Mon, 23 Jun 2003 13:58:52 +0200 (MEST), wrote:
> > For the moment you still cannot grow or shrink volume groups,
> > so as soon as you want to change replication of a volume. you have to
> > recreate it.
> Yeah, 
> shame, it would be quite useful :)
> 
> > Jan is working on removing that limitation but I would not suggest holding
> > your breath.
> 
> I'll try and see what I can find for myself... dig dig dig
...
> Jan, can I get a few pointers about this ? like what exactly limits it
> (few words) Since it's really urgent I'd try to find a solution rather
> than hacking a shell script to do the job.. 

Most of the redudancy that prevented us from doing this has been
removed. I redid a large part of the volume handling on clients a while
ago. And the final server bit dropped in to place right before 6.0 was
released, servers internally no longer use the VSG numbers.

The only missing bit is a way to force clients to refetch the new
replicated volume information when the replication group is changed. Now
I know that this information is refetched when a client is restarted.
And it might also get updated after a disconnection, but I'm not really
sure about that one.

In any case, the steps would be something like,

I have a replicate volume FOO, which is currently replicated across 2
servers, server1 holds the replica FOO.0, and server2 is responsible for
FOO.1. Assume we have a new server, server3 that is being set up as a
new additional replica.

First of all we need to make sure that all servers agree who is who. So
the new server must have been given a unique number in /vice/db/hosts.
We then restart server1 and server2 to make sure they know about the new
machine, and we bring up server3. This is basically a problem any time
we try to bring a new server into the group, hopefully this can be made
easier over time as well.

(from here on, all commands should be run on the SCM)
Then we can create a new underlying replica FOO.2 on server3, but first
we need to know the replicated volume-id of FOO, this can be found by
either spitting through the output from volutil -h server1 info FOO.0,
or by grepping /vice/db/VRList for FOO. The replicated-id starts with
'7F'. Let's say the replicated volume id is 7F000000 and we want to
create the new volume on the /vicepa partition,

    # volutil create_rep -h server3 create_rep /vicepa FOO.2 7F000000
    # bldvldb.sh server3

Volutil create_rep probably spits out the volume-id of the newly created
replica, we probably want to write this down somewhere as it will be
useful later on. The bldvldb.sh script will rebuild the Volume Location
DataBase (VLDB) which all servers use to discover who is responsible for
the new volume. This information is used during resolution and when
responding to ViceGetVolumeInfo queries from clients.

So now we have 3 underlying replicas, but although servers (and clients)
know about the existance of the new replica, they don't know that it
really is part of the replicated volume named 'FOO'. This information is
stored in the Volume Replication DataBase (VRDB), and we haven't done
anything to update that.

So we get /vice/db/VRList in our favorite editor, and go to the line
that describes the replication group for volume FOO. The anatomy of
these lines is pretty simple,

volume replicated-id #replicas replica-1 replica-2 ... replica-8 VSG-id

So we would probably have something like,

    FOO 7F000000 2 1000000 2000000 0 0 0 0 0 0 E0000101

And we have to change this into,

    FOO 7F000000 3 1000000 2000000 xxxxxx 0 0 0 0 0 E0000101

(xxxxxx) is the volume-id of the new replica that we wrote down earlier.
(also not that I replaced one of the '0' entries with the new volume id,
so we went from 6 zeros to 5 zeros). Once we saved the new VRList, we
have to tell the SCM to create a new VRDB out of this file,

    # volutil makevrdb /vice/db/VRList

That should be pretty quick and the update daemons will automatically
propagate the new VRDB file to all other servers and they should all
show a 'New Data Base received' in their logfiles.

The final step is to force clients to see the new information. As I said
earlier, this should really be the only missing link, but I know for a
fact that volume information is automatically refetched after a restart.
When you restart a client and access the volume, codacon should show a
lot of Resolve related messages as the data is forced to the new server,
a simple 'cfs strong ; ls -lR' should trigger the necessary resolution
to force all data in the expanded volume to the new server.

Jan

Coda File System

Re: volumes ?