(Illustration by Gaich Muramatsu)
On Sat, Feb 28, 2004 at 08:50:04PM -0500, Jan Harkes wrote: > I'll detail the experimental way in another message when I have a bit of > time. But it is worth a wait, because it might just be what you're > looking for. Allright, got some time now. The following trick is something that is only possible with 6.0 or more recent clients and servers. The problem with earlier versions of the server is that they internally still rely on the VSG numbers to get the servers that are hosting replicas of a volume. This was at some point replaced by a lookup in the VRDB (volume replication database). There are some additional issues. Resolution logs are turned off for singly replicated volumes, and turning it back on is often problematic. Also the clients have to refetch volume information and recognize that there are suddenly more replicas available. Because there are several issues, and it is relatively recent change there are still a lot of places that are not well tested and you could actually lose the existing replica. So having an off-coda copy of your data is definitely very strongly recommended. In any case, if this works it should save a lot of time, if it doesn't the 'recovery procedure' (destroying the old volume, creating a new replica and copying all the data from the backup) is what you would have to do already. Lets look at how a replicated volume works. There is a two level lookup, first we get the 'replicated volume' information, which returns a list of 'volume replicas'. I guess we could call the higher level replicated volumes 'logical volume' because in reality we only store a logical mmapping in a server. The volume replicas are 'physical volumes', they actually have some tangible presence on a server. The createvol_rep script first creates all the physical volume replicas on all servers, and finally adds the logical replicated volume information to the VRList file (which forms the basis for the VRDB). Now pretty much everything from here on would be done on the SCM. So we need some of the information which can be found in the /vice/db/VRList file. * volume name * replicated volume id * volume replica id For the singly replicated volume a line in the VRList would look like, <volumename> <replicated id> 1 <volumeid> 0 0 0 0 0 0 0 E0000xxx The first difficulty is that when we created the singly-replicated volume, we turned off resolution logging for the underlying volume replica as it really isn't useful. Also because there is no resolution with other servers, there is no way to truncate the log if it grows too much. So we need to turn resolution back on, this is something that can very easily go wrong, and might actually cause later problems because all directories normally have at least a single NULL resolution log entry, but without resolution logging those do not exist and there might still be some place left over that expect at least one entry. I just tried it on a test server which is running the current CVS code and it worked fine. volutil <replicaid> setlogparms reson 4 logsize 8192 * 'reson 4' - turn resolution on * 'logsize 8192' - allow for 8192 resolution log entries Then we can create the additional volume replica on the new server. volutil -h <newserver> create_rep /vicepa <volumename>.1 <replicated id> * creates a replicated volume on the /vicepa partition. * volumenames for volume replicas are typically 'name'.number, there is no real reason for this except to keep them separate in the namespace. Interestingly we pretty much only perform lookups on the volume replica id, and not the name except when we try to mount a replicated volume by name. * resolution logging is already enabled for new volume replicas. When the command completes it should have picked a new 'volume replica id' for the newly created volume and dumped it to stdout. We can then take this number and add it into the VRList record for the replicated volume. We leave most of the line the same, <volumename> <replicated id> 2 <volumeid> <new volid> 0 0 0 0 0 0 E0000xxx ^ bump the count from 1 to 2 ^ replace the first 0 with the new volumeid Basically we're done, except for the fact that no client or server actually knows that the new volume replica exists or that it is part of the replicated volume. For that we have to tell the server to update the VLDB and VRDB databases. bldvldb.sh <newserver> * build volume location database. This gets the current list of volumes replicas from <newserver> and builds a new volume location database. volutil makevrdb /vice/db/VRList * builds a new VRDB file. Once these databases are propagated by the updatesrv/updateclnt daemons all servers are now aware of the replicated volumes. Clients will not realize it until they refresh the locally cached volume information. For the root volume this normally happens only when the client is restarted. For all other volumes, you can alternatively use 'cfs checkvolume' which invalidates the mountpoints so they will be looked up again. There really should be some sort of callback to automate this. Another gotcha here, before release 6.0.3, clients would segfault when the replication changed. Finally when all of this is done, a simple recursive ls (ls -lR) will trigger resolution and the new replica will get populated with all the files from the original. JanReceived on 2004-02-29 23:38:07