(Illustration by Gaich Muramatsu)
On Mon, Mar 26, 2001 at 01:35:35PM -0500, Brad Clements wrote: > I've looked through the archives and docs, can't find a ready answer to this. > > I have 3 servers replicating a variety of volumes. > > One of the servers has lost a hard drive that contained the RVM (both > partitions) and /vicepa Cool. > Fortunately all of the volumes on the failed drive are on the SCM. > > What's the proper procedure to get this server up and running again? We need to know exactly which volume replicas were stored on that server. There should be a /vice/vol/BigVolumeList that contains all the important information (and then some). We need the following information for every volume that used to be on the server. partition replica name replicated groupid volume id /vicepa xx:user.one.N It shouldn't really matter whether we use the same partition or not, we're rebuilding the lost replica anyways. replica name is identical to the replicated volume name + ".<nr>" eg. replicated volume = "vm:u.jaharkes.mail", replica names on the different servers are "vm:u.jaharkes.mail.0" and "vm:u.jaharkes.mail.1". Replicated groupid is simply the replicated volume id (eg. 0x7F0000A8). Volume id is the underlying replica id. This is recorded in volume replication database (VRDB/VRList), so it better be the same one ;) Ok, get /vice/vol/BigVolumeList, it looks like.... P/vicepa Hmahler.coda.cs.cmu.edu T23dbe3 F584b8 P/vicepb Hmahler.coda.cs.cmu.edu T23dbe3 Fd5c18 Wvm:www.root.1 Ic8000001 Hc8 P/vicepa m0 M0 U7cbf Wc8000001 C388376d4 D388376d4 B3abed327 Af66 Wvm:www.public.1 Ic8000002 Hc8 P/vicepa m0 M0 U87a1 Wc8000002 C36b6097c D36b6097 c B3abed2bc Af218c Bvm:www.public.1.backup Ic8000003 Hc8 P/vicepa m0 M0 U8ac6 Wc8000002 C36b687c5 D 36b687c5 B0 A0 It is pretty simple to extract the required info from here. The lines starting with P identify partitions on servers, search down to the server that is gone. Cut and paste everything up until the next "P". Now we should have the info for all volume replicas (and backup volumes). strip out volume replica information only, "grep ^W volumes" Now we have lines like: W<volume replica name> I<replica id> H?? P<partition name> Either by hand or using awk strip pull the info out. Now we're only missing the replicated volume id's. Those are in /vice/db/VRList. "example VRList lines" e:braam.rep2 7F00042E 2 e60000ed e500001e 0 0 0 0 0 0 E0000149 e:braam.tallis 7F0003F6 1 e500001a 0 0 0 0 0 0 0 E0000153 <replicated volume name> <replicated volume id> <nr of volumes> <volumeid's> So we can get the replicated volume id's by doing; "grep -i <volumeid> /vice/db/VRList | cut -d' ' -f2" Hopefully we'll end up with all the necessary information in a nice list similar to the following; /vicepa userone.1 7F000125 e60000de /vicepa usertwo.1 7F000126 e60000df etc. Once we have all this info we're in pretty good shape. Put a working drive in the failed server, and reinstall an empty codaserver (rm -rf /vicepa/* ; vice-setup-rvm) or even (rm -rf /vice ; rm -rf /vicepa/* ; vice-setup) Then bring the servers up, and each of the lines of information we created earlier exactly match the arguments of volutil create_rep partition-path volumeName grpid [rw-volid] which is what we need to do for every lost volume. Once the volumes are created there is just one thing left to do, get the data back onto the newly reinitialized server. cfs strong cd /coda/some/restored/volume ; volmunge -a `pwd` Instead of using volmunge, "ls -lR" will also do the trick, but it will cross volume boundaries. It can take a few hours, but it should work. JanReceived on 2001-03-26 16:55:42