This happens when you have a resolution log that is full. In the SrvLog you will usually be able to see which volume is affected, take down it's volume id (you may need to consult /vice/vol/VRList on the SCM to do this. Kill the dead (zombied) server, and restart it. The moment it is up you do:
# filcon isolate -s this_server We need to prevent clients from overwriting the log again # volutil setlogparms volid reson 4 logsize 16384 # filcon clear -s this_serverUnless you do "huge" things 16k will be plenty.
If this happens you have several options. If the server has crashed during salvaging it will not come up by trying again, you must either repair the damaged volume or not attach that volume.
Not attaching the volume is done as follows. Find the volume id of the damaged volume in the SrvLog . Create a file named /vice/vol/skipsalvage with the lines:
1 0xdd000123
You can also try to repair the volume with norton . Norton is invoked as:
norton [LOG] [DATA] [DATA-SIZE]
These parameters can be found in /vice/srv.conf. See norton (8) for detailed information about norton 's operation. Built-in help is also available while running norton .
|
Tuesday I have lost my email folder - the whole volume moose:braam.mail was corrupted on server moose and it wouldn't salvage. Here is how I got it back.
First I tried mounting moose:braam.mail.0.backup but this was corrupted too.
On the SCM in /vice/vol/VRList I found the replicated volume number 7f000427 and the volume number ce000011 (fictious) for the volume.
I logged in as root to bison , our backup controller. I read the backuplog for Tuesday morning in /vice/backuplogs/backuplog.DATE and saw that the incremental dump for August 31st had been fine. At the end of that log, I saw the name 7f000427.ce000011 listed as dumped under /backup (a mere symlink) and /backup2 as spool directory with the actual file. The backup log almost shows how to move the tape to the correct place and invoke restore:
root@bison#
cd /backup2
root@bison#
mt -f /dev/nst0 rewind
root@bison#
restore -b 500 -f /dev/nst0 -s 3 -i
Value after
-s
depends upon which
/backup
[123]
volume we pick to restore backup.
restore
>
cd 31Aug1998
restore
>
add moose.coda.cs.cmu.edu-7f000427.ce000011
restore
>
extract
Specify volume #:
1
In /vice/db/dumplist I saw that the last full backup had been on Friday Aug28. I went to the machine room and inserted that tape (recent tapes are above bison ). This time 7f000427.ce000011 was a 200MB file (the last full dump) in /backup3 . I extract the file as above.
Then I merged the two dumps:
root@bison# merge /restore/peter.mail /backup2/28Aug1998/*7f000427.ce000011 \ > /backup3/31Aug1998/*7f000427.ce000011
This took a minute or two to create /restore/peter.mail. Now all that was needed was to upload that to a volume:
root@bison# volutil -h moose restore /restore/peter.mail /vicepa moose:braam.mail.restored
Back to the SCM, to update the volume databases:
root@SCM# bldvldb.sh moose
Now I could mount the restored volume:
root@SCM# cfs mkm restored-mail moose:braam.mail.restoredand copy it into a read write volume using cpio or tar.
When trying to create volumes, and createvol_rep reports RPC2_NOBINDING , it is an indication that the server is not (yet) accepting connections.
It is useful to look at /vice/srv/SrvLog , the server performs the equivalent of fsck on startup, which might take some time. Only when the server logs Fileserver Started in SrvLog , it starts accepting incoming connections.
Another reason could be that an old server is still around, blocking the new server from accessing the network ports.
Some process has the UDP port open which rpc2portmap or auth2 is trying to obtain. In most cases this is an already running copy of rpc2portmap or auth2 . Kill all running copies of the program in question and restart them.
Servers can crash when they are given inconsistent or bad data-files. You should check whether updateclnt and updatesrv are both running on the SCM and the machine that has crashed. You can kill and restart them. Then restart codasrv and it should come up.