Coda File System

Re: coda server crash

From: Lionix <wayne-cci_at_noos.fr>
Date: Sun, 18 May 2003 15:53:35 +0200
Hi,

At this level of technical rvm problem i would not be of any help i 
think.....
Anyway I gonna write a response until Jan is back ! :-)
The volume is 0x1000001 so i assume you're dealing with your root-volume...

During my test on coda-fs ( not finished at all ), at a time i losed my 
SCM :  It was complaining about rvm : "No RVM type selected"....  I 
tried to get it back using multiple ways : rvmutl, rdsinit.. during some 
couple of hours without success.... That was quit interesting because i 
discovered there was a lot of things i did not had understood and tried !

- As i did not configure the root-volume replicated....
- As i did not had installed coda-backup to get a root-volume backup...

The last solution for me was to get my data ( not too much at all ) out 
from the data servers with venus and to re-install all the servers.... :-P

As you are trying to get root-volume back too, if you don't have 
replication or a backup , you're on a bad run.... Let's hope Jan or 
someone else have a magical high technical solution !

This rise up a paradoxe i noticed :

- I read on some docs that write-replicate the root-volume was dangerous 
because a conflict on this volume would freeze the whole cluster.... ( 
got to try  this too to see....) !
I suppose that this is out of date because I read a mail from Jan  that 
tell us that the root volume is 3-replicated at cms...

So i choose the " root-volume security policy " :
- If you don't have the root-volume replicated you must have a backup up 
scheduled... ( in case trouble with the SCM)
OR
- Full-Replicate the root volume and do less operation as possible on it 
: no user's operation !!!
 only coda configuration stuff !!!!
 ( you can't put administrator's work in an Admin volume  too !!!!!  )

Definitly maximum avoid operation on root-volume is a good idea in 
production environment i think...

Hope this considerations would help for the future....
Hope somebody could help more...
Hope i did not write to many stupid things....

Kaufmann Lionel
Coda newbee Admin

Steve Simitzis wrote:

>well, now it's getting worse and worse, and i fear i may have lost
>everything. i just tried suggestions from this thread:
>
>http://www.coda.cs.cmu.edu/maillists/codalist/codalist-2003/4842.html
>
>and i tried restarting it by creating a file called /vice/vol/skipsalvage.
>now i'm getting this:
>
>[root_at_db srv]# more SrvLog
>02:21:11 New SrvLog started at Sat May 17 02:21:11 2003
>
>02:21:11 Resource limit on data size are set to -1
>
>02:21:11 RvmType is Rvm
>02:21:11 Main process doing a LWP_Init()
>02:21:11 Main thread just did a RVM_SET_THREAD_DATA
>
>02:21:11 Setting Rvm Truncate threshhold to 5.
>
>Error rvm_load_segment returns 210
>02:21:11 rds_load_heap error RVM_EOFFSET
>
>and this:
>
>[root_at_db srv]# more SrvErr
>Assertion failed: err == RVM_SUCCESS, file "/usr/src/redhat/BUILD/coda-5.3.20/co
>da-src/vice/srv.cc", line 1807
>EXITING! Bye!
>
>
>i really don't know what to do at this point. it seems to be about
>as broken as it can possibly get.
>
>On 05/17/03, Steve Simitzis <steve_at_saturn5.com> wrote: 
>
>  
>
>>On 05/16/03, Jan Harkes <jaharkes_at_cs.cmu.edu> wrote: 
>>
>>    
>>
>>>    volutil setlogparms <volume-replica-id> reson 4 logsize 16384
>>>
>>>reson 4       - log based resolution is enabled
>>>logsize 16384 - doubles the number of resolution log entries, I don't
>>>		 remember ever overflowing after growing it to this.
>>>
>>>      
>>>
>>well, i had resolution turned off previously, and then i ran the above
>>command.
>>
>>now codasrv is dead with no hope of restarting. here is what i get
>>in the logs:
>>
>>141 /vice/srv: db> tail SrvLog
>>01:45:19 Salvaging file system partition /vicepa
>>01:45:19 Force salvage of all volumes on this partition
>>01:45:19 Scanning inodes in directory /vicepa...
>>01:45:22 Entering DCC(0x1000001)
>>01:45:22 DCC: Salvaging Logs for volume 0x1000001
>>
>>01:45:22 done:  738 files/dirs, 1536 blocks
>>01:45:24 Entering DCC(0x1000002)
>>01:46:10 MarkLogEntries: loglist was NULL ... Not good
>>
>>142 /vice/srv: db> tail SrvErr
>>Assertion failed: 0, file "/usr/src/redhat/BUILD/coda-5.3.20/coda-src/volutil/vol-salvage.cc", line 851
>>EXITING! Bye!
>>
>>everytime i restart codasrv, it dies in the same way. i tried to reinit
>>the log using rvmutl (i hope i was doing what i think i was doing) out
>>of desperation, but it does not come back up, each time with the same
>>error. ack!!
>>
>>-- 
>>
>>steve simitzis : /sim' - i - jees/
>>          pala : saturn5 productions
>> www.steve.org : 415.282.9979
>>  hath the daemon spawn no fire?
>>    
>>
Received on 2003-05-18 10:01:35