(Illustration by Gaich Muramatsu)
On Fri, Dec 05, 2003 at 03:30:05AM +0100, Lionix wrote: > I'm currently using rsync to transfer data from nfs to coda.... > Browsing the ML I know that it's not the better idea : > http://www.coda.cs.cmu.edu/maillists/codalist/codalist-2003/5122.html Well as I said in that email... >>>>> If everyone sends more email to codalist, it will cause more >>>>> mailarchive rebuilds and I get more of these conflicts. Perhaps >>>>> that will push this higher up in the list of things that really >>>>> need to be looked at. But in fact things have been improving steadily. I haven't really had many problems lately. > Change date of codaproc2.cc was 9 months ago so it's in 6.0.x builds. > And I can't consider spamming the ML as a "politicaly correct" solution. > But I recognize i'm writing a lot last weeks... :o) > > So I started playing on cache size parameter, having a look at the load > average.... > Resolving problems as I could... practicing and trying to improve... > > For the first time I get a server side problem during rsync. > > ====SrvErr==== > No waiters, dropped incoming sftp packet > No waiters, dropped incoming sftp packet > [....] > No waiters, dropped incoming sftp packet > Assertion failed: l, file "recov_vollog.cc", line 309 > EXITING! Bye! > ========== > > OK ! Some troubles in recovering volume log.... > Let me bet and correct me if wrong... > Something like this function return a pointer, to the volumelog, after > he tried to grow it ( grow index ? ) because he wants to insert a new > transaction log ? That sure looks like it, the server logs an entry for every operation. This entry is removed as a result of either one of two things. One, the client sends a 'COP2' message indicating success at all replicas of the volume. Or we just successfully triggered server-server resolution on the object related to the logged operation. The log has a fixed length, although it can be made larger with 'volutil setlogparms', that typically won't help unless the real problem is fixed. Now one question is, is this a replicated volume, or only a single replica? Because singly replicated volumes are never resolved, we disable resolution logging, but perhaps your server got into the disabled logging code anyways. > ======SrvErr > Assertion failed: size == s, file "recov_vollog.cc", line 386 > EXITING! Bye! > > Uhu.... SalvageLog function... don't understand too much but I see > something browsing volume log to track if space could be freed othevise > tryied to increase Logsize no ? There are more (or fewer) logrecords than expected, I'm not entirely sure what it going on here. > 00:43:14 SalvageIndex: Vnode 0x2ae has no inodeNumber > 00:43:14 SalvageIndex: Creating an empty object for it Hmm, the server was asked to create files, but the client never actually stored data into them. Possibly not-yet reintegrated. > 00:43:14 Entering DCC(0x1000013) > 00:43:22 DCC: Salvaging Logs for volume 0x1000013 > > Reading this it seems I wasn't syncing only one volume.... > > An other restart failed with the same SrvErr message.... And same > feather for SrvLog > Should I continue untill he succeed starting :- ? You could create an entry for this volume in /vice/srv/skipsalvage, or was it /vice/vol/skipsalvage... Any case the content would be 1 0x1000013 The 1 indicates that one volume id will follow, and then the volumeid that should be ignored during salvage. This should bring your server up with everything but this one problematic volume. > I then went to the other server, restarting hoping puting him at same > state his "brother" but he's still refusing to start with no errors.. > Freezing on the : > 02:55:33 Main thread just did a RVM_SET_THREAD_DATA > 02:55:33 Setting Rvm Truncate threshhold to 5. Hmm, old server maybe still running or something? killall -9 codasrv and retry. If you manage to get this one up and running without a problem and the 0x1000013 volume is part of a replicated volume you can get everything back up and running without too much hassle. You would have to delete the corrupt replica (0x1000013), and then recreate it as an empty volume. If everything is done right, server-server resolution will simply copy everything back from the surviving replica. You need to know... Replicated volumeid for 0x1000013, something like 0x7f0000XX. The unique volumename, this depends a bit on what the other one happens to be named as. Lets say your replicated volume is 'volume', then one replica will be 'volume.0' and the other 'volume.1'. So you have to check (volutil info?) what the name of the the surviving replica is. Finally you need to know/decide where this volume should be stored (vicepa) With all of this info we can do, volutil purge 0x1000013 volume.X volutil create_rep /vicepa volume.X 0x7f0000XX 0x1000013 All of this info might still be listed in one of the files in /vice/vol/remote/XXXX on the SCM. Finally, ls -lR on the client in the related volume should trigger runt-resolution and bring all the data back. JanReceived on 2003-12-05 19:12:02