(Illustration by Gaich Muramatsu)
After last problem I reported, I did some more tests with two server coda cell. My goal was to create a replicated volume and populate it with some files (mailboxes in maildir format, and some documentation I keep in HTML and PDF for offline access). The SCM machine runs coda 6.9.5. I started with the same on non-SCM, but recently upgraded this to latest git sources, to see if it changes anything. Both servers communicate over WiFi, so it's possible that they will lose connectivity for a while. I have no problem to operate on small data. Manipulating a few files and a few megabytes works well. For now, however, I'm pretty sure, that I just cannot do any "massive" copy operation against those servers, as the crash is just a matter of time. I start with creating a new replicated volume do on the SCM: createvol_rep docs otwieracz.localnet/vicepa kontrabanda.localnet/vicepa On the client machine I can mount this volume without problems. I want to copy manuals directory into it: cp -r manuals /coda/coda.localnet/docs/ manuals is not something really big: $ du -sh manuals 629M manuals $ find manuals | wc -l 31216 (I'm sure, I don't exceed 4k files per directory limit: there is no more than 100 files per directory). At some point of copy operation both servers crash: SCM SrvErr: No waiters, dropped incoming sftp packet No waiters, dropped incoming sftp packet (many lines like that) No waiters, dropped incoming sftp packet RVMLIB_ASSERT: Error in rvmlib_free Assertion failed: 0, file "rvmlib.c", line 258 ***BackTrace*** /usr/sbin/codasrv(coda_assert+0x5f)[0x4a4bff] /usr/sbin/codasrv(rvmlib_free+0x181)[0x4a2ab1] /usr/sbin/codasrv(_ZN5recle8FreeVarlEv+0x1aa)[0x4726da] /usr/sbin/codasrv(_Z11TruncateLogP6VolumeP5VnodeP7vmindex+0xf9)[0x471729] /usr/sbin/codasrv(_Z12InternalCOP2iP11ViceStoreIdP17ViceVersionVector+0x5fa)[0x4327ba] /usr/sbin/codasrv(FS_ViceCOP2+0xad)[0x432a5d] /usr/sbin/codasrv(srv_ExecuteRequest+0xf52)[0x454c62] /usr/sbin/codasrv[0x41f8b4] /usr/lib64/../lib64/liblwp.so.2(+0x5fe2)[0x7fa21e1bbfe2] /lib64/libc.so.6(+0x36aa0)[0x7fa21d91caa0] /lib64/libc.so.6(sigsuspend+0x16)[0x7fa21d91cd76] /usr/lib64/../lib64/liblwp.so.2(lwp_makecontext+0x10e)[0x7fa21e1bc13e] /lib64/libc.so.6(fflush+0x6b)[0x7fa21d95381b] /lib64/libc.so.6(_longjmp+0x2b)[0x7fa21d91c8ab] /usr/lib64/../lib64/liblwp.so.2(+0x5f72)[0x7fa21e1bbf72] /usr/lib64/../lib64/liblwp.so.2(lwp_swapcontext+0x22)[0x7fa21e1bc022] /usr/lib64/../lib64/liblwp.so.2(LWP_DispatchProcess+0x3bd)[0x7fa21e1baf7d] /usr/lib64/../lib64/liblwp.so.2(LWP_QWait+0x57)[0x7fa21e1bb987] /usr/sbin/codasrv(main+0xdc6)[0x41c286] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fa21d907a95] /usr/sbin/codasrv[0x41cee9] EXITING! Bye! non-SCM SrvErr: No waiters, dropped incoming sftp packet No waiters, dropped incoming sftp packet (repeated many times) No waiters, dropped incoming sftp packet RVMLIB_ASSERT: Error in rvmlib_free Assertion failed: 0, file "rvmlib.c", line 258 ***BackTrace*** /usr/sbin/codasrv(coda_assert+0x5f)[0x4a4bef] /usr/sbin/codasrv(rvmlib_free+0x181)[0x4a2aa1] /usr/sbin/codasrv(_ZN5recle8FreeVarlEv+0x1aa)[0x4726da] /usr/sbin/codasrv(_Z11TruncateLogP6VolumeP5VnodeP7vmindex+0xf9)[0x471729] /usr/sbin/codasrv(_Z12InternalCOP2iP11ViceStoreIdP17ViceVersionVector+0x5fa)[0x4327ba] /usr/sbin/codasrv(FS_ViceCOP2+0xad)[0x432a5d] /usr/sbin/codasrv(srv_ExecuteRequest+0xf52)[0x454c62] /usr/sbin/codasrv[0x41f8b4] /usr/lib64/../lib64/liblwp.so.2(+0x5fe2)[0x7f4c1dc8ffe2] /lib64/libc.so.6(+0x36aa0)[0x7f4c1d3f0aa0] /lib64/libc.so.6(sigsuspend+0x16)[0x7f4c1d3f0d76] /usr/lib64/../lib64/liblwp.so.2(lwp_makecontext+0x10e)[0x7f4c1dc9013e] /lib64/libc.so.6(fflush+0x6b)[0x7f4c1d42781b] /lib64/libc.so.6(_longjmp+0x2b)[0x7f4c1d3f08ab] /usr/lib64/../lib64/liblwp.so.2(+0x5f72)[0x7f4c1dc8ff72] /usr/lib64/../lib64/liblwp.so.2(lwp_swapcontext+0x22)[0x7f4c1dc90022] /usr/lib64/../lib64/liblwp.so.2(LWP_DispatchProcess+0x3bd)[0x7f4c1dc8ef7d] /usr/lib64/../lib64/liblwp.so.2(LWP_QWait+0x57)[0x7f4c1dc8f987] /usr/sbin/codasrv(main+0xdc6)[0x41c286] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f4c1d3dba95] /usr/sbin/codasrv[0x41cee9] EXITING! Bye! Of course I can restart both servers and resume copy operation (which continues to local cache anyway), but this leads to server/server conflicts sooner or later. I have to populate that volume in some way. Maybe I should just shutdown non-SCM and copy to SCM only? Or maybe running a client directly on the SCM, just for that copy operation would be a better option? PiotrReceived on 2013-07-08 07:21:06