(Illustration by Gaich Muramatsu)
Hello Piotr, On Tue, Jul 09, 2013 at 06:52:43PM +0200, Piotr Isajew wrote: > As for now it seems that it's possible to crash both servers in > repetitive way, even when copying small amount of files to single > directory (both servers crashed on the same rvm assertion when > copying 20 pdf files, 60M total). Hmm. This looks much worse than my experience here. Our servers and clients are not compiled from the official upstream git but our patches touch mostly the authentication part, not the file service. We use though more server threads compared to upstream. This might create a certain difference. > The most stable behaviour can be achieved by turning off non-SCM, > performing copy operation, waiting for venus to reintegrate > everything to SCM and than bringing up non-SCM and propagating > changes to it. This, however, for larger sets of data gives "No > space left in volume log." error on the SCM, and it crashes on How large log do you have? I was using 16384 for several years but since I once also got "No space left in volume log" I am now now using 65536. (Apparently the data amounts have grown as the time goes :) Of course, a server being down at a massive update operation will eventually lead to log overflow anyway - but your test operation is not huge at all. > another assertion. Turnig on non-SCM in such situation leads to > repeatable suicide at it's start, and the whole situation starts > to look like a dog trying to catch his own tail. You wrote "Both servers communicate over WiFi, so it's possible that they will lose connectivity for a while." This may contribute to triggering odd bugs - Coda service was developed with an assumption of servers having reliable contact with each other. It is the clients who are supposed/allowed to have intermittent connections. A server going down is supported but not as a regular situation (and this is not the same as servers being partitioned from each other). I am not aware of any "fundamental" reasons preventing the servers from working properly if they lose contact with each other. Nevertheless: regretfully or naturally Coda does not [try to] cover every possible situation and servers with unreliable network are an unsupported configuration. The relevant code paths are in the best case probably not fully tested and in the worst case non-existent (among others - leading to asserts). I have successfully run Coda with geographically spread servers - but the network between them was reliable and such a setup is not something inherently supported by Coda design. Jan will hopefully correct me if I am wrong on the above. Would it be feasible for you Piotr to make a test with the servers on a wired connection? Regards, RuneReceived on 2013-07-09 15:25:11