(Illustration by Gaich Muramatsu)
Hello everyone, So tell me, how is it that I can't make this nice little filesystem function correctly. I "guess", the replication works. But now it seems that I am having trouble with my clients. First of all, whenever I stop venus, in order to restart it I have to do "Venus -init", otherwise it turns into a zombie. Once I do that, then I can stop venus and restart it without the "-init". What is causing this? Second it seems that my clients to NOT like to stay connected. I can manage to get all of my client's activly connected for a few minutes. I can test this by modifying a file on the server, then making sure the changes made it to every other client. This works in the beginning. But after a while (time seems random), the clients start disconnecting. Also, once any type of work is done inside the filesystem, the client in which I am using wants to disconnect. In order for my changes to propagate, I have to do a combination of the following commands, each of which has it's own problems. Commands I try that eventually gets me reconnected (in no particular order): *NOTE: m1.public and m2.public act as server's and as clients. 1) cfs cs myrealm *Note: I try this command, and sometimes it works, other times to doesn't. My realm file looks like this: "myrealm m1.public m2.public". When I use the above command, it often tells me that the server's are still disconnected, which is wrong, because if I do "cfs cs m(1,2).public" it works just fine. Why would this happen? When the above command DOES work, then my clients continue to function for a little while longer. 2) cfs reconnect *Note: I have never been able to get this command to work. 3) cfs fr /coda/myrealm/storage *Note: Sometimes I have to do this inorder for changes to be made to the server. Isn't there a better, automatic way? 4) echo -n "pwd" | clog user *Note: I run this command on all my machines, and sometimes it reconnects the volume, sometimes it doesnt. Functionality is sporadic. Trying the above commands in different orders sometime's get's me reconnected. However once, I try to continue moderate use of the Filesystem, everything dies again. I cannot manage to make it stable. Perhaps if I explain one application that I am using, everyone will get a better idea of what I am trying to make coda do. Perhaps my work is incompatable with coda. I am unsure. Basically, one of the applications I am attempting to run is an mpi version of povray. MpiPovray works best with shared storage. If I have 8 machines total, then I am running povray on all of them, and each machine must be able to write to a particular "working directory" at the same time. I am attempting to use coda as this "working directory". At most the program instance running on each nodes will write a log file to the working dir. I am unsure if this particular application send's data back to the master mode to final output to file, or if each worker node write's to the file from it's respective machine. So, my problem is that the filesystem either can't seem to keep up with the operations of the program, which seems wrong to me because it worked fine before I tried replication. I cannot seem to find any usefull information in my log files. When coda works, it works great, but then it just stops working. No obvious error entries. If anyone is interested I can clear my log files, and repeat everything I do that gets me to the dead filesystem point. I can make the log files available to whoever would like them. BTW: in my server.conf file on both servers I have "mapprivate=1" enabled. and in venus.conf would "dontuservm=1" make any difference to my situation. I am unsure of it's proper use. If anyone can offer any suggestions or help, it would be greatly appreciated. Thanks in advance, I'm going to go hide under a rock. -RD __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.comReceived on 2004-11-01 17:17:30