(Illustration by Gaich Muramatsu)
On Fri, Dec 05, 2003 at 11:17:50AM +0900, Stephen J. Turnbull wrote: > >>>>> "Lionix" == Lionix <lio_at_absium.com> writes: > > Lionix> And [Jan] always take the time one day to read the ml and > Lionix> write some responses. > > That's right, he'll jump in soon if there are mistakes or unanswered > questions. You shouldn't write him privately unless it's something > you know is uninteresting to the list (like megabytes of logs). Well, I went off to Vegas over thanksgiving and when I got back there were all those messages with correct and insightful answers, so I thought I'd better back off for a while longer ;) > Lionix> I get some troubles... Morevover I've observed that venus > Lionix> generaly writedisconnect from most volumes when one volume > Lionix> is heavy touched and turn into disconected state ... > > I would think you could improve availability by having several > servers, each serving only a fraction of the volumes. I've noticed that double replication is often the best solution. Clients will load balance reads, and on 100Base-T, writes are as fast as writing to a single server. This is probably because the bulk-data transfer is able to send the data to the second server before the first server has acknowledged. So we get to utilize the network a lot better. > Also, by reducing bandwidth needed on weak connections (eg, by using > rsync protocol for transport) probably this could be improved. That > would be a pretty big project though, I suppose. For reading this would only help if files are updated in-place. But in reality many applications try to shield us from losing data as a result of out-of-diskspace or other write errors and most files are updated as a sequence of mv file file.bak create file rm file.bak so the file before the update is a completely different object compared to the the file after the update. And rsync would be just as inefficient as fetching (or storing) the whole file. Maybe local CML optimizations could work around this problem by recognizing the sequence and when we in fact remove the backup file, simply log this as an over-write of the existing file. > Lionix> If I remember correctly venus fetch files to the fastest > Lionix> server to response his request. > > Lionix> And then servers replicate to themself data.... Venus fetches file and directory data from the fastest server. But writing (storing) a file depends on the estimated connectivity. If we believe there is a good network, it writes to all servers in parallel. But when the network is not so good, it writes to the fastest server and then triggers server-server resolution to propagate the update to all others. > Lionix> I think a risk arize if venus crash during fetching file > Lionix> to server in a rvm-less config.... All data in memory of > Lionix> venus machine would be lost ! Well, yeah, but if I have a client that only reads out of Coda the data loss isn't really a big deal. We can always refetch it as soon as the client is restarted. The problem areas really are, - crash while a file is open for writing This is always a problem, even for the existing venus because we don't know what happens to a file until it is closed. So the only thing a client can do is discard the "owrite" file. (it is actually moved to the /usr/coda/spool directory) - crash while we still have pending operations logged in the CML that haven't been sent to the server. This is where RVM helps us. If your client doesn't have recoverable memory (and a persistent cache for file data) this stuff can and will be lost when venus is shut down for any reason. We actually used the non-persistent caching on iPaq handhelds. To avoid wearing the flash we didn't use RVM and mounted a ramfs partition over the venus.cache directory. The local cache would be lost whenever the handheld loses power or reboots (or venus dies). But on the other hand, these PDA's run on battery for long periods of time and barring any crashing bugs really don't shutdown or reboot. JanReceived on 2003-12-05 18:10:25