(Illustration by Gaich Muramatsu)
On Sun, Aug 17, 2003 at 04:34:20PM -0700, Ronald Hilterfuge wrote: > I am investigating whether or not coda is suitable for my application. > I have two questions about the workings of coda. I would like to > read-write mirror a rather large filesystem (about 800 gb) over a WAN > to 4 distinct nodes. I have sufficient RAM to be able to create a RVM Ok, in a way this really is a FAQ, but I guess noone has ever tried to really write it down as one. There are clearly different types of replication. One is the type you describe, where multiple widely distributed sites have a need to have identical replicas of a common dataset. I don't really know if there is a good term for this, but site-mirroring comes close. The other type is more a failover type of replication, where data is replicated across multiple machines at a single site. This is similar to how a RAID5 array works, the load of read accesses is shared across all replicas, and when one replica becomes unavailable the others can transparently take over. Some sort of High Availability replication. Coda's server-server replication is of the second type. It really doesn't perform well when the various replicas are spread across a wide area. The main reason for this is that most of the replication smarts are not in the server, but on the client. It would almost be possible to give a Coda client an HTTP backend and use a normal web-server as fileserver. The only modification that would be necessary is a way to securely synchronize the webservers when a client detects a difference between servers. Now it is possible to approach the site-mirror type of replication if you have all the servers in one location, and only use Coda clients from the other locations. This is a bit inefficient, because all the clients do not share state, so each independently has to fetch identical files. Some interesting research is done on the use of staging servers and client cache sharing, where a recent copy of (most of) the data is placed on a local machine, or clients are allowed to borrow data from each other's caches. Ofcourse trust and security come into play here. http://portal.acm.org/citation.cfm?id=566775&coll=portal&dl=ACM&CFID=11728998&CFTOKEN=73925472 http://www.eecs.umich.edu/~jflinn/papers/sigops02.ps > metadata partition which is still 4% of the total filessytem size. > > 1) How would coda perform over such a situation? Can coda handle large > filesystems? Not very well, you might have to look at things like rsync, unison or omirr. Coda can handle extremely large filesystems as long as your files are relatively large :) Our real limitation is the number of objects whose metadata has to be stored in recoverable VM (basically RAM+SWAP). Each object takes a few hundred bytes, but it quickly adds up. > 2) Also how do the filesystem semantics work? if someone attempts to > open a file at location A, and someone else tries to open the same > file at location B, how does the filesystem respond in this situation? For reading or for writing? If both open for reading, no problem. Both get an identical copy. If one reads and the other writes, the reader will not see any updates until he closes and reopens the file _after_ the writer has closed his file. Of both are writing, then the last one to close his filedescriptor gets a reintegration conflict which has be be repaired by the user. JanReceived on 2003-08-19 11:31:59