(Illustration by Gaich Muramatsu)
On Mon, Feb 14, 2005 at 04:22:16PM -0800, Ray Taft wrote: > - The website sites pushes in excess of 1.2Gbps of > bandwidth to the internet. > > - We use 30 FreeBSD 4.11 Servers servers and load > balancing to balance web traffic to those servers. I don't think bandwidth would be an issue for Coda, individual reads and writes are never bounced up to userspace, so once the file is opened accesses should be as fast as if the file were on the local disk. Now opening/closing do involve an upcall and if we can't keep the working set in the the local cache the open possibly involves pruning the cache and fetching missing data. So the rate of requests that trigger a cache miss and possibly even the request rate when we have only cached copies could be a bottleneck for Coda. > - Each webserver has a 100Mbps connection to the NET, > and a private 100Mbps connection to a Storage Network. Multihoming is still problematic, but as long as the Coda server(s) only have a single interface connected to the private network I don't forsee too much trouble. > - Our file server is a DUAL AMD 2.4Ghz (64bit) > platform. It has 8GB RAM, 4 Intel 1Gbps Interfaces > (Trunked / Etherchannel) running SuSE 9.1. Lots of > Fast RAID 10 Storage. All 4 Gig ports are connected to > a private storage network. Do those 4 trunked interfaces show up as a single aggregate 4Gbps link with a single ip address? > The client side servers have a max of 160 Gigabytes > each (local ATA hard disk). The total volume of the > site is 360Gigabytes and growing on the file server. > The web server /coda client will never have enough > storage to hold all of the files on the coda server. Depending on the number of files, 360GB is probably already pushing the limits of a single Coda server. The best way to tell is to run 'rvmsizer' on the tree and see if its reports problematicly large directories and how much metadata it thinks you'd need. My guess is that it is probably hard to get a reliably working server with 2GB or more metadata. > The reason for the smaller storage was that not all > 360 Gigabytes would be accessed at any one point in > time that frequently. Roughly 200 gigabytes of storage > would be requested on any particular day of the total. Ok, so the working set 'almost' fits completely on the clients. > Our hope is by implementing CODA, the most frequently > requested files (average size is 100 Mbytes - big > files), would be cached on the client side webserver > elevating some load from the NFS file server. That by > caching the files locally, we would be able to achieve > higher transfer rates from the webserver to the surfer > as the file operation is a request on a local cached > file system rather than on a backend NFS pipe. Would you still keep the NFS server for smaller, less frequently accessed files? 100MB average file size is very nice for Coda, a client would have no trouble managing the ~1600 locally cached copies. You do need to tweak some parameters in venus.conf, otherwise the client will try to allocate memory to manage about 7 million locally cached files which is really can't do yet.. > How does an administrator set the cache size on the > local hard disk? How does the flushing algorithm work? > Does it have a flushing algorithm? There is a flushing algorithm, but I'm not too happy with it. It calculates a 'temperature' for each file, which seems to depend on how recently the file was accessed, how long it has been in the cache (if it stayed around for a while, it probably has proven itself useful) and how costly it would be to refetch the data and probably a handful of other parameters. But in real life it seems to means that there is no simple way to reason about which file gets thrown out for what reason and it seems like small recently fetched files that haven't been around long enough to build up 'temperature' are dumped because I unpacked some huge tarball a week ago. The local cache size on a client is a hard limit for fetching new data from the servers, but a soft limit when locally writing. We don't see individual writes, so a newly created file is 0 bytes and as such takes no space, until the file is closed and we suddenly realize that it is a in fact a bit larger only then do we start pruning. So it is probably smart to reduce the local cache size by about the size of the largest file that you expect to write. > consumed? Will coda flush the least requested files on > some sort of timer, keeping only frequently requested > data in the local disk cache? No, flushes are only performed when we need space to put something new in the cache. > When a surfer http requests a file that does not exist > at the CODA client side cache, and the coda client has > to go back to the coda server to fetch the file, does > the surfer have to wait before the entire file has > been cached on the coda client before taking delivery > (before the file transfer starts)? Yes, the application (apache thread) will be blocked in the open call until the complete file is cached, once Coda returns from the open the application can do anything it wants without having to block on anything but local disk accesses. JanReceived on 2005-02-24 13:50:09