(Illustration by Gaich Muramatsu)
On Thu, Feb 08, 2007 at 07:54:10AM +0100, Reiner Dassing wrote: > thank you very much for your detailed explanations. > > The mentioned transfer speeds are detected on coda-6.1.2. > > But I had used coda-6.9.0 and there I saw very bad responses. Interesting, 6.9 probably can use some additional tweaking in the writeback parameters. It always logs local changes and writes them back asynchronously. The settings are tuned to minimize the impact of writeback on user activity. The current settings are to use up to 20% of the available network bandwidth (write back occurs in 1 second bursts once every 5 seconds). So if the amount of dirty data fits in the client cache, this should be near optimal. We hardly get any write delays, and updates invisibly trickle back at some point. However if our working set is (considerably) larger than the client cache, we end up blocking even on reads because the whole cache is dirty and we can't discard anything until reintegration has caught up. At this point the 'codacon' output should be showing things like 'yellow zone, slowing down writer' or 'red zone, stalling writer'. Readers get a little more room, but will eventually block as well. So in a case where we copy a lot of data into the cache, it may be useful to increase the length of the reintegration bursts. In fact it may be useful to allow for 30-60 seconds of reintegration to push a large file back in a single transfer instead of trickling it in small chunks. It isn't hard to tweak the settings they are controlled through the writedisconnect settings of cfs. The settings are persistent across reboots, but only affect currently known (or explicitly specified) volumes. There is an -age parameter which defines how long something should stay in the CML before it is eligible for writeback, the longer we keep things around the more chance we have that we can optimize it. The second setting -hogtime defines how long reintegration is allowed to take. Once every 5 seconds a reintegration thread kicks in and pushes all eligible objects back that it can send within the hogtime period, anything else will have to wait for the next (multiple of) 5 seconds interval. The default is, cfs wd -age 0 -hogtime 1 so modifications can be written back immediately, but we only want reintegration to use up to 1 second. A possible option would be, cfs wd -age 0 -hogtime 5 This should reintegrate in 5 second bursts, however if we underestimate the time it takes we might end up reintegrating a bit longer and miss the next interval so we'd only be reintegrating maybe 50% of the time. So even better may be, cfs wd -age 0 -hogtime 30 (or 60) This pretty much should try to clear out as much as we can. It does mean that write operations may be blocked for up to 30 (or 60) seconds while reintegration is occuring. Finally there is a magic setting, cfs wd -age 0 -hogtime 0 (or as an alias 'cfs strong') This will force a synchronous reintegration after every operation before we return control to the application. The resulting behaviour should be close to connected mode operation in 6.1.2. I have been finding some odd cases though, under low bandwidth conditions trickle reintegration would not check for the synchronous reintegration flag and we end up trying to reintegrate the file in chunks of 0 bytes, resulting in no progress on each successive reintegration attempt. > There must be something in coda-6.9.0 which makes it very slow. > I now use coda-6.1.2 which is as fast as expected. 6.9 is a lot faster for the local application as long as the working set is smaller than the client cache size. But yes in the long run it does take longer before updates actually reach the server since we only use about 20% of the available time/bandwidth for write-back purposes. The question is how do both clients compare when we scale up to 10s or 100s of clients against a single server. Reintegration is considerably more efficient as operations are batched and the server can commit the whole batch in a single transaction. But if the 6.1 clients add too much load to the server they would fall back on write-disconnection as well, but by default assume that it is network related and use conservative writedisconnect settings of -age 30 and -hogtime 60. My assumption is that overall updates from the 6.9 clients would end up on the servers faster while keeping the same behaviour and consistency guarantees as we see with a single client, single server setup. Compared to this, the 6.1 clients are drastically changing their behaviour and consistency model as they bounce back and forth between strong and weak connectivity states depending on activity of other clients and the resulting server load. JanReceived on 2007-02-09 16:13:05