(Illustration by Gaich Muramatsu)
On Sun, Nov 25, 2012 at 06:35:37PM +0100, u-codalist-9wcu_at_aetey.se wrote: > On Sun, Nov 25, 2012 at 08:19:41AM -0500, Mahadev Satyanarayanan wrote: > > We regularly assign a class project that modifies Venus to perform > > prefectching. Here is the most recent version of the project. > > It would be exciting to hear if the students have found a certain class of > situations (file data access rate vs bandwidth vs cache size vs average > file length and similar) and corresponding algorithms where prefetching > would be consistently superior? > > I am pessimistic but it is only the actual tests and measurements who can show > the answer. You are correct in that assumption. Prefetching is mostly useful for systems that see a high communication latency. In Coda's case the cache plays a very significant role and there are effectively 2 cases. 1) Not enough space in the cache. We have to throw something out of the cache to accomodate the prefetched data. Now a prefetch decision is not so much, 'will the user need this object he did not even ask for', but it becomes a 'is this unknown object more important than any of the objects we know the user has accessed (recently) and which have successfully avoided being dropped from the cache.' 2) There is enough space to cache all data we need (the working set). Easy, if we need something it should be cached already unless it has recently been changed on the server. But in that case we received a callback and the only challenge is to (hopefully) fetch the data before the user needs it, yet avoid an unnecessary fetch that occurs when the file on the server changes before the user has accessed the local copy. Some research was done on how frequently files are accessed and changed, I believe that when a file has been written and has not seen any further write or delete activity in a short time (about 5 minutes?) it will most likely stay unchanged for a very long time. But my memory is a bit vague on that. I think the following paper talks about these access patterns. Long Term Distributed File Reference Tracing: Implementation and Experience. Mummert, L.B., Satyanarayanan, M. Software-Practice and Experience, June 1996, Volume 26, No. 6 Earlier version appeared as: CMU SCS Technical Report, Nov. 1994, CMU-CS-94-213 Hoarding fits this scenario quite well, the user can explicitly specify his working set through a hoard profile and Coda will periodically make sure to refetch any files whose callbacks were broken. Some more automated mechanism that tries to estimate or maintain the current working set in the cache could be useful. Cache replacement in Coda uses various heuristics which include filesize (cost to refetch the data) and access frequency, hoard priority is just one of the variables. A simpler LRU or ARC replacement algorithm may not be as 'smart' but it is easier to reason about. At the moment there are counter intuitive situations, such as copying a source tarball into Coda and then unpacking it gives the tarball a much higher priority than the extracted files which are more useful because of its larger size and the fact that it got accessed at least twice (one to create, one to read) while the more interesting extracted files are small and have a single access. Now there is one case that I know of where prefetching helps and this is in a high latency (and high bandwidth) situation when a user is accessing an uncached directory (outside of the working set). The client has to notice the directory access followed by stat() calls, but also estimate which access pattern for directory entries is used by the application ('normal' directory order, alphabetic, by inode number). Once these conditions are met it can start prefetching attributes starting at the opposite side of the range and effectively cut the time to read the directory and attributes of children in half. Of course other things will still have to be evicted from the cache, but if the application is really going to fetch attributes of all directory entries the evictions are going to happen either way. JanReceived on 2012-11-26 09:53:26