(Illustration by Gaich Muramatsu)
On Wed, May 02, 2007 at 01:01:55PM -0400, shivers_at_ccs.neu.edu wrote: > From: Jan Harkes <jaharkes_at_cs.cmu.edu> > Clearly the average file size is considerably larger and we are far more > likely to see reasonable numbers for the number of cached files. If we > have a 1TB cache we may see something in the order of 200K digital > photos, 3000 whole-CD flacs, 1000 TV recordings, or a couple of hundred > VM images. > > Actually, I *just now* checked my 20Gb homedir: > % du -sk . ; find . -type f -print | wc -l > 20088072 . > 379371 > > 20088072/379371 = 53 > > So my average file size is 53kb (a little less, actually, if du includes > the blocks used by directories). There is a small program (/usr/bin/rvmsizer) that is included in the Coda server package, which is useful to estimate the amount of recoverable memory a server needs to store a copy of a local tree. The RVM numbers it gives do not really correspond to what is needed on the client, but it does also report some known cases that a Coda client cannot handle, such as too many files per directory. $ rvmsizer ~ 35875 directories, 603847 files, 48489 directory pages total file size 38499769202 bytes (36716.24MB) average file size 63757 bytes total directory size 163344384 bytes (155.78MB) average directory size 4553 bytes estimated RVM used by directory data, 99305472 bytes (94.71MB) estimated RVM usage based on object counts, 213477388 bytes (203.59MB) So my average file size is also clearly not in the 'several MB range'. But still, it is a factor 2-3 larger compared to the current value we use in venus. Initially BLOCKS_PER_FILE was 8KB, I guess that value was picked as an appropriate average file size when development started, 1987-1988. In 1998 we bumped BLOCKS_PER_FILE up to 24KB, I think this was after checking the average filesize on various desktops, but we also looked at the average size of files stored in /coda at the time. The measured average may have been a little lower at the time. Now it is almost 10 years later, and I don't find it surprising that the average has gone up. Especially considering that disk is cheap and disk space has been growing exponentially. I am surprised that the average file size only seems to triple over the period of roughly 10 years, at least as far as my personal files is concerned. On the other hand I hardly ever throw things away, so the average for new files must be higher. > I had no idea it was so small, since I have down in that tree a couple of CD > images and even a complete vmware virtual filesystem for a virtual WinXP image > sitting in a pair of files, plus some music and probably a few video clips. Right, I probably have digital photos, maybe some music, lots of sources, tarballs, maybe a VM image or two. I it also includes things like my web browser cache. > That's for a *single* user. For a ten-thousand user "campus", add 4 zeroes. Yeah, but a 10,000 user campus would hopefully use a more than just a single server for all of it's users. I think the AFS2 goal was in the order of a thousand clients per server, not sure how close Coda gets to that goal. JanReceived on 2007-05-02 16:44:43