Coda File System

From: <shivers_at_ccs.neu.edu> Date: Wed, 2 May 2007 13:01:55 -0400 (EDT)

    From: Jan Harkes <jaharkes_at_cs.cmu.edu> 
    Clearly the average file size is considerably larger and we are far more
    likely to see reasonable numbers for the number of cached files. If we
    have a 1TB cache we may see something in the order of 200K digital
    photos, 3000 whole-CD flacs, 1000 TV recordings, or a couple of hundred
    VM images.

Actually, I *just now* checked my 20Gb homedir:
    % du -sk . ; find . -type f -print | wc -l
    20088072        .
    379371

20088072/379371 = 53

So my average file size is 53kb (a little less, actually, if du includes
the blocks used by directories).

I had no idea it was so small, since I have down in that tree a couple of CD
images and even a complete vmware virtual filesystem for a virtual WinXP image
sitting in a pair of files, plus some music and probably a few video clips.

Note that my email is stored in babyl format, not maildir, so it's one file
per mailbox, not one file per message. My mail base is about 1.7Gb/1200 files,
going back to 1979. Clearly my average file size would drop a lot (by changing
both the denominator & the numerator) if I flipped this over to maildir format.

So the moral for me is that coda, for me, in 2007, needs to handle caches with
- 10's of Gb to a small number of Tb, and
- about a million files.
I.e., the small-average-file-size case still needs to be handled, even when
the file stores get much larger.

That's for a *single* user. For a ten-thousand user "campus", add 4 zeroes.

Over on the server where I keep my music, I have 320Gb in 41k files, so an
average file of 7.8Mb.
    -Olin

Coda File System

filesystem sizes