This Web Page describes two things, The Coda Project traces, and the
DFSTrace gathering system. During the period from February 91 to March
93 The Coda Project collected
traces of all system call activity on 33 different machines. This was
done through the DFSTrace system originally developed by Lily Mummert
and M.
Satyanarayanan as a part of the Coda project at Carnegie Mellon University. In the
spring of 98, Tom
M. Kroeger became interested in using these traces and
re-implementing DFSTrace in Linux. With the support of Prof. M.
Satyanarayanan he spent the Fall of 98 reorganizing and processing
the trace data to make available for general use. Additionally, with
the support of The Usenix
Association he and his adviser Prof. Darrell D. E. Long have
hired Ben Gertzfield to do this
re-implementation at the Concurrent
Systems Laboratory at the University of California at Santa
Cruz.
These pages are mirrored at two locations:
Table of contents:
The Coda Project Traces are currently stored on a set of 38 CDs. Six sets of the CDs were made. In order to read and analyze these traces an extensive library that allows for traces of varying formats was created. The DFSTrace reading library is included on each CD along with the summary results from each of these analysis programs.
Library Documentation | Download a copy of the Library |
The complete details of the data within the traces is available from the library. I would suggest looking at DFSTrace/src/tracelib/tracelib.h. This code library has been tested on the following systems: Linux, FreeBSD, SunOS 5.5, Digital Unix 4.0.
Further details and the primary reference on DFSTrace and these traces
is available in:
Lily B. Mummert, M. Satyanarayanan: Long Term
Distributed File Reference Tracing: Implementation and Experience.
Software Practices and Experiences,
26(6): 705-736 (1996). Also available as a Technical Report CMU-CS-94-213
(pdf), (ps)
It is strongly recommended that anyone intending to use these traces for research read this paper in detail.
Making the entire 24 GB of data available over the Internet is difficult at this point. To provide a more extensive set of trace data than those enclosed with the trace library we have made available on-line a set of traces from four different machines, each covering a one month duration.
In addition to the actual trace data, an index CD contains a copy of the summary information for each of the 38 data CDs. We have also made this summary information available on-line.
After examining these samples, if you have a need for the entire set of 38 CDs please contact Ethan Miller directly (elm@cs.ucsc.edu) to make arrangements to temporarily borrow the CD collection or get a password to access the CDs online.
Currently this work is still in progress. We have developed a loadable module that allows the tracing of all system calls; currently this module only sends information via printk. We are extending this module to create a device /dev/DFSTrace and sends output to that device in a binary format. Once this work is complete, the code for this module will be made publicly available.
The DFSTrace system was originally developed by Lily Mummert and Jay Kistler as a part of Prof. M. Satyanarayanan's Coda Project. These traces were used to provide insight for the design of the Coda File System.
Tom M. Kroeger used these traces in his research into predictive caching. This work was done at the Concurrent Systems Laboratory at , under the guidance of his adviser, Prof. Darrell D. E. Long.
The following is a list of other filesystem traces that we are aware of (any updates to this list are very welcome, send them to tmk@cs.ucsc.edu):