We describe a new algorithm for path profiling. This simple, fast algorithm selects and places profile instrumentation to minimize run-time overhead. Instrumented programs run with overhead comparable to the best previous profiling techniques. On the SPEC95 benchmarks, path profiling overhead averaged 31%, as compared to 16% for efficient edge profiling. Path profiling also identifies longer paths than a previous technique, which predicted paths from edge profiles (average of 88, versus 34 instructions). Moreover, profiling shows that the SPEC95 *train* input datasets covered most of the paths executed in the *ref* datasets.

This is joint work with James Larus of the University of Wisconsin--Madison. A paper on the work will appear in MICRO-29 and is available at ftp://ftp.cs.wisc.edu/wwt/micro96.ps.

We answer this question for another very expressive type system, namely,
the rank 2 intersection type system. Trevor Jim, in POPL '96, has shown
that this system can type more terms than ML while retaining the same
complexity of type inference. This makes it an attractive tool for
type-based program analysis.
We show that for the rank 2 intersection type system with subtyping, and
with types augmented by effects (i.e., control-flow information), there
exists a sound and complete type inference algorithm that can compute
control-flow information directly from the structure of the program text.
Subtyping is used to orient flow information; the *principal typing*
property of the type system lends *modularity* to the analysis,
so that program fragments with free variables can be separately analysed;
and the polymorphism inherent in the type system renders the analysis
*polyvariant*, so that the same function can be specified to have
different behaviours at its different uses.

We describe a polyvariant flow analysis framework for F_{pred},
the predicative fragment of system F. Our framework characterizes a
broad class of analyses, and we prove that any analysis in this class is
correct. We then exhibit an uncomputable analysis that respects types.
Crucially, this analysis uses types to control polyvariance. We show, however,
that our analysis *is* computable for a subset of F_{pred} sufficient to
express the core of ML. Finally, we provide a formal characterization of the
*precision* of an analysis, and show that the product of a type-respecting
analysis with any sound analysis still respects types.

To our knowledge, this is the first presentation of a polyvariant flow analysis framework for a polymorphic language that uses type information to characterize and control precision.

This is joint work with Suresh Jagannathan and Steve Weeks.

We describe a method that improves the cache locality of sequential programs by scheduling fine-grained threads. The algorithm relies upon hints provided at the time of thread creation to determine a thread execution order likely to reduce cache misses. This technique may be particularly valuable when compiler-directed tiling is not feasible. Experiments with several application programs, on two systems with different cache structures, show that our thread scheduling method can improve program performance by reducing second-level cache misses.

This is joint work with Jan Edler, Otto Anshus, Craig Douglas, and Kai Li.