State of the Art of Performance Visualization

This is a guest post by Kate Isaacs, UC Davis, who is one of the authors of the paper presented.

Software visualization for performance is about helping developers find inefficiencies slowing down their code. Performance can have significant affects on the usability, feasibility and cost of running software. At EuroVis 2014, we presented a State-of-The-Art Report (STAR) on performance visualization which I’ll go over below. See here for the full report, slides, and literature website.

Performance is generally measured in terms of time, e.g., time to complete or throughput. Power consumption is another performance measure of interest but there are fewer tools for gathering such data. Since neither can be determined statically, some component of performance data must be gathered during execution (or possibly a simulation there of). This includes calling contexts and state information from the software as well as performance counters like cycles, flops, packets, and cache misses from the hardware. Collecting these data generally falls into one of two formats: Profiles aggregate the data in time, offering low overhead but less detail. Traces record each event separately as it occurs and thus quickly grow in size, so must be limited in scope.

Bar horizontal placement and width indicates time and duration of a function. Bar vertical placement is call stack depth.

Trace visualization showing call stack timeline, by Trümper et al.

We’ve broken down the use of visualization here into three main tasks. First, developers want to gain an overall understanding of what actions the software takes and how it uses resources during execution. Second, developers want help in detecting performance problems — they want to be able to quickly find anomalies, bottlenecks, load imbalance, and misuse of resources. Finally, they want to attribute these problems either to the software itself or some interaction between the software and the system on which it runs. Going beyond line-of-code attribution is a major challenge in helping developers truly understand causes of poor performance.

Though many tools employ visual analytics approaches to meet these tasks, we focused our STAR on the unique visualizations that may be part of these systems or stand alone. We categorized the visualizations by the context they provide to the performance measurements:

The software context is that of the code itself.  Call graphs are a popular sub-context for performance visualization. The need to show time or counter data makes indented trees with attached tables or color on node-link diagrams popular avenues.

Performance data has also been displayed on the code itself. Serial traces, like the one of Trümper et al. above, often focus on displaying the call stack in time or other code information, so they fall in the software context as well.

Call graph drawn as an indented tree, so each row is a unique call path. Data associated with that call path is in the same row of an adjoined table.

Indented tree call graph with tabular attributes, by Lin et al.

Threads and parallel processes are the fundamental units of the tasks context. Visualizing traces is a large area of research in this context, with challenges due to the sheer number of tasks. Representing the interactions between these tasks and their creation and deletion in time adds even more difficulty. Gantt-like representations and node-link diagrams are widely used here.

Each row represents a different processes timelines with bars shown function time and duration. Lines drawn between rows show messages between processors.

Gantt-like per-process timelines with messages overlaid, from Vampir

The system on which software is run is the hardware context. This can be the individual CPU cores or GPUs running the code and their scheduling of instructions, traces of the memory hierarchy usage, and representations of compute nodes and their interconnection network. We also included the operating system in this context, as that is rarely changed by application developers. When possible, natural representations have been used, but scale and complexity of modern architectures has largely removed this option.

Ports are spheroids colored by performance data. Nodes are surrounded by these ports. Network links are shown as lines between ports, colored to show traffic.

Multiprocessor nodes connected in a 2D plane network, from Haynes et al.

The application context is the domain of what is computed by the software. In scientific simulations, this is often a physical domain and in linear algebra libraries this would be the matrices involved. A lot of work has been done in the SciVis community for visualizing the former, but few tools have integrated a mapping of performance data onto those visualizations.

Memory accesses are shown both on the matrices (application context) and on the 1D arrays representing memory and caches (hardware context).

Matrix multiply visualization showing the computing matrices and their memory accesses, from Choudhury et al.

There are several challenges to address in performance visualization, the largest one being scale. Representing growing numbers of parallel operations or multivariate data from counters, function calls, and static context information is a major part of this problem. However, simply managing and compressing the large amount of data that can be collected is also a problem. Another challenge is handling ensembles of data taken from multiple executions, so developers can better determine the effects of their changes. As mentioned before, sophisticated attribution and depictions of complicated architectures are also in demand. These challenges demonstrate the pressing need for innovative performance visualization.

Interested? The first Workshop on Visual Performance Analysis will be held at Supercomputing 2014 — regular and short papers are due July 28th.

This entry was posted in Papers, Research, Summary, Uncategorized and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s