From Fedora Project Wiki

Revision as of 02:52, 18 October 2009 by Sundaram (talk | contribs) (fix typo)

Date Status report
2009-10-17 Podcast editing completed, posted below.
Next action: N/A, done other than publication.

Return to Category:F12 in-depth features page

Interviews

Podcast interview

Print interview

Conducted with Will Cohen of Red Hat.

Introduce yourself and since not all readers may be familiar with SystemTap, tell us about what it is and how it's used.
I am Will Cohen, one of the people that work on performance tools such as SystemTap and OProfile at Red Hat. I started working at Red Hat in the summer of 2000 as an "Enginerd" (that is what my original ID badge said) supporting GCC and porting GCC for various embedded processors. In 2002 I pushed to get some performance tools into the Red Hat linux distribution and been working on performance tools since then.
Being able to modify and instrument code to understand what is going on in open source code is cool, but having to recompile the code and restart machine to run that modified code isn't so cool. SystemTap provides infrastructure to simplify that instrumentation process. It allows developers and system administrators to instrument the kernel and user space programs without the need to recompile or to restart the program or system. SystemTap has a library of probe points and functions in the tapset library to make it easier to examine what is happening in the code. SystemTap's scripting language include a number of constructs such as conditionals tests, associative arrays, and statistical operations to allow data reduction within the scipt so one doesn't have to have to look for a crucial piece of information hidden in tens of thousands of lines of log file.
SystemTap is a very versatile tool. Kernel developers have used SystemTap to debug issues with NFS and files systems. Some of example scripts available can be used by developers to tune their applications. For example a short script can list details on page faults and the amount of time required to handle each page fault (the pfaults.stp example) or trace the events in postgresql.
Are there similar tools that a developer might use on other, non-Linux platforms? What about similar tools on Linux? How does SystemTap compare with those tools?
Solaris, Mac OS X, and BSD include DTrace. On Linux there is LTTng and the ftrace. For a detailed comparison of SystemTap, DTrace, and LTTng one can look at http://sourceware.org/systemtap/wiki/SystemtapDtraceComparison.
All right, let's dive into some details. In Fedora 12 SystemTap has been extended to offer kernel tracepoint support. What are kernel tracepoints, and how does this feature help developers?
Earlier versions of SystemTap used kprobes to implement the instrumentation. Kprobes were implemented with the breakpoint instruction, and required using an relatively expensive trap operation. In some cases it was also difficult to portably specify where to place the kprobe using SystemTap because of changes in the source code.
Tracepoints are much lower overhead than kprobes. They are placed in the kernel by kernel developer and as a result do not suffer from some of the portability issue of kprobes-based instrumentation. Unlike kprobes, the tracepoints do not need the dwarf debuginfo to find where to place the probe. People will see a number of tracepoints available in the 2.6.31 kernel in Fedora 12. If you are curious to see what tracepoints are available in the running kernel, you can use the following SystemTap command:
stap -L 'kernel.trace("*")'|sort
We keep hearing about "dwarf debuginfo," and that SystemTap supports this format. What's changed from previous kinds of debuginfo?
Traditional debuggers (and SystemTap) need to maps between the executable code and the original source code. This information is contained in DWARF debuginfo. The DWARF standard has been used for some time in Linux. It maps line number in the source code and variables names into locations in memory. SystemTap makes use of this information so that a user can probe function "foo" and get the value of variable "x" rather than saying they want a probe at memory address 0x8000413 and read the data out of 0xffff8000.
As a normal part of the build process RPMs executables are built with debugging information. Towards the end of the build process the DWARF debuginfo is extracted from the executables and placed in a separate debuginfo RPM and a link is made in the original file to the new DWARF debuginfo file in this separate debuginfo RPM. Thus, to use SystemTap on an executable, one just needs to install the debuginfo RPM associated with the binary RPM. If the debuginfo isn't installed on the system, the newest versions of SystemTap will even provide a suggestion of the command line to run to get the appropriate debuginfo RPM installed.
One complaint heard is that the debuginfo RPMs can be large. Roland McGrath is working on tools to eliminate redudancies in the debuginfo RPMs and reduced their size.
https://fedoraproject.org/wiki/Features/DebugInfoRevamp
There's also some work that's been done to make SystemTap work with Eclipse, a development environment that's used by many developers. How does that integration work, and how would a developer use it?
The Eclipse plugin, eclipse-systemtapgui, provides an Integrated Development Environment (IDE) for systemtap scripts. It also provides some tools for quickly generating graphs from the output of the script.
We gather this isn't the final endpoint of SystemTap development, because the feature page mentions static probes. Can you explain what those are, how they'll work, and when we might see them in SystemTap?
There is a push more user space markers in applications and kernel tracepoints to make it even easier to understand system behavior. For example instrumenting the threads library so people can understand what their multithread application is doing. There are existing user space program in Fedora such as Java and Postgresql that already have markers enabled, but should and can be more.
Finer grain control of what operations are allowed by which users, so that a unpriviledged users can probe their userspace application with SystemTap.
Use of hardware watchpoints so that a script could determine when a variable is read or written. This could be used to diagnose stack corruption issues.
Where would developers read more about SystemTap? Is there much documentation on how to put it to practical use?
There is a lot of documentation for SystemTap we have been trying to make documentation, tutorials, and examples easily available. You can find links to this material at:
http://sourceware.org/systemtap/documentation.html
What do you do when you're not hacking on SystemTap or other tools?
I am a biking fool (not to be confused with a foolish biker :). I love to bike, but I take care to ensure that I don't end up as a hood ornament on an oversized SUV.
Thanks for your time, Will -- we're looking forward to seeing SystemTap among the many advancements in Fedora 12 in November!