Features/SystemtapTracingRefresh

= Systemtap Tracing Refresh =

Summary
New and improved systemtap (1.0 release) with much better documentation, examples and tools. Updated to take advantage of new features integrated into Fedora 12, gcc debuginfo (variable tracking assignment) output, kernel 2.6.31 (tracepoints), Eclipse GUI. And providing a static user space marker implementation to be used by developers wanting to expose high level tracing events in their applications (already used for postgresql and java in Fedora 12, with more application integration scheduled for Fedora 13).

Owner

 * Name: Mark Wielaard


 * email: mjw@redhat.com

Current status

 * Targeted release: Fedora 12
 * Last updated: Sep 29 2009
 * Percentage of completion: 100%


 * New upstream 1.0 that has been tested against new kernel (2.6.31), gcc (with vta enabled) and elfutils (0.143) in rawhide. Has been pushed as systemtap-1.0-fc12 for Beta.
 * Only smaller bugfixes might be added from now on (tracking kernel rebuild with gcc-vta enabled and rhbz#521991, and some issue with nfs probing when that happens sw#10678)

Detailed Description
By packaging a new version of systemtap, that is tuned for updated gcc debuginfo (dwarf variable tracking assignments) output, kernel (2.6.31) tracepoints, better examples, tools and development extensions that enable programmers to embed static probe markers in their sources Fedora users will be able to have much better observability of their whole system.

Benefit to Fedora
It will be easier for developers and users to observe what is really happening on their system.

Scope
Most of this work has been done upstream and by coordinating with the gcc and kernel maintainers for better debuginfo output and more tracepoints. Specific improvements that will be delivered through this feature are:


 * Better support for F12 GCC (mainly the new, improved, more compact and more accurate - variable tracking assignment) debuginfo support. This is basically done now except for some bug fixing here and there. And needs coordination with elfutils for new release.


 * Better support for F12 Kernel (2.6.31). The test results already look pretty OK. We are still bug hunting some stuff, but I don't believe anything really nasty is blocking.


 * Better kernel tracepoint support, with thanks to Will lots more documentation on the various tracepoints in the kernel.


 * Support for the unified kernel trace buffer.


 * Lots of pases are faster now. Especially helping those wanting to get a quick list of probes that can be set.


 * Simple GUI client to visualize some standard tracing/profiling issues. Is it enabled in the standard rpm spec package build already?


 * Eclipse Systemtap plugin work.


 * User space backtracing and more accurate kernel dwarf unwinder. We did provide that in an update for F11, but it is more robust now. Although there is some work to be done to make it really easy and reliable/intuitive to use.


 * Dwarfless syscall probing for kernel-debuginfo-less setups.


 * Module signing and a refresh of the systemtap-client-server setup (Dave is writing a whitepaper about it).


 * User space probing is much more robust and accurate (especially in the face of prelinking, separate debuginfo and 32-on-64 executable quirks). Support for symbol aliases, etc. Although some more testing and bugfixing is needed (especially versus c++ stuff). (Mark is writing a whitepaper about it)


 * Experimental utrace/kprobe sdt support. This is the basis for the next step Systemtap Static Probes Feature.


 * More tapset functions, more examples, more documentation, some stap language constructs extensions...

This feature will also be the basis for adding more static probing to fedora packages in general. Some packages already have those enabled (java, postgresql) and we will coordinate with those maintainers to make those probes work seamlessly with the other systemtap improvements. Integration of more static probes to other packages is outside the scope of this feature though. That will be done through the Systemtap Static Probes Feature.

How To Test
Whether systemtap and the kernel or a user space application are working in general can be tested by installing systemtap, and the kernel-debuginfo and/or the application debuginfo. There is also the systemtap-testsuite package. Installing that and running sudo make installcheck in /usr/share/systemtap/testsuite gives an overview of how well tracing is working in general on the system.

TODO Add specific examples of interesting traces of kernel and apllications and them to a testing page listing:
 * Package install instructions.
 * Setup and sample run of the application
 * A reference to the probes and systemtap tapset functions used.
 * And an simple example stap invocation listing markers that can be enabled.
 * Doing the same through the simple gui and/or eclipse plugin.

Question: Is there a convention/template for adding such test pages for test days?

Answer: See test day proposal process at QA/Test_Days/Create - jlaska 12:07, 20 July 2009 (UTC)

User Experience
When installing debuginfo for packages users will be able to trace on a low level what those applications (and/or the kernel) is doing.

Dependencies
Needs some coordination with gcc (to sync on debuginfo improvements), elfutils (for some new features taking advantage of the gcc debuginfo improvements) and kernel (for new tracepoints included). All this is being done already upstream, for fedora we just need to make sure the latest versions are packaged. For packages that already have static markers enabled (java and postgresql) some testing of the results between package updates will be necessary to make sure the user experience is as smooth as can be.

Contingency Plan
Some of the features listed in the scope might not be fully completed, but that just means less functionality for the end user to observe certain behaviour is limited. Except for the risk that systemtap works less well than desirable there is no impact on other packages.

Documentation

 * TODO expand (ref whitepapers?)

The upstream website has lots of documentation and examples.

Release Notes

 * TODO expand with more specifics

Systemtap has been extended to better support user space tracing, kernel tracepoints, take advantage of modern gcc debuginfo (dwarf) output and providing a static user space marker implementation to be used by developers wanting to expose high level tracing events in their applications. This enables users, developers and administrators a low level overview of what is going on with their kernel or deep down in a specific program or subsystem.

Systemtap comes with a tutorial, a language reference manual, a tapsets reference and an examples directory under /usr/share/doc/systemtap-?.?/

Comments and Discussion

 * See Talk:Features/SystemtapTracingRefresh