Features/CrashHandling

From FedoraProject

< Features(Difference between revisions)
Jump to: navigation, search
(Summary: - add client-side info)
m (Comments: bad hr!)
 
(19 intermediate revisions by 4 users not shown)
Line 4: Line 4:
 
== Summary ==
 
== Summary ==
  
As of about Fedora 6, packages no longer include the "debuginfo" data necessary for local crash handlers to get a useful stack trace.  See:
+
# A crash handler which notifies the user when a program crashes and allows them to submit a report to the Fedora developers, and
http://fedoraproject.org/wiki/Packaging/Debuginfo and http://fedoraproject.org/wiki/StackTraces
+
# A server for collecting crash reports and mining useful data from them.
 
+
What we want is a system that gets information about the crash to developers in a form with complete stack trace data.  There are several
+
options for this.  First, the Apport system developed by Ubuntu.  See the old [[Features/Apport]] feature page.
+
 
+
[http://www.redhat.com/archives/rhl-devel-list/2008-June/msg01250.html A discussion] on fedora-devel-list came to the conclusion that the Apport system as designed won't work for Fedora because it sends the complete core dump over the network.
+
 
+
Another option (currently used by GNOME upstream) is [http://code.google.com/p/google-breakpad/ Breakpad] and [http://code.google.com/p/socorro/ Socorro].
+
 
+
The plan has two parts:
+
 
+
== Client ==
+
* Create a program to catch crashing programs and write out a crash report / stack trace
+
** This should be able to produce Breakpad reports, among other output formats
+
* Notify the user when a program crashes, and allow them to
+
** Save the crash data and create a report
+
** Ignore further crashes of that program
+
** Ignore all further crashes
+
 
+
== Server ==
+
* Get a Socorro server running in Fedora's infrastructure
+
* Point the default breakpad configuration to it (easy)
+
  
 
== Owner ==
 
== Owner ==
Line 32: Line 11:
  
 
== Current status ==
 
== Current status ==
* Targeted release:
+
* Targeted release: [[Releases/{{FedoraVersion||next}} | {{FedoraVersion|long|next}} ]]
* Last modified: [[Date(2008-06-09)]] 
+
* Last modified: {{date|2008-12-01}}
 
* Percent complete: 0%
 
* Percent complete: 0%
 
== Usage cases / rationale ==
 
* See summary
 
  
 
== Benefit to Fedora ==
 
== Benefit to Fedora ==
* See summary
+
 
 +
By providing an automated mechanism for tracking application crashes, we will be able to:
 +
* see bugs earlier, and fix them earlier
 +
* see what bugs are hit most
 +
* get usage and crash data from people who are unable or unwilling to interact with bugzilla
 +
 
 +
Better crash data leads to more crash fixes, which leads to a higher-quality distribution.
  
 
== Scope ==
 
== Scope ==
  
Requires running a new server in the Fedora infrastructure.   
+
As of about Fedora 6, packages no longer include the "debuginfo" data necessary for local crash handlers to get a useful stack traceSee [[Packaging/Debuginfo]] and [[StackTraces]] for details.
  
== Testing ==
+
What we want is a system that gets information about the crash to developers in a form with complete stack trace data.
  
Cause a program to crash and get a report submitted to Socorro.  Test that socorro correctly retraces it and gets enough information for a developer to identify the problem.
+
The plan has two major parts - a crash handler which runs on the client, and a server for submitting/aggregating crash reports.
  
== Dependencies ==
+
=== Client ===
 +
==== crash-handler ====
 +
A program to catch crashing programs and write out a crash report / stack trace.
 +
* Catching the crash is trivial using the kernel's core pattern piping support, e.g.:
 +
** <code>echo '|/usr/sbin/crash-handler --pid %p --rlimit %c' > /proc/sys/kernel/core_pattern</code>
 +
* Write crashes to a (configurable) standard location, such as <code>/var/crash</code>
 +
* This crash handler should be able to produce [http://code.google.com/p/google-breakpad/wiki/ClientDesign Breakpad] minidumps
 +
** The same output format is used by GNOME (in {{package|bug-buddy}}) and {{package|firefox}}.
  
1. None that aren't in Fedora client side. 
+
==== crash-watcher ====
2. Need to package the socorro server
+
A small daemon to:
 +
* watch the crash location for new dumps
 +
* clean up old/unneeded dumps, based on user preferences (maximum age/disk space/etc.)
  
== Details ==
+
When a new dump is found, send notifications to the user allowing them to:
 +
* Send a report (''iff'' the binary was provided by Fedora)
 +
** Optional "Always send report automatically" checkbox
 +
* Ignore further crashes of that program
 +
* Ignore all further crashes
  
 +
==== crash-submitter ====
 +
Sends minidumps to the server to be retraced. {{package|bug-buddy}} might work for this.
 +
* Submit report to Socorro server (or similar)
 +
** Configured to use Fedora server by default, but allow user to set their own server
 +
*** Future work: allow per-package overrides (so GNOME dumps go to GNOME, etc)
 +
* Save UUID for that report somewhere, as with {{package|kerneloops}}
  
== Optional ==
+
=== Server ===
 +
* Get a Socorro server running in Fedora's infrastructure
 +
* Point the default breakpad configuration to it (easy)
 +
 
 +
=== Open questions ===
 +
* Do symbol resolution on the client or the server?
 +
* How to do symbol resolution? FUSE? littlebottom?
 +
* How much backtracing can be done without debuginfo installed at the client?
 +
* Tie it to smolt profiles?
 +
* Run a separate kerneloops server?
 +
* Why not use breakpad?
 +
** Breakpad is a library - we don't want LD_PRELOAD everywhere to magically link the library in when needed.
 +
 
 +
== How To Test ==
 +
 
 +
Cause a program to crash and get a report submitted to Socorro.  Test that socorro correctly retraces it and gets enough information for a developer to identify the problem.
  
 
== User Experience ==
 
== User Experience ==
  
A program crashes.  We display a dialog or notification that the  
+
A program crashes.  We display a dialog or notification that the program has crashed and save a useful stack trace to a well-known location.
  
== Contingency plan ==
+
== Dependencies ==
  
If this plan fails for some unforseen reason, we can reinvestigate other options such as Apport.
+
== Contingency plan ==
 +
# Don't enable the agent
 +
# Don't ship the agent
 +
# Reinvestigate other options such as Apport.
  
 
== Documentation ==
 
== Documentation ==
  
None needed.
+
Some simple documentation on how to enable and disable the crash reporting, and how to make it happen automatically.
  
 
== Release Notes ==
 
== Release Notes ==
  
We will want to explain to developers of Free programs how to find crash dumps.
+
(We will want to explain to developers of Free programs how to find crash dumps.)
  
 
== Comments ==
 
== Comments ==
  
----
+
* See [[Talk:Features/CrashHandling]]
 +
* New development continues here [[Features/CrashCatcher]]
  
 
[[Category:FeaturePageIncomplete]]
 
[[Category:FeaturePageIncomplete]]

Latest revision as of 21:48, 26 January 2009

Contents

[edit] Handling program crashes in Fedora

[edit] Summary

  1. A crash handler which notifies the user when a program crashes and allows them to submit a report to the Fedora developers, and
  2. A server for collecting crash reports and mining useful data from them.

[edit] Owner

  • Name: [none currently]

[edit] Current status

[edit] Benefit to Fedora

By providing an automated mechanism for tracking application crashes, we will be able to:

  • see bugs earlier, and fix them earlier
  • see what bugs are hit most
  • get usage and crash data from people who are unable or unwilling to interact with bugzilla

Better crash data leads to more crash fixes, which leads to a higher-quality distribution.

[edit] Scope

As of about Fedora 6, packages no longer include the "debuginfo" data necessary for local crash handlers to get a useful stack trace. See Packaging/Debuginfo and StackTraces for details.

What we want is a system that gets information about the crash to developers in a form with complete stack trace data.

The plan has two major parts - a crash handler which runs on the client, and a server for submitting/aggregating crash reports.

[edit] Client

[edit] crash-handler

A program to catch crashing programs and write out a crash report / stack trace.

  • Catching the crash is trivial using the kernel's core pattern piping support, e.g.:
    • echo '|/usr/sbin/crash-handler --pid %p --rlimit %c' > /proc/sys/kernel/core_pattern
  • Write crashes to a (configurable) standard location, such as /var/crash
  • This crash handler should be able to produce Breakpad minidumps

[edit] crash-watcher

A small daemon to:

  • watch the crash location for new dumps
  • clean up old/unneeded dumps, based on user preferences (maximum age/disk space/etc.)

When a new dump is found, send notifications to the user allowing them to:

  • Send a report (iff the binary was provided by Fedora)
    • Optional "Always send report automatically" checkbox
  • Ignore further crashes of that program
  • Ignore all further crashes

[edit] crash-submitter

Sends minidumps to the server to be retraced. Package-x-generic-16.pngbug-buddy might work for this.

  • Submit report to Socorro server (or similar)
    • Configured to use Fedora server by default, but allow user to set their own server
      • Future work: allow per-package overrides (so GNOME dumps go to GNOME, etc)
  • Save UUID for that report somewhere, as with Package-x-generic-16.pngkerneloops

[edit] Server

  • Get a Socorro server running in Fedora's infrastructure
  • Point the default breakpad configuration to it (easy)

[edit] Open questions

  • Do symbol resolution on the client or the server?
  • How to do symbol resolution? FUSE? littlebottom?
  • How much backtracing can be done without debuginfo installed at the client?
  • Tie it to smolt profiles?
  • Run a separate kerneloops server?
  • Why not use breakpad?
    • Breakpad is a library - we don't want LD_PRELOAD everywhere to magically link the library in when needed.

[edit] How To Test

Cause a program to crash and get a report submitted to Socorro. Test that socorro correctly retraces it and gets enough information for a developer to identify the problem.

[edit] User Experience

A program crashes. We display a dialog or notification that the program has crashed and save a useful stack trace to a well-known location.

[edit] Dependencies

[edit] Contingency plan

  1. Don't enable the agent
  2. Don't ship the agent
  3. Reinvestigate other options such as Apport.

[edit] Documentation

Some simple documentation on how to enable and disable the crash reporting, and how to make it happen automatically.

[edit] Release Notes

(We will want to explain to developers of Free programs how to find crash dumps.)

[edit] Comments