Features/CrashHandling

From FedoraProject

< Features(Difference between revisions)
Jump to: navigation, search
(Client: details about the proposed client-side design)
m (Comments: bad hr!)
 
(7 intermediate revisions by 3 users not shown)
Line 4: Line 4:
 
== Summary ==
 
== Summary ==
  
As of about Fedora 6, packages no longer include the "debuginfo" data necessary for local crash handlers to get a useful stack trace.  See:
+
# A crash handler which notifies the user when a program crashes and allows them to submit a report to the Fedora developers, and
http://fedoraproject.org/wiki/Packaging/Debuginfo and http://fedoraproject.org/wiki/StackTraces
+
# A server for collecting crash reports and mining useful data from them.
 +
 
 +
== Owner ==
 +
* Name: [none currently]
 +
 
 +
== Current status ==
 +
* Targeted release: [[Releases/{{FedoraVersion||next}} | {{FedoraVersion|long|next}} ]]
 +
* Last modified: {{date|2008-12-01}}
 +
* Percent complete: 0%
 +
 
 +
== Benefit to Fedora ==
 +
 
 +
By providing an automated mechanism for tracking application crashes, we will be able to:
 +
* see bugs earlier, and fix them earlier
 +
* see what bugs are hit most
 +
* get usage and crash data from people who are unable or unwilling to interact with bugzilla
 +
 
 +
Better crash data leads to more crash fixes, which leads to a higher-quality distribution.
 +
 
 +
== Scope ==
 +
 
 +
As of about Fedora 6, packages no longer include the "debuginfo" data necessary for local crash handlers to get a useful stack trace.  See [[Packaging/Debuginfo]] and [[StackTraces]] for details.
  
 
What we want is a system that gets information about the crash to developers in a form with complete stack trace data.
 
What we want is a system that gets information about the crash to developers in a form with complete stack trace data.
  
The plan has two parts:
+
The plan has two major parts - a crash handler which runs on the client, and a server for submitting/aggregating crash reports.
  
 
=== Client ===
 
=== Client ===
Line 41: Line 62:
 
* Get a Socorro server running in Fedora's infrastructure
 
* Get a Socorro server running in Fedora's infrastructure
 
* Point the default breakpad configuration to it (easy)
 
* Point the default breakpad configuration to it (easy)
 
=== Extra ===
 
* Run a separate kerneloops server?
 
  
 
=== Open questions ===
 
=== Open questions ===
 
* Do symbol resolution on the client or the server?
 
* Do symbol resolution on the client or the server?
 
* How to do symbol resolution? FUSE? littlebottom?
 
* How to do symbol resolution? FUSE? littlebottom?
 +
* How much backtracing can be done without debuginfo installed at the client?
 
* Tie it to smolt profiles?
 
* Tie it to smolt profiles?
 
+
* Run a separate kerneloops server?
=== Comments ===
+
 
* Why not use breakpad?
 
* Why not use breakpad?
** We don't want LD_PRELOAD everywhere.
+
** Breakpad is a library - we don't want LD_PRELOAD everywhere to magically link the library in when needed.
  
== Owner ==
+
== How To Test ==
* Name: [none currently]
+
 
+
== Current status ==
+
* Targeted release:
+
* Last modified: {{date|2008-06-09}}
+
* Percent complete: 0%
+
 
+
== Usage cases / rationale ==
+
By providing an automated mechanism for tracking application crashes, we will be able to:
+
* see bugs earlier, and fix them earlier
+
* see what bugs are hit most
+
* get usage and crash data from people who are unable or unwilling to interact with bugzilla
+
 
+
== Benefit to Fedora ==
+
Better crash data, which leads to more crash fixes, which leads to a higher-quality distribution.
+
 
+
== Scope ==
+
Infrastructure:
+
* Requires running a new server in the Fedora infrastructure.
+
Code:
+
* Requires a new crash handling agent
+
* Requires packaging the Socorro server
+
 
+
== Testing ==
+
  
 
Cause a program to crash and get a report submitted to Socorro.  Test that socorro correctly retraces it and gets enough information for a developer to identify the problem.
 
Cause a program to crash and get a report submitted to Socorro.  Test that socorro correctly retraces it and gets enough information for a developer to identify the problem.
 
== Dependencies ==
 
 
# Need to package the socorro server
 
 
== Details ==
 
 
 
== Optional ==
 
  
 
== User Experience ==
 
== User Experience ==
  
 
A program crashes.  We display a dialog or notification that the program has crashed and save a useful stack trace to a well-known location.
 
A program crashes.  We display a dialog or notification that the program has crashed and save a useful stack trace to a well-known location.
 +
 +
== Dependencies ==
  
 
== Contingency plan ==
 
== Contingency plan ==
Line 106: Line 93:
 
== Release Notes ==
 
== Release Notes ==
  
We will want to explain to developers of Free programs how to find crash dumps.
+
(We will want to explain to developers of Free programs how to find crash dumps.)
  
 
== Comments ==
 
== Comments ==
  
----
+
* See [[Talk:Features/CrashHandling]]
 +
* New development continues here [[Features/CrashCatcher]]
  
 
[[Category:FeaturePageIncomplete]]
 
[[Category:FeaturePageIncomplete]]

Latest revision as of 21:48, 26 January 2009

Contents

[edit] Handling program crashes in Fedora

[edit] Summary

  1. A crash handler which notifies the user when a program crashes and allows them to submit a report to the Fedora developers, and
  2. A server for collecting crash reports and mining useful data from them.

[edit] Owner

  • Name: [none currently]

[edit] Current status

[edit] Benefit to Fedora

By providing an automated mechanism for tracking application crashes, we will be able to:

  • see bugs earlier, and fix them earlier
  • see what bugs are hit most
  • get usage and crash data from people who are unable or unwilling to interact with bugzilla

Better crash data leads to more crash fixes, which leads to a higher-quality distribution.

[edit] Scope

As of about Fedora 6, packages no longer include the "debuginfo" data necessary for local crash handlers to get a useful stack trace. See Packaging/Debuginfo and StackTraces for details.

What we want is a system that gets information about the crash to developers in a form with complete stack trace data.

The plan has two major parts - a crash handler which runs on the client, and a server for submitting/aggregating crash reports.

[edit] Client

[edit] crash-handler

A program to catch crashing programs and write out a crash report / stack trace.

  • Catching the crash is trivial using the kernel's core pattern piping support, e.g.:
    • echo '|/usr/sbin/crash-handler --pid %p --rlimit %c' > /proc/sys/kernel/core_pattern
  • Write crashes to a (configurable) standard location, such as /var/crash
  • This crash handler should be able to produce Breakpad minidumps

[edit] crash-watcher

A small daemon to:

  • watch the crash location for new dumps
  • clean up old/unneeded dumps, based on user preferences (maximum age/disk space/etc.)

When a new dump is found, send notifications to the user allowing them to:

  • Send a report (iff the binary was provided by Fedora)
    • Optional "Always send report automatically" checkbox
  • Ignore further crashes of that program
  • Ignore all further crashes

[edit] crash-submitter

Sends minidumps to the server to be retraced. Package-x-generic-16.pngbug-buddy might work for this.

  • Submit report to Socorro server (or similar)
    • Configured to use Fedora server by default, but allow user to set their own server
      • Future work: allow per-package overrides (so GNOME dumps go to GNOME, etc)
  • Save UUID for that report somewhere, as with Package-x-generic-16.pngkerneloops

[edit] Server

  • Get a Socorro server running in Fedora's infrastructure
  • Point the default breakpad configuration to it (easy)

[edit] Open questions

  • Do symbol resolution on the client or the server?
  • How to do symbol resolution? FUSE? littlebottom?
  • How much backtracing can be done without debuginfo installed at the client?
  • Tie it to smolt profiles?
  • Run a separate kerneloops server?
  • Why not use breakpad?
    • Breakpad is a library - we don't want LD_PRELOAD everywhere to magically link the library in when needed.

[edit] How To Test

Cause a program to crash and get a report submitted to Socorro. Test that socorro correctly retraces it and gets enough information for a developer to identify the problem.

[edit] User Experience

A program crashes. We display a dialog or notification that the program has crashed and save a useful stack trace to a well-known location.

[edit] Dependencies

[edit] Contingency plan

  1. Don't enable the agent
  2. Don't ship the agent
  3. Reinvestigate other options such as Apport.

[edit] Documentation

Some simple documentation on how to enable and disable the crash reporting, and how to make it happen automatically.

[edit] Release Notes

(We will want to explain to developers of Free programs how to find crash dumps.)

[edit] Comments