From Fedora Project Wiki

(→‎Step 1: Configuring Kdump: Updated the doc to use the new crashkernel syntax)
(Recommend crashkernel=auto)
 
(20 intermediate revisions by 11 users not shown)
Line 1: Line 1:
 
= Kernel and kdump =
 
= Kernel and kdump =
  
Kdump is a kernel crash dumping mechanism and is very reliable because the
+
[https://www.kernel.org/doc/Documentation/kdump/kdump.txt Kdump] is a kernel crash dumping mechanism. It is very reliable because the crash dump is captured from the context of a freshly booted kernel and not from the context of the crashed kernel. Kdump uses kexec to boot into a second kernel whenever system crashes. This second kernel, often called the capture kernel, boots with very little memory and captures the dump image.
crash dump is captured from the context of a freshly booted kernel and not
 
from the context of the crashed kernel. Kdump uses kexec to boot into
 
a second kernel whenever system crashes. This second kernel, often called
 
the crash kernel, boots with very little memory and captures the dump image.
 
  
The first kernel reserves a section of memory that the second kernel uses
+
The first kernel reserves a section of memory that the capture kernel uses to boot. Kexec enables booting the capture kernel without going through the BIOS, so contents of the first kernel's memory are preserved, which is essentially the kernel crash dump.
to boot. Kexec enables booting the capture kernel without going through the
 
BIOS, so contents of the first kernel's memory are preserved, which is
 
essentially the kernel crash dump.
 
  
 
== How to Use Kdump ==
 
== How to Use Kdump ==
Line 16: Line 9:
 
=== Step 1: Configuring Kdump ===
 
=== Step 1: Configuring Kdump ===
  
# First, install the kexec-tools, crash and kernel-debuginfo packages. Use following command line to install the packages.
+
# First, install the `kexec-tools`, `crash` and `kernel-debuginfo` packages using the following command line.
#: <pre>yum install kexec-tools crash kernel-debuginfo</pre>
+
#: <pre>dnf install --enablerepo=fedora-debuginfo --enablerepo=updates-debuginfo kexec-tools crash kernel-debuginfo</pre>
#:
+
#: NOTE: The `crash` and `kernel-debuginfo` packages are only required to examine the resulting kernel dump file.  If you are setting up kdump on a machine simply to capture a dump file that will be analyzed by someone else or on a different machine, you can skip those packages.
# Next, edit /etc/grub.conf and add the "crashkernel=128M" command line option.  An example command line might look like this:
+
# Next, edit {{filename|/etc/default/grub}} and add the `crashkernel=auto` command line option to `GRUB_CMDLINE_LINUX`The result might look like this:
#: <pre>kernel /vmlinuz-2.6.29.5-191.fc11.x86_64 ro root=/dev/VolGroup00/LogVol00 rhgb console=tty0 console=ttyS0,115200 crashkernel=64M@16M"</pre>
+
#: <pre>GRUB_CMDLINE_LINUX="rd.luks.uuid=luks-123e4567-e89b-12d3-a456-426614174000 rhgb quiet crashkernel=auto"</pre>
#:
+
# Update the GRUB configuration file. For a UEFI installation of Fedora, run the following command.
# Next, consider editing /etc/kdump.conf.  When the kdump service starts it creates an initramfs for use with the crash kernel.  In the defalult configuration
+
#: <pre>grub2-mkconfig -o /etc/grub2-efi.cfg</pre>
#: kdump creates an initramfs that finds and mounts the root file system, pivots to it and runs /sbin/init, which beings the normal boot process. While this is
+
#: NOTE: For a BIOS installation of Fedora, replace {{filename|/etc/grub2-efi.cfg}} with {{filename|/etc/grub2.cfg}}.
#: Functional, it is somewhat limiting, in that it mandates the vmcore be saved to a local file filesystem, and also implies the starting of all the other system services
+
# Optionally, edit the kdump configuration file at {{filename|/etc/kdump.conf}}.  This will allow you to write the dump file over the network or to a location on the local system other than /var/crash.  For additional information, consult the mkdumprd man page and the comments in /etc/kdump.conf.
#: which can cause problems normally associated with operating in low memory environments (remember the system is acting here like you are running with only 128M of available RAM)
+
# Next, activate the kdump system service at startup using the following the command.
#: by using /etc/kdump.conf, the kdump service will attempt to capture the vmcore file from the initramfs, which avoids starting all those unneeded services, and allows you
+
#: <pre>systemctl enable kdump.service</pre>
#: access to other kdump features, such as saving the vmcore to an ssh server, nfs share, raw disk, etc.  See them kdump.conf man page for settings details.
+
# Finally, reboot your system.
#:
 
# Next, reboot your system
 
# Finally, active the kdump system service
 
#: <pre>/sbin/chkconfig kdump on</pre>
 
#: <pre>/sbin/service kdump start</pre>
 
  
 +
Considerations:
  
Notes:
+
# The `crashkernel=auto` command line option causes the kernel to calculate how much physical memory to reserve for preloading and running the capture kernel. You can specify how much memory to reserve by using a parameter such as `crashkernel=256M` instead.
 
+
# kdump.service takes care of pre-loading the capture kernel at system boot time.
# Above shown parameter reserves 128MB of physical memory. This reserved memory is used to preload and run the capture kernel.
+
# It is recommended to either set up a serial console or switch to run level 3 (init 3) for testing purposes. The reason is that kdump does not reset the console if you are in X or framebuffer mode, and no message might be visible on console after system crash. You may also see screen corruption in graphics mode during capture.
# Init scripts take care of pre-loading the capture kernel at system boot time.
+
# Capturing a crash dump can take a long time, especially if the system has a lot of memory. Be patient. The system will reboot after the dump is captured.
# It is recommended to either set up a serial console or switch to run level 3 (init 3) for testing purposes. The reason being that kdump does not reset the console if you are in X or framebuffer mode, and no message might be visible on console after system crash.
 
  
 
=== Step 2: Capturing the Dump ===
 
=== Step 2: Capturing the Dump ===
Line 47: Line 35:
 
ways.
 
ways.
  
# Trigger through /proc interface  
+
# Enable [[QA/Sysrq|SysRq]] then trigger a panic through <code>/proc</code> interface  
 +
#: <pre>echo 1 > /proc/sys/kernel/sysrq</pre>
 
#: <pre>echo c > /proc/sysrq-trigger</pre>
 
#: <pre>echo c > /proc/sysrq-trigger</pre>
 
# Trigger by inserting a module which calls panic().
 
# Trigger by inserting a module which calls panic().
Line 65: Line 54:
  
 
For more information on using the <code>crash</code> tool, see [[#More Documentation]].
 
For more information on using the <code>crash</code> tool, see [[#More Documentation]].
 +
 +
== On versions ==
 +
 +
The versions of <code>kexec-tools</code> and <code>crash</code> can be very reliant on the version of kernel running.  On Fedora, from time-to-time the package versions can get out of sync and can lead to partially working crash dumps.  This may manifest as warning messages from <code>crash</code> such as
 +
 +
<pre>
 +
page excluded: kernel virtual address: ffff.........9d28  type: "..."
 +
</pre>
 +
 +
If you want to know specifically what versions are supported, you can examine the <code>srpm</code> for the version of <code>kexec-tools</code> you are running, in particular <code>makedumpfile.h</code> will have something like
 +
 +
<pre>
 +
#define OLDEST_VERSION          KERNEL_VERSION(2, 6, 15)/* linux-2.6.15 */
 +
#define LATEST_VERSION          KERNEL_VERSION(4, 5, 3)/* linux-4.5.3 */
 +
</pre>
 +
 +
If you run <code>makedumpfile</code> against an unsupported kernel version it will probably still mostly work.  It will output an error message to the console, but it can be easy to miss in the <code>kexec</code> output.
 +
 +
If the dump is behaving unexpectedly you can modify <code>kdump.conf</code> to <i>not</i> filter any pages (perhaps except zero-filled pages with <code>-d1</code>) and only use it to compress the <code>vmcore</code> with <code>-c</code>.  This might result in a more useful <code>vmcore</code>.  If that fails, you could take <code>makedumpfile</code> out of the picture entirely by change the <code>core_collector</code> in <code>kdump.conf</code> to <code>scp</code>, which will simply copy <code>/proc/vmcore</code> to a permanent location.
 +
 +
If having further issues, you may also try building the latest <code>crash</code> tool from source.  If you are at the point of debugging kernel crash dumps you can probably figure it out :)  You might want to try something like:
 +
 +
<pre>
 +
$ sudo dnf builddep crash # quick way to get the right libraries
 +
$ git clone https://github.com/crash-utility/crash.git
 +
$ cd crash
 +
$ make lzo # don't forget the lzo if you're using compressed dumps
 +
</pre>
  
 
== More Documentation ==
 
== More Documentation ==
Line 71: Line 88:
 
* http://lse.sourceforge.net/kdump/
 
* http://lse.sourceforge.net/kdump/
 
* Using crash - http://people.redhat.com/anderson
 
* Using crash - http://people.redhat.com/anderson
 +
* https://www.golinuxhub.com/2018/08/how-to-configure-and-install-kdump-rhel7-crashkernel.html
  
[[Category:Debugging]]
+
[[Category:Debugging|K]]

Latest revision as of 12:31, 16 April 2021

Kernel and kdump

Kdump is a kernel crash dumping mechanism. It is very reliable because the crash dump is captured from the context of a freshly booted kernel and not from the context of the crashed kernel. Kdump uses kexec to boot into a second kernel whenever system crashes. This second kernel, often called the capture kernel, boots with very little memory and captures the dump image.

The first kernel reserves a section of memory that the capture kernel uses to boot. Kexec enables booting the capture kernel without going through the BIOS, so contents of the first kernel's memory are preserved, which is essentially the kernel crash dump.

How to Use Kdump

Step 1: Configuring Kdump

  1. First, install the kexec-tools, crash and kernel-debuginfo packages using the following command line.
    dnf install --enablerepo=fedora-debuginfo --enablerepo=updates-debuginfo kexec-tools crash kernel-debuginfo
    NOTE: The crash and kernel-debuginfo packages are only required to examine the resulting kernel dump file. If you are setting up kdump on a machine simply to capture a dump file that will be analyzed by someone else or on a different machine, you can skip those packages.
  2. Next, edit /etc/default/grub and add the crashkernel=auto command line option to GRUB_CMDLINE_LINUX. The result might look like this:
    GRUB_CMDLINE_LINUX="rd.luks.uuid=luks-123e4567-e89b-12d3-a456-426614174000 rhgb quiet crashkernel=auto"
  3. Update the GRUB configuration file. For a UEFI installation of Fedora, run the following command.
    grub2-mkconfig -o /etc/grub2-efi.cfg
    NOTE: For a BIOS installation of Fedora, replace /etc/grub2-efi.cfg with /etc/grub2.cfg.
  4. Optionally, edit the kdump configuration file at /etc/kdump.conf. This will allow you to write the dump file over the network or to a location on the local system other than /var/crash. For additional information, consult the mkdumprd man page and the comments in /etc/kdump.conf.
  5. Next, activate the kdump system service at startup using the following the command.
    systemctl enable kdump.service
  6. Finally, reboot your system.

Considerations:

  1. The crashkernel=auto command line option causes the kernel to calculate how much physical memory to reserve for preloading and running the capture kernel. You can specify how much memory to reserve by using a parameter such as crashkernel=256M instead.
  2. kdump.service takes care of pre-loading the capture kernel at system boot time.
  3. It is recommended to either set up a serial console or switch to run level 3 (init 3) for testing purposes. The reason is that kdump does not reset the console if you are in X or framebuffer mode, and no message might be visible on console after system crash. You may also see screen corruption in graphics mode during capture.
  4. Capturing a crash dump can take a long time, especially if the system has a lot of memory. Be patient. The system will reboot after the dump is captured.

Step 2: Capturing the Dump

Normally kernel panic() will trigger booting into capture kernel but for testing purposes one can simulate the trigger in one of the following ways.

  1. Enable SysRq then trigger a panic through /proc interface
    echo 1 > /proc/sys/kernel/sysrq
    echo c > /proc/sysrq-trigger
  2. Trigger by inserting a module which calls panic().

The system will boot into the capture kernel. A kernel dump will be automatically saved in /var/crash/<dumpdir> and the system will boot back into the regular kernel. The name of the dump directory will depend on date and time of crash. For example, /var/crash/2006-02-17-17:02/vmcore.

Step 3: Dump Analysis

Once the system has returned from recovering the crash, you may wish to analyse the kernel dump file using the crash tool.

  1. First, locate the recent vmcore dump file:
    find /var/crash -type f -mtime -1
  2. One you have located a vmcore dump file, call crash:
    crash /var/crash/2009-07-17-10\:36/vmcore /usr/lib/debug/lib/modules/`uname -r`/vmlinux
Note.png
Missing debuginfo?
Cannot find any files under /usr/lib/debug? Make sure you have the kernel-debuginfo package installed.

For more information on using the crash tool, see #More Documentation.

On versions

The versions of kexec-tools and crash can be very reliant on the version of kernel running. On Fedora, from time-to-time the package versions can get out of sync and can lead to partially working crash dumps. This may manifest as warning messages from crash such as

page excluded: kernel virtual address: ffff.........9d28  type: "..."

If you want to know specifically what versions are supported, you can examine the srpm for the version of kexec-tools you are running, in particular makedumpfile.h will have something like

 #define OLDEST_VERSION          KERNEL_VERSION(2, 6, 15)/* linux-2.6.15 */
 #define LATEST_VERSION          KERNEL_VERSION(4, 5, 3)/* linux-4.5.3 */

If you run makedumpfile against an unsupported kernel version it will probably still mostly work. It will output an error message to the console, but it can be easy to miss in the kexec output.

If the dump is behaving unexpectedly you can modify kdump.conf to not filter any pages (perhaps except zero-filled pages with -d1) and only use it to compress the vmcore with -c. This might result in a more useful vmcore. If that fails, you could take makedumpfile out of the picture entirely by change the core_collector in kdump.conf to scp, which will simply copy /proc/vmcore to a permanent location.

If having further issues, you may also try building the latest crash tool from source. If you are at the point of debugging kernel crash dumps you can probably figure it out :) You might want to try something like:

$ sudo dnf builddep crash # quick way to get the right libraries
$ git clone https://github.com/crash-utility/crash.git
$ cd crash
$ make lzo # don't forget the lzo if you're using compressed dumps

More Documentation