Common kernel problems

From FedoraProject

(Difference between revisions)
Jump to: navigation, search
(How to set module options for boot drivers)
(How to set module options for boot drivers: altered reference to modprobe.conf to modprobe.d/)
 
(19 intermediate revisions by 5 users not shown)
Line 1: Line 1:
{{Needs love}}
 
 
 
This page documents common problems with the [http://www.kernel.org/ Linux kernel] in Fedora.
 
This page documents common problems with the [http://www.kernel.org/ Linux kernel] in Fedora.
  
 
== How to set kernel boot options ==
 
== How to set kernel boot options ==
Kernel boot options are contained in the file <code>/etc/grub.conf</code>. Each installed kernel has a group of lines called a stanza describing what kernel to boot, where to find the root file system, the name of the initrd to load, and additional kernel options. A typical stanza looks something like this:
+
Kernel boot options are contained in the file <code>/boot/grub/grub.conf</code>. Each installed kernel has a group of lines called a stanza describing:
 +
* the title of the operative system to load
 +
* where to find the boot partition (in grub named ''root''!)
 +
* what kernel (vmlinuz-*) to boot, with additional kernel options
 +
* the name of the initrd to load
 +
 
 +
A typical stanza looks something like this:
 
<pre>
 
<pre>
title Fedora Core (2.6.22.9-61.fc6)
+
title Fedora 13 (2.6.33.5-124.fc13.i686.PAE)
root (hd0,1)
+
        root (hd1,7)
kernel /vmlinuz-2.6.22.9-61.fc6 ro root=LABEL=/ rhgb quiet
+
        kernel /vmlinuz-2.6.33.5-124.fc13.i686.PAE ro root=/dev/mapper/VG_f13-LV_f13_root rd_LVM_LV=VG_f13/LV_f13_root
initrd /initrd-2.6.22.9-61.fc6.img
+
        rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=us rhgb quiet
 +
        initrd /initramfs-2.6.33.5-124.fc13.i686.PAE.img
 +
 
 +
title CentOS 5 (2.6.18-194.3.1.el5)
 +
root (hd0,4)
 +
kernel /vmlinuz-2.6.18-194.3.1.el5 ro root=/dev/mapper/VG_CentOS-LV_CentOS_root
 +
        rd_LVM_LV=VG_CentOS/LV_CentOS_root rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16
 +
        KEYTABLE=us rhgb quiet
 +
initrd /initrd-2.6.18-194.3.1.el5.img
 +
 
 +
title Ubuntu 10.04 LTS
 +
root (hd0,6)
 +
chainloader (hd0,6)+1
 +
kernel /grub/core.img
 +
savedefault
 +
boot
 
</pre>
 
</pre>
Kernel options are placed at the end of the "kernel" line and are separated by spaces. When having problems, it is usually a good idea to remove the "quiet" option so that the full set of kernel messages is shown during boot. Here is an example with the "quiet" and "rhgb" options removed and some other options added:
+
In this example, we have three OO.SS: '''Fedora 13''' (boot) resides on the eighth partition of the second hard disk. (Remember, that in grub the partition and disk numbers begin from 0); CentOS on the fifth partition of the first disk and Ubuntu on the seventh partition of the first disk.
<pre>
+
 
title Fedora Core (2.6.22.9-61.fc6)
+
Kernel options are placed at the end of the ''kernel'' line and are separated by spaces. In the example:
root (hd0,1)
+
* ro: mounts root device read-only on boot
kernel /vmlinuz-2.6.22.9-61.fc6 ro root=LABEL=/ pci=nomsi,nommconf nohz=off
+
* root: root filesystem
initrd /initrd-2.6.22.9-61.fc6.img
+
* rd_LVM_LV: it activates the root filesystem in the logical volume LV_f13_root of the volume group VG_f13
</pre>
+
* rd_NO_LUKS: disables crypto LUKS detection
NOTE: The full list of kernel options is in the file /usr/share/doc/kernel-doc-<version>/Documentation/kernel-parameters.txt, which is installed with the kernel-doc package.
+
* rd_NO_MD: disables MD RAID detection
 +
* rd_NO_DM: disables DM RAID detection
 +
* LANG: is the system language, written to /etc/sysconfig/i18n in the initramfs
 +
* SYSFONT: is the console font, written to /etc/sysconfig/i18n in the initramfs
 +
* KEYTABLE: is the keytable filename, written to /etc/sysconfig/keyboard in the initramfs
 +
* rhgb: for graphical boot support
 +
* quiet: disables most log messages
 +
 
 +
For other options view also the wiki [[Dracut/Options#Dracut_kernel_command_line_parameters | Dracut kernel command line parameters]].
 +
 
 +
When having problems, it is usually a good idea to remove the ''quiet'' option so that the full set of kernel messages is shown during boot
 +
 
 +
=== Getting the Full List of Kernel Options ===
 +
{{Admon/note |
 +
The full list of kernel options is in the file ''/usr/share/doc/kernel-doc-<version>/Documentation/kernel-parameters.txt'', which is installed with the ''kernel-doc'' package.}}
  
 
== How to set module options for boot drivers ==
 
== How to set module options for boot drivers ==
Line 27: Line 60:
 
sata_nv.adma=0
 
sata_nv.adma=0
 
</pre>
 
</pre>
Alternatively, add this line to <code>/etc/modprobe.conf</code>:
+
Alternatively, add this line to a file in <code>/etc/modprobe.d/</code>:
 
<pre>
 
<pre>
 
options sata_nv adma=0
 
options sata_nv adma=0
 
</pre>
 
</pre>
To get options set in <code>/etc/modprobe.conf</code> into the initrd, run the <code>mkinitrd</code> program.  Usually this is just the command <code>mkinitrd /boot/initrd-$(uname -r).new.img $(uname -r)</code> to build a new initrd for the currently-running kernel without overwriting the exisitng one. (See <code>man mkinitrd</code> for help on additional options.) To test the new initrd, reboot the system and use the command line editing facilities to change the name of the initrd. Or, create a new stanza in the <code>/etc/grub.conf</code> file something like this (see above for the original):
+
To get options set in <code>/etc/modprobe.d/*</code> into the initrd, run the <code>mkinitrd</code> program.  Usually this is just the command <code>mkinitrd /boot/initrd-$(uname -r).new.img $(uname -r)</code> to build a new initrd for the currently-running kernel without overwriting the exisitng one. (See <code>man mkinitrd</code> for help on additional options.) To test the new initrd, reboot the system and use the command line editing facilities to change the name of the initrd. Or, create a new stanza in the <code>/etc/grub.conf</code> file something like this (see above for the original):
 
<pre>
 
<pre>
 
title Fedora Core [with new initrd]  (2.6.29-0.215.rc7.fc11.i586)
 
title Fedora Core [with new initrd]  (2.6.29-0.215.rc7.fc11.i586)
Line 39: Line 72:
 
</pre>
 
</pre>
 
This will let you boot with either the new or the old initrd by pressing the up arrow / down arrow keys on the very first boot screen. Once everything is tested, remove the original initrd and rename the new one to the same name as the old one, then remove the "[with new initrd] " stanza from <code>/etc/grub.conf</code>.
 
This will let you boot with either the new or the old initrd by pressing the up arrow / down arrow keys on the very first boot screen. Once everything is tested, remove the original initrd and rename the new one to the same name as the old one, then remove the "[with new initrd] " stanza from <code>/etc/grub.conf</code>.
 +
 +
== How to ensure you always have a working kernel installed ==
 +
=== Keeping more than the default number of kernels installed ===
 +
If you are having problems with some update kernels, you may want to increase the maximum number of kernels that yum will leave installed. Edit <code> /etc/yum.conf </code> and change the number on the <code> installonly_limit </code> line to do that. Note that you will need enough free space in /boot to keep the extra kernels installed.
 +
=== Uninstalling older non-working kernels ===
 +
You may also erase kernels that you know are not working in order to keep older, working kernels from being uninstalled on the next update.
  
 
== Can't find root filesystem / error mounting /dev/root ==
 
== Can't find root filesystem / error mounting /dev/root ==
Line 57: Line 96:
  
 
* Checking whether or not the CapsLock key (or NumLock or ScrollLock) causes the light on the keyboard to change state can be used as an indication of whether or not the kernel has hung completely, or if there is something else going on.
 
* Checking whether or not the CapsLock key (or NumLock or ScrollLock) causes the light on the keyboard to change state can be used as an indication of whether or not the kernel has hung completely, or if there is something else going on.
* For boot related issues we need as much info as possible, so removing <code>quiet</code> from the boot flags should be the first thing to ask for.
+
* For boot related issues we need as much info as possible, so removing <code>quiet</code> <code>rhgb</code> from the boot flags should be the first thing to ask for.
 
* Slowing down the speed of text output with <code>boot_delay=1000</code> (the number may need to be tweaked higher/lower to suit) may allow the user to take a digital camera photo of the last thing on screen.
 
* Slowing down the speed of text output with <code>boot_delay=1000</code> (the number may need to be tweaked higher/lower to suit) may allow the user to take a digital camera photo of the last thing on screen.
 
* Booting with <code>vga=791</code> (or even just vga=1 if the video card won't support 791) will put the framebuffer into high resolution mode to get more lines of text on screen, allowing more context for bug analysis.
 
* Booting with <code>vga=791</code> (or even just vga=1 if the video card won't support 791) will put the framebuffer into high resolution mode to get more lines of text on screen, allowing more context for bug analysis.
Line 87: Line 126:
  
 
== Can't find installation CD/DVD or hard drives ==
 
== Can't find installation CD/DVD or hard drives ==
* Try <code>pci=nomsi,nommconf</code>. This disables PCI Message Signaled Interrupts and MMCONFIG, which are only needed by a few systems at this time.
+
* Try <code>pci=nomsi,nommconf</code>. This disables PCI Message Signaled Interrupts and MMCONFIG.
  
 
* Try booting with <code>libata.dma=1</code> [use DMA only for hard drives]  or <code>libata.dma=0</code> [do not use DMA at all] . This can at least get the system installed, then the drivers can be updated.
 
* Try booting with <code>libata.dma=1</code> [use DMA only for hard drives]  or <code>libata.dma=0</code> [do not use DMA at all] . This can at least get the system installed, then the drivers can be updated.
 +
 +
* Try the boot option <code>pci=nocrs</code> on 2.6.34 and later kernels.
 +
 +
* The option <code>pcie_aspm=off</code> may be needed by some SCSI and RAID drivers (and some network drivers as well.)
 +
 +
* Try disabling the AHCI driver by adding <code>rdblacklist=ahci</code>. This forces the generic drivers to be used, which may work, but sometimes very slowly.
  
 
== Install runs very slowly ==
 
== Install runs very slowly ==
Line 107: Line 152:
 
* Hooking up serial console / [[netconsole| ]]  can sometimes get debug info out of the machine.
 
* Hooking up serial console / [[netconsole| ]]  can sometimes get debug info out of the machine.
 
* If the hang happened whilst in X, the machine may still respond to ssh logins from other machines. Try this to get a dmesg.
 
* If the hang happened whilst in X, the machine may still respond to ssh logins from other machines. Try this to get a dmesg.
* The magic sysrq key might work. Enable it with <code>sysctl kernel.sysrq=1</code> (or put <code>kernel.sysrq = 1</code> in your <code>/etc/sysctl.conf</code>). This will allow you to hit <code>ctrl-alt-sysrq</code> plus one of the following keys to get debugging info.
+
* The magic sysrq key might work. See [[QA/Sysrq]] for details.
<pre>m will dump information about the current state of memory
+
 
+
t will dump the state of every task the kernel knows about
+
 
+
p will dump the current processor state (useful when a process
+
is looping inside the kernel
+
 
+
s will sync all data pending writeback to disk. (This is useful
+
so that this debug info actually stands a chance of hitting the
+
log files.)</pre>
+
You can also trigger magic sysrq functions by <code>echo</code>'ing the relevant one letter command to <code>/proc/sysrq-trigger</code>
+
If the machine is hanging before the initscripts get to run (which would set the above sysctl), boot with sysrq_always_enabled=1
+
  
 
* booting with <code>nmi_watchdog=2</code> may cause a backtrace to occur when the lockup happens.
 
* booting with <code>nmi_watchdog=2</code> may cause a backtrace to occur when the lockup happens.
  
== Suspend/Resume to RAM failure ==
+
== Suspend/Resume failure ==
The most common failure mode is 'black screen on resuming'.
+
The most common failure mode is 'black screen on resuming', but the system may also hang while suspending.
 
* Laptops using the nv driver should be considered hibernate-only capable as per https://www.redhat.com/archives/fedora-test-list/2007-September/msg00365.html
 
* Laptops using the nv driver should be considered hibernate-only capable as per https://www.redhat.com/archives/fedora-test-list/2007-September/msg00365.html
* Find out if the system is locked up completely by hitting the caps lock key.
+
* If the system fails to resume, see if the system is locked up completely by hitting the caps lock key.
* If the capslock light doesn't toggle, the system is completely dead. Try again, but this time before suspending, activate the pm_trace functionality with <code>echo 1 > /sys/power/pm_trace</code>. This reprograms the real time clock to contain a few bytes of information which we can use to diagnose which driver failed to resume. After the hang, reboot, boot up again. Now use the command <code> dmesg | grep "hash matches" </code> and you will get a list of matches like this: <code> hash matches device 0000:05:06.1 </code> . The last device on the list is likely the one thats causing problems. To find out which driver is causing the problem you will have to look up the driver in <code> /sys/bus/pci/drivers/ </code>. This can be done using <code> find /sys/bus/pci/drivers/ -name "0000:05:06.1" </code>. It will return a path similar to this one: <code> /sys/bus/pci/drivers/firewire_ohci/0000:05:06.1 </code> which means that the firewire_ohci driver is causing troubles. Unloading the module using <code> modprobe -r firewire_ohci </code> should fix the suspend issues. Please also note that pm_trace uses the RTC for storing the data, which will result into a wrong system clock after boot. To fix it just use system-config-date to set the correct date.
+
* If the capslock light doesn't toggle, or the failure is during suspend, try again, but this time before suspending, activate the pm_trace functionality with <code>echo 1 > /sys/power/pm_trace</code>. This reprograms the real time clock to contain a few bytes of information which we can use to diagnose which driver failed to suspend or resume. After the hang, reboot, boot up again. Now use the command <code> dmesg | grep "hash matches" </code> and you will get a list of matches like this: <code> hash matches device 0000:05:06.1 </code> . The last device on the list is likely the one thats causing problems. To find out which driver is causing the problem you will have to look up the driver in <code> /sys/bus/pci/drivers/ </code>. This can be done using <code> find /sys/bus/pci/drivers/ -name "0000:05:06.1" </code>. It will return a path similar to this one: <code> /sys/bus/pci/drivers/firewire_ohci/0000:05:06.1 </code> which means that the firewire_ohci driver is causing troubles. Unloading the module using <code> modprobe -r firewire_ohci </code> should fix the suspend issues. Please also note that pm_trace uses the RTC for storing the data, which will result into a wrong system clock after boot. To fix it just use system-config-date to set the correct date.
* If the capslock light does toggle, then the system did come back up, and it's possible that we just failed to reinitialise the video.  http://people.freedesktop.org/~hughsient/quirk may contain further useful information to diagnose this problem.  It may also be useful to initiate the suspend from a tty (<code>ctrl-alt-f1</code>) and run <code>pm-suspend ; dmesg > dmesg.out ; sync</code> by hand.  Upon resuming you'll now have some more debug info to sift through.  Additionally, this way when it resumes, you already have a console logged in from which you can type commands 'blind'. Trying <code>vbetool post</code> for example may bring things back to life.
+
* If the capslock light does toggle when resuming, then the system did come back up, and it's possible that we just failed to reinitialise the video.  It may be useful to initiate the suspend from a tty (<code>ctrl-alt-f1</code>) and run <code>pm-suspend ; dmesg > dmesg.out ; sync</code> by hand.  Upon resuming you'll now have some more debug info to sift through.  Additionally, this way when it resumes, you already have a console logged in from which you can type commands 'blind'. Trying <code>vbetool post</code> for example may bring things back to life.
 
* Proprietary 3d graphics driver users should test with respective open source drivers.
 
* Proprietary 3d graphics driver users should test with respective open source drivers.
 
* Try <code>rmmod</code>'ing various modules before doing the suspend. If this makes things work again, retry with a smaller set of modules unloaded. Keep retrying until you narrow down which module is to blame.
 
* Try <code>rmmod</code>'ing various modules before doing the suspend. If this makes things work again, retry with a smaller set of modules unloaded. Keep retrying until you narrow down which module is to blame.
Line 146: Line 179:
  
 
=== "High Definition Audio" devices ===
 
=== "High Definition Audio" devices ===
Many times the model can't be detected properly. Adding the correct model to the sound card driver's entry in /etc/modprobe.conf will force the driver to use that model, e.g. <code>options sound-card-0 model=3stack</code>. Options for this driver are documented in the file <code>/usr/share/doc/kernel-doc-<version>/Documentation/sound/alsa/ALSA-Configuration.txt</code> in the kernel-doc package.
+
Many times the model can't be detected properly. Adding the correct model to the sound card driver's entry in /etc/modprobe.d/dist.conf will force the driver to use that model, e.g. <code>options sound-card-0 model=3stack</code>. Options for this driver are documented in the file <code>/usr/share/doc/kernel-doc-<version>/Documentation/sound/alsa/ALSA-Configuration.txt</code> in the kernel-doc package.
  
 
== System hangs on reboot ==
 
== System hangs on reboot ==
Line 176: Line 209:
 
* Try the kernel parameter <code> pnpacpi=off </code>
 
* Try the kernel parameter <code> pnpacpi=off </code>
  
== Differences between the Fedora 7 and Fedora 8 kernels ==
+
== CPU stuck at the lowest frequency on ThinkPad machines ==
=== How to make Fedora 8 behave like Fedora 7 and vice versa ===
+
ThinkPad users who see their system throttled as soon as the processor module
==== In Fedora 8, USB autosuspend is enabled while it is disabled in Fedora 7 ====
+
gets loaded and without obvious reason should check the contents of this file:
* To make Fedora 8 behave like Fedora 7, add <code> usbcore.autosuspend=-1 </code> to the kernel options.
+
 
* To make Fedora 7 behave like Fedora 8, add <code> usbcore.autosuspend=2 </code> to the kernel options.
+
<code>/sys/devices/system/cpu/cpu0/cpufreq/bios_limit</code>
==== In Fedora 8, libata ACPI is enabled while it is disabled in Fedora 7 ====
+
 
* To make Fedora 8 behave like Fedora 7, add <code> noacpi=1 </code> to the libata module's options.
+
If it is set to the lowest value, you must pass <code>processor.ignore_ppc=1</code> boot parameter as a workaround.
* To make Fedora 7 behave like Fedora 8, add <code> noacpi=0 </code> to the libata module's options.
+
(See kernel.org bug #16382 for details.)
==== In Fedora 8, PCI message signaled interrupts (MSI) are enabled while they are disabled in Fedora 7 ====
+
 
* To make Fedora 8 behave like Fedora 7, add <code> pci=nomsi </code> to the kernel options.
+
== Can't decrypt drive / encryption password not accepted ==
* To make Fedora 7 behave like Fedora 8, add <code> pci=msi </code> to the kernel options.
+
Try adding the boot option <code>rdblacklist=aesni-intel</code>. You may also have to blacklist the aesni-intel driver by adding a blacklist entry in <code>/etc/modprobe.d</code>.
 +
 
 +
== Elantech trackpad not recognized as a trackpad ==
 +
You can add the option <code>psmouse.force_elantech=1</code> to force recognition. This requires at least kernel 2.6.34 to work.
 +
 
 +
== Systems with nVidia adapters using the nouveau driver lock up randomly ==
 +
Try adding the boot option <code>nouveau.noaccel=1</code>.
 +
 
 +
== Network drives using the CIFS filesystem get inconsistent data when reading files ==
 +
This is fixed in recent kernel updates, but can be worked around by adding <code>noserverino</code> to the mount options.
 +
 
 +
== PCI Devices Not Recognized / AHCI: "failed to stop engine" ==
 +
On kernel version 2.6.34 and later, ACPI is used to determine PCI resources. Some machines have bugs in their ACPI BIOS code and fail set configure resources properly. Try using <code> pci=nocrs </code> to disable use of ACPI for resource enumeration.
 +
 
 +
== Unable to Allocate Memory / page allocation failure ==
 +
Heavily-loaded network servers may have trouble allocating memory even though there is no shortage. Try setting the sysctl <code> vm.min_free_kbytes </code> to 65536 in order to keep additional memory free for allocation by network drivers.
  
 
== See Also... ==
 
== See Also... ==
  
 
* [[Common file system problems]]
 
* [[Common file system problems]]

Latest revision as of 21:24, 26 February 2012

This page documents common problems with the Linux kernel in Fedora.

Contents

[edit] How to set kernel boot options

Kernel boot options are contained in the file /boot/grub/grub.conf. Each installed kernel has a group of lines called a stanza describing:

  • the title of the operative system to load
  • where to find the boot partition (in grub named root!)
  • what kernel (vmlinuz-*) to boot, with additional kernel options
  • the name of the initrd to load

A typical stanza looks something like this:

title Fedora 13 (2.6.33.5-124.fc13.i686.PAE)
        root (hd1,7)
        kernel /vmlinuz-2.6.33.5-124.fc13.i686.PAE ro root=/dev/mapper/VG_f13-LV_f13_root rd_LVM_LV=VG_f13/LV_f13_root 
         rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=us rhgb quiet
        initrd /initramfs-2.6.33.5-124.fc13.i686.PAE.img

title CentOS 5 (2.6.18-194.3.1.el5)
	root (hd0,4)
	kernel /vmlinuz-2.6.18-194.3.1.el5 ro root=/dev/mapper/VG_CentOS-LV_CentOS_root 
         rd_LVM_LV=VG_CentOS/LV_CentOS_root rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 
         KEYTABLE=us rhgb quiet
	initrd /initrd-2.6.18-194.3.1.el5.img

title Ubuntu 10.04 LTS
	root (hd0,6)
	chainloader (hd0,6)+1
	kernel /grub/core.img
	savedefault
	boot

In this example, we have three OO.SS: Fedora 13 (boot) resides on the eighth partition of the second hard disk. (Remember, that in grub the partition and disk numbers begin from 0); CentOS on the fifth partition of the first disk and Ubuntu on the seventh partition of the first disk.

Kernel options are placed at the end of the kernel line and are separated by spaces. In the example:

  • ro: mounts root device read-only on boot
  • root: root filesystem
  • rd_LVM_LV: it activates the root filesystem in the logical volume LV_f13_root of the volume group VG_f13
  • rd_NO_LUKS: disables crypto LUKS detection
  • rd_NO_MD: disables MD RAID detection
  • rd_NO_DM: disables DM RAID detection
  • LANG: is the system language, written to /etc/sysconfig/i18n in the initramfs
  • SYSFONT: is the console font, written to /etc/sysconfig/i18n in the initramfs
  • KEYTABLE: is the keytable filename, written to /etc/sysconfig/keyboard in the initramfs
  • rhgb: for graphical boot support
  • quiet: disables most log messages

For other options view also the wiki Dracut kernel command line parameters.

When having problems, it is usually a good idea to remove the quiet option so that the full set of kernel messages is shown during boot

[edit] Getting the Full List of Kernel Options

Note.png
The full list of kernel options is in the file /usr/share/doc/kernel-doc-<version>/Documentation/kernel-parameters.txt, which is installed with the kernel-doc package.

[edit] How to set module options for boot drivers

Module options are set in the file /etc/modprobe.conf, or (with versions of module-init-tools in F10+) on the kernel command line. Drivers that are needed to boot the system are put into an initrd, and their options are copied from modprobe.conf by the mkinitrd script that builds the initrd. To change module options for those drivers, you can change the /etc/modprobe.conf file and rebuild the initrd, or alternatively (in recent releases of Fedora) you can simply append on the kernel command line.

For example, to disable adma mode on an nVidia SATA controller, add these options to the kernel command line (format is <modulename>.<option>=value):

sata_nv.adma=0

Alternatively, add this line to a file in /etc/modprobe.d/:

options sata_nv adma=0

To get options set in /etc/modprobe.d/* into the initrd, run the mkinitrd program. Usually this is just the command mkinitrd /boot/initrd-$(uname -r).new.img $(uname -r) to build a new initrd for the currently-running kernel without overwriting the exisitng one. (See man mkinitrd for help on additional options.) To test the new initrd, reboot the system and use the command line editing facilities to change the name of the initrd. Or, create a new stanza in the /etc/grub.conf file something like this (see above for the original):

title Fedora Core [with new initrd]  (2.6.29-0.215.rc7.fc11.i586)
root (hd0,1)
kernel /vmlinuz-2.6.29-0.215.rc7.fc11.i586 ro root=LABEL=/ rhgb quiet
initrd /initrd-2.6.29-0.215.rc7.fc11.i586.new.img

This will let you boot with either the new or the old initrd by pressing the up arrow / down arrow keys on the very first boot screen. Once everything is tested, remove the original initrd and rename the new one to the same name as the old one, then remove the "[with new initrd] " stanza from /etc/grub.conf.

[edit] How to ensure you always have a working kernel installed

[edit] Keeping more than the default number of kernels installed

If you are having problems with some update kernels, you may want to increase the maximum number of kernels that yum will leave installed. Edit /etc/yum.conf and change the number on the installonly_limit line to do that. Note that you will need enough free space in /boot to keep the extra kernels installed.

[edit] Uninstalling older non-working kernels

You may also erase kernels that you know are not working in order to keep older, working kernels from being uninstalled on the next update.

[edit] Can't find root filesystem / error mounting /dev/root

  • A lot of these bugs end up being a broken initrd due to bugs in mkinitrd. Get the user to attach their initrd for their kernel to the bz, and also their /etc/modprobe.conf, or have them examine the contents themselves if they are capable of that.
  • Picking apart the initrd of a working and failing kernel and doing a diff of the init script can reveal clues.

To take apart an initrd, do the following ..

mkdir initrd
cd initrd/
gzip -dc /boot/initrd-2.6.23-0.104.rc3.fc8.img | cpio -id

Another way to examine the initrd is with Midnight Commander. Add the extension .cpio.gz to the filename and then just place the cursor over the name and press Enter.

  • Unsupported mount options like "relatime" in /etc/fstab can cause problems. Removing any references to the "relatime" option and rebuilding the initrd will fix this.

[edit] Crashes/Hangs

  • Checking whether or not the CapsLock key (or NumLock or ScrollLock) causes the light on the keyboard to change state can be used as an indication of whether or not the kernel has hung completely, or if there is something else going on.
  • For boot related issues we need as much info as possible, so removing quiet rhgb from the boot flags should be the first thing to ask for.
  • Slowing down the speed of text output with boot_delay=1000 (the number may need to be tweaked higher/lower to suit) may allow the user to take a digital camera photo of the last thing on screen.
  • Booting with vga=791 (or even just vga=1 if the video card won't support 791) will put the framebuffer into high resolution mode to get more lines of text on screen, allowing more context for bug analysis.
  • initcall_debug will allow to see the last thing the kernel tried to initialise before it hung.
  • There are numerous switches that change which at times have proven to be useful to diagnose failures by disabling various features.
  • acpi=off is a big hammer, and if that works, narrowing down by trying pci=noacpi instead may yield clues
  • nolapic and noapic are sometimes useful
  • nolapic_timer can be useful on i386; on x86_64 this option is called noapictimer
  • Given it's new and still seeing quite a few changes, nohz=off and/or highres=off may be worth testing. (Though this is kernel 2.6.21 and above only)
  • If you get no output at all from the kernel, sometimes booting with earlyprintk=vga can sometimes yield something of interest.
  • If the kernel locks up with a 'soft lockup' report, booting with nosoftlockup will disable this check allowing booting to continue.
  • If the kernel locks up really early, booting with edd=skipmbr or edd=off may help
  • The system can hang because the clock isn't running properly, see System clock runs too fast/slow
  • Sometimes the system can hang because it is looking for nonexistent floppy drives. See Boot pauses probing floppy device
  • Sometimes multiple options are needed, e.g. clocksource=acpi_pm nohz=off highres=off
  • Try to narrow down the options needed to the absolute minimum. This helps the kernel maintainers find the underlying problem.
  • If it hangs after "Freeing unused kernel memory: 280k freed" you might have glibc.i686 when your processor is not capable of i686. Replace it to glibc.i386 and be sure the "i686" and "nosegneg" directories are deleted.

[edit] Boot pauses probing floppy device

On some machines (mostly laptops with removable floppy drives), boot will pause while the (non-existant) floppy device is probed. A series of the following messages will appear:

end_request: I/O error, dev fd0, sector 0
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0

This is caused by initrd's nash searching for filesystem labels on the floppy device. This problem can be avoided by adding floppy.allowed_drive_mask=0 to the kernel boot options.

[edit] Can't find installation CD/DVD or hard drives

  • Try pci=nomsi,nommconf. This disables PCI Message Signaled Interrupts and MMCONFIG.
  • Try booting with libata.dma=1 [use DMA only for hard drives] or libata.dma=0 [do not use DMA at all] . This can at least get the system installed, then the drivers can be updated.
  • Try the boot option pci=nocrs on 2.6.34 and later kernels.
  • The option pcie_aspm=off may be needed by some SCSI and RAID drivers (and some network drivers as well.)
  • Try disabling the AHCI driver by adding rdblacklist=ahci. This forces the generic drivers to be used, which may work, but sometimes very slowly.

[edit] Install runs very slowly

If the system runs very slowly, it may have a BIOS bug that causes part of the system memory to be uncached. Playing with the mem= parameter can work around this problem. Trying for example, mem=1000M will limit the system to 1000 megabytes of memory and may make the install run much faster.

[edit] Can't install

Sometimes, even booting with acpi=off or various other boot command line options, the kernel refuses to boot on some subsets of hardware. If none of the above tricks helps, then..

  • In rawhide bugs, if the report is something that would prevent someone from installing the next release (crashes during boot, doesn't find hard disks etc), mark the bug as blocking 'F9Blocker' (bug 235706).
  • if it's against the previously released version of Fedora, then it's possible that the problem was caused in a kernel bug that has since been fixed upstream. As Fedora constantly rebases to newer upstream kernels, they'll get picked up by the respins done by the folks at http://fedoraunity.org Suggest that the user tries an updated ISO if one is available.

[edit] Diagnosing "My machine locked up"

This can be a tricky one to diagnose. Most users don't have serial console capability, so we're mostly guessing in the dark.

  • For possible workarounds for this problem, see Crashes/Hangs
  • If it's repeatable, hooking up a serial cable to a second box can be useful for capturing kernel messages that may get printed just before the lockup. Configure the machine being debugged to boot with console=ttyS0,115200 console=tty0 and run a terminal program such as minicom on the other end. Configure the remote end to talk at the same baud rate (115200). (In minicom ctrl-a, p, i, enter. More info on setting up a serial terminal can be found at http://searchenterpriselinux.techtarget.com/tip/0,289483,sid39_gci1118136,00.html
  • Sometimes just getting lsmod output from users can yield enough clues if there are multiple reports and common modules between both. (It also allows to filter out reports from users of nvidia,vmware etc).
  • Hooking up serial console / can sometimes get debug info out of the machine.
  • If the hang happened whilst in X, the machine may still respond to ssh logins from other machines. Try this to get a dmesg.
  • The magic sysrq key might work. See QA/Sysrq for details.
  • booting with nmi_watchdog=2 may cause a backtrace to occur when the lockup happens.

[edit] Suspend/Resume failure

The most common failure mode is 'black screen on resuming', but the system may also hang while suspending.

  • Laptops using the nv driver should be considered hibernate-only capable as per https://www.redhat.com/archives/fedora-test-list/2007-September/msg00365.html
  • If the system fails to resume, see if the system is locked up completely by hitting the caps lock key.
  • If the capslock light doesn't toggle, or the failure is during suspend, try again, but this time before suspending, activate the pm_trace functionality with echo 1 > /sys/power/pm_trace. This reprograms the real time clock to contain a few bytes of information which we can use to diagnose which driver failed to suspend or resume. After the hang, reboot, boot up again. Now use the command dmesg | grep "hash matches" and you will get a list of matches like this: hash matches device 0000:05:06.1 . The last device on the list is likely the one thats causing problems. To find out which driver is causing the problem you will have to look up the driver in /sys/bus/pci/drivers/ . This can be done using find /sys/bus/pci/drivers/ -name "0000:05:06.1" . It will return a path similar to this one: /sys/bus/pci/drivers/firewire_ohci/0000:05:06.1 which means that the firewire_ohci driver is causing troubles. Unloading the module using modprobe -r firewire_ohci should fix the suspend issues. Please also note that pm_trace uses the RTC for storing the data, which will result into a wrong system clock after boot. To fix it just use system-config-date to set the correct date.
  • If the capslock light does toggle when resuming, then the system did come back up, and it's possible that we just failed to reinitialise the video. It may be useful to initiate the suspend from a tty (ctrl-alt-f1) and run pm-suspend ; dmesg > dmesg.out ; sync by hand. Upon resuming you'll now have some more debug info to sift through. Additionally, this way when it resumes, you already have a console logged in from which you can type commands 'blind'. Trying vbetool post for example may bring things back to life.
  • Proprietary 3d graphics driver users should test with respective open source drivers.
  • Try rmmod'ing various modules before doing the suspend. If this makes things work again, retry with a smaller set of modules unloaded. Keep retrying until you narrow down which module is to blame.
  • Another trick that sometimes works to force video to come back up is to enable the BIOS password. This makes the system resume in a VGA text mode that the kernel recovers from a lot easier. Not a real solution, but it can help to diagnose other problems.

[edit] System clock runs too fast/slow

  • Try a different clock source, e.g. : clocksource=acpi_pm
  • Clock sources can be changed at runtime by writing the new clocksource name to the file /sys/devices/system/clocksource/clocksource0/current_clocksource, but be aware that changing to an unstable/broken clock source can hang the system. Changing tsc or jiffies to acpi_pm should be okay. (The list of available sources is in the file available_clocksource in the same directory.)
  • The kernel's tickless mode is enabled by default in Fedora 7 and 8, but can sometimes cause incorrect timekeeping. Using nohz=off highres=off will disable it.

[edit] Sound card doesn't work

[edit] "High Definition Audio" devices

Many times the model can't be detected properly. Adding the correct model to the sound card driver's entry in /etc/modprobe.d/dist.conf will force the driver to use that model, e.g. options sound-card-0 model=3stack. Options for this driver are documented in the file /usr/share/doc/kernel-doc-<version>/Documentation/sound/alsa/ALSA-Configuration.txt in the kernel-doc package.

[edit] System hangs on reboot

Changing the reboot method can work around this problem. To force a reboot method other than the default, use the reboot= kernel option:

  • reboot=b forces reboot through the system BIOS.
  • reboot=w forces a 'warm" reboot (no memory test.)

These can be combined: reboot=b,w forces a warm reboot using the system BIOS.

[edit] Booting is slow

The first thing to do is isolate which part of the boot process is slow to determine if the fault is the kernel, the initrd scripts, or other parts of the boot process. One way to do this is using the bootchart application. Install this with yum, and the next time you reboot, profiling will be done during boot which can be collected by running the command bootchart which will generate a .png file containing a graph showing where the time was spent. If the kernel appears to stall during boot, booting with the boot parameter printk.time=1 will insert timestamps before every message the kernel prints to its ringbuffer. Retrieve these messages with dmesg, and look for large deltas between two timestamps to isolate (for eg) drivers which may be spending a long time initialising.

[edit] Creation of slab failed

In Rawhide/devel kernels (and in -debug flavors of released kernels), Fedora uses the SLUB allocator with full slab debugging enabled by default. The debugging might cause problems in some rare cases: memory allocations can fail, causing the system to panic. Slab debugging can be disabled with the option slub_debug=- (a single minus sign.) Note, that this option will hide an actual bug that really should be reported and fixed rather than worked around.

[edit] USB devices don't work

This can be caused by USB autosuspend stopping and starting devices repeatedly. To disable autosuspend globally, use the kernel option usbcore.autosuspend=-1 .

[edit] Problems with PCMCIA / PC Card adapters

By default, the kernel only reserves a fairly small amount of memory and I/O space for PC Card adapters. Some adapters need more space, or will not work within the default range of addresses.

  • The amount of memory allocated can be set using the cbmemsize kernel option. Default is 64 megabytes, but it can be changed to e.g. 256 megabytes using the option cbmemsize=256M . Going over 256M is not recommended.
  • The default for Cardbus IO space is 256 bytes, but it can be changed using cbiosize , e.g. to change the size to 4096 bytes, use cbiosize=4096 . Setting this to a value larger than 4096 may cause problems.

[edit] nVidia SATA controllers don't recognize all connected drives

  • Try the kernel parameter pnpacpi=off

[edit] CPU stuck at the lowest frequency on ThinkPad machines

ThinkPad users who see their system throttled as soon as the processor module gets loaded and without obvious reason should check the contents of this file:

/sys/devices/system/cpu/cpu0/cpufreq/bios_limit

If it is set to the lowest value, you must pass processor.ignore_ppc=1 boot parameter as a workaround. (See kernel.org bug #16382 for details.)

[edit] Can't decrypt drive / encryption password not accepted

Try adding the boot option rdblacklist=aesni-intel. You may also have to blacklist the aesni-intel driver by adding a blacklist entry in /etc/modprobe.d.

[edit] Elantech trackpad not recognized as a trackpad

You can add the option psmouse.force_elantech=1 to force recognition. This requires at least kernel 2.6.34 to work.

[edit] Systems with nVidia adapters using the nouveau driver lock up randomly

Try adding the boot option nouveau.noaccel=1.

[edit] Network drives using the CIFS filesystem get inconsistent data when reading files

This is fixed in recent kernel updates, but can be worked around by adding noserverino to the mount options.

[edit] PCI Devices Not Recognized / AHCI: "failed to stop engine"

On kernel version 2.6.34 and later, ACPI is used to determine PCI resources. Some machines have bugs in their ACPI BIOS code and fail set configure resources properly. Try using pci=nocrs to disable use of ACPI for resource enumeration.

[edit] Unable to Allocate Memory / page allocation failure

Heavily-loaded network servers may have trouble allocating memory even though there is no shortage. Try setting the sysctl vm.min_free_kbytes to 65536 in order to keep additional memory free for allocation by network drivers.

[edit] See Also...