KernelBugClassification

From FedoraProject

Revision as of 19:41, 29 April 2013 by Jwboyer (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

Kernel Subsystem classification

The kernel, while being a single component in bugzilla, is a large and diverse collection of assorted drivers and subsystems. When doing bug triage, it can be difficult to determine exactly what is causing the problem being reported. This is an overview intended to help narrow things down.

The Backtrace

The main source of information we get forkernel problems is the backtrace from an oops, WARN_ON, or kernel panic. This lists the kernel running, the modules loaded, and most importantly, the area of the kernel that hit an issue. An example is below.

WARNING: at lib/list_debug.c:62 __list_del_entry+0x82/0xd0()
Hardware name: MacBookPro9,2
list_del corruption. next->prev should be ffff880107c69150, but was 4c4f56452d580a0d
Modules linked in: binfmt_misc tcp_lp ebtable_nat fuse nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables rfcomm bnep be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nls_utf8 hfsplus snd_hda_codec_hdmi snd_hda_codec_cirrus btusb snd_hda_intel uvcvideo snd_hda_codec snd_hwdep arc4 videobuf2_vmalloc snd_seq b43 videobuf2_memops videobuf2_core videodev media usblp bcm5974 bluetooth snd_seq_device mac80211 snd_pcm cfg80211 rfkill ssb bcma iTCO_wdt iTCO_vendor_support lpc_ich mfd_core snd_page_alloc snd_timer snd i2c_i801 mei soundcore coretemp microcode joydev apple_gmux apple_bl applesmc input_polldev vhost_net tun macvtap macvlan kvm_intel kvm uinput crc32c_intel ghash_clmulni_intel i915 firewire_ohci tg3 sdhci_pci i2c_algo_bit sdhci drm_kms_helper firewire_core ptp mmc_core pps_core crc_itu_t drm i2c_core video sunrpc
Pid: 17477, comm: kworker/u:80 Not tainted 3.8.8-203.fc18.x86_64 #1
Call Trace:
 [<ffffffff8105e675>] warn_slowpath_common+0x75/0xa0
 [<ffffffff8105e756>] warn_slowpath_fmt+0x46/0x50
 [<ffffffff8130d182>] __list_del_entry+0x82/0xd0
 [<ffffffff81492b92>] xhci_drop_ep_from_interval_table+0x82/0x1e0
 [<ffffffff814950bf>] xhci_discover_or_reset_device+0x23f/0x3d0
 [<ffffffff8145fd63>] hub_port_reset+0x293/0x570
 [<ffffffff81460993>] hub_port_init+0x73/0xaf0
 [<ffffffff81469025>] ? usb_hcd_reset_endpoint+0x25/0x70
 [<ffffffff8146bd75>] ? usb_enable_endpoint+0xa5/0xb0
 [<ffffffff81461906>] usb_reset_and_verify_device+0x106/0x6c0
 [<ffffffff8146b09c>] ? usb_get_status+0x9c/0xd0
 [<ffffffff81464318>] usb_port_resume+0x3e8/0x5e0
 [<ffffffff81097b08>] ? __enqueue_entity+0x78/0x80
 [<ffffffff8145da90>] ? usb_dev_thaw+0x20/0x20
 [<ffffffff81476b95>] generic_resume+0x15/0x30
 [<ffffffff8146d5c5>] usb_resume_both+0x105/0x150
 [<ffffffff8146e3bf>] usb_resume+0x1f/0xd0
 [<ffffffff8145da90>] ? usb_dev_thaw+0x20/0x20
 [<ffffffff8145daa3>] usb_dev_resume+0x13/0x20
 [<ffffffff813f53f8>] dpm_run_callback+0x58/0x90
 [<ffffffff813f5dce>] device_resume+0xde/0x200
 [<ffffffff813f5f11>] async_resume+0x21/0x50
 [<ffffffff81089920>] async_run_entry_fn+0xb0/0x1b0
 [<ffffffff8107a623>] process_one_work+0x163/0x480
 [<ffffffff8107ce5e>] worker_thread+0x15e/0x450
 [<ffffffff8107cd00>] ? busy_worker_rebind_fn+0x110/0x110
 [<ffffffff81081f30>] kthread+0xc0/0xd0
 [<ffffffff81010000>] ? perf_trace_xen_cpu_write_gdt_entry+0xa0/0x100
 [<ffffffff81081e70>] ? kthread_create_on_node+0x120/0x120
 [<ffffffff8165baac>] ret_from_fork+0x7c/0xb0
 [<ffffffff81081e70>] ? kthread_create_on_node+0x120/0x120

Let's break down the backtrace into components.

The Hardware Name line lists the machine type. In this case a Macbook Pro 9,2, machine.

The Modules linked in line lists all of the modules loaded in the kernel at the time of the issue.

The line that starts with Pid: lists the process ID that was currently running when the issue happend, the name of the process, any kernel taint flags that may have been set, and the version of the kernel.

The rest of the oops is the backtrace. This lists the functions on the kernel stack that were executed in order to hit this issue, starting with the most recently executed function first. This is of the format:

Address of function: [<XXXXXXXXXXXXXXXX>]
Function name: string_of_function_name
Offset within the function: +0xNNN/0xNNN

If the backtrace is from a module that is loaded in the system instead of something built into the kernel, you will see an additional section at the end of each backtrace line with the module name in brackets, e.g.

[<f82013dc>] intel_wait_for_pipe_off+0x14c/0x180 [i915]

Determining the driver or subsystem

Let's look at some example backtraces.

Additional info:
WARNING: at fs/btrfs/extent-tree.c:6337 btrfs_alloc_free_block+0x35a/0x370 [btrfs]()
Hardware name: HP ENVY dv7 Notebook PC
Modules linked in: usb_storage ebtable_nat xt_CHECKSUM bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables rfcomm bnep be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_seq arc4 uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core iwldvm snd_seq_device videodev snd_pcm mac80211 media snd_page_alloc rtsx_pci_sdmmc snd_timer rtsx_pci_ms snd mmc_core iTCO_wdt iwlwifi btusb iTCO_vendor_support cfg80211 bluetooth i2c_i801 soundcore coretemp hp_wmi joydev memstick sparse_keymap rfkill hp_accel rtsx_pci lpc_ich mfd_core microcode mei lis3lv02d input_polldev vhost_net tun macvtap macvlan kvm_intel kvm binfmt_misc uinput btrfs zlib_deflate libcrc32c i915 crc32c_intel i2c_algo_bit drm_kms_helper ghash_clmulni_intel drm r8169 mii i2c_core wmi video hid_logitech_dj sunrpc
Pid: 2636, comm: btrfs-transacti Not tainted 3.8.5-201.fc18.x86_64 #1
Call Trace:
 [<ffffffff8105e62f>] warn_slowpath_common+0x7f/0xc0
 [<ffffffff8105e726>] warn_slowpath_fmt+0x46/0x50
 [<ffffffffa01cb47a>] btrfs_alloc_free_block+0x35a/0x370 [btrfs]
 [<ffffffff81082920>] ? autoremove_wake_function+0x50/0x50
 [<ffffffffa01d40c0>] ? verify_parent_transid+0x170/0x170 [btrfs]
 [<ffffffffa01fb093>] ? read_extent_buffer+0xc3/0x120 [btrfs]
 [<ffffffffa01b7caa>] __btrfs_cow_block+0x12a/0x510 [btrfs]
 [<ffffffffa01b8234>] btrfs_cow_block+0x124/0x1c0 [btrfs]
 [<ffffffffa01bae86>] push_leaf_left+0x116/0x1a0 [btrfs]
 [<ffffffffa01d80e9>] ? btrfs_mark_buffer_dirty+0x99/0xf0 [btrfs]
 [<ffffffffa01be43f>] btrfs_del_items+0x31f/0x4a0 [btrfs]
 [<ffffffffa01d1b88>] btrfs_del_csums+0x298/0x310 [btrfs]
 [<ffffffffa01c526a>] __btrfs_free_extent+0x5fa/0x870 [btrfs]
 [<ffffffffa01c8dc9>] run_clustered_refs+0x2f9/0xb50 [btrfs]
 [<ffffffffa021d143>] ? find_ref_head+0x83/0xf0 [btrfs]
 [<ffffffffa01cce48>] btrfs_run_delayed_refs+0xc8/0x2f0 [btrfs]
 [<ffffffffa01dc6a6>] btrfs_commit_transaction+0x86/0xa70 [btrfs]
 [<ffffffff810828d0>] ? wake_up_bit+0x40/0x40
 [<ffffffffa01d57c5>] transaction_kthread+0x1a5/0x220 [btrfs]
 [<ffffffffa01d5620>] ? btree_readpage_end_io_hook+0x290/0x290 [btrfs]
 [<ffffffff81081fc0>] kthread+0xc0/0xd0
 [<ffffffff81010000>] ? ftrace_define_fields_xen_mc_entry+0xa0/0xf0
 [<ffffffff81081f00>] ? kthread_create_on_node+0x120/0x120
 [<ffffffff81658b6c>] ret_from_fork+0x7c/0xb0
 [<ffffffff81081f00>] ? kthread_create_on_node+0x120/0x120

Here we see that some code in the btrfs filesystem threw a WARN on some condition it hit. The line

WARNING: at fs/btrfs/extent-tree.c:6337 btrfs_alloc_free_block+0x35a/0x370 [btrfs]()

tells us exactly in the source tree where this WARN_ON was placed, line 6337 of the extent-tree.c file in the fs/btrfs/ directory. That is a clear indicator that this is a btrfs filesystem issue. However, there are many reports where WARN/WARN_ON aren't the cause of the backtrace, so let's look at one of those.

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff8130aa06>] __list_add+0x26/0xd0
PGD 124017067 PUD 11b1a6067 PMD 0 
Oops: 0000 [#1] SMP 
Modules linked in: fuse ebtable_nat xt_CHECKSUM bridge stp llc ipt_MASQUERADE nf_conntrack_netbios_ns nf_conntrack_broadcast ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi rfcomm bnep scsi_transport_iscsi vfat fat snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_intel snd_hda_codec rc_alink_dtu_m snd_hwdep mt2060 snd_seq snd_seq_device snd_pcm af9013 snd_page_alloc snd_timer dvb_usb_af9015 arc4 ath9k ath9k_common ath9k_hw ath mac80211 ath3k snd uvcvideo videobuf2_vmalloc dvb_usb_v2 videobuf2_memops btusb bluetooth dvb_core videobuf2_core cfg80211 videodev soundcore rc_core asus_nb_wmi asus_wmi sparse_keymap media rfkill k10temp i2c_piix4 microcode joydev vhost_net tun macvtap macvlan kvm_amd kvm uinput binfmt_misc dm_crypt ata_generic pata_acpi r8169 mii ums_realtek pata_atiixp usb_storage radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core wmi video sunrpc
CPU 0 
Pid: 26849, comm: kdvb-ad-0-fe-0 Not tainted 3.8.5-201.fc18.x86_64 #1 ASUSTeK COMPUTER INC. K53BE/K53BE
RIP: 0010:[<ffffffff8130aa06>]  [<ffffffff8130aa06>] __list_add+0x26/0xd0
RSP: 0018:ffff880106951ce8  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff880106951d20 RCX: 0000000000000000
RDX: ffff8802402969c0 RSI: 0000000000000000 RDI: ffff880106951d20
RBP: ffff880106951d08 R08: 0000000000000000 R09: 0000000000000174
R10: 0000000000000001 R11: 0000000000000000 R12: ffff8802402969c0
R13: 0000000000000000 R14: 00000000ffffffff R15: ffff8802402969c0
FS:  00007f971cc4f7c0(0000) GS:ffff88024ec00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 000000013ea83000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kdvb-ad-0-fe-0 (pid: 26849, threadinfo ffff880106950000, task ffff8801898e4620)
Stack:
 ffffffff8164def3 ffff8802402969b8 ffff8802402969bc ffff8801898e4620
 ffff880106951d68 ffffffff8164df2e ffff880106951d48 ffffffff81097a40
 ffff88024ec13df0 ffff8801898e4668 ffff88024ec13df0 ffff8802402969b8
Call Trace:
 [<ffffffff8164def3>] ? __mutex_lock_interruptible_slowpath+0x63/0x170
 [<ffffffff8164df2e>] __mutex_lock_interruptible_slowpath+0x9e/0x170
 [<ffffffff81097a40>] ? account_entity_dequeue+0x80/0xa0
 [<ffffffff8164e042>] mutex_lock_interruptible+0x42/0x50
 [<ffffffffa050b268>] af9015_af9013_init+0x58/0xa0 [dvb_usb_af9015]
 [<ffffffffa0223d41>] dvb_usb_fe_init+0xb1/0x170 [dvb_usb_v2]
 [<ffffffffa03b7c9b>] dvb_frontend_init+0x2b/0xb0 [dvb_core]
 [<ffffffffa03bae45>] dvb_frontend_thread+0x85/0x700 [dvb_core]
 [<ffffffff8164ed86>] ? __schedule+0x3c6/0x7a0
 [<ffffffffa03badc0>] ? dvb_register_frontend+0x1d0/0x1d0 [dvb_core]
 [<ffffffff81081fc0>] kthread+0xc0/0xd0
 [<ffffffff81010000>] ? ftrace_define_fields_xen_mc_entry+0xa0/0xf0
 [<ffffffff81081f00>] ? kthread_create_on_node+0x120/0x120
 [<ffffffff81658b6c>] ret_from_fork+0x7c/0xb0
 [<ffffffff81081f00>] ? kthread_create_on_node+0x120/0x120
Code: 00 00 00 00 00 55 48 89 e5 48 83 ec 20 48 89 5d e8 4c 89 65 f0 48 89 fb 4c 89 6d f8 4c 8b 42 08 49 89 f5 49 89 d4 49 39 f0 75 31 <4d> 8b 45 00 4d 39 c4 75 6f 4c 39 e3 74 45 4c 39 eb 74 40 49 89 
RIP  [<ffffffff8130aa06>] __list_add+0x26/0xd0
 RSP <ffff880106951ce8>
CR2: 0000000000000000

Here we see an OOPS in the __list_add function, which is a common kernel facility for adding a member to a linked list. That doesn't mean that the list manipulation functions are broken, it simply means that is the function that executed to trip the kernel up. Normally that is a result of some other driver or subsystem hitting an error while it was using these functions. If we look further down in the bactrace, we can eventually see that a driver in the DVB subsystem is having this issue when trying to execute it's initialization function. The dvb_usb_af9015 driver and DVB subsystem is the likely starting point for triaging this issue.

Common problem areas

Some backtraces don't have a module in the Call Trace itself because they are from components built into the kernel. Some common areas we see are below.

xhci_*, ehci_*, ohci_*: USB subsystem ext[234]_*: Ext4 filesystem (we use the ext4 code for ext2/ext3) acpi_*: ACPI efifb_*, vesafb_*: Framebuffer subsystem ieee80211_*, mac80211_*: Wireless networking subsystem snd_*: ALSA subsystem

There are many others as well. As you get familiar with the kernel and its various subsystems, you'll eventually be able to narrow down where an OOPS has come from.