From Fedora Project Wiki
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Boot problem (jreiser)

[2011-08-06 0000 UTC] Upon boot of DVD, then dracut drops to shell with the message:

 Warning: No root device "live:/dev/disk/by-label/Fedora" found.

The file is present as

 /dev/disk/by-label/Fedora 16 x86_64 DVD

with those three spaces. Reboot with quoting (single, or double, or backslash) also fails. Altering the boot parameter to

 root=ID=ata_SONY_DVD_RW_AW_G-170A

also fails [that name is in my /dev/disk/by-id].

[2011-08-06 1700 UTC] After modifying pungi so that the label of the DVD has dashes "Fedora-16-x86_64-DVD" and changing the boot command line to include the dashes "root=live:CDLABEL=Fedora-16-x86_64-DVD", then dracut still complains "No root device \"live:/dev/disk/by-label/Fedora-16-x86_64-DVD\" found". The device is there, symlinked to ../../sr0 ==> /dev/sr0, and that is the correct name of the hardware drive.

[2011-08-07 0400 UTC] In the emergency shell after dracut cannot find the root, I can find it by:

 mkdir /mnt1 /mnt2 /mnt3
 mount -o ro,loop /dev/sr0 /mnt1
 mount -o ro,loop /mnt1/images/install.img /mnt2
 mount -o ro,loop /mnt2/LiveOS/rootfs.img /mnt3

but I cannot find how to specify that in a "root=live..." that dracut understands. It's also a mystery why so many mounts are necessary.

notes/solutions (wwoods)

  1. The DVD isn't built correctly - pungi changes the label of the image without changing the boot config file. Pungi will need to be fixed to handle that. (boot.iso should work fine.)
    • I believe dracut expects spaces to be escaped, so it would need to be: root=live:CDLABEL=Fedora%x2016%x20x86_64%x20DVD
    • As a workaround, you could just boot with root=live:/dev/sr0
  2. The mounts are necessary because treebuilder images are live images, and all live images are ext4 images inside squashfs images inside an iso (CD/DVD) or vfat (USB stick) filesystem.
    • Yes, that does seem needlessly complicated
    • No, there isn't a simpler way to do it right now


mount traps when systemd mounts /tmp (hamzy)

boot the 08/04/2011 iso image with the following options

  linux systemd.log_target=kmsg systemd.log_level=debug rd.break

Seen during booting:

...
[   11.992445] systemd[1]: tmp.mount mount process exited, code=killed status=11
[   12.029660] systemd[1]: Job loader.service/start failed with result 'dependency'.
[   12.029695] systemd[1]: Unit tmp.mount entered failed state.
...
bash-4.2# systemctl status tmp.mount
[   71.988204] systemd[1]: Accepted connection on private bus.
[   71.988860] systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Manager.LoadUnit() on /org/freedesktop/systemd1
[   71.989077] systemd[1]: Got D-Bus request: org.freedesktop.DBus.Properties.GetAll() on /org/freedesktop/systemd1/unit/tmp_2emount
tmp.mount - Runtime Directory
          Loaded: loaded (/lib/systemd/system/tmp.mount)
          Active: inactive (dead)
           Where: /tmp
            What: tmpfs
          CGroup: name=systemd:/system/tmp.mount
[   71.990451] systemd[1]: Got D-Bus request: org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local

FIX:

boot with the following options

  linux systemd.log_target=kmsg systemd.log_level=debug rd.break

run the following command in the shell

  sed -i -e 's/tmp\.mount //' /sysroot/lib/systemd/system/loader.service

notes (wwoods)

  • Only seems to happen on PPC64 systems
  • This workaround has been added to the treebuilder branch; should be fixed for images built on or after Aug. 9

Slow (jreiser)

Treebuilder is slower. The first culprit looks like dracut.

number execve
7606 /bin/egrep
2578 /bin/cp
1633 /lib64/ld-linux-x86-64.so.2
1434 /usr/bin/ldd
1413 /bin/ln
1303 /usr/local/bin/ln
1303 /usr/bin/ln
1087 /sbin/modinfo

The second culprit is not honoring $TMPDIR, and not putting yumroot-$PID and installroot-$PID inside it.

count filename
1693845 *
253387 $DESTDIR/yumroot/...
79977 $DESTDIR/installroot/...
44486 /etc/xattr.conf
41713 /proc/self/task/31818/attr/fscreate
25919 /proc/self/task/1070/attr/fscreate
20220 /usr
19487 /etc/localtime
18485 /lib64/libc.so.6
18294 /usr/share
18184 /etc/ld.so.cache
18074 /etc/ld.so.preload
17213 /usr/bin/ldd
14471 /bin/egrep
9861 /usr/share/locale
9090 /var/log/dracut.log
9013 /dev/null
7529 /sbin/modprobe
6110 .

commentary (wwoods)

  • This is pretty useless without some hard numbers. On my years-old test systems, I can complete a run in the following times:
Build type Arch Time Notes
treebuilder x86_64 19:15 install.img: 125M
treebuilder (bcj=off) x86_64 15:07 install.img: 127M
treebuilder ppc64 40:10 install.img: 134M
treebuilder (bcj=off) ppc64 37:43 install.img: 134M

(clearly I should turn bcj off by default on ppc64)

speedups

  • bcj=off: Add this to /etc/lorax/lorax.conf
[compression]
bcj=off

The anaconda installer starts but then hangs (hamzy)

boot the 08/04/2011 iso image with the following options

  linux systemd.log_target=kmsg systemd.log_level=debug rd.break

run the following command in the shell

  sed -i -e 's/tmp\.mount //' /sysroot/lib/systemd/system/loader.service
  exit

You will see and then nothing else

  Starting Anaconda version 16.14.

Change the file /lib/systemd/system/loader.service to start the loader like this

  ExecStart=/usr/bin/strace -f -e write=3,4,5 /sbin/loader

This will run strace on the loader and produce the following output

[pid   606] <... read resumed> "", 4096) = 0
[pid   606] --- {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=910, si_status=0, si_utime=0, si_stime=1} (Child exited) ---
[pid   606] read(11, "", 4096)          = 0
[pid   606] waitpid(910, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 910
[pid   606] close(9)                    = 0
[pid   606] close(11)                   = 0
[pid   606] access("/mnt/install/source/.treeinfo", R_OK) = -1 ENOENT (No such file or directory)
[pid   606] umount("/mnt/install/source", 0) = 0
[pid   606] time(NULL)                  = 1312818684
[pid   606] send(6, "<143>Aug  8 15:51:24 loader: in "..., 62, MSG_NOSIGNAL) = 62
[pid   606] write(4, "15:51:24,021 DEBUG loader: in do"..., 61) = 61
 | 00000  31 35 3a 35 31 3a 32 34  2c 30 32 31 20 44 45 42  15:51:24 ,021 DEB |
 | 00010  55 47 20 6c 6f 61 64 65  72 3a 20 69 6e 20 64 6f  UG loade r: in do |
 | 00020  4c 6f 61 64 65 72 4d 61  69 6e 2c 20 73 74 65 70  LoaderMa in, step |
 | 00030  20 3d 20 53 54 45 50 5f  4c 41 4e 47 0a            = STEP_ LANG.    |
[pid   606] select(3, [2], NULL, NULL, {0, 0}) = 0 (Timeout)
[pid   606] write(1, "\33[2;21H\33[m\17\33[30m\33[47m\342\224\214\342\224\200\342\224\200\342\224"..., 1926) = 1926
[pid   606] select(1, [0], [], [], NULL

The loader gets as far as outputting the following statement

  15:06:02,581 DEBUG loader: in doLoaderMain, step = STEP_LANG.

However, I think that there is still unbuffered strace output that did not make it to the screen.

TODO:

Try to set up debugging with gdbserver

  gdbserver :9999 /sbin/loader

$HOME? (jreiser)

During install from boot.iso DVD, then /mnt/sysimage has a "home/<<me>>" directory which contains a copy of my user $HOME at the time I ran pungi, even though I ran pungi within "su; cd" which set $CWD to /root.

SQUASHFS errors (hamzy)

boot the 08/04/2011 iso image with the following options

  linux systemd.log_target=kmsg systemd.log_level=debug rd.break

When you are in the shell, you will see the following errors:

switch_root:/# grep -r 'Starting Anaconda' / 2>/dev/null
[  112.508993] SQUASHFS error: xz_dec_run error, data probably corrupt
[  112.509009] SQUASHFS error: squashfs_read_data failed to read block 0x6e32d5a
[  112.509021] SQUASHFS error: Unable to read data cache entry [6e32d5a]
[  112.509028] SQUASHFS error: Unable to read page, block 6e32d5a, size 8ec4
[  112.509146] SQUASHFS error: Unable to read data cache entry [6e32d5a]
[  112.509152] SQUASHFS error: Unable to read page, block 6e32d5a, size 8ec4
[  112.541370] SQUASHFS error: Unable to read data cache entry [6e32d5a]
[  112.541379] SQUASHFS error: Unable to read page, block 6e32d5a, size 8ec4
[  112.541506] SQUASHFS error: Unable to read data cache entry [6e32d5a]
[  112.541512] SQUASHFS error: Unable to read page, block 6e32d5a, size 8ec4

or

bash-4.2# nslookup hamzy2.austin.ibm.com
[ 3082.076565] SQUASHFS error: xz_dec_run error, data probably corrupt
[ 3082.076587] SQUASHFS error: squashfs_read_data failed to read block 0x6e32d5a
[ 3082.076597] SQUASHFS error: Unable to read data cache entry [6e32d5a]
[ 3082.076604] SQUASHFS error: Unable to read page, block 6e32d5a, size 8ec4
Bus error

It seems that one of the blocks is corrupt?

missing files for networking (hamzy)

boot the 08/04/2011 iso image with the following options

  linux systemd.log_target=kmsg systemd.log_level=debug rd.break

When trying to bring up the networking, I noticed the following missing files:

bash-4.2# ifup eth0
/etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory
/etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory
/etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory
/etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory
/etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory
/etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory
/etc/sysconfig/network-scripts/ifup-ipv6: line 91: /sbin/sysctl: No such file or directory
/etc/sysconfig/network-scripts/ifup-ipv6: line 115: [: -ne: unary operator expected
/etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory
/etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory
/etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory
/etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory

/bin/mountpoint and /sbin/sysctl are missing from the image.

networking does not work (hamzy)

boot the 08/04/2011 iso image with the following options

  linux systemd.log_target=kmsg systemd.log_level=debug rd.break

I am in the shell, I set up the networking by the following commands:

bash-4.2# cat << __EOF__ > /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE="eth0"
BOOTPROTO="static"
DNS1="9.10.244.100"
GATEWAY="9.5.250.1"
HWADDR="2A:0F:4A:82:98:04"
ONBOOT="yes"
IPADDR="9.5.250.146"
IPV6INIT="yes"
NETMASK="255.255.255.0"
NM_CONTROLLED="yes"
TYPE="Ethernet"
__EOF__
bash-4.2# cat << __EOF__ > /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=bluebill
__EOF__
bash-4.2# ifup eth0

I then saw the following errors:

bash-4.2# ifup eth0
/etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory
/etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory
/etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory
/etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory
/etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory
/etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory
/etc/sysconfig/network-scripts/ifup-ipv6: line 91: /sbin/sysctl: No such file or directory
/etc/sysconfig/network-scripts/ifup-ipv6: line 115: [: -ne: unary operator expected
/etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory
/etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory
/etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory
/etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory

However, I can bypass the mountpoint errors by setting the SYSTEMCTL_SKIP_REDIRECT environment variable

bash-4.2# SYSTEMCTL_SKIP_REDIRECT=yes ifup eth0
/etc/sysconfig/network-scripts/ifup-ipv6: line 91: /sbin/sysctl: No such file or directory
/etc/sysconfig/network-scripts/ifup-ipv6: line 115: [: -ne: unary operator expected

Unfortunately, while networking looks like it is up, the machine is unresponsive to the network (not pingable, not sshable).

bash-4.2# ifconfig
eth0      Link encap:Ethernet  HWaddr 2A:0F:4A:82:98:04
          inet addr:9.5.250.146  Bcast:9.5.250.255  Mask:255.255.255.0
          inet6 addr: fe80::280f:4aff:fe82:9804/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:38 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 b)  TX bytes:2460 (2.4 KiB)
          Interrupt:20

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

bash-4.2# nslookup hamzy2.austin.ibm.com
[ 3082.076565] SQUASHFS error: xz_dec_run error, data probably corrupt
[ 3082.076587] SQUASHFS error: squashfs_read_data failed to read block 0x6e32d5a
[ 3082.076597] SQUASHFS error: Unable to read data cache entry [6e32d5a]
[ 3082.076604] SQUASHFS error: Unable to read page, block 6e32d5a, size 8ec4
Bus error

possible recursive locking detected (hamzy)

boot the 08/04/2011 iso image with the following options

  linux systemd.log_target=kmsg systemd.log_level=debug rd.break

Sometimes, you will see the following:

[    7.860083] =============================================
[    7.860110] [ INFO: possible recursive locking detected ]
[    7.860121] 3.0.0-1.fc16.ppc64 #1
[    7.860130] ---------------------------------------------
[    7.860140] udevd/288 is trying to acquire lock:
[    7.860152]  (&mm->mmap_sem){++++++}, at: [<c000000000044560>] .drop_cop+0x70/0x13c
[    7.860185]
[    7.860186] but task is already holding lock:
[    7.860199]  (&mm->mmap_sem){++++++}, at: [<c00000000069c3c8>] .do_page_fault+0x244/0x72c
[    7.860227]
[    7.860228] other info that might help us debug this:
[    7.860241]  Possible unsafe locking scenario:
[    7.860242]
[    7.860253]        CPU0
[    7.860261]        ----
[    7.860267]   lock(&mm->mmap_sem);
[    7.860283]   lock(&mm->mmap_sem);
[    7.860302]
[    7.860303]  *** DEADLOCK ***
[    7.860304]
[    7.860323]  May be due to missing lock nesting notation
[    7.860324]
[    7.860341] 1 lock held by udevd/288:
[    7.860351]  #0:  (&mm->mmap_sem){++++++}, at: [<c00000000069c3c8>] .do_page_fault+0x244/0x72c
[    7.860390]
[    7.860391] stack backtrace:
[    7.860401] Call Trace:
[    7.860412] [c00000027250aab0] [c000000000015764] .show_stack+0x94/0x144 (unreliable)
[    7.860435] [c00000027250ab70] [c0000000006a3098] .dump_stack+0x24/0x2c
[    7.860455] [c00000027250abf0] [c0000000000ee25c] .__lock_acquire+0x176c/0x17f4
[    7.860475] [c00000027250ad40] [c0000000000ee858] .lock_acquire+0x10c/0x128
[    7.860498] [c00000027250ae30] [c000000000698f34] .down_read+0x78/0x12c
[    7.860519] [c00000027250aed0] [c000000000044560] .drop_cop+0x70/0x13c
[    7.860539] [c00000027250af70] [c0000000000448b4] .destroy_context+0x2c/0x50
[    7.860558] [c00000027250b000] [c0000000000a88d4] .__mmdrop+0x50/0x70
[    7.860577] [c00000027250b090] [c000000000096c2c] .finish_task_switch+0x10c/0x15c
[    7.860600] [c00000027250b130] [c000000000696284] .schedule+0x9f8/0xaf0
[    7.860620] [c00000027250b3b0] [c000000000696428] .io_schedule+0xac/0x118
[    7.860646] [c00000027250b440] [c00000000016f550] .sleep_on_page+0x1c/0x2c
[    7.860668] [c00000027250b4c0] [c00000000069717c] .__wait_on_bit_lock+0x88/0x108
[    7.860688] [c00000027250b580] [c00000000016f528] .__lock_page+0x6c/0x78
[    7.860712] [c00000027250b650] [c00000000016f69c] .find_lock_page+0x7c/0xc4
[    7.860735] [c00000027250b6f0] [c0000000001881b0] .shmem_getpage+0x9c/0xad8
[    7.860755] [c00000027250b800] [c000000000188c44] .shmem_fault+0x58/0xb8
[    7.860775] [c00000027250b8b0] [c000000000192e10] .__do_fault+0x78/0x650
[    7.860795] [c00000027250b9c0] [c000000000196400] .handle_pte_fault+0x3cc/0x9b0
[    7.860815] [c00000027250bac0] [c000000000197a4c] .handle_mm_fault+0x15c/0x16c
[    7.860840] [c00000027250bb80] [c00000000069c628] .do_page_fault+0x4a4/0x72c
[    7.860860] [c00000027250be30] [c00000000000676c] handle_page_fault+0x20/0x74