Important note -
For Red Hat Enterprise Linux 5.8+ and 6 the debuginfo packages are no longer
provided via the Red Hat public FTP site. They have instead moved to Red Hat
Network (RHN) and RHN Satellite for download.
This is tested procedure for “Enabling crash dump and analysing it using crash command”.
Testing crash dump analysis on lcls-opi30 -
Find out your existing kernel version using "uname -a" command.
# uname -a
Linux lcls-opi30 2.6.18-274.17.1.el5PAE #1 SMP Wed Jan 4 22:49:48 EST 2012 i686 i686 i386 GNU/Linux
Open browser and go to url - ftp://ftp.redhat.com/redhat/linux/enterprise/5Server/en/os/i686/Debuginfo/
Download following two RPMs -
kernel-debuginfo-common-2.6.18-274.17.1.el5.i686.rpm
kernel-PAE-debuginfo-2.6.18-274.17.1.el5.i686.rpm
install the rpms -
rpm -ivh kernel-debuginfo-common-2.6.18-274.17.1.el5.i686.rpm
rpm -ivh kernel-PAE-debuginfo-2.6.18-274.17.1.el5.i686.rpm
Install crash utility -
yum install crash
Install kexec-tools
yum install kexec-tools
Install following rpms (don't know whether all of these are required)
kernel-debug-devel-2.6.18-348.1.1.el5 Fri 01 Feb 2013 12:23:57 PM PST
kernel-debug-2.6.18-348.1.1.el5 Fri 01 Feb 2013 12:23:25 PM PST
Issue following command -
grubby --update-kernel=ALL --args="crashkernel=128M@16M"
Above command, updates /boot/grub/grub.conf file and do following type of change for the kernel parameter-
kernel /boot/vmlinuz-2.6.18-274.17.1.el5PAE ro root=/dev/sda1 rhgb quiet crashkernel=128M@16M
Note - In /boot/grub/grub.conf file "crashkernel=X@Y" is a kernel parameter for Kdump.
Kdump requires some memory reservation for the second kernel(capture kernel).
When kdump is enabled, (physical memory - X) will be allocated.
X: denotes what physical address the reserved memory section starts.
Y: denotes how much memory to reserve for the second kernel(capture kernel)
Tune /etc/kdump.conf. To enable the dump file compression, add the -c parameter, to remove zero pages and free pages add -d 17:
path /var/crash
core_collector makedumpfile -d 17 -c --message-level 1
Enable the kdump service:
chkconfig kdump on
Reboot:
shutdown -r now
In case of a system crash, Kexec will boot to the capture kernel without clearing the crashed kernel memory and then pass the control to this kernel. Kdump, in its turn, will capture the dump and put it into a sudir of /var/crash directory, named with date and time dump was created.
Verify you have reserved memory for the crash kernel
[root@lcls-opi30 etc]# cat /proc/iomem |grep Crash
01000000-08ffffff : Crash kernel
[root@lcls-opi30 127.0.0.1-2013-02-01-11:27:49]# /etc/init.d/kdump status
Kdump is operational
Triggering kernel panic and testing crash dump - panic your kernel:
- first issue SysRq
echo "1" > /proc/sys/kernel/sysrq
The magic SysRq key is a key combination understood by the Linux kernel, which allows the user to perform various low level commands regardless of the system's state. It is often used to recover from freezes, or to reboot a computer without corrupting the filesystem. If your linux freezed and it`s unresponsive if you have a serial type of connection you can execute key combination of Alt+SysRq+c or Alt+PrintScreen+c to reboot kexec and output a crashdump.
- trigger kernel crash dump
echo "c" > /proc/sysrq-trigger
Kexec will boot the crash kernel and create the core dump in the default location /var/crash after that O S will reboot. Next thing you can do is to analyze the core dump:
[root@lcls-opi30 127.0.0.1-2013-02-01-11:27:49]# cd /var/crash/127.0.0.1-2013-02-01-11\:27\:49/
[root@lcls-opi30 127.0.0.1-2013-02-01-11:27:49]# ls -l total 202844
-rw------- 1 root root 207502458 Feb 1 11:29 vmcore
Use crash command to analyse the crash dump file -
The “bt” and “sys” crash commands should be enough to
find out about the guilty process.
Troubleshooting the code via the backtrace
syscalls and correcting a potential bug is another challenge implying coder
skills and a broad kernel internals knowledge.
[root@lcls-opi30 127.0.0.1-2013-02-01-11:27:49]# crash vmcore /usr/lib/debug/lib/modules/2.6.18-274.17.1.el5PAE/vmlinux
crash 5.1.8-2.el5_9
Copyright (C) 2002-2011 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.0
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
KERNEL: /usr/lib/debug/lib/modules/2.6.18-274.17.1.el5PAE/vmlinux
DUMPFILE: vmcore [PARTIAL DUMP]
CPUS: 4
DATE: Fri Feb 1 11:27:27 2013
UPTIME: 00:03:39
LOAD AVERAGE: 0.16, 0.19, 0.09
TASKS: 154
NODENAME: lcls-opi30
RELEASE: 2.6.18-274.17.1.el5PAE
VERSION: #1 SMP Wed Jan 4 22:49:48 EST 2012
MACHINE: i686 (2992 Mhz)
MEMORY: 20.8 GB
PANIC: "SysRq : Trigger a crashdump"
PID: 4941
COMMAND: "bash"
TASK: f703b000 [THREAD_INFO: f38c9000]
CPU: 2
STATE: TASK_RUNNING (SYSRQ)
crash> ps
PID PPID CPU TASK ST %MEM VSZ RSS COMM
0 0 0 c06903c0 RU 0.0 0 0 [swapper]
> 0 1 1 d3707550 RU 0.0 0 0 [swapper]
0 1 2 d3707000 RU 0.0 0 0 [swapper]
> 0 1 3 d3723aa0 RU 0.0 0 0 [swapper]
1 0 3 d3707aa0 IN 0.0 2160 636 init
2 1 0 d3723550 IN 0.0 0 0 [migration/0]
3 1 0 d3723000 IN 0.0 0 0 [ksoftirqd/0]
4 1 0 d3728aa0 IN 0.0 0 0 [watchdog/0]
5 1 1 d3728550 IN 0.0 0 0 [migration/1]
6 1 1 d3728000 IN 0.0 0 0 [ksoftirqd/1]
7 1 1 d3735aa0 IN 0.0 0 0 [watchdog/1]
8 1 2 d3735550 IN 0.0 0 0 [migration/2]
9 1 2 d3735000 IN 0.0 0 0 [ksoftirqd/2]
10 1 2 d3761aa0 IN 0.0 0 0 [watchdog/2]
11 1 3 d3761550 IN 0.0 0 0 [migration/3]
12 1 3 d3761000 IN 0.0 0 0 [ksoftirqd/3]
13 1 3 d376caa0 IN 0.0 0 0 [watchdog/3]
14 1 0 d376c550 IN 0.0 0 0 [events/0]
15 1 1 d376c000 IN 0.0 0 0 [events/1]
16 1 2 f7fecaa0 IN 0.0 0 0 [events/2]
17 1 3 f7fec550 IN 0.0 0 0 [events/3]
18 1 1 f7fec000 IN 0.0 0 0 [khelper]
19 1 3 f7fddaa0 IN 0.0 0 0 [kthread]
25 19 0 f7facaa0 IN 0.0 0 0 [kblockd/0]
....
crash> log
Linux version 2.6.18-274.17.1.el5PAE (mockbuild@x86-003.build.bos.redhat.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-51)) #1 SMP Wed Jan 4 22:49:48 EST 2012
BIOS-provided physical RAM map:
BIOS-e820: 0000000000010000 - 000000000009ec00 (usable)
BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 00000000cfe0ac00 (usable)
BIOS-e820: 00000000cfe0ac00 - 00000000cfe0cc00 (ACPI NVS)
BIOS-e820: 00000000cfe0ec00 - 00000000cfe5cc00 (reserved)
BIOS-e820: 00000000cfe5cc00 - 00000000cfe5ec00 (ACPI data)
BIOS-e820: 00000000cfe5ec00 - 00000000d0000000 (reserved)
BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
BIOS-e820: 00000000fe000000 - 00000000ff000000 (reserved)
BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved)
BIOS-e820: 0000000100000000 - 0000000530000000 (usable)
20352MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000fe710
Using x86 segment limits to approximate NX protection
On node 0 totalpages: 5439488
DMA zone: 4096 pages, LIFO batch:0
Normal zone: 225280 pages, LIFO batch:31
HighMem zone: 5210112 pages, LIFO batch:31
DMI 2.3 present.
DMI: Dell Inc. Precision WorkStation 690 /0DT029, BIOS A08 04/25/2008
Using APIC driver default
ACPI: RSDP (v002 DELL ) @ 0x000febf0
ACPI: XSDT (v001 DELL B8K 0x00000015 ASL 0x00000061) @ 0x000fce4e
ACPI: FADT (v003 DELL B8K 0x00000015 ASL 0x00000061) @ 0x000fcf76
ACPI: SSDT (v001 DELL st_ex 0x00001000 INTL 0x20050624) @ 0xfff62b03
ACPI: MADT (v001 DELL B8K 0x00000015 ASL 0x00000061) @ 0x000fd06a
ACPI: BOOT (v001 DELL B8K 0x00000015 ASL 0x00000061) @ 0x000fd108
......
crash> sys
KERNEL: /usr/lib/debug/lib/modules/2.6.18-274.17.1.el5PAE/vmlinux
DUMPFILE: vmcore [PARTIAL DUMP]
CPUS: 4
DATE: Fri Feb 1 11:27:27 2013
UPTIME: 00:03:39
LOAD AVERAGE: 0.16, 0.19, 0.09
TASKS: 154
NODENAME: lcls-opi30
RELEASE: 2.6.18-274.17.1.el5PAE
VERSION: #1 SMP Wed Jan 4 22:49:48 EST 2012
MACHINE: i686 (2992 Mhz)
MEMORY: 20.8 GB
PANIC: "SysRq : Trigger a crashdump"
crash> bt
PID: 4941 TASK: f703b000 CPU: 2 COMMAND: "bash"
#0 [f38c9ee8] crash_kexec at c04436e1
#1 [f38c9f30] __handle_sysrq at c054c56d
#2 [f38c9f58] write_sysrq_trigger at c04ab5fe
#3 [f38c9f64] proc_reg_write at c04a672d
#4 [f38c9f84] vfs_write at c0476b85
#5 [f38c9f9c] sys_write at c04771ac
#6 [f38c9fb8] system_call at c0404f44
EAX: 00000004 EBX: 00000001 ECX: b7f80000 EDX: 00000002
DS: 007b ESI: 00000002 ES: 007b EDI: b7f80000
SS: 007b ESP: bffa47e8 EBP: bffa4808
CS: 0073 EIP: 008b1402 ERR: 00000004 EFLAGS: 00000246
crash> mod
MODULE NAME SIZE OBJECT FILE
f8822c00 scsi_transport_sas 32321 (not loaded) [CONFIG_KALLSYMS]
f882c400 ehci_hcd 34381 (not loaded) [CONFIG_KALLSYMS]
f8833f80 ohci_hcd 25065 (not loaded) [CONFIG_KALLSYMS]
f883c180 uhci_hcd 25549 (not loaded) [CONFIG_KALLSYMS]
f8844080 sd_mod 25281 (not loaded) [CONFIG_KALLSYMS]
f884f180 mptscsih 37825 (not loaded) [CONFIG_KALLSYMS]
f885ef00 jbd 57705 (not loaded) [CONFIG_KALLSYMS]
f8883e00 scsi_mod 144021 (not loaded) [CONFIG_KALLSYMS]
f88a5900 ext3 125769 (not loaded) [CONFIG_KALLSYMS]
f88ad900 dm_message 6977 (not loaded) [CONFIG_KALLSYMS]
.....
crash>exit
++++++++++++++++++++++++++++++++++++++++++++++++
Some theory -
● A new kernel, often called capture kernel, is booted after the crash
● Previous kernel's memory is preserved
● Dump is captured from the context of capture kernel
● Kernel to kernel boot loader enables booting a new kernel after a crash
● Kexec is the underlying kerneltokernel bootloader
Kdump is a kernel crash dumping mechanism and is very reliable because the
crash dump is captured from the context of a freshly booted kernel and not from
the context of the crashed kernel. Kdump uses kexec to boot into a second kernel
whenever system crashes. This second kernel, often called the crash kernel,
boots with very little memory and captures the dump image.
The first kernel reserves a section of memory that the second kernel uses to
boot. Kexec enables booting the capture kernel without going through the BIOS,
so contents of the first kernel's memory are preserved, which is essentially the
kernel crash dump.
"crashkernel=128M" parameter in grub.conf file reserves 128MB of physical memory. This reserved
memory is used to preload and run the capture kernel.
- Init scripts take care of pre-loading the capture kernel at system boot
time.
- It is recommended to either set up a serial console or switch to run level 3
(init 3) for testing purposes. The reason being that kdump does not reset the
console if you are in X or framebuffer mode, and no message might be visible on
console after system crash. You may also see screen corruption in graphics mode
during capture.
- Capturing a crash dump can take a long time, especially if the system has a
lot of memory. Be patient. The system will reboot after the dump is captured.
Output of "last" command will show like following in the event of crash -
reboot system boot 2.6.18-274.17.1. Fri Feb 1 19:08 (00:14)
divekar pts/2 mcclogin.slac.st Fri Feb 1 18:21 - crash (00:46)
brobeck pts/2 mcclogin.slac.st Fri Feb 1 13:25 - 15:47 (02:21)
|