当前位置:网站首页>Kernel Debugging Tricks
Kernel Debugging Tricks
2022-06-24 21:23:00 【already_ skb】
Kernel Debugging Tricks
Debugging the kernel is not necessarily rocket science; in fact it can be achieved using very simple and straight forward techniques and some time, patience and perseverance. This page describes some tricks and techniques to help debug the kernel.
printk is your friend
The simplest, and probably most effective way to debug the kernel is via printk(). This enables one to print messages to the console, and it very similar to printf(). Note that printk() can slow down the execution of code which can alter the way code runs, for example, changing the way race conditions occur.
CHANGING THE RING BUFFER SIZE
The internal kernel console message buffer can sometimes be too small to capture all of the printk messages, especially when debug code generates a lot of printk messages. If the buffer fills up, it wraps around and one can lose valueable debug messages.
To increase the internal buffer, use the kernel boot parameter:
log_buf_len=N
where N is the size of the buffer in bytes, and must be a power of 2.
CHANGING DEBUG LEVELS
One can specify the type of printk() log level by pre-pending the 1st printk() argument with one of the following:
KERN_EMERG /* system is unusable */ KERN_ALERT /* action must be taken immediately */ KERN_CRIT /* critical conditions */ KERN_ERR /* error conditions */ KERN_WARNING /* warning conditions */ KERN_NOTICE /* normal but significant condition */ KERN_INFO /* informational */ KERN_DEBUG /* debug-level messages */
e.g. printk(KERN_DEBUG "example debug message\n");
If one does not specify the log level then the default log level of KERN_WARNING is used. For example, enable all levels of console message:
echo 7 > /proc/sys/kernel/printk
To view console messages at boot, remove the quite and splash boot parameters from the kernel boot line in grub. This will disable the usplash splash screen and re-enable console messages.
Serial Console
Serial console enables one to dump out console messages over a serial cable. Most modern PCs do not have legacy serial ports, so instead, one can use a USB serial dongle instead. A "null serial cable" or "universal file transfer cable" is needed to connect the target computer with the host. Most commonly this will be a DB9 female to DB9 female null serial cable. In addition, one needs to enable USB serial support as a kernel build configuration:
CONFIG_USB_SERIAL_CONSOLE=y CONFIG_USB_SERIAL=y
and enable the appropriate driver, e.g.:
CONFIG_USB_SERIAL_PL2303=y
and boot this kernel with
console=ttyUSB0,9600n8
one may need to adjust the baud rate appropriately.
Note: Generally, there is NO hardware or software flow control on serial console drivers, which means one may get dropped characters when running very high speed tty baud rates, such as 115200 baud.
Console Messages
Kernel Oops messages general contain a fair amount of information, ranging from register and process state dump and a stack dump too. Unfortunately the stack dump can be more than 25 lines and can scroll off the top of the 25 line Virtual Console. Hence to capture more of a Oops, try the following:
chvt 1 setfont /usr/share/consolefonts/Uni1-VGA8.psf.gz
Of course, one may still have a stack dump that scrolls the top of the Oops message off the console, so one trick is to rebuild the kernel with the stack dump removed, just to capture the initial Oops information. To do this, modify dump_stack in arch/x86/kernel/dumpstack_*.c and comment out the call to show_trace()
Slowing down kernel messages on boot
One may find a machine hangs during the kernel boot process and one would like to be able to see all the kernel messages but unfortunately they scroll off the console too quickly. One can slow down kernel console messages at boot time using by building the kernel with the following option enabled:
CONFIG_BOOT_PRINTK_DELAY=y
And boot the machine with the following kernel boot parameter:
boot_delay=N
where N = msecs delay between each console message.
Kernel panic during suspend
Debugging suspend/resume issues can be difficult if the kernel panics during suspend, especially late in the suspend because console messages are disabled. One can stop console messages from being suspended by using the kernel parameter no_console_suspend:
no_console_suspend=1
This will force the console not to suspend. Boot with this option, chvt 1 (to console #1), and suspend using pm-suspend
Serial Console in VirtualBox
In some debug scenerios it can be helpful to debug the kernel running inside a virtual machine. This is useful for some classes of non-hardware specific bugs, for example generic kernel core problems or debugging file system drivers.
One can capture Linux console messages running inside VirtualBox by setting it the VirtualBox serial log to /tmp/vbox and running a serial tty communications program such as minicom, and configure it to communicate with a named pipe tty called unix#/tmp/vbox
Boot with virtualised kernel boot line:
console=ttyS0,9600
and minicom will capture the console messages
Network Console
One can route console messages over a network using netconsole. Note that it's not useful for capturing kernel panics as kernel halts before the messages can be transmitted over the network. However it can be useful to monitor systems without the need of message serial console cabling.
see Documentation/networking/netconsole.txt and also Kernel/Netconsole.
netconsole=[src-port]@[src-ip]/[<dev>],[tgt-port]@<tgt-ip>/[tgt-macaddr]
where
src-port source for UDP packets (defaults to 6665)
src-ip source IP to use (interface address)
dev network interface (eth0)
tgt-port port for logging agent (6666)
tgt-ip IP address for logging agent
tgt-macaddr ethernet MAC address for logging agent (broadcast)Examples:
linux [email protected]/eth1,[email protected]/12:34:56:78:9a:bc
The remote host can run either 'netcat -u -l -p <port>' or syslogd.
gdb on vmlinux
One can disassemble a built kernel using gdb on the vmlinux image. This is useful when one gets a kernel Oops message and a stack dump - one can then disassemble the object code and see where the Oops is occuring. For example:
gdb debian/build/build-generic/vmlinux (gdb) disassemble printk Dump of assembler code for function printk: 0xffffffff8023dce0 <printk+0>: sub $0xd8,%rsp 0xffffffff8023dce7 <printk+7>: lea 0xe0(%rsp),%rax 0xffffffff8023dcef <printk+15>: mov %rsi,0x28(%rsp) 0xffffffff8023dcf4 <printk+20>: mov %rsp,%rsi 0xffffffff8023dcf7 <printk+23>: mov %rdx,0x30(%rsp) 0xffffffff8023dcfc <printk+28>: mov %rcx,0x38(%rsp) 0xffffffff8023dd01 <printk+33>: mov %rax,0x8(%rsp) 0xffffffff8023dd06 <printk+38>: lea 0x20(%rsp),%rax 0xffffffff8023dd0b <printk+43>: mov %r8,0x40(%rsp) 0xffffffff8023dd10 <printk+48>: mov %r9,0x48(%rsp) 0xffffffff8023dd15 <printk+53>: movl $0x8,(%rsp) 0xffffffff8023dd1c <printk+60>: movl $0x30,0x4(%rsp) 0xffffffff8023dd24 <printk+68>: mov %rax,0x10(%rsp) 0xffffffff8023dd29 <printk+73>: callq 0xffffffff8023d980 <vprintk> 0xffffffff8023dd2e <printk+78>: add $0xd8,%rsp 0xffffffff8023dd35 <printk+85>: retq End of assembler dump.
Objdump
If one has the built object code at hand, one can disassemble the object using objdump as follows:
objdump -SdCg debian/build/build-generic/fs/dcache.o
Using GDB to find the location where your kernel panicked or oopsed.
A quick and easy way to find the line of code where your kernel panicked or oopsed is to use GDB list command. You can do this as follows.
Lets assume your panic/oops message says something like:
[ 174.507084] Stack: [ 174.507163] ce0bd8ac 00000008 00000000 ce4a7e90 c039ce30 ce0bd8ac c0718b04 c07185a0 [ 174.507380] ce4a7ea0 c0398f22 ce0bd8ac c0718b04 ce4a7eb0 c037deee ce0bd8e0 ce0bd8ac [ 174.507597] ce4a7ec0 c037dfe0 c07185a0 ce0bd8ac ce4a7ed4 c037d353 ce0bd8ac ce0bd8ac [ 174.507888] Call Trace: [ 174.508125] [<c039ce30>] ? sd_remove+0x20/0x70 [ 174.508235] [<c0398f22>] ? scsi_bus_remove+0x32/0x40 [ 174.508326] [<c037deee>] ? __device_release_driver+0x3e/0x70 [ 174.508421] [<c037dfe0>] ? device_release_driver+0x20/0x40 [ 174.508514] [<c037d353>] ? bus_remove_device+0x73/0x90 [ 174.508606] [<c037bccf>] ? device_del+0xef/0x150 [ 174.508693] [<c0399207>] ? __scsi_remove_device+0x47/0x80 [ 174.508786] [<c0399262>] ? scsi_remove_device+0x22/0x40 [ 174.508877] [<c0399324>] ? __scsi_remove_target+0x94/0xd0 [ 174.508969] [<c03993c0>] ? __remove_child+0x0/0x20 [ 174.509060] [<c03993d7>] ? __remove_child+0x17/0x20 [ 174.509148] [<c037b868>] ? device_for_each_child+0x38/0x60 [ 174.509241] [<c039938f>] ? scsi_remove_target+0x2f/0x60 [ 174.509393] [<d0c38907>] ? __iscsi_unbind_session+0x77/0xa0 [scsi_transport_iscsi] [ 174.509699] [<c015272e>] ? run_workqueue+0x6e/0x140 [ 174.509801] [<d0c38890>] ? __iscsi_unbind_session+0x0/0xa0 [scsi_transport_iscsi] [ 174.509977] [<c0152888>] ? worker_thread+0x88/0xe0 [ 174.510047] [<c01566a0>] ? autoremove_wake_function+0x0/0x40
Lets say you want to know what line of code represents sd_remove+0x20/0x70. cd to the ubuntu debian/build/build-generic directory in your kernel tree and run gdb on the ".o" file which has the function sd_remove() in this case in sd.o, and use the gdb "list" command, (gdb) list *(function+0xoffset), in this case function is sd_remove() and offset is 0x20, and gdb should tell you the line number where you hit the panic or oops. This has worked for me very reliably for most cases.
[email protected]:~/devel/ubuntu/kernel/ubuntu-karmic-397906/debian/build/build-generic/drivers/scsi$ gdb sd.o GNU gdb (GDB) 6.8.50.20090628-cvs-debian Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... (gdb) list *(sd_remove+0x20) 0x1650 is in sd_remove (/home/manjo/devel/ubuntu/kernel/ubuntu-karmic-397906/drivers/scsi/sd.c:2125). 2120 static int sd_remove(struct device *dev) 2121 { 2122 struct scsi_disk *sdkp; 2123 2124 async_synchronize_full(); 2125 sdkp = dev_get_drvdata(dev); 2126 blk_queue_prep_rq(sdkp->device->request_queue, scsi_prep_fn); 2127 device_del(&sdkp->dev); 2128 del_gendisk(sdkp->disk); 2129 sd_shutdown(dev); (gdb)
边栏推荐
- Time standard and format
- After 5 months' test, it took 15K to come for an interview. When I asked, it was not worth even 5K. It was really
- Requests requests for web page garbled code resolution
- The difference between RPC and restful
- 基于C语言实现的足球信息查询系统 课程报告+项目源码+演示PPT+项目截图
- Appium introduction and environment installation
- Builder mode -- Master asked me to refine pills
- Limit summary (under update)
- Nifi fast authentication configuration
- Agency mode -- Jiangnan leather shoes factory
猜你喜欢
随机推荐
Football information query system based on C language course report + project source code + demo ppt+ project screenshot
ping: www.baidu.com: 未知的名称或服务
Simpledateformat thread unsafe
Php-pdo parameter binding problem
Three more days
memcached全面剖析–3. memcached的删除机制和发展方向
After a few years in the testing industry, do you still know a little?
Variable setting in postman
CondaValueError: The target prefix is the base prefix. Aborting.
JMeter parameterization
Undo log and redo log must be clear this time
基于STM32的物联网下智能化养鱼鱼缸控制控制系统
Format method and parse method of dateformat class
Markdown use
Prototype mode -- clone monster Army
OSI notes sorting
Time standard and format
Use of kubernetes storage volumes
Background operation retry gave up; KeeperErrorCode = ConnectionLoss
JMeter implementation specifies concurrent loop testing








