抓虫日记之 kgdb set breakpoint at ppc64

DDD  2009年10月31日 星期六 21:46 | 2644次浏览 | 1条评论

The first doubleword of PPC64 ABI function descriptors contains the address of the entry point of the function.

"module_event" function descriptors:
c0000000005b0360 <module_event>:
c0000000005b0360: c0 00 00 00 lfs f0,0(0)
c0000000005b0364: 00 09 5c 80 .long 0x95c80
*** c0 00 00 00 00 09 5c 80 -> the entry point of the "module_event"

c000000000095c80 <.module_event>:
c000000000095c80: 7c 08 02 a6 mflr r0
c000000000095c84: fb c1 ff f0 std r30,-16(r1)

 

 

抓虫日记之 kgdb set breakpoint at ppc64

 

 

A: BUG重现步骤

Host:

1: connect gdb to kgdb(GDB was configured as "--host=i686-pc-linux-gnu --target=powerpc-linux-gnu".)

(gdb) target remote udp:10.0.0.15:6443

 

2: set a break point at "module_event"

(gdb) b module_event

 

Target:

3: insert a module, and the "module_event" breakpoint will be hit, then we get the following error:

 

root@atca6101:/root> insmod /tmp/dummy.ko 

Unable to handle kernel paging request for instruction fetch

Faulting instruction address: 0x7d82100800095c80

Oops: Kernel access of bad area, sig: 11 [#1]

PREEMPT NUMA LTT NESTING LEVEL : 0 

Maple

Modules linked in: dummy(+) kgdboe

NIP: 7d82100800095c80 LR: c00000000007a14c CTR: 7d82100800095c80

REGS: c00000017817b9b0 TRAP: 0400   Not tainted  (2.6.27.37-WR3.0.2as_standard-00080-gb14bbdf-dirty)

MSR: 9000000040009032 <EE,ME,IR,DR>  CR: 24002088  XER: 00000000

TASK = c00000017a183180[2223] 'insmod' THREAD: c000000178178000

GPR00: 7d82100800095c80 c00000017817bc30 c0000000005fa2e8 c000000000547c88 

GPR04: 0000000000000001 d00000000002d100 0000000024002022 c000000000011770 

GPR08: c00000017817b660 c0000000005b0360 c00000000060c300 0000000000000000 

GPR12: 0000000044002088 c00000000060c300 0000000000000000 000000001008a334 

GPR16: 00000000100b142c 00000000100ad5c0 00000000100eb278 00000000100ad72c 

GPR20: 00000000100eb2c0 00000000100e5030 0000000000000000 00000000100ad5c4 

GPR24: c000000000491940 0000000000000000 0000000000000001 d00000000002d100 

GPR28: 0000000000000000 fffffffffffffffc c000000000593ae0 0000000000000000 

NIP [7d82100800095c80] 0x7d82100800095c80

LR [c00000000007a14c] .notifier_call_chain+0xcc/0x120

Call Trace:

[c00000017817bc30] [c00000000007a15c] .notifier_call_chain+0xdc/0x120 (unreliable)

[c00000017817bce0] [c00000000007a520] .__blocking_notifier_call_chain+0x70/0xb0

[c00000017817bd90] [c000000000089ae0] .SyS_init_module+0x100/0x260

[c00000017817be30] [c00000000000852c] syscall_exit+0x0/0x40

Instruction dump:

XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 

XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 

---[ end trace ff196d014336a31d ]---

Segmentation fault

 

来自我同事的简洁描述:

I disassembled the vmlinux file to see that the start of the module_event is as follows:
(gdb) i line *0xc0000000000a1840
Line 1622 of "/workspace/6101/build/linux/kernel/kgdb.c"
starts at address 0xc0000000000a1840 <module_event>
and ends at 0xc0000000000a1850 <kgdb_tasklet_bpt>.
And yet gdb insists on putting a breakpoint at a different location.
(gdb) i line *0xc0000000005b7b98
No line number information available for address
0xc0000000005b7b98 <module_event>
(gdb) i line module_event
Line 1622 of "/workspace/6101/build/linux/kernel/kgdb.c"
starts at address 0xc0000000000a1840 <module_event>
and ends at 0xc0000000000a1850 <kgdb_tasklet_bpt>.

[jl@prt-server5 linux-emer_atca6101-standard-build]$ grep module_event System.map
c0000000000a1840 t .module_event
c0000000005b7b98 d module_event
So gdb is picking the "d" one... I don't know what the "d" means in the System.map file though.

 

B: BUG现场分析

"Unable to handle kernel paging request for instruction fetch

Faulting instruction address: 0x7d82100800095c80"

 

如果我们仔细观察“7d82100800095c80” 这个地址,可以发现其开头的"7d821008" 是PPC平台的触发断点的指令.

 

造成这个BUG的原因很可能就是 gdb/kgdb 本来要修改指针指向指令内容的值,由于某些原因,把这个指针地址本身给改了.

 

举个实例:

void * ptr;

&ptr = 0x005b0360

(*ptr) = (*0x005b0360) = 0x00095c80

 

本来是想修改(*ptr)指向的内容,即把0x00095c80 修改为 0x7d821008,

 

但由于某些错误操作, 把&ptr自己给修改了,即把 0x005b0360 修改成 0x7d821008了

所以系统在执行(*ptr) -> (*0x7d821008) 取指令的时候出问题了.

 

 

 

C: BUG触发原因

 

 

经过一番在kgdb里的艰苦打印调试,并没有发现kgdb有任何异常.

 

kgdb没辙了,就转向gdb吧.

 

一般来说,往哪个点设置什么值,是由gdb来主导的,kgdb只是执行相应的动作,既然kgdb是正常执行的,

那也许就意味着是gdb搞错地址了,把module_event函数的地址给取错了,然后触发了这个问题.

 

于是我objdump出vmlinux的符号地址,然后grep了下module_event这个符号,找到如下信息:

 

******************************************************************************************************

...

c0000000005b0360 <module_event>:

c0000000005b0360:       c0 00 00 00     lfs     f0,0(0)

c0000000005b0364:       00 09 5c 80     .long 0x95c80

c0000000005b0368:       c0 00 00 00     lfs     f0,0(0)

c0000000005b036c:       00 5f a2 e8     .long 0x5fa2e8

...

c000000000095c80 <.module_event>:

c000000000095c80:       7c 08 02 a6     mflr    r0

c000000000095c84:       fb c1 ff f0     std     r30,-16(r1)

...

******************************************************************************************************

 

发现有两个关于module_event, 很显然它们的关系是:

看起来上面那个module_event是函数符号表之类的东西,然后它的内容是指向真正的函数地址

 

(* 0xc0000000005b0360 <module_event>) -> c000000000095c80 <.module_event>

<.module_event> 是真正的函数入口点地址.

 

我查看了下 ppc64的 ABI文档,找到了有关上面的解释。

我把关键内容贴出来:

 

******************************************************************************************************

In PPC64 ABI, there is a function descriptors structure.

 

PPC64 ABI Function Descriptors

A function descriptor is a three doubleword data structure that contains the following values:

    * The first doubleword contains the address of the entry point of the function.

    * The second doubleword contains the TOC base address for the function.

    * The third doubleword contains the environment pointer for languages such as Pascal and PL/1.

 

For an externally visible function, the value of the symbol with the same name as the function is the address of the function descriptor. Symbol names with a dot (.) prefix are reserved for holding entry point addresses. The value of a symbol named ".FN" is the entry point of the function "FN".

 

The value of a function pointer in a language like C is the address of the function descriptor.

******************************************************************************************************

 

其它更多的有关ppc64 ABI的信息,可以浏览

http://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi-1.9.html#FUNC-DES

 

因此"c0000000005b0360 <module_event>" 是函数描述符,其指向的地址 “c000000000095c80 <.module_event>”才是真正的函数地址.

 

 

看到这,就豁然开朗了,原来gdb那个笨蛋把0xc0000000005b0360这个当成module_event函数的地址,并修改插入断点值.

 

******************************************************************************************************

c0000000005b0360 <module_event>:

c0000000005b0360:       7d 82 21 08     ******----> here was modified to "7d 82 21 08"

c0000000005b0364:       00 09 5c 80     .long 0x95c80

c0000000005b0368:       c0 00 00 00     lfs     f0,0(0)

c0000000005b036c:       00 5f a2 e8     .long 0x5fa2e8

...

c000000000095c80 <.module_event>:

c000000000095c80:       7c 08 02 a6     mflr    r0

c000000000095c84:       fb c1 ff f0     std     r30,-16(r1)

...

******************************************************************************************************

 

导致系统读取函数描述符的地址去取指令的时候,访问无效地址而出问题...

 

 

The right action of gdb should be:

******************************************************************************************************

...

c0000000005b0360 <module_event>:

c0000000005b0360:       c0 00 00 00     lfs     f0,0(0)

c0000000005b0364:       00 09 5c 80     .long 0x95c80

c0000000005b0368:       c0 00 00 00     lfs     f0,0(0)

c0000000005b036c:       00 5f a2 e8     .long 0x5fa2e8

...

c000000000095c80 <.module_event>:

c000000000095c80:       7d 82 21 08     ********modifiy here to "7d 82 21 08"*******

c000000000095c84:       fb c1 ff f0     std     r30,-16(r1)

...

******************************************************************************************************

 

D: BUG解决方法

修改gdb对ppc64 arch的函数符号解析规则,让其能获取到正确的函数入口地址,而不是取函数描述符.

 

 

评论

我的评论:

发表评论

请 登录 后发表评论。还没有在Zeuux哲思注册吗?现在 注册 !
劳永超

回复 劳永超  2010年01月09日 星期六 15:11

这篇很精彩呀,首先得知道“"7d821008" 是PPC平台的触发断点的指令”,然后才容易得出后面的结论“造成这个BUG的原因很可能就是 gdb/kgdb 本来要修改指针指向指令内容的值,由于某些原因,把这个指针地址本身给改了”,最后才有了后面的分析和查证

膜拜一下楼主。

0条回复

暂时没有评论

Zeuux © 2024

京ICP备05028076号