phil
2005-11-23 09:51:55 UTC
Hello Gentlemen,
I analyzed a oops that occured on an "old" version of LiS : 2-16 in a
process which is closing a streams pipe. I've attached the oops and the
corresponding section of code hereafter.
The process is multithreaded and we've perheaps reached a race
condition where two threads are dealing with the closing of the same
pipe ...
I found similar errors reported on Linux archive and I understand that
such race conditions have been fixed with upper release of LiS. (i.e
2-18).
In particular, in the below mail there's a reference to a patch for
problem 3) "synchronization problem between pipes." And I think that it
could be the root cause of my problem.
http://www.mail-archive.com/linux-***@gsyc.escet.urjc.es/msg01855.html
My question is : beyond the fact that LiS 2-18 will certainly fixed
this issue, I was wondering if someone could point me to the specific
modification that has fixed this specific issue (perheaps the patch
mentioned in the email from Jeff)
Best regards,
-Philippe
-----------------------------------------------------------------------------------------------------------------------------
== Section of code where the oops occured
-----------------------------------------------------------------------------------------------------------------------------
int lis_safe_SAMESTR(queue_t *q, char *f, int l)
{
if ( lis_check_q_magic(q,f,l)
&& q->q_next != NULL
&& lis_check_q_magic(q->q_next,f,l)
)
return ((q->q_flag&QREADR) == (*q->q_next*->q_flag&QREADR));
^^^^^^^^^
/ksymoops
showed that q->q_next is null
in spite of
the condition test in if statement. /
return 0;
}
-----------------------------------------------------------------------------------------------------------------------------
== The OOPS :
-----------------------------------------------------------------------------------------------------------------------------
Unable to handle kernel NULL pointer dereference at virtual address
0000001c
printing eip:
f8b332b9
*pde = 32ba1001
*pte = 58e29067
Oops: 0000
netconsole nfs lockd sunrpc streams swrmm parport_pc lp parport autofs4
audit e1000 tg3 ipv6 floppy sg microcode keybdev mousedev hid input
usb-ohci usbcore e
CPU: 3
EIP: 0060:[<f8b332b9>] Tainted: P
EFLAGS: 00010002
EIP is at lis_safe_SAMESTR [streams] 0x41 (2.4.21-20.ELsmp/i686)
eax: 00000000 ebx: cb2852ac ecx: f8bd4ad4 edx: 00000020
esi: 0000068a edi: f8bd4ad4 ebp: d6f61ed8 esp: d6f61ebc
ds: 0068 es: 0068 ss: 0068
Process oracle (pid: 1131, stackpage=d6f61000)
Stack: cb2852ac 00000000 cb2852ac f8b32c80 cb2852ac f8bd4ad4 0000068a
00000202
d6f61f00 f4f272b4 00003a98 f4f2753c f8b291d1 cb2852ac f4f272b4
f8c0c960
00000286 00000282 f4f2753c d6f61f30 00000000 f4f272b4 f8b29639
f4f272b4
Call Trace: [<f8b32c80>] lis_qcountstrm [streams] 0x50 (0xd6f61ec8)
[<f8bd4ad4>] .rodata.str1.4 [streams] 0x4584 (0xd6f61ed0)
[<f8b291d1>] close_action [streams] 0x1c9 (0xd6f61eec)
[<f8c0c960>] lis_stdata_sem [streams] 0x0 (0xd6f61ef8)
[<f8b29639>] lis_doclose [streams] 0x2c9 (0xd6f61f14)
[<f8b29850>] lis_strclose [streams] 0x178 (0xd6f61f44)
[<c016472a>] __fput [kernel] 0xea (0xd6f61f78)
[<c01628ae>] filp_close [kernel] 0x8e (0xd6f61f94)
[<c0162956>] sys_close [kernel] 0x66 (0xd6f61fb0)
Code: 8b 40 1c 83 e0 10 83 e2 10 39 c2 0f 94 c0 0f b6 c0 eb d3 55
-----------------------------------------------------------------------------------------------------------------------------
I analyzed a oops that occured on an "old" version of LiS : 2-16 in a
process which is closing a streams pipe. I've attached the oops and the
corresponding section of code hereafter.
The process is multithreaded and we've perheaps reached a race
condition where two threads are dealing with the closing of the same
pipe ...
I found similar errors reported on Linux archive and I understand that
such race conditions have been fixed with upper release of LiS. (i.e
2-18).
In particular, in the below mail there's a reference to a patch for
problem 3) "synchronization problem between pipes." And I think that it
could be the root cause of my problem.
http://www.mail-archive.com/linux-***@gsyc.escet.urjc.es/msg01855.html
My question is : beyond the fact that LiS 2-18 will certainly fixed
this issue, I was wondering if someone could point me to the specific
modification that has fixed this specific issue (perheaps the patch
mentioned in the email from Jeff)
Best regards,
-Philippe
-----------------------------------------------------------------------------------------------------------------------------
== Section of code where the oops occured
-----------------------------------------------------------------------------------------------------------------------------
int lis_safe_SAMESTR(queue_t *q, char *f, int l)
{
if ( lis_check_q_magic(q,f,l)
&& q->q_next != NULL
&& lis_check_q_magic(q->q_next,f,l)
)
return ((q->q_flag&QREADR) == (*q->q_next*->q_flag&QREADR));
^^^^^^^^^
/ksymoops
showed that q->q_next is null
in spite of
the condition test in if statement. /
return 0;
}
-----------------------------------------------------------------------------------------------------------------------------
== The OOPS :
-----------------------------------------------------------------------------------------------------------------------------
Unable to handle kernel NULL pointer dereference at virtual address
0000001c
printing eip:
f8b332b9
*pde = 32ba1001
*pte = 58e29067
Oops: 0000
netconsole nfs lockd sunrpc streams swrmm parport_pc lp parport autofs4
audit e1000 tg3 ipv6 floppy sg microcode keybdev mousedev hid input
usb-ohci usbcore e
CPU: 3
EIP: 0060:[<f8b332b9>] Tainted: P
EFLAGS: 00010002
EIP is at lis_safe_SAMESTR [streams] 0x41 (2.4.21-20.ELsmp/i686)
eax: 00000000 ebx: cb2852ac ecx: f8bd4ad4 edx: 00000020
esi: 0000068a edi: f8bd4ad4 ebp: d6f61ed8 esp: d6f61ebc
ds: 0068 es: 0068 ss: 0068
Process oracle (pid: 1131, stackpage=d6f61000)
Stack: cb2852ac 00000000 cb2852ac f8b32c80 cb2852ac f8bd4ad4 0000068a
00000202
d6f61f00 f4f272b4 00003a98 f4f2753c f8b291d1 cb2852ac f4f272b4
f8c0c960
00000286 00000282 f4f2753c d6f61f30 00000000 f4f272b4 f8b29639
f4f272b4
Call Trace: [<f8b32c80>] lis_qcountstrm [streams] 0x50 (0xd6f61ec8)
[<f8bd4ad4>] .rodata.str1.4 [streams] 0x4584 (0xd6f61ed0)
[<f8b291d1>] close_action [streams] 0x1c9 (0xd6f61eec)
[<f8c0c960>] lis_stdata_sem [streams] 0x0 (0xd6f61ef8)
[<f8b29639>] lis_doclose [streams] 0x2c9 (0xd6f61f14)
[<f8b29850>] lis_strclose [streams] 0x178 (0xd6f61f44)
[<c016472a>] __fput [kernel] 0xea (0xd6f61f78)
[<c01628ae>] filp_close [kernel] 0x8e (0xd6f61f94)
[<c0162956>] sys_close [kernel] 0x66 (0xd6f61fb0)
Code: 8b 40 1c 83 e0 10 83 e2 10 39 c2 0f 94 c0 0f b6 c0 eb d3 55
-----------------------------------------------------------------------------------------------------------------------------