Thursday, July 30, 2009

How to interpret a Linux core dump file

http://www.dialogic.com/support/helpweb/dxall/tn957.aspx

Symptom:
This technote provides information on how to interpret a Linux core dump file when an application running on the system, terminates unexpectedly, due to a segmentation fault and generates a core file.

Reason for the problem:
A segmentation fault generally occurs when a program attempts to access a memory location that it is not allowed to, or attempts to access a memory location in a way that is not allowed. The operating system will then kill the program and create a core file that programmers can use to figure out what went wrong. It contains detailed information about the nature of the crash, such as what caused the crash and what the program was doing when it happened.

Fix / Solution: 
A core dump can be caused by any number of issues that may or may not be related to a Dialogic® problem. This technical note describes how to gather information to determine if a Dialogic® API call may be the cause of a segmentation fault. Here are the basic steps to take in determining the cause for the core dump:

  1. To determine what program a core file came from, use the file command:
    <prompt> file core.1234
    core.1234: ELF 32-bit LSB core file of “app” (signal 11), Intel 80386, version 1, from “app”

    Note: The above output shows that program “app” is the executable which generated the core dump file.

  2. Then run GDB (The GNU Debugger) to view the contents of the core file with the following command line options:
    <prompt> gdb app core.1234 (where “app” is the application executable and “core.1234” is the core dump file)

  3. Once GDB has loaded up, you can run the “bt” command to display backtrace of the program stack.
    Example:
    (gdb) bt
    #0 0x40c8b6ec in s7_listen () from /usr/lib/libgcs7.so
    #1 0x40c8be4e in DlgcHost_GcSS7::s7_Listen () from /usr/lib/libgcs7.so
    #2 0x40278255 in gc_Listen () at eval.c:41
    #3 0x08049ac1 in route () at eval.c:41
    #4 0x0804986a in main () at eval.c:41
    #5 0x402c9507 in __libc_start_main (main=0x8049220 <main>, argc=2, ubp_av=0xbfffeae4, init=0x8048d64 <_init>, fini=0x804b4f0 <_fini>, rtld_fini=0x4000dc14 <_dl_fini>, stack_end=0xbfffeadc) at ../sysdeps/generic/libc-start.c:129


  4. Looking at the stack itself, it goes upward where the last executed function will always be frame 0. That is where the application experienced the segmentation fault:
    #0 0x40c8b6ec in s7_listen () from /usr/lib/libgcs7.so
    It lists the function ‘s7_listen’ and the library ‘libgcs7.so’ from which the segmentation fault (core dump) occurred.
    Note the last frame 0 doesn’t always tell the whole story so it’s a good idea to check a couple of frames below that to see if there are any recognizable functions. Sometimes the functions are from an internal library linked to the application, therefore look for the last external function call to see where the problem started at. In this case the application called the gc_Listen API call within the route subroutine as per:
    #2 0x40278255 in gc_Listen () at eval.c:41
    #3 0x08049ac1 in route () at eval.c:41

Additional information:
Here are some further tips on narrowing down the cause of a segmentation fault (core dump):

  1. Recompile the program with the –g option and re-run the test again. This will enable extra debug information which only GDB can use when reading the stack in the core file. This option is available for both gcc and g++ compilers. <prompt> gcc app.c –g –o app –l(libs to be linked with you application)
    As a result the core dump file will now generate the following backtrace:
    (gdb) bt
    #0 0x40c8b6ec in s7_listen () from /usr/lib/libgcs7.so
    #1 0x40c8be4e in DlgcHost_GcSS7::s7_Listen () from /usr/lib/libgcs7.so
    #2 0x40278255 in gc_Listen () at eval.c:41
    #3 0x08049abd in route (i=1, tsinfo=0x804f1b0) at app.c:274
    #4 0x0804986a in main (argc=2, argv=0xbfffeee4) at app.c:190
    #5 0x402c9507 in __libc_start_main (main=0x8049220 <main>, argc=2, ubp_av=0xbfffeee4, init=0x8048d64 <_init>, fini=0x804b4f0 <_fini>, rtld_fini=0x4000dc14 <_dl_fini>, stack_end=0xbfffeedc) at ../sysdeps/generic/libc-start.c:129


    Note: The output now shows additional information such as parameters passed within the function calls and line numbers specific to the source file (.c) where the function was called from.

    Example: Frame 3 shows that the route function received two parameters, an integer and a pointer to a memory address where it was called from line #274 of app.c file.

  2. In order to step thru the frames of the stack and view additional information use the “up” and “down” commands:
    Example of moving up the stack to frame #4 (after executing “up” 4 times):
    (gdb) up
    #3 0x08049abd in route (i=1, tsinfo=0x804f1b0) at app.c:274
    274 if (gc_Listen(port[i].ldev,tsinfo,EV_SYNC) != GC_SUCCESS ) {


    Note: Each example shows the contents of that line in the app.c file.

  3. At any point use the “print” command to show contents of a memory address passed as a pointer to the function:

    Example of printing the contents of pointer tsinfo:
    (gdb) print *tsinfo
    $1 = {sc_numts = 1, sc_tsarrayp = 0x0}


    Notice the values of the two members of the structure passed to the route function call.

No comments: