If you have symbols enabled in your IOC binaries, you can use GDB on the production side to examine core dumps, set breakpoints etc.
You should be able to do anything in production that you can do on the development side with some additional steps.
The big difference is that the sources for the dependencies for your IOC are not available on the production side so, if you need these, you'll have to scp
them separately and point GDB to them.
Here are some examples of debugging a test IOC in production; note, for this example, I am mimicking what
cram
does in a VM without any afs access.
[mshankar@testarch-32-mshankar CATER_12345]$ ls -ltr /afs
ls: cannot access /afs: No such file or directory
[mshankar@testarch-32-mshankar CATER_12345]$
The test IOC was built statically on lcls-dev2 and have symbols by default. However, I turned off optimization using
USR_CFLAGS=-O0
.
[mshankar@lcls-dev2 CATER_12345]$ file bin/linux-x86/testSeq
bin/linux-x86/testSeq: ELF 32-bit LSB ... dynamically linked (uses shared libs), not stripped
[mshankar@lcls-dev2 CATER_12345]$ ldd bin/linux-x86/testSeq
linux-gate.so.1 => (0x00ac4000)
libpthread.so.0 => /lib/libpthread.so.0 (0x007c8000)
libreadline.so.5 => /usr/lib/libreadline.so.5 (0x00d78000)
libncurses.so.5 => /usr/lib/libncurses.so.5 (0x009be000)
librt.so.1 => /lib/librt.so.1 (0x00909000)
libdl.so.2 => /lib/libdl.so.2 (0x00ef5000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x002ee000)
libm.so.6 => /lib/libm.so.6 (0x00700000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00465000)
libc.so.6 => /lib/libc.so.6 (0x00471000)
/lib/ld-linux.so.2 (0x006c0000)
The test IOC uses the sequencer module and has a bug in the sequence that should cause it to core dump.
[mshankar@testarch-32-mshankar CATER_12345]$ cat testSeqApp/src/sncExample.stt
program sncExample
double v;
double* q; <-- This variable is unassigned and thus should cause a NPE in the printf below
assign v to "mshankarHost:aiExample1";
monitor v;
ss ss1
{
state low
{
when(v>5.0)
{
printf("changing to high - %d\n", *q);
} state high
If I run this IOC on lcls-dev2, it should crash in 10 seconds and generate a core file.
[mshankar@lcls-dev2 iocmstest01]$ ulimit -c unlimited
[mshankar@lcls-dev2 iocmstest01]$ pwd
/u/cd/mshankar/workspace/mshankarTestSeq/CATER_12345/iocBoot/iocmstest01
[mshankar@lcls-dev2 iocmstest01]$ ./st.cmd
#!../../bin/linux-x86/testSeq
...
sncExample[0]: all channels connected & received 1st monitor
save_restore:readReqFile: unable to open file info_settings.req. Exiting.
epics> Segmentation fault (core dumped)
Still on lcls-dev2, here's me looking into the core file, looking at the stack trace and examining variables.
[mshankar@lcls-dev2 iocmstest01]$ gdb ../../bin/linux-x86/testSeq core.8103
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-45.el5)
...
Program terminated with signal 11, Segmentation fault.
#0 0x08051018 in A_ss1_low (ssId=0x87fd118, ... pNextState=0x87fd134) at ../sncExample.stt:13
13 printf("changing to high - %d\n", *q);
(gdb) bt
#0 0x08051018 in A_ss1_low (ssId=0x87fd118, ... pNextState=0x87fd134) at ../sncExample.stt:13
#1 0x08066d7b in ss_entry (arg=0x87fd118) at ../seq_task.c:367
#2 0x08066ff1 in sequencer (arg=0x87fcf90) at ../seq_task.c:105
#3 0x0811825b in start_routine (arg=0x87fd380) at ../../../src/libCom/osi/os/posix/osdThread.c:385
#4 0x004e6912 in start_thread () from /lib/libpthread.so.0
#5 0x0025e7ce in clone () from /lib/libc.so.6
(gdb) up
#1 0x08066d7b in ss_entry (arg=0x87fd118) at ../seq_task.c:367
367 st->actionFunc(ss, var, transNum, &ss->nextState);
(gdb) down
#0 0x08051018 in A_ss1_low (ssId=0x87fd118, ... pNextState=0x87fd134) at ../sncExample.stt:13
13 printf("changing to high - %d\n", *q);
(gdb) print q
$1 = (double *) 0x0
Now, we switch over to the production side (aka the VM) and generate a core dump.
[mshankar@testarch-32-mshankar iocmstest01]$ ulimit -c unlimited
[mshankar@testarch-32-mshankar iocmstest01]$ pwd
/home/mshankar/temp/test/mshankarTestSeq/CATER_12345/iocBoot/iocmstest01
[mshankar@testarch-32-mshankar iocmstest01]$ ./st.cmd
#!../../bin/linux-x86/testSeq
...
sncExample[0]: all channels connected & received 1st monitor
epics> Segmentation fault (core dumped)
[mshankar@testarch-32-mshankar iocmstest01]$
Here's me looking into the core file, looking at the stack trace and examining variables.
Note that this time, I have to tell GDB the location of my sources as the folder in production are different from the folders in development.
[mshankar@testarch-32-mshankar iocmstest01]$ gdb ../../bin/linux-x86/testSeq core.4774
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6)
...
#0 0x08051018 in A_ss1_low (ssId=0x8c18d28, ... pNextState=0x8c18d44) at ../sncExample.stt:13
13 ../sncExample.stt: No such file or directory.
in ../sncExample.stt
Missing separate debuginfos, use: debuginfo-install...
(gdb) directory /home/mshankar/temp/test/mshankarTestSeq/CATER_12345/testSeqApp/src
Source directories searched: /home/mshankar/temp/test/mshankarTestSeq/CATER_12345/testSeqApp/src...
(gdb) bt
#0 0x08051018 in A_ss1_low (ssId=0x8c18d28, ... pNextState=0x8c18d44) at ../sncExample.stt:13
#1 0x08066d7b in ss_entry (arg=0x8c18d28) at ../seq_task.c:367
#2 0x08066ff1 in sequencer (arg=0x8c18b38) at ../seq_task.c:105
#3 0x0811825b in start_routine (arg=0x8c18f90) at ../../../src/libCom/osi/os/posix/osdThread.c:385
#4 0x00973a49 in start_thread () from /lib/libpthread.so.0
#5 0x008b0e1e in clone () from /lib/libc.so.6
(gdb) up
#1 0x08066d7b in ss_entry (arg=0x8c18d28) at ../seq_task.c:367
367 ../seq_task.c: No such file or directory.
in ../seq_task.c
(gdb) down
#0 0x08051018 in A_ss1_low (ssId=0x8c18d28, ... pNextState=0x8c18d44) at ../sncExample.stt:13
13 printf("changing to high - %d\n", *q);
(gdb) print q
$1 = (double *) 0x0
One does not need the sources; for example, in
seq_task.c:367
there is a variable called
st
.
One can print the contents of
st
without the sequencer sources; thus, I know I am still in the "low" state.
(gdb) up
#1 0x08066d7b in ss_entry (arg=0x8c18d28) at ../seq_task.c:367
367 ../seq_task.c: No such file or directory.
in ../seq_task.c
(gdb) print *st
$1 = {stateName = 0x8121fd0 "low", actionFunc = 0x8051004 <A_ss1_low>...
delayFunc = 0x8050f69 <D_ss1_low>, entryFunc = 0, exitFunc = 0, eventMask = 0x8121fc8, options = 0}
One can also set breakpoints within the module. Here I am breaking in the sequencer in
seq_task.c:367
[mshankar@testarch-32-mshankar iocmstest01]$ gdb ../../bin/linux-x86/testSeq
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6)
...
(gdb) directory /home/mshankar/temp/test/mshankarTestSeq/CATER_12345/testSeqApp/src
Source directories searched: /home/mshankar/temp/test/mshankarTestSeq/CATER_12345/testSeqApp/src:...
(gdb) break seq_task.c:367
Breakpoint 1 at 0x8066d5a: file ../seq_task.c, line 367.
(gdb) tty /dev/pts/4
(gdb) set args st.cmd
(gdb) run
...
Breakpoint 1, ss_entry (arg=0x81e5118) at ../seq_task.c:367
367 ../seq_task.c: No such file or directory.
in ../seq_task.c
...
(gdb) print *st
$1 = {stateName = 0x8121fd0 "low", ... <D_ss1_low>,
entryFunc = 0, exitFunc = 0, eventMask = 0x8121fc8, options = 0}
(gdb) step
A_ss1_low (ssId=0x81e5118, pVar=0x0, transNum=1, pNextState=0x81e5134) at ../sncExample.stt:27
27 } state high
(gdb) cont
Continuing.
...
Of course, having the source of the module locally makes some of GDB's output more readable. If this is necessary,
scp
the sources of the module (
the sources only
) to the production side in a temporary folder and use a directory command.
[mshankar@testarch-32-mshankar iocmstest01]$ ls /home/mshankar/temp/test/modules/seq/
CVS seq_ca.c seqCom.h seq.h seq_mac.c seq_prog.c seq_qry.c seq_queue.h
seq_task.c Makefile seq_cmd.c seq_debug.h seq_if.c seq_main.c seqPvt.h seq_queue.c
seq_release.pl
[mshankar@testarch-32-mshankar iocmstest01]$ gdb ../../bin/linux-x86/testSeq
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6)
(gdb) directory /home/mshankar/temp/test/mshankarTestSeq/CATER_12345/testSeqApp/src
Source directories searched: ...
(gdb) directory /home/mshankar/temp/test/modules/seq
Source directories searched: ...
(gdb) tty /dev/pts/4
(gdb) set args st.cmd
(gdb) run
Starting program: /home/mshankar/temp/test/mshankarTestSeq/CATER_12345/bin/linux-x86/testSeq st.cmd
...
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb6af2b70 (LWP 6074)]
0x08051018 in A_ss1_low (ssId=0x81dbe18, ... pNextState=0x81dbe34) at ../sncExample.stt:13
13 printf("changing to high - %d\n", *q);
...
(gdb) bt
#0 0x08051018 in A_ss1_low (ssId=0x81dbe18, ... pNextState=0x81dbe34) at ../sncExample.stt:13
#1 0x08066d7b in ss_entry (arg=0x81dbe18) at ../seq_task.c:367
#2 0x08066ff1 in sequencer (arg=0x81dbc28) at ../seq_task.c:105
#3 0x0811825b in start_routine (arg=0x81dc080) at ../../../src/libCom/osi/os/posix/osdThread.c:385
#4 0x00973a49 in start_thread () from /lib/libpthread.so.0
#5 0x008b0e1e in clone () from /lib/libc.so.6
(gdb) up
#1 0x08066d7b in ss_entry (arg=0x81dbe18) at ../seq_task.c:367
367 st->actionFunc(ss, var, transNum, &ss->nextState);
That's it. Theoritically, you should be able to do everything you can do in development; some usecases may require a couple of extra GDB commands.