Developing and debugging IOCS when deploying with cram

Motivation

cram is a tool used to deploy applications to multiple facilities - LCLS, FACET, SPEAR etc. It does this by minimizing the dependencies of the application on the environment; that is, by requiring the application to be reasonably self-contained. All development (compiles/builds/linking/packaging) is done in the development environment (for example, lcls-dev2) and the application contents (binaries) are rsync'ed over to the various facilities. We strongly encourage you to do all testing/QA in the development environment; however, we do recognize that this is not always possible in the case of IOC applications. This document outlines some general approaches to developing and debugging IOCS when deploying with cram.

Developing/testing IOCS in development

If you can develop and test your IOC in the development environment, we strongly encourage you to do so. In addition to not being restricted to PAMMS/POMMS, you can also get a quick Make a change -> Compile -> Restart IOC -> Debug cycle. Assuming your IOC has been migrated to cram, here's an approach using soft links. On lcls-dev2,
  1. Checkout your IOC application in your workspace using eco.
    
    [mshankar@lcls-dev2 workspace]$ pwd
    /u/cd/mshankar/workspace
    [mshankar@lcls-dev2 workspace]$ eco
    Enter name of module/package to checkout: mshankarTestSeq 
    Enter name of tag or [RETURN] to use HEAD>
    Using MAIN_TRUNK. The name of the directory will be 'MAIN TRUNK'.
    ...
    [mshankar@lcls-dev2 workspace]$ cd mshankarTestSeq/MAIN_TRUNK/
    [mshankar@lcls-dev2 MAIN_TRUNK]$ pwd
    /u/cd/mshankar/workspace/mshankarTestSeq/MAIN_TRUNK
    
    
  2. In the iocTop for your software package, create a softlink to the folder containing your application as created in the previous step.
    
    [mshankar@lcls-dev2 MAIN_TRUNK]$ pwd
    /u/cd/mshankar/workspace/mshankarTestSeq/MAIN_TRUNK
    [mshankar@lcls-dev2 MAIN_TRUNK]$ cd $EPICS_IOC_TOP/mshankarTestSeq
    [...]$ ln -s /u/cd/mshankar/workspace/mshankarTestSeq/MAIN_TRUNK mshankar_development
    [mshankar@lcls-dev2 mshankarTestSeq]$ 
    
    
  3. The softlink created in the previous step mshankar_development should show up as a valid release in cram. Upgrade your test IOC to use this release.
    
    [mshankar@lcls-dev2 MAIN_TRUNK]$ pwd
    /u/cd/mshankar/workspace/mshankarTestSeq/MAIN_TRUNK
    [mshankar@lcls-dev2 MAIN_TRUNK]$ cram up -f Dev -i sioc-b34-mstest01 mshankar_development
    Upgraded ioc sioc-b34-mstest01 in  package mshankarTestSeq in facilty Dev of type SIOC to \...
    	mshankar_development
    [mshankar@lcls-dev2 MAIN_TRUNK]$ cram ls
    Current versions on facility: LCLS 
    Current versions on facility: FACET 
    Current versions on facility: TestFac 
    Current versions on facility: Dev 
    Current master release => V_0_0_2
    IOC: sioc-b34-mstest01 => mshankar_development (*)
    [mshankar@lcls-dev2 MAIN_TRUNK]$ 
    
    
  4. That's it. You can now Make a change -> Compile -> Restart IOC -> Debug using the test IOC sioc-b34-mstest01.

Compiling in dev and testing in production.

In some cases, it is not possible to fully test an IOC in development. In these cases, one has to get some time from MCC operations on POMM/PAMM days and test in production. Here's a sample run with two windows; one in development and one in production.
  1. In the development window, use the -d option of eco to check out MAIN_TRUNK (or the appropriate branch) to a folder for this release.
    
    [mshankar@lcls-dev2 workspace]$ eco -d CATER_12345
    Enter name of module/package to checkout: mshankarTestSeq
    Enter name of tag or [RETURN] to use HEAD>
    Using MAIN_TRUNK. The name of the directory will be CATER_12345.
    ...
    [mshankar@lcls-dev2 workspace]$ cd mshankarTestSeq/CATER_12345/
    [mshankar@lcls-dev2 CATER_12345]$ pwd
    /u/cd/mshankar/workspace/mshankarTestSeq/CATER_12345
    
    
  2. The folder checked out in the previous step, CATER_12345, is a valid release that can be used in cram. After Make a change -> Compile, use cram push to push the release over to the production.
  3. Upgrade the IOC you'll be using for testing using cram upgrade.
  4. In the production window, Restart IOC -> Debug.
  5. If you need to make changes to the release, in the development window, Make a change -> Compile, and then use cram push -f to update the already pushed contents of this release.
    
    [mshankar@lcls-dev2 CATER_12345]$ vim testSeqApp/src/sncExample.stt 
    [mshankar@lcls-dev2 CATER_12345]$ make
    ...
    make[1]: Leaving directory `/afs/slac.stanford.edu/u/cd/mshankar/workspace/mshankarTestSeq/...
    [mshankar@lcls-dev2 CATER_12345]$ cram push -f
    
    
  6. Repeat the previous two steps as often as necessary. cram uses rsync to copy over the contents of the folder - this is very efficient in what it does and only updates the files that actually changed.
  7. Once you have a satisfactory release, tag the release with the appropriate tag.
  8. To be consistent with department policy, use eco to generate a new tagged release with the tag created in the previous step and cram the newly tagged release.

Debugging using GDB in production

If you have symbols enabled in your IOC binaries, you can use GDB on the production side to examine core dumps, set breakpoints etc. You should be able to do anything in production that you can do on the development side with some additional steps. The big difference is that the sources for the dependencies for your IOC are not available on the production side so, if you need these, you'll have to scp them separately and point GDB to them.
Here are some examples of debugging a test IOC in production; note, for this example, I am mimicking what cram does in a VM without any afs access.

[mshankar@testarch-32-mshankar CATER_12345]$ ls -ltr /afs
ls: cannot access /afs: No such file or directory
[mshankar@testarch-32-mshankar CATER_12345]$ 

The test IOC was built statically on lcls-dev2 and have symbols by default. However, I turned off optimization using USR_CFLAGS=-O0.

[mshankar@lcls-dev2 CATER_12345]$ file bin/linux-x86/testSeq 
bin/linux-x86/testSeq: ELF 32-bit LSB ... dynamically linked (uses shared libs), not stripped
[mshankar@lcls-dev2 CATER_12345]$ ldd bin/linux-x86/testSeq 
	linux-gate.so.1 =>  (0x00ac4000)
	libpthread.so.0 => /lib/libpthread.so.0 (0x007c8000)
	libreadline.so.5 => /usr/lib/libreadline.so.5 (0x00d78000)
	libncurses.so.5 => /usr/lib/libncurses.so.5 (0x009be000)
	librt.so.1 => /lib/librt.so.1 (0x00909000)
	libdl.so.2 => /lib/libdl.so.2 (0x00ef5000)
	libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x002ee000)
	libm.so.6 => /lib/libm.so.6 (0x00700000)
	libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00465000)
	libc.so.6 => /lib/libc.so.6 (0x00471000)
	/lib/ld-linux.so.2 (0x006c0000)

The test IOC uses the sequencer module and has a bug in the sequence that should cause it to core dump.

[mshankar@testarch-32-mshankar CATER_12345]$ cat testSeqApp/src/sncExample.stt
program sncExample
double v;
double* q;  <-- This variable is unassigned and thus should cause a NPE in the printf below
assign v to "mshankarHost:aiExample1";
monitor v;

ss ss1
{
	state low
	{
	    when(v>5.0)
	    {
		printf("changing to high - %d\n", *q);
	    } state high

If I run this IOC on lcls-dev2, it should crash in 10 seconds and generate a core file.

[mshankar@lcls-dev2 iocmstest01]$ ulimit -c unlimited
[mshankar@lcls-dev2 iocmstest01]$ pwd
/u/cd/mshankar/workspace/mshankarTestSeq/CATER_12345/iocBoot/iocmstest01
[mshankar@lcls-dev2 iocmstest01]$ ./st.cmd 
#!../../bin/linux-x86/testSeq
...
sncExample[0]: all channels connected & received 1st monitor
save_restore:readReqFile: unable to open file info_settings.req. Exiting.
epics> Segmentation fault (core dumped)

Still on lcls-dev2, here's me looking into the core file, looking at the stack trace and examining variables.

[mshankar@lcls-dev2 iocmstest01]$ gdb ../../bin/linux-x86/testSeq core.8103 
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-45.el5)
...
Program terminated with signal 11, Segmentation fault.
#0  0x08051018 in A_ss1_low (ssId=0x87fd118, ... pNextState=0x87fd134) at ../sncExample.stt:13
13			printf("changing to high - %d\n", *q);
(gdb) bt
#0  0x08051018 in A_ss1_low (ssId=0x87fd118, ... pNextState=0x87fd134) at ../sncExample.stt:13
#1  0x08066d7b in ss_entry (arg=0x87fd118) at ../seq_task.c:367
#2  0x08066ff1 in sequencer (arg=0x87fcf90) at ../seq_task.c:105
#3  0x0811825b in start_routine (arg=0x87fd380) at ../../../src/libCom/osi/os/posix/osdThread.c:385
#4  0x004e6912 in start_thread () from /lib/libpthread.so.0
#5  0x0025e7ce in clone () from /lib/libc.so.6
(gdb) up
#1  0x08066d7b in ss_entry (arg=0x87fd118) at ../seq_task.c:367
367			st->actionFunc(ss, var, transNum, &ss->nextState);
(gdb) down
#0  0x08051018 in A_ss1_low (ssId=0x87fd118, ... pNextState=0x87fd134) at ../sncExample.stt:13
13			printf("changing to high - %d\n", *q);
(gdb) print q
$1 = (double *) 0x0

Now, we switch over to the production side (aka the VM) and generate a core dump.

[mshankar@testarch-32-mshankar iocmstest01]$ ulimit -c unlimited
[mshankar@testarch-32-mshankar iocmstest01]$ pwd
/home/mshankar/temp/test/mshankarTestSeq/CATER_12345/iocBoot/iocmstest01
[mshankar@testarch-32-mshankar iocmstest01]$ ./st.cmd 
#!../../bin/linux-x86/testSeq
...
sncExample[0]: all channels connected & received 1st monitor
epics> Segmentation fault (core dumped)
[mshankar@testarch-32-mshankar iocmstest01]$ 

Here's me looking into the core file, looking at the stack trace and examining variables. Note that this time, I have to tell GDB the location of my sources as the folder in production are different from the folders in development.

[mshankar@testarch-32-mshankar iocmstest01]$ gdb ../../bin/linux-x86/testSeq core.4774 
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6)
...
#0  0x08051018 in A_ss1_low (ssId=0x8c18d28, ... pNextState=0x8c18d44) at ../sncExample.stt:13
13	../sncExample.stt: No such file or directory.
	in ../sncExample.stt
Missing separate debuginfos, use: debuginfo-install...
(gdb) directory /home/mshankar/temp/test/mshankarTestSeq/CATER_12345/testSeqApp/src
Source directories searched: /home/mshankar/temp/test/mshankarTestSeq/CATER_12345/testSeqApp/src...
(gdb) bt
#0  0x08051018 in A_ss1_low (ssId=0x8c18d28, ... pNextState=0x8c18d44) at ../sncExample.stt:13
#1  0x08066d7b in ss_entry (arg=0x8c18d28) at ../seq_task.c:367
#2  0x08066ff1 in sequencer (arg=0x8c18b38) at ../seq_task.c:105
#3  0x0811825b in start_routine (arg=0x8c18f90) at ../../../src/libCom/osi/os/posix/osdThread.c:385
#4  0x00973a49 in start_thread () from /lib/libpthread.so.0
#5  0x008b0e1e in clone () from /lib/libc.so.6
(gdb) up
#1  0x08066d7b in ss_entry (arg=0x8c18d28) at ../seq_task.c:367
367	../seq_task.c: No such file or directory.
	in ../seq_task.c
(gdb) down
#0  0x08051018 in A_ss1_low (ssId=0x8c18d28, ... pNextState=0x8c18d44) at ../sncExample.stt:13
13			printf("changing to high - %d\n", *q);
(gdb) print q
$1 = (double *) 0x0

One does not need the sources; for example, in seq_task.c:367 there is a variable called st. One can print the contents of st without the sequencer sources; thus, I know I am still in the "low" state.

(gdb) up
#1  0x08066d7b in ss_entry (arg=0x8c18d28) at ../seq_task.c:367
367	../seq_task.c: No such file or directory.
	in ../seq_task.c
(gdb) print *st
$1 = {stateName = 0x8121fd0 "low", actionFunc = 0x8051004 <A_ss1_low>... 
delayFunc = 0x8050f69 <D_ss1_low>,  entryFunc = 0, exitFunc = 0, eventMask = 0x8121fc8, options = 0}

One can also set breakpoints within the module. Here I am breaking in the sequencer in seq_task.c:367

[mshankar@testarch-32-mshankar iocmstest01]$ gdb ../../bin/linux-x86/testSeq 
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6)
...
(gdb) directory /home/mshankar/temp/test/mshankarTestSeq/CATER_12345/testSeqApp/src
Source directories searched: /home/mshankar/temp/test/mshankarTestSeq/CATER_12345/testSeqApp/src:...
(gdb) break seq_task.c:367
Breakpoint 1 at 0x8066d5a: file ../seq_task.c, line 367.
(gdb) tty /dev/pts/4
(gdb) set args st.cmd
(gdb) run
...
Breakpoint 1, ss_entry (arg=0x81e5118) at ../seq_task.c:367
367	../seq_task.c: No such file or directory.
	in ../seq_task.c
...
(gdb) print *st
$1 = {stateName = 0x8121fd0 "low", ... <D_ss1_low>, 
  entryFunc = 0, exitFunc = 0, eventMask = 0x8121fc8, options = 0}
(gdb) step
A_ss1_low (ssId=0x81e5118, pVar=0x0, transNum=1, pNextState=0x81e5134) at ../sncExample.stt:27
27		    } state high
(gdb) cont
Continuing.
...

Of course, having the source of the module locally makes some of GDB's output more readable. If this is necessary, scp the sources of the module (the sources only) to the production side in a temporary folder and use a directory command.

[mshankar@testarch-32-mshankar iocmstest01]$ ls /home/mshankar/temp/test/modules/seq/
CVS       seq_ca.c   seqCom.h     seq.h     seq_mac.c   seq_prog.c  seq_qry.c    seq_queue.h 
seq_task.c Makefile  seq_cmd.c  seq_debug.h  seq_if.c  seq_main.c  seqPvt.h    seq_queue.c  
seq_release.pl
[mshankar@testarch-32-mshankar iocmstest01]$ gdb ../../bin/linux-x86/testSeq 
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6)
(gdb) directory /home/mshankar/temp/test/mshankarTestSeq/CATER_12345/testSeqApp/src
Source directories searched: ...
(gdb) directory /home/mshankar/temp/test/modules/seq
Source directories searched: ...
(gdb) tty /dev/pts/4
(gdb) set args st.cmd
(gdb) run
Starting program: /home/mshankar/temp/test/mshankarTestSeq/CATER_12345/bin/linux-x86/testSeq st.cmd
...
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb6af2b70 (LWP 6074)]
0x08051018 in A_ss1_low (ssId=0x81dbe18, ... pNextState=0x81dbe34) at ../sncExample.stt:13
13			printf("changing to high - %d\n", *q);
...
(gdb) bt
#0  0x08051018 in A_ss1_low (ssId=0x81dbe18, ... pNextState=0x81dbe34) at ../sncExample.stt:13
#1  0x08066d7b in ss_entry (arg=0x81dbe18) at ../seq_task.c:367
#2  0x08066ff1 in sequencer (arg=0x81dbc28) at ../seq_task.c:105
#3  0x0811825b in start_routine (arg=0x81dc080) at ../../../src/libCom/osi/os/posix/osdThread.c:385
#4  0x00973a49 in start_thread () from /lib/libpthread.so.0
#5  0x008b0e1e in clone () from /lib/libc.so.6
(gdb) up
#1  0x08066d7b in ss_entry (arg=0x81dbe18) at ../seq_task.c:367
367			st->actionFunc(ss, var, transNum, &ss->nextState);

That's it. Theoritically, you should be able to do everything you can do in development; some usecases may require a couple of extra GDB commands.