************************************************************************* Instructions for building Elf, Bear and BgsApp, linking with Great Circle then looking at leak reports. Also included are other miscellaneous procedures which have been found useful.. C. L. Davis 11-9-03 ************************************************************************** Some of these instructions, particularly those directly involving GC can be seen at http://heplibw3.slac.stanford.edu/BFROOT/www/Computing/Programming/QC/LeakCheck/GreatCircle/index.html Many of the steps for building Elf, Bear and BgsApp are identical. Differences are clearly indicated. N.B. Commands to be entered are followed by **** &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Instructions for building and running Bear and Elf ------------------------------------------------------------------------ Logon to SLAC ------------- N.B. To determine whether a particular build is available for a specific release, look at the directory $BFDIST/releases/a.b.c/lib/ where a.b.c is the release number you are interested in. This will tell you which builds are available for this release, e.g. SunOS5, SunOS5-noOptimize-Debug, etc. Set up the environment ----------------------- % Work space setenv MYWORK /afs/slac.stanford.edu/g/babar/work/c/cldavis **** % I have set an alias, 'setmywork', which will set this environment variable at SLAC when logged in as cldavis % Execute srtpath, and respond to the prompts with the relevant release number (e.g. 11.7.0) and operating system (e.g. SunOS5-noOptimize-Debug). Note that it is necessary to use the noOptimize-Debug version in order to provide debugging information to locate memory leaks. srtpath **** Create the release directory for Elf/Bear/BgsApp and add the packages you will need. ------------------------------------------------------------------------- newrel -s $MYWORK -t 11.7.0 gc1170 **** cd gc1170 *** addpkg Bear **** addpkg Elf **** addpkg BgsApp **** addpkg Moose **** addpkg workdir **** addpkg -h HOWTO **** <<<< Obtain the HOWTO packages % Then 'addpkg' any further packages necessary to create a working release. Note: 'newrel' and 'addpkg' need only be performed once for each release. ----- When running Elf/Bear/Bogus/etc a second time for the same release 'cd' to the release directory then enter 'srtpath' and proceed as below. Compile ------- % Interactively (not at SLAC) gmake lib **** % Batch queue submission bsub -q bldrecoq -o lib.log gmake lib **** Special requirements for Great Circle usage ------------------------------------------- % The maximum GC monitor file size defaults to 40MB. For most BaBar % applications this is insufficient. It can be overridden as % follows. Be careful. GC maps an amount of memory equal to % GS_MAX_MONFILESIZE when the job starts. Setting GS_MAX_MONFILESIZE % too large can cause a job to run out of memory. The best bet is to $ set GS_MAX_MONFILESIZE about 50 MB larger than the actual size of % the GC monitor file. A test run with GS_MAX_MONFILESIZE set to % say 500 MB will be necessary to determine the typical size of the % monitor file. setenv GS_MAX_MONFILESIZE 1000 **** % In order to redirect the GC monitor and log files to somewhere other % than /tmp the environment variables GC_MONDIR and GS_LOG_FILE_PATH % must be set. The % example below redirects these files to the directory % /afs/slac.stanford.edu/u/ec/cldavis/gc1023/workdir setenv GS_MONDIR /afs/slac.stanford.edu/u/ec/cldavis/gc1023/workdir setenv GS_LOG_FILE_PATH /afs/slac.stanford.edu/u/ec/cldavis/gc1023/workdir % Alternatively it may be convenient to redirect these files to % $MWORK setenv GS_MONDIR $MYWORK setenv GS_LOG_FILE_PATH $MYWORK % Set the GC license file environment variable (Sun) setenv GS_LICENSE_FILE $BFROOT/package/Geodesic/gc7016/sun4x_58/greatcircle/vw6/license/gslicense.txt % or under linux setenv GS_LICENSE_FILE $BFROOT/package/Geodesic/c7016/i386_linux24/greatcircle/gnu/license/gslicense.txt % GC variable to prevent excessively large log files setenv GS_MEM_FREED_BEFORE_NEXT_FOOTPRINT_REDUCE 1000000000 Build the executable -------------------- % For Bear gmake Bear.bin **** Interactive bsub -q bldrecoq -o bear.log gmake Bear.bin **** Batch % For Elf gmake Elf.ElfUserXtcApp **** Interactive bsub -q bldrecoq -o elf.log gmake Elf.ElfUserXtcApp **** Batch % BgsApp instructions not tested since the switch to SunOS58 gmake BgsApp.bin **** Interactive bsub -q bldrecoq -o bgs.log gmake BgsApp.bin ***** Batch % For Moose gmake Moose.bin **** Interactive bsub -q bldrecoq -o moose.log gmake Moose.bin **** Batch Special Note ------------ Sometimes it is necessary to explicity specify the version when compiling and linking e.g. bsub -q bldrecoq -R sol8 -o elf.log gmake Elf.ElfUserXtcApp Debugging issue --------------- % To get the 'core' written to a scratch area enter the following commands touch $MYWORK/core **** ln -s $MYWORK/core core **** Execution - Bear ---------------- % While you are waiting for the link to complete, do the following. They can also be done earlier, if you prefer sp3analboot **** <<<< depends on which data you want to run or analboot2 **** <<<< on. SP3 for MC cd workdir **** gmake setup **** % The below was what was needed under 8.3.3 for running real data. As of 17-5-00 I don't know whether this is still needed. However, as of the same date it appears that MC data can be analysed without this variable set. setenv BearConfigPatchSet Run1 **** <<<< this tells the executable <<<< that you will be running <<<< with real data, not MC % The .tcl file to run Bear is ordinarily taken from the Bear subdirectory, and moved in to the workdir subdirectory, as in cp ../Bear/BearProduction.tcl **** % and then it is edited to put in the name of the collection (data set) to read and a number of events to process % and then run the job, cd workdir **** $BFROOT/package/Geodesic/gc7016/sun4x_58/greatcircle/vw6/bin/gsinject -t -d -n -v ./bin/$BFARCH/BearApp BearProduction.tcl **** % or make the appropriate adjustments for linux Execution - Elf --------------- % Elf requires a 'personal' database in order to test all aspects successfully. Therefore before execution your own db must exist. See instructions below for creation of personal db. Elf reads xtc files, not from the database. However, Elf writes to the database. setboot **** (Must be executed, unless already done during this session) cd workdir **** gmake setup **** cp -p ../Elf/ElfPatches.tcl . **** (You may chose not to do this) setenv ElfNoRun yes **** % Above is optional. When set Elf returns '>' and waits for exit before processing events setenv ElfWriteDb yes **** setenv ElfOutputCollection **** % Where collection name is the name of 'your' db collection to be written to setenv ElfConfigPatchSet Run1 **** % As of September 2002 it appears that ElfPromptCalib must also be % set setenv ElfPromptCalib Default $BFROOT/package/Geodesic/gc7016/sun4x_58/greatcircle/vw6/bin/gsinject -t -d -n -v ./bin/$BFARCH/ElfUserXtcApp -n 5 -f babar-0000.xtc ../Elf/ElfProduction.tcl % or with appropriate changes for linux % Where the above will run over 5 events from the xtc file 'babar-0000.xtc' The exact location of the desired xtc file can be entered OR a soft link. To skip events at the beginning of an xtc file use "-s xxx" where xxx is the number of events to skip. Note that if events are skipped they are counted. For example, to analyse 500 events starting at event 100 you need "-n 600 -s 100". Further details can be found in the Elf README. % If the environment variable ElfNoRun has been set, upon execution of the above command the programme will eventually present a '>' prompt. At which type 'exit' Execution - Moose --------------- % Moose requires(?) a 'personal' database in order to test all aspects successfully. Therefore before execution your own db must exist. See instructions below for creation of personal db. setboot **** (Must be executed, unless already done during this session) cd workdir **** gmake setup **** setenv CONDALIAS Sep2001 setenv RUNNUM 400000 setenv NEVENT 50 $BFROOT/package/Geodesic/gc7016/sun4x_58/greatcircle/vw6/bin/gsinject -t -d -n -v ./bin/$BFARCH/MooseApp PARENT/Moose/MooseProduction.tcl % or with appropriate changes for linux % Where the above will generate 50 events. Staging xtc files (for Elf) --------------------------- Use tcstage 0012630-001 % The above xtc file number is one which Peter Elmer routinely uses. tcstage will return the absolute location of the xtc file after which you can set a soft link. The location of the xtc file is given by /nfs/tcstage/tcfiles/babar-0012630-001.xtc % If the xtc is not on disk it will be staged to disk. This can take a long time... % N.B. 6-3-02. On the linux machines RH6.2 files larger than 2GB cannot be seen. This means Elf jobs reading typical xtc files will crash. Need to use RH7.2 linux machines - noric-new. Execution for BgsApp -------------------- setboot ****** (BgsApp needs a personal database to write its events to. Create in the same way as for Elf) cd workdir ****** gmake setup ****** setenv RUNNUM 400000 ****** User chosen run number setenv NEVENT 50 ****** Events to generate setenv CONDALIAS Mar2001 ****** Conditions setenv CONFIGALIAS Mar2001-Cfg ****** Configuration setenv BgsOutputCollection ryd21-test1 ****** Output collection name % Execution. >&! bogus.log option writes output to a file. For GC % runs must be interactive. $BFROOT/package/Geodesic/greatcircle/6.0.0.9/solaris/sparc/vw6/bin/gsinject -t -d -n -v BgsApp ../BgsApp/BgsUserExampleEvtGen.tcl % or with appropriate changes for linux 6.0.0.9/linux/x86/gnu/... % The above gsinject command has not yet been tested (12-2-02) &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& How to create your very own FDB ------------------------------- % These instructions may be found in HOWTO/HOWTO-database-importing % There are three options. Elf usage writes to the event store but % in general does not need its own conditions and configurations. % This can be accomplished by creating a personal db then defining % where standard conditions and configurations can be found. % Go to the release for which you want to create the db. Create a .bbobjy file, unless it already exists. The .bbobjy file should contain the following line FD_NUMBER=26016 % where 26016 is any valid FDID number. setboot **** gmake database.import **** gmake database.load BYPASS_CONDITIONS_LOAD=yes BYPASS_CONFIG_LOAD=yes **** % Then tell it to get the conditions/configurations from some other % place: BdbDomainBootNames con BdbDomainBootNames cfg % where is the BOOT file for some FDB actually % containing conditions/configurations. At this time (19 Aug 2002), % you can access the con/cfg used by physboot at SLAC by typing: BdbDomainBootNames con /nfs/objyboot2/objy/databases/production/boot/physics/V1/ana/conditions/BaBar.BOOT BdbDomainBootNames cfg /nfs/objyboot2/objy/databases/production/boot/physics/V1/ana/conditions/BaBar.BOOT % or alternatively you can use those used by simuboot at SLAC: BdbDomainBootNames con /nfs/objyboot2/objy/databases/production/boot/simulation/V1/ana/conditions/BaBar.BOOT BdbDomainBootNames cfg /nfs/objyboot2/objy/databases/production/boot/simulation/V1/ana/conditions/BaBar.BOOT % It is not guarenteed that the BOOT files for the conditions used by % physboot and simuboot will stay the same over time (and the values % will be different at in2p3 and other sites). To verify what FDB/Boot % file is being used for conditions for any given production federation % set your OO_FD_BOOT to point to it (e.g. by typing 'physboot') and % then type 'BdbDomainBootNames' (with no arguments). The printout % will tell you the BOOT file is being used for the conditions/ % configurations. % In general using the con/cfg FDB's used by physboot or simuboot % should be a good bet for most situations. % Note also the 'BdbDomainBootNames con ' will ask you % for confirmation, if you want to skip that you can add an option to % skip confirmation: 'BdbDomainBootNames -f con '. % When using a new release, the same database can be used so long as the schema is updated. Updating the schema is achieved via ooschemaupgrade -infile /afs/slac.stanford.edu/g/babar/Bdb/Reference/current/schema.dmp $OO_FD_BOOT % It may also be necessary to update the conditions and configuration. In this case proceed via the two commands below BdbDmnImport -replace /nfs/objyserv3/objy/databases/snapshots/opr/latest/Conditions.tdf BdbDmnImport -replace /nfs/objyserv3/objy/databases/snapshots/opr/latest/Configuration.tdf % conditions and configurations can be updated from ir2 using BdbDmnImport -replace /nfs/objyserv3/objy/databases/snapshots/ir2/current/Conditions.tdf BdbDmnImport -replace /nfs/objyserv3/objy/databases/snapshots/ir2/current/Configuration.tdf &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Some useful DB related commands -------------------------------- % Deleting a created db is performed via gmake database.deleteboot CONFIRM_DELETE=yes **** % returning the name of the db currently selected echo $OO_FD_BOOT **** % return the size of a database du -sk **** % e.g. du -sk /nfs/objyserv7/objy/databases/user1/cldavis/26016/BaBar.BOOT % returns 8 /nfs/objyserv7/objy/databases/user1/cldavis/26016/BaBar.BOOT % There are sub directories in each release e.g. du -sk /nfs/objyserv7/objy/databases/user1/cldavis/26016/* % returns 8 /nfs/objyserv7/objy/databases/user1/cldavis/26016/BaBar.BOOT 3592 /nfs/objyserv7/objy/databases/user1/cldavis/26016/BaBar.FDB 8744888 /nfs/objyserv7/objy/databases/user1/cldavis/26016/conditions 34728 /nfs/objyserv7/objy/databases/user1/cldavis/26016/configuration 118864 /nfs/objyserv7/objy/databases/user1/cldavis/26016/events 992 /nfs/objyserv7/objy/databases/user1/cldavis/26016/global.bdb 1048 /nfs/objyserv7/objy/databases/user1/cldavis/26016/management.bdb 3736 /nfs/objyserv7/objy/databases/user1/cldavis/26016/metadata 8 /nfs/objyserv7/objy/databases/user1/cldavis/26016/oo_BaBar_65535_shire01_27166_cldavis_1.JNL 8 /nfs/objyserv7/objy/databases/user1/cldavis/26016/oo_BaBar_65535_tersk03_21667_cldavis_1.JNL 8 /nfs/objyserv7/objy/databases/user1/cldavis/26016/oo_BaBar_65535_tersk03_28471_cldavis_1.JNL % Available db free space df -k <$OO_FD_BOOT> **** % To interogate the database use BdbLoader **** mod talk BdbInspector **** returns prompt 'BdbInspector>' % responding with 'help' will list all possible options Removing database locks ----------------------- % First try oocleanup -local **** % If this does not work, take a look at the current locks with oolockmon | less **** % or locate your lock using oolockmon | grep 'uid' **** % where 'uid' is your unix id found via the command id **** % Assuming the PID of the process you are trying to remove the locks on no longer exists, try oocleanup -deadowner -transaction **** % where is the TransID in the table of locks given by the lock monitor In order to interpret the lock monitor information the result of the following might be needed id **** &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Which libraries are used in the link ? -------------------------------------- % In order to see exactly which libraries are used in linking use setenv VERBOSE yes **** % before performing the link statement (will it work for batch execution ?) &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Problems with running the chosen release ? ------------------------------------------------ A) If the chosen release has problems (not due to GC) try the following: e.g. Bear 8.6.3a did not have a SunOS5-noOptimize-Debug build necessary for GC debugging. 8.6.3 did have a SunOS5-noOptimize-Debug build. However, 8.6.3 has link/compile/run-time errors. In order to create useable SunOS5-noOptimize-Debug 8.6.3 build do the addpkg below. This includes all the 8.6.3a tags (fixes) to the 8.6.3 build. N.B. Using 8.6.3b would include all the 8.6.3a tags as well as 8.6.3b tags. addpkg -f $BFROOT/dist/releases/8.6.3a/tagFile **** After executing this proceed to the "gmake Elf.ElfUserXtcApp" (or similar) to create the new executable. B) Note on "ddl" stuff Peter Elmer says... "ddl stuff can only be compiled against a database of your own." Therefore a personal database must exist at compile time. I presume the database created above would work - but this takes a long time. A short cut appears to be simply: Create .bbobjy file setboot **** gmake database.import BYPASS_LOAD=yes **** C) BFDIST environment variable This by default set to /afs/slac.stanford.edu/g/babar/dist For some releases there is not enough space in the 'afs partition' for the noOptimize-Debug builds. If the build fails, for example certain libraries are not found try resetting the BFDIST variable setenv BFDIST /nfs/farm/babar/bfdist **** D) Recompile and Link If it becomes necessary to recompile and link - for example you want to make sure you are starting from scratch do gmake clean ***** This will remove all the bin/ lib/ tmp/ etc directories E) Disk Space Issues Occasionally programs will crash due to insufficient disk space. This could be due to exceeeding your personal quota OR a system wide problem (particularly for database storage). There is nothing you can do with a system wide shortage apart from sending the relevant individual an email. If the problem is personal storage enter the Unix command fs lisquota ***** This will tell you how much space you have available. Delete unecessary files to solve the problem....(!) F) tmp/ location /afs/slac.stanford.edu/g/babar/work is nfs disk space. If problems are encountered during linking (e.g. undefined globals) try resetting tmp/ to afs. Remove the exisiting tmp/ link and reinstall directories rm -r tmp/ mkdir tmp gmake installdirs Finally, recompile and link... G) TCL Path List In order to find out which modules are being called in your analysis look for the lines in your .tcl below #List the path for tcl debugging purposes, if required. #echo --------------------------------------------------- #path list #echo --------------------------------------------------- Uncomment the "path list". When you run the path list will be dumped into your log file. &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& To find out which versions exist for different architectures ------------------------------------------------------------ di $BFDIST/releases/a.b.c/bin/ ****** &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Debugging after a program crash ------------------------------- % Look in "HOWTO-Basic-Debugging" % Addendum: If a core file is not presented check that your shell is set to produce a coredumpfile. Do 'limit'; if coredumpfilesize is unlimited you are OK. If not set it to unlimited - type 'unlimit coredumpsize' If you still don't get a core file try adding the line action enable NameAction **** % to the Patches.tcl in workdir % If you still don't get a core file try running the analysis 'within' the debugger. This will not produce a core file but it should return a trace of where the job crashed. This is achieved as follows: dbx **** (for solaris) gdb **** (for linux) % At the dbx prompt catch fpe % this will 'catch' floating point exception bugs run <...> % where ... are the arguments that would normally be included to execute outside the debugger % When the job crashes, type 'where' at the dbx prompt to get a traceback % When running with GC it is also possible to use the debugger gdb $BFROOT/package/Geodesic/gc7016/i386_linux24/greatcircle/gnu/bin/gsinject (for linux) dbx $BFROOT/package/Geodesic/gc7016/sun4x_58/greatcircle/vw6/bin/gsinject (for solaris) % then at the debug prompt gdb> run -t -d -n -v ./bin/$BFARCH/ElfUserXtcApp -n 500 -f ...... &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Looking at the GC leak report ! -------------------------------- % gcmonitor must be running. To start gcmonitor cd $BFROOT/package/Geodesic/gc7016/sun4x_58/greatcircle/vw6/bin/ **** /i386_linux24/greatcircle/gnu/bin/ ./gcmonitor & **** % gcmonitor is started with the default port 50565. % If gcmonitor is already running it does not need to be restarted. To check if it is running do ps -aef | grep gcmonitor **** % There is an issue here about the processes listed having "lost their token". If jobs are old they probably have "lost their token". All in all it seems best to start gcmonitor with a new port, for example, ./gcmonitorBBR -p 50566 & **** (from the Geodesic directory above) % There seems to be no limit on the port numbers one can use. % To look at the leak report use Netscape to access http://tersk06.slac.stanford.edu:50565 **** % (use the appropriate machine and/or port number) % Once at the GC page click on "select program" to obtain the list of available reports. % Reports are stored in /tmp/ (or in the directory to which GC_MONDIR points) of the form gc12345.mon. Along with everything else in /tmp/ their lifetime is severely limited. The best (to date) way of saving the important information from a report appears to be to save the 'html' from netscape. At SLAC it is convenient to save these html files in $BFROOT/doc/HyperNews/hndocs/aux/cldavis/ (where the user must first create his/her own directory after /aux/). They can be accessed through netscape via the URL http://babar-hn.slac.stanford.edu:5090/hn/aux/cldavis/filename The disadvantage of these html files is that the number of leaks contained is defined when the report is saved. Typically I have been saving the "top 25" leaks (file size about 150 KB). The only way to look at more than this number of leaks is to go back to the original monitor report. The actual monitor files (e.g. gc12345.mon) are very large, typically 200 MB unzipped, 30MB gzipped. As space allows certain of these monitor files will be gzipped and saved in the QA directory $BFROOT/QA/GC/ [Actual files are stored in $BFROOT/QA/etc/vol2/GC, (or ../vol1/.., ../vol3/..) with links to $BFROOT/QA/GC/] Monitor files may be archived. Look at http://www.slac.stanford.edu/comp/unix/unix-bkup/adsmarchive.htm and http://www.slac.stanford.edu/comp/unix/unix-bkup/adsmrestore.htm Interpreting the Leak Reports ----------------------------- % This is hard to do unless you happen to be a C++ expert and are very familiar with all modules. Sometimes the report itself is sufficient to locate the offending module/file name. If not try using 'srtglimpse' to locate the offending code. srtglimpse -H a.b.c 'string' **** % searches Babar code in release a.b.c for 'string'. Output is a list of files in which 'string' occurs. Note: 'string' cannot be more than 32 characters long. For further information on srtglimpse see the man page or type srtglimpse -h **** % The latest version of Great Circle also provides a way to access the offending code for each memory leak, by specifying the directory in which the source is located. As of 12-2-02, this method does not seem to be superior to srtglimpse. &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& Using REMEDY to post leak problems ---------------------------------- % All leaks found in GC should be reported to package coordinators via the 'REMEDY' system. It is appropriate to reference the location of the relevant leak report for use of the responsible package co-ordinator. See section on looking at leak reports for suggested location of reports. % Package co-ordinators are listed at http://www.slac.stanford.edu/BFROOT/www/Computing/Offline/SoftwareAdmin/SoftwareAdmin.html % then choose 'package'. % Keep track of posted remedy reports. If they are not fixed in a 'reasonable' amount of time contact Peter Elmer. &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&& To obtain details about MC collections use bfreport. ---------------------------------------------------- bfreport -d 213341 ***** % to obtain detailed information about run 213341. &&&& In order to get additional data onto disk for a particluar run. e.g. Bear seems to nee "raw", "sim" and "tru" data. Use collstagein. collstagein -include raw ***** &&&& In order to find out what part of a collection is actually on disk use BdbDistScan -summary ***** &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&