Using edg to run a BaBar job on the grid
This page is intended to provide the necessary hints to be able to run
a simple BaBar job on the grid, we will suppose that the user is physically
logged on at SLAC. These instructions are easily translatable to any other
site.
This project is in a fastly moving phase, so expect some changes
to occur soon.
The edg setup instructions is mainly coming from Gilbert.
Main topics:
- How to setup your EDG environment
- How to run a simple Beta application
- How to run a more complicated application using
a Storage Element and the Replica Catalog
Pre-requisite
You need to have a valid certificate issued by a recognized certification
authority and to have registered your certificate to the BaBar Virtual Organization
(VO). The following link
gives instructions to perform these preliminary steps.
Protecting your private key
Your private key (userkey.pem) should be stored in such a way that it cannot
be stolen by someone else. To do so you need to do the following:
- Create a ~/private/.globus directory, since it is in ~/private, this
directory can only be read by you
- Copy userkey.pem in ~/private/.globus
- Create a soft link from ~/.globus/userkey.pem to ~/private/.globus/userkey.pem
Setting up the edg environment
The edg User Interface (UI) is now correctly setup on any noric machine.
- Create a directory edg-test
- Create a directory /nfs/babar/grid01/the_first_letter_of_your_logonID/your_logonID
- Create a link from ~/.globus/.gass_cache to /nfs/babar/grid01/the_first_letter_of_your_logonID/your_logonID/.gass_cache.
This directory is used internally by globus/edg
- Define an alias: edgUI_env = 'source /afs/slac.stanford.edu/package/bbrgrid/edgUI-env.csh'
- Execute this alias each time you connect to SLAC and want to use
edg
In order to run any globus or edg command, you need to get a valid proxy
with "grid-proxy-init"
A simple EDG application
The purpose of the following paragraph is to explain how to setup a very
basic EDG application to run a Beta job. The executable, the log files and
the output n-tuples are transferred back to the submission site through the
Ressource Broker
Building a Beta
application
- Install a test release in the usual way, replacing "boutigny" by your
name and the release number by any suitable release
- newrel -t 12.2.1 12.2.1 -s $BFROOT/work/b/boutigny
- cd 12.2.1
- srtpath 12.2.1 Linux24
- addpkg BetaUser
- addpkg workdir
- gmake workdir.setup
- setenv BetaBdbMicro yes for Objy or setenv
BetaKanga yes for Kanga
- setenv BetaPhysMicro yes
- setenv PhysMicro yes
- setenv PhysAll yes
- setenv BetaRootTuple yes if you want to produce
ROOT-tuples instead of HBOOK
- gmake BetaUser.all
- Create a suitable tcl file in workdir: test.tcl
- Make a dry run of your executable in order to create a dumped tcl
(dumped.tcl). This part is now working, thanks to the hard work from
Asoka (See: http://babar-hn.slac.stanford.edu:5090/HyperNews/get/BaBarGrid/50.html
)
- In release 12.2.2 you need the following tags:
- Framework ads07Aug02
- RooModules ads07Aug02
- BdbModules ads30Jul02
- In release 12.4.0 no tags are necessary
- Then type:
- physboot (for Objy)
- BetaApp
- ev dumpOnBegin dumped.tcl
- source test.tcl
- exit
-
- If you wish you can now copy the BetaApp executable and the dumped.tcl
file to another directory, ${HOME}/edg-test/Objy in the current example.
- By default the "input add collection name"
command in BdbEventInput is checking the existence
of the collection in the current federation. In a real grid job one should
of course not assume that the input collection exists locally. In order to
turn off the collection existence check, one can use the -novalidate option:
"input add -novalidate collection name"
Preparing the edg job
- In the edg-test directory, you need to create a wrapup shell script
Beta.sh containing the following lines:
- #!/bin/csh
- set pwd = `pwd`;
- cd $BFDIST/releases/12.2.1
- srtpath 12.2.1 Linux24
- physboot
- cd $pwd
- ln -s $BFDIST/releases/12.2.1 PARENT
- BetaApp dumped.tcl
- The link is necessary for the Beta job to retrieve some data files.
- This scripts supposes that the $BFDIST variable will be correctly
setup by the remote host
- The physboot step sets up correctly the LD_LIBRARY_PATH for Objy
Still in the edg-test directory you need to create a
file Beta.jdl in Job Description Language in order to give all the necessary
instructions to edg to run the application
Executable = "Beta.sh";
InputSandbox = {"$HOME/edg-test/Beta.sh","$HOME/edg-test/Objy/BetaApp","$HOME/edg-test/Objy/dumped.tcl"};
StdOutput = "result.out";
StdError = "result.err";
OutputSandbox = {"result.out","result.err","framework.hbook"};
Rank = other.MaxCpuTime;
Requirements = Member ( other.RunTimeEnvironment
, "BABAR-Test-slac-01");
Environment = {"BetaBdbMicro=yes","BetaPhysMicro=yes","PhysMicro=yes","PhysAll=yes"};
- The BABAR-Test-slac-01 target corresponds to the SLAC Computing
Element (CE), "CCIN2P3" will target in2p3 and simply "BABAR" will target the
UK sites at large.
Submitting the job
Job submission is done with the command:
dg-job-submit Beta.jdl
This will return a URL which is a unique job ID
If you are not using the default Resource Broker (RB) you will have
to provide a special configuration file and pass it to dg-job-submit with
the -config parameter. The configuration file can be built from $EDG_LOCATION/etc/UI_Config_ENV.cfg
- You can then check your job status with:
dg-job-status 'URL'
Pay attention to the mandatory quotes
- Once you get the status "Output Ready" you can retrieve the output
sandbox with:
dg-job-get-output 'URL'
If things go wrong, it may be useful to check the status of the Resource Broker
(RB).
A More Complex EDG Application
Here we describe a more sophisticated application using the
Storage Element (SE) and the Replica Catalog (RC)
Preparing the BaBar
job
The BaBar executable is prepared in the same way as above,
using a recent release (12.4.0c), let's assume that we have:
- An Executable : "MyAnalysisApp"
- A "all.tcl" tcl file generated
with the Framework "ev dumpOnBegin" command
Goal
We want to be able to do the following:
- Copy the executable to a Storage Element
- Record it in the Replica Catalog with a Logical File Name
(LFN)
- Submit the job through the RB
- The RB selects a Computing Element (CE) and checks whether
the executable is available on the closiest SE
- If yes, the job is submitted to the CE
- The job produces an n-tuple
- The N-tuple is copied to the SE and the entry is recorded
to the RC
Recording the executable on the RC
At the moment we are using a RC maintained in manchester by Alessandra Forti.
To use the RC you need a RC configuration file like the following:
RC_REP_CAT_MANAGER_DN=cn=Manager,dc=gridpp,dc=ac,dc=uk RC_REP_CAT_MANAGER_PWD=xxxxxxxx (*) RC_REP_CAT_URL=ldap://bfb.hep.man.ac.uk:9011/rc=BaBarReplicaCatalog,dc=gridpp,dc=ac,dc=uk RC_LOGICAL_COLLECTION=ldap://bfb.hep.man.ac.uk:9011/lc=file collection,rc=BaBarReplicaCatalog,dc=gridpp,dc=ac,dc=uk
(*) Please ask for the password
Define an environment variable RC_CONFIG_FILE containing the
path name of the previous file
Find the address and the Mount Point of the SE where you want
to record the executable. These informations can be found from an LDAP query
to the site gatekeeper. For instance, targetting in2p3:
ldapsearch -x -H ldap://ccgridli08.in2p3.fr:2135
-b 'Mds-Vo-name=local,o=grid' objectclass=CloseStorageElement CloseSE MountPoint
where ccgridli08.in2p3.fr is the name of the GateKeeper
This command will return informations
like the following:
..... ccgridli07.in2p3.fr, ccgridli08.in2p3.fr:2119/jobmanager-bqs-A, ccgridli08.in2p3.fr, local, Griddn: closeSE=ccgridli07.in2p3.fr,ceId=ccgridli08.in2p3.fr:2119/jobmanager-bqs-A ,hn=ccgridli08.in2p3.fr,Mds-Vo-Name=local,o=Grid CloseSE: ccgridli07.in2p3.fr MountPoint: /edg/StorageElement/prod .....
So we see that the name of the SE closest to ccgridli08.in2p3.fr
is ccgridli07.in2p3.fr and that the file system mount point is: /edg/StorageElement/prod
We can create a directory on the storage element with the command:
edg-gridftp-mkdir gsiftp://ccgridli07.in2p3.fr/edg/StorageElement/prod/babar/boutigny
We can now copy and register the executable file with:
edg-replica-manager-copyAndRegisterFile -l boutigny/MyAnalysisApp -s bbr-gate01.slac.stanford.edu//afs/slac.stanford.edu/u/eb/boutigny/rel/12.4.0/bin/Linux24/MyAnalysisApp -d ccgridli07/edg/StorageElement/prod/babar/boutigny/MyAnalysisApp -e
One can check the presence of the file in the RC with:
ldapsearch -L -S "uc=*" "filename=*" uc path filename -b dc=gridpp,dc=ac,dc=uk -h bfb.hep.man.ac.uk -p 9011 -P 2 -x
JDL file
We suppose that the relevants files are stored at SLAC in a directory $HOME/edg-test/IN2P3
Executable = "Master.sh"; Arguments = "boutigny/MyAnalysisApp"; InputSandbox = {"$HOME/edg-test/IN2P3/job.sh","$HOME/edg-test/IN2P3/all.tcl","$HOME/edg-test/IN2P3/Master.sh","$HOME/edg-test/RC/RC.conf"}; StdOutput = "result.out"; StdError = "result.err"; OutputSandbox = {"result.out","result.err"}; InputData = "LF:boutigny/MyAnalysisApp"; ReplicaCatalog = "ldap://bfb.hep.man.ac.uk:9011/lc=file collection,rc=BaBarReplicaCatalog,dc=gridpp,dc=ac,dc=uk"; Rank = other.MaxCpuTime; Requirements = Member (other.RunTimeEnvironment ,"CC-IN2P3") && other.OpSys == "RH 7.2" && other.MaxCpuTime > 400 && other.MaxCpuTime < 50000; Environment = {"BetaBdbMicro=yes","BetaPhysMicro=yes","PhysMicro=yes","PhysAll=yes","BetaRootTuple=yes","BdbMicro=yes","NEVENTS=1000","NTUPLE=test.root"}; DataAccessProtocol = "gridftp";
The combination of instruction: "InputData - ReplicaCatalog and DataAccessProtocol"
will allow the RB to select a CE that own the executable. If the executable
is not found, then the job will be rejected before being sent to any CE.
In this JDL the "Environment" instruction is formatted to target a specific
CE at IN2P3, it can of course be changed to be less selective and to choose
any CE accessible to BaBar
Also notice that we need to send the RC.conf file through the Input Sandbox.
The scripts
We have chosen to split the execution script in two parst. One contains
the pure BaBar analysis stuff and is very similar to the basic example
job.sh
#!/bin/tcsh
set curdir = $PWD
cd $BFDIST/releases/12.4.0c srtpath 12.4.0c Linux24 setboot setenv OO_FD_BOOT /afs/in2p3.fr/group/babar/data/bootfiles/theData/physics-analysis/V1/9102/BaBar.BOOT cd $curdir ln -s $BFDIST/releases/12.4.0c PARENT
chmod +x Executable
Executable all.tcl
The second one is in charge to get the executable from the SE and to store
the output n-tuple to the SE
#!/bin/bash
exe=$1
GSF=`edg-brokerinfo getSelectedFile $exe gridftp` echo GSF: $GSF TFN=`echo $GSF | cut -c 8-300` echo TFN: $TFN
globus-url-copy gsiftp$TFN file://$PWD/Executable
chmod +x job.sh ./job.sh
CE=`edg-brokerinfo getCE | cut -d ":" -f 1` echo CE: $CE CSE=`edg-brokerinfo getCloseSEs` echo CSE: $CSE MP=`edg-brokerinfo getSEMountPoint $CSE` echo MP: $MP
globus-url-copy file://$PWD/$NTUPLE gsiftp://$CSE$MP/babar/boutigny/Ntuples/$NTUPLE
edg-replica-manager-registerEntry -l boutigny/Ntuples/$NTUPLE -s $CSE$MP/babar/boutigny/Ntuples/$NTUPLE -c RC.conf
With this executable, the closest SE nodename and the Mount Point are automatically
determined using the "edg-brokerinfo" command, so this script is generic and
can run on any CE.
last modified March 11, 2003 by
Dominique Boutigny, <boutigny@in2p3.fr>
|