SLAC PEP-II
BABAR
SLAC<->RAL
Babar logo
HEPIC E,S & H Databases PDG HEP preprints
Organization Detector Computing Physics Documentation
Personnel Glossary Sitemap Search Hypernews
Unwrap page!
Comp. Search
Who's who?
Meetings
FAQ Homepage
Archive
Environment
Administration
New User Info.
Web Info/Tools
Monitoring
Training
Tools & Utils
Programming
C++ Standard
SRT, AFS, CVS
QA and QC
Remedy
Histogramming
Operations
PromptReco
Simulation Production
Online SW
Dataflow
Detector Control
Evt Processing
Run Control
Calibration
Databases
Offline
Workbook
Coding Standards
Simulation
Reconstruction
Prompt Reco.
BaBar Grid
Data Distribution
Beta & BetaTools
Kanga & Root
Analysis Tools
RooFit Toolkit
Data Management
Data Quality
Event display
Event Browser
Code releases
Databases
Check this page for HTML 4.01 Transitional compliance with the
W3C Validator
(More checks...)

"Task Management"

Status demo - Setting up tasks.

Introduction:

For an outline of the 'Task Management' see http://www.slac.stanford.edu/~roethel/bookkeeping/TaskManagement/.
This demo is meant to demonstrate how to setup (configure) a task. The steps to configure a task include:

  • setting up your job environment (executable, tcl-file, output and logfile locations etc.).
    -> defining a task setup.
  • evtl. registering the software release with the database (if this release is not registered already).
    -> registering a software release.
  • defining the input datasets and output streams.
    -> defining a task.

Glossary:

  • Task: A task is used here to define the main objective of the processing job. 'Oct. 2003 SP skims' could be and example of a task. A task then keeps information on what (i.e. which datasets) and how (i.e. the common configuration) the data should be processed. Very roughly a task can be compared to a skim job in SkimTools, with the extension that a task can change the configuration (but can only have one configuration at any time).
  • Dataset: This is easy - it's any dataset in the meaning of (and as defined in) the new bookkeeping model.
  • Configuration/Setup: The configuration stores all the information to process the data, i.e. the executable, the logfile path, the output path, the software release etc.
  • Jobs: A jobs is loosly defined in the sense of a process, i.e. it resembles the actual job that will be submitted to the batch queue. This is different from the SkimTools definition of a skim job, but I think it's more coherent with the general use of the term job. A job stores the specific information that is required to run a process (in contrast to the configuration that stores the common that apply to all jobs), e.g. the input collection, the output collections, the processing queue (if different from the default) etc.
  • TaskManager: The interface between user and the skim task management. This additional layer was added be able to extend the application with a graphical user interface.

Running the Demo (BbkTaskManager V00-00-00):

runDemo is a shell script that contains the example listed here. Setup the environment:
  • > cd ~bbrskim/12.5.3c
  • > srtpath <cr> <cr>
It is recommended to reset the demo by running resetDemo.pl . This will erase the entries made in the demo that might still exists from running runDemo. To run the demo from the shell script instead of cut-and-pasting the commands type runDemo . After finishing the demo please run resetDemo.pl so others might try.

Registering and listing database entries for the software release used:

  • Listing existing entries:
    roethel@noric05> listRelease.pl --user aforti --db aforti --host slac
    id     name             created               precedence
    --------------------------------------------------------------------------
    1      12.5.3c          2003/10/27 17:32:20   12050303
    
  • Look at details (Well - there aren't any yet, but maybe later we might add a 'description' column, where users can add comments to releases they created. This might be valuable since general analysis use diffferent sets of tags to overwrite the default tags).
    roethel@noric05> listRelease.pl --user aforti --db aforti --host slac
    id     name             created               precedence
    --------------------------------------------------------------------------
    1      12.5.3c          2003/10/27 17:32:20   12050303
    
    roethel@noric05> listRelease.pl --user aforti --db aforti --host slac 12.5.3c
    name                          :    12.5.3c
    created at                    : 2003/10/27 17:32:20
    precedence                    :   12050303
    id                            :          1
    
  • Create a new entry
    roethel@noric05> createRelease.pl --user aforti --db aforti --host slac 12.6.0 12060000
    name                          :     12.6.0
    created at                    : 2003/10/28 12:32:26
    precedence                    :   12060000
    id                            :          2
    
  • ... is it there...
    roethel@noric05> listRelease.pl --user aforti --db aforti --host slac
    id     name             created               precedence
    --------------------------------------------------------------------------
    1      12.5.3c          2003/10/27 17:32:20   12050303
    2      12.6.0           2003/10/28 12:32:26   12060000
    
  •  

Defining a setup:

Setup parameters can be changed as long as a setup is not registered to a task. Any setup can only be registered to one task.

  • list existing setups:
    roethel@noric05> listSetup.pl --user aforti --db aforti --host slac
    id     name             created               release          task
    --------------------------------------------------------------------------
    1      TestSetup-1      2003/10/27 17:41:57   12.5.3c
    3      TestSetup-2      2003/10/27 17:47:57   12.5.3c
    4      TestSetup-3      2003/10/27 17:49:40   12.5.3c
    
  • create a new setup:
    roethel@noric05> createSetup.pl --user aforti --db aforti --host slac MyTestSetup-1 myExec run.tcl
    successfully created new configuration
    name                          : MyTestSetup-1
    maintainer                    :    roethel
    created at                    : 2003/10/28 14:00:08
    id                            :          7
    associated with task          :          -
    task id                       :          0
    release                       :    12.5.3c
    runs                          : myExec run.tcl
    default queue                 :     bfobjy
    logfile path                  : /u/br/roethel/devel/12.5.3c/log/<TASKNAME>/<RUNNUMBER>/<PASS>.LOG
    output path                   : /u/br/roethel/devel/12.5.3c/results/<TASKNAME>/<RUNNUMBER>/<PASS>/<STREAM>-micro.root
    optional configuration file   : /u/br/roethel/devel/12.5.3c/workdir/.bbkTMConfig
    ready                         :         no
    

    Placeholders <TASKNAME>, <RUNNUMBER> etc. will be replaced by the actual value when jobs are created/subitted. Keeping the taskname and the runnumber in the logfile and output file guarantees the files to be unique.
    But - ooops in my example the setup was created with the default (current) release 12.5.3. I want to use 12.6.0 So I ....
  • ... edit a setup (editSetup.pl --help gives the full list of options):
    roethel@noric05> editSetup.pl --user aforti --db aforti --host slac --release 12.6.0 MyTestSetup-1
    successfully edited setup.
    name                          : MyTestSetup-1
    maintainer                    :    roethel
    created at                    : 2003/10/28 14:44:33
    id                            :          7
    associated with task          :          -
    task id                       :          0
    release                       :     12.6.0
    runs                          : myExec run.tcl
    default queue                 :     bfobjy
    logfile path                  : /u/br/roethel/devel/12.5.3c/log/<TASKNAME>/<RUNNUMBER>/<PASS>.LOG
    output path                   : /u/br/roethel/devel/12.5.3c/results/<TASKNAME>/<RUNNUMBER>/<PASS>/<STREAM>-micro.root
    optional configuration file   : /u/br/roethel/devel/12.5.3c/workdir/.bbkTMConfig
    ready                         :         no
    

Defining a task:

The easiest way to define a task is to associate a setup with a list of input streams.

  • Create a task:
    roethel@noric05> createTask.pl --user aforti --db aforti --host slac -c MyTestSetup-1 -s "AllEvents,Jpsitoll" MyFirstTask
    successfully created new task
    name                          : MyFirstTask
    maintainer                    :    roethel
    created at                    : 2003/10/28 15:19:14
    id                            :          3
    current configuration         : MyTestSetup-1
    input datasets                :          -
    output stream(s)              : AllEvents,Jpsitoll
    ready                         :         no
    active                        :         no
    
  • Now we need to add the input datasets. This has to be done separate from creating a task since it is not guaranteed that the Task-Manager and the (public) run-database will be in the same physical database. Therefore we need to provide connection information to the database the dataset resides in (yes - I know. Using the per-site defined 'public' database as default will simplify this):
    roethel@noric05> addDataset.pl --user aforti --db aforti --host slac MyFirstTask
     run1-dataset
    successfully added dataset(s)
    name                          : MyFirstTask
    maintainer                    :    roethel
    created at                    : 2003/10/28 15:19:14
    id                            :          3
    current configuration         : MyTestSetup-1
    input datasets                : run1-dataset
    output stream(s)              : AllEvents,Jpsitoll
    ready                         :        yes     <--- our task is ready now!
    active                        :         no
    
  • Let's see what we've got sofar:
    roethel@noric05> listTask.pl --user aforti --db aforti --host slac
    name             created               setup            description
    --------------------------------------------------------------------------
    MyFirstTask      2003/10/28 15:19:14   MyTestSetup-1
    

     

Next things to do...

Task-Management Framework:

  • 'Synchronize' the Task-Manager with the offline. The offline now has better methods to pass 'on-the-fly' information to the executable.
  • Create the (simple) interface to the batch system. Straight forwared since a similar system is already in use for skimming.
  • Testing and refining (adding options).

User Interface:

Using the Task-Manager over the command line is possible, but is not meant to be the default way of using it. In particular to create and edit a task the standart way to do that should be over a menu-driven interface (using a terminal or a graphical user interface) with built-in help functions etc.

  • Adding and testing further commands (in particular catching 'nonsense' input).
  • Adding a terminal driven menu (started).
  • Adding a graphical user interface.