SLAC PEP-II
BABAR
SLAC<->RAL
Babar logo
HEPIC E,S & H Databases PDG HEP preprints
Organization Detector Computing Physics Documentation
Personnel Glossary Sitemap Search Hypernews
Unwrap page!
Comp. Search
Who's who?
Meetings
FAQ Homepage
Archive
Environment
Administration
New User Info.
Web Info/Tools
Monitoring
Training
Tools & Utils
Programming
C++ Standard
SRT, AFS, CVS
QA and QC
Remedy
Histogramming
Operations
PromptReco
Simulation Production
Online SW
Dataflow
Detector Control
Evt Processing
Run Control
Calibration
Databases
Offline
Workbook
Coding Standards
Simulation
Reconstruction
Prompt Reco.
BaBar Grid
Data Distribution
Beta & BetaTools
Kanga & Root
Analysis Tools
RooFit Toolkit
Data Management
Data Quality
Event display
Event Browser
Code releases
Databases
Check this page for HTML 4.01 Transitional compliance with the
W3C Validator
(More checks...)

Computing Job Descriptions

This page describes a number of the jobs that need to be done in the computing system.

This page has not been updated since 2004. The job descriptions are still useful, but many of the names of people involved have changed.

Coordination

Computing Coordinator

The BABAR physics program involves very large data sets and many parallel sophisticated analyses, each of which present significant computing challenges. The BaBar Computing Coordinator provides overall leadership and coordination of computing activities throughout the international BABAR Collaboration, as a full participant in the BABAR physics program. In this capacity, they are responsible for leadership and coordination of:

  • Operation and ongoing development activities on all aspects of BABAR computing, which includes offline software, offline operations, online software and computing hardware, and computing tools, and
  • Upgrades of BABAR Computing that address new requirements arising from PEP II luminosity improvements, from detector upgrades, and from experience gained via physics analysis efforts.

The BaBar Computing Coordinator ensures that BABAR Computing satisfies the needs of the BABAR physics program within the existing financial resource provided by the International Finance Committee, while maximizing the capability to perform computing and analysis activities in Tier-A centres and collaborating institutions. The coordinator serves as a member of the BABAR Management Group, which advises the Spokesperson. In this aspect the BaBar Computing Coordinator;

  • Coordinate BABAR computing activities with BABAR management, the BABAR Technical Board, BABAR analysis activities, BABAR collaborators, the Computing Steering Committee and SLAC Computing Services and management, and
  • Identify the resources, both material and human, required to accomplish the goals of BABAR computing activities and, with BABAR and SLAC management, ensure that necessary resources are available.

The role also requires management of a substantial staff of both computing professionals and collaborators working within the Computing Organisation. This includes direct and indirect supervision, and working with people in this highly distributed organisation.

Deputy Computing Coordinator

Assists the Computing Coordinator in all aspects of his role as designated. May also be delegated responsibilities. Provides backup during absence of Computing Coordinator.


Offline

The offline project is defined as all that offline code centrally maintained, run and distributed by the BaBar experiment. This includes various aspects of simulation, reconstruction and physics analysis tools. Except for questions of interfacing, it does not include code that we don't develop and/or distribute, and it doesn't include code that is used by only a small number of localized physicists. Everything not part of the offline project belongs either in computing operations or online, but this note is not intended to list all them.

The offline project is organized by an "offline project manager" (OPM). He or she works with a deputy (DOPM), and through a number of "area managers". Currently, those areas are "simulation", "reconstruction", "database", "releases", "physics tools architecture" and "physics contacts".

The purpose of this note is to layout, in some detail, expectations for who will be primarily doing what. We have a good team in the offline, and we know how to work together - this is not meant to change that. Rather, this note presents to the larger community a snapshot of how this is working now.

Each area manager has responsibilities and authority listed below. Each area manager is delegated whatever authority the offline project manager has within his or her specific area, except as listed. Each area manager is responsible (with the project manager and deputy) for setting goals & priorities within their area, and then accomplishing them.

In addition, there are likely to be various temporary special case projects. These will be explicitly somebodies responsibility, either an area manager, the DOPM, or the OPM. An example would be the effort to re-support the HP machines.

Offline project manager

Responsible for the union of the area managers' and deputy's responsibility. Individually responsible (E.g. can't get away from it) for overall schedule, resource allocation and manpower procurement. Responsible for overall balance of priorities within and among the area managers efforts. Responsible for documenting progress or its lack, and explaining why.

As a specific responsibility, until delegated, the project manager will oversee the development of the event display(s) and related graphics

See further below for things explicitly _not_ part of the offline project, hence not within the authority/responsibility of the OPM

Authority: Makes and maintains own schedule, routinely reported. Most additional authority has been delegated, and resides with the area managers. Can replace deputy and area managers after consulting with the computing coordinator and computing standing committee (which is defined somewhere to be computing coordinator plus some set of people from the computing group).

Resources: No computing professionals, as these have been redirected to the area managers. Students and postdocs as they are attracted. .

Reconstruction Manager

Responsible for development and deployment of the common aspects of reconstruction, including the reconstruction geometry & alignment calibration model & code, the overall organization of reconstruction executables, and the reco infrastructure packages. Responsible for the system-specific aspects of the reconstruction code to the level desired by the specific system managers (Note: This is not a shared responsibility. The reco manager is responsible for the technical aspects of the reco software within the five systems and their integration into the whole. The system manager's offline designee is responsible for organizing the people, setting priorities within their systems, and making decisions about which algorithms are most suitable, etc. The offline reco manager is responsible for setting up means of validating that the specific system code works when integrated into the whole). Responsible for the "tracking system" reconstruction and monitoring. Responsible for the checkout and development of the Bear package and executable.

As an additional responsibility, responsible for the basic offline infrastructure, including Framework, the transient event and environment model, SRT and the basic build structure, PackageList, etc, including the overall dependency structure of the code, but not the specific dependencies in any package outside common & reconstruction. (This is historically a reconstruction responsibility, and as reconstruction stresses this area, we chose to keep it one. ) Responsible for negotiating changes to this infrastructure with the online organization to the extent they are sensitive to it.

Authority: Full authority over the reconstruction packages, including the associated infrastructure packages. Authority over the reconstruction and infrastructure contents of the development releases, subject to specific scheduling issues currently delegated to the deputy offline project manager. Authority over content and scheduling of changes to the basic offline infrastructure, including Framework, the transient event and environment model, SRT and the basic build structure, etc. Works directly with the QC and QA groups under operations.

Resources: directs the work of Asoka Desilva (Framework and infrastructure), Terry Hung (SRT and script support), Wouter Hulsbergen (Tracking)

Simulation Manager

Responsible for the fast, intermediate and detailed simulations, including development and validation. Reponsible for development and deployment of the common aspects of simulation, including the simulation geometry & code, the overall organization of simulation executables, and the simulation infrastructure packages. Responsible for the system-specific aspects of the simulation code to the level desired by the specific system managers (Note: This is not a shared responsibility. The simulation manager is responsible for the technical aspects of the simulation software within the five systems and their integration into the whole. The system manager's offline designee is responsible for organizing the people, setting priorities within their systems, and making decisions about which algorithms are most suitable, etc. The offline simulation manager is responsible for setting up means of validating that the specific system code works when integrated into the whole) Responsible for integration of the generators, and the technical and consistency aspects of the generator internals. Responsible for offline aspects of background mixing, L1 and L3 trigger simulation.

While being responsible for all the above most of those aspects require little attention (beyond deployment of new L1 simulation). BaBar has been running for more than four years now and we have learned a lot about our detector and the underlying physics. The Simulation Manager should work towards including these lessons in the simulation so that it can mirror the data more closely.

Authority: Full authority over the simulation packages, including the generators. Has authority over non-hardware-specific aspects of mixing & trigger simulation software development.

Resources: SLAC G4 developers and others developers doing Common Task work on Core Simulation. Helps set priority for subsystem developers.

Release Manager

Responsible for the operation, maintenance and implementation of the release system used to build the various forms of common releases. O,M&I of the testing systems.

Authority: Full authority over the construction and operation of the system. Must coordinate release build schedule, special cases, overall capacity with the deputy offline project manager. Must coordinate development of the general structure of the release build system with the reconstruction manager.

Physics Contact and Physics Tools Architect are now together as Physics Software Manager

Physics Tools Architect

Responsible for the architecture and implementation upon which specific physics analysis tools are built. This includes Beta, the structure of the physics tools packages, and relevant other packages & base classes. Responsible for the integration, either proactively or retroactively, of provided tools with the offline system support for physics tools. Not responsible for the provision, documentation and/or tuning of any specific tool; these are the responsibility of the respective physics tools groups via the physics contact.

Open question: What role does this manager have in the provision of "the PAW replacement"?

Authority: Within the general software structures previously established, full authority to design and implement software infrastructure for physics tools, including relevant modifications to pre-existing tools. Is expected to make decisions of priority for tool support development, is expected to chair decision-making meetings on tool architecture and support, and is expected to organize code migrations when necessary. Chairs a routine phone meeting where new and updated tools are presented and their architecture is discussed, updates to Beta.

Resources: Second call on the time of Akbar Mohktarani.

Physics Contact

Responsible for the inclusion of physics-related software into the offline executables. Responsible for obtaining feedback from the physics analysis organization on event store use, generator quality, specific physics analysis tools, etc. Responsible for obtaining and deploying documentation of the performance of specific tools made part of the offline system, recommendations on expected use. Responsible for event selections (cut values and algorithm selections) that will be run so that remote institutions can get access to the data they need. Will consult widely within the specific physics analysis and physics tools groups, and arrange the integration of appropriate tools into the offline system.

Authority: Final call on the inclusion of a specific tool or tool update in the offline system, within the parameters of the development cycle. Selects which tools will be run in the offline productions, and how they will be configured. Specifies how selections, etc, will be run.

Resources: None, unless provided out of the physics organization. But as this is primarily a communication role, its not clear how much assistance is needed.

Note: There is a built-in tension between the "Physics Contact" and the Physics Tools Architect. In the case of a "useful, but technically incompatible" tool, there will be forces pressing for and against inclusion. The decision on inclusion belongs to the Physics Contact, but it is expected that the Physics Tools Architect's opinion will be weighed, and that both parties will work toward improvement of that particular tool after inclusion. This emphasis is consistent with our "physics first, but maintainability for the long term" approach.


Distributed

Database Manager

Responsible for the safety, consistency and efficiency of the data stored in the common offline databases. Responsible for the common database code, including utility programs and classes. Responsible for event catalogs and collection utilities, query support. Responsible for the overall operation of the databases, including technical oversight of the HPSS system. Responsible for testing integrity of storage systems, including backup provisions.

Authority: Makes technical decisions regarding the structure and implementation of the common database code. Make operational decisions about clustering and placement, structuring data for transport, schema modifications and changes to the database build procedures. Determines operational procedures for use of the databases, by both individual users and production systems.

Resources: Directs event store engineers (Yemi Adesanya and Daniel Wang), conditions database engineer (Igor Gaponenko) and aids the online database engineer (Andy Salnikov) (titles may not be exactly right)

Open issue: This is in the distributed organization, but there is significant online overlap. I don't intend to let the database manager play both ends against the middle. The online and operations managers should make really clear what they need from the database group and when, then its the distributed's job to make sure they get it.


Operations

Operations Manager

The operations manager is responsible for the operational activities of BaBar Computing. This includes Simulation Production, Prompt Reconstruction Production, Skim Production, Kanga Production, Database Operations, Quality Control and the upkeep of BaBar Documentation. Very close liaison with the Operations Managers of each Tier-A facility is required to be provided by this role. The operations manager (as they are also the Operations Manager for the SLAC Tier-A) is also responsible for developing deployment plans for SLAC resources in cooperation with SCS. They are the primary conduit for information between BaBar and SCS.


Online

Online Event Processing

The Online Event Processing system provides the infrastructure for Level 3 triggering, online data quality monitoring, and parts of online calibrations. The OEP group develops and maintains the associated software, and provides on-call support for operational problems involving it. The Fast Monitoring Operations manager provides support and training for user and detector system interaction with the data quality monitoring system.

JAS and Java GUI support

Provides basic support for all uses of the Java language in the online system. In particular, supports the application in BaBar of the JAS ("Java Analysis Studio") program, and the BaBar-specific components that connect with it. Provides assistance to the OEP group in the maintenance of the user interfaces for data quality monitoring (on-call support is provided by the Fast Monitoring Operations manager).

DAQ Ops

The DAQ Operations Manager is responsible for maintaining the health of the BABAR Data-AcQuisition system (DAQ) and instructing others in its use. To perrform this role, the DAQ Operations Manager must implement software upgrades to the "Production" (data-taking) software, coordinate hardware and software changes to the system amongst the many online groups (Dataflow, Event Processing, Level 3 Trigger, Run Control, and Detector Controls), train the shift-takers in the operation and troubleshooting of the data-acquisition system, and develop tools and documentation for the overall improvement of DAQ operations.