SLAC PEP-II
BABAR
SLAC<->RAL
Babar logo CM2 logo
HEPIC E,S & H Databases PDG HEP preprints
Organization Detector Computing Physics Documentation
Personnel Glossary Sitemap Search Hypernews

CM2 - Goals

Summary

CM2 builds on the experience from the original computing model and attempts to address in particular issues that become more and more important as the BaBar dataset grows over the next few years.

The high level strategy and requirements for CM2 were developed by the Computing Model Working Group 2 (CMWG2) committee in summer and fall of 2002 and the implementation of the new model took place during 2003. As implemented, these can be broken down into four specific areas: A high level description and motivation for each of these is provided below (and much greater detail is available in other sections of this document).

The CM2 Kanga Eventstore

Since 1999 BaBar has used two eventstores: While the Bdb/Objy eventstore in principle provided greater functionality, in practice there were a number of data access and scalability issues that proved very difficult to solve. In addition the size of the Bdb/Objy data was larger than foreseen and there were a number of practical difficulties to use the Bdb/Objy data for analysis at Tier C sites (i.e. universities).

The original Kanga implementation (often refered to as "classic Kanga") was much more easily accessed and easier to export and use at small Tier C sites. The data had however to be converted from Bdb/Objy data produced in Prompt Reconstruction (PR) and Simulation Production (SP), as they were not able to produce Kanga data directly and only "micro" data was available in classic Kanga.

As part of CM2 we decided to built a next-generation Kanga eventstore that could be used at both Tier A and Tier C sites as well as on laptops and workstations. The main properties of the CM2 Kanga eventstore are: To summarize, the new CM2 Kanga eventstore is meant to retain all of the advantages of classic Kanga, but extend that with significant new functionality.

A new analysis model

In addition the limitations described above, the 2 eventstore implementations had other disadvantages in the context of analysis: These limitations led to the standard analysis method of running large "productions" over the data in the eventstore, in one of the two formats, and writing out ntuples in various AWG-specific custom formats:

*

This allowed analysis specific information to be stored (composite candidates and any calculated quantities), but as ntuples had no connection with the eventstore it was also necessary to copy out some or all of the micro. The access to these ntuples was sufficiently easy that analysis could be done, but with the disadvantage of large AWG-organized ntuple productions into what was effectively a set of ad-hoc eventstore formats.

As part of the CM2 Kanga eventstore implementation we decided to improve on this situation in three principle ways: The upshot of this for the average analysis user is that instead of doing a large ntuple production in each AWG, you provide code and configuration for your skim which will run for you in a "production skim". You can then read back in Beta jobs much faster (typically a factor of 2-10 as it is no longer necessary to redo combinatorics) and also use (at some level) interactively even more quickly.

These central skim productions are intended to be run every 3 months (to allow new (and updated) skims to be introduced frequently) and are expected to reduce significantly the need for AWG-organized ntuple productions.

The new Micro and Mini

Still to finish....

Bookkeeping and Task Management

The average user doing analysis needs access to a variety of "bookkeeping" information in order to use the data from the eventstore. This can include luminosities, "good run" classifications, MC information like decay files, etc. For historical reasons this information was scattered in multiple places and not particularly well integrated. Users were required to put together what they needed from the various sources.

In addition, as the integrated luminosity increases, the simple number of jobs that one needs to run and the management of information regarding their success or failure, outputs, etc. can become quite significant.

The new bookkeeping and "task management" are intended to address these issues. The new dataset bookkeeping integrates in one place all information relevant for the analysis user. This replaces the functionality of "skimData" (with a much simplified interface), the GoodRuns package, spruns, and the lumi script. "Task Management" is the name given to a general replacement for the SkimTools package, allowing application of a "task" to a "dataset". The set of scripts and the task bookkeeping is intended to be much more flexible and powerful than the SkimTools package.


Last modified 15-Aug-2004, Peter Elmer