SLAC PEP-II
BABAR
SLAC<->RAL
Babar logo
HEPIC E,S & H Databases PDG HEP preprints
Organization Detector Computing Physics Documentation
Personnel Glossary Sitemap Search Hypernews
Unwrap page!
Comp. Search
Who's who?
Meetings
FAQ Homepage
Archive
Environment
Administration
New User Info.
Web Info/Tools
Monitoring
Training
Tools & Utils
Programming
C++ Standard
SRT, AFS, CVS
QA and QC
Remedy
Histogramming
Operations
PromptReco
Simulation Production
Online SW
Dataflow
Detector Control
Evt Processing
Run Control
Calibration
Databases
Offline
Workbook
Coding Standards
Simulation
Reconstruction
Prompt Reco.
BaBar Grid
Data Distribution
Beta & BetaTools
Kanga & Root
Analysis Tools
RooFit Toolkit
Data Management
Data Quality
Event display
Event Browser
Code releases
Databases
Check this page for HTML 4.01 Transitional compliance with the
W3C Validator
(More checks...)

Padova to SLAC data distribution

  • Following a discussion with Wilko, Teela, Artem I wrote a coarse sequence diagram for Padova to SLAC data distribution . The discussions concentrated on a modular system with a clean set of interfaces allowing components to be replaced when required.
  • The important points from the sequence diagram are that each component should report success or failure to the caller and should update the Data dist bookkeeping with the state each component is in. Another important point that came out of the discussions was a requirement for a clear separation of responsibility: PR is responsible for producing the root files and for the mapping between collection and file, Data Dist is responsible for moving the files and making sure that they end up where intended. Note: currently the data dist system couples to the import system. It's not clear to me that import is part of the data dist system.
  • Wilko's comments following this diagram:
  • Would it make sense to put the xml-description file also into the tar
    file, so that there is only one file transfered per collection ?
    On the second page you have the TransferXportFile and ImportXportFile
    that talk to the DataDistBookkeeping, which collects info about the import
    and is partly specific to the method the import is done (tar files, xml...).
    Therefore I think there should be maybe two types of bookkeeping systems:
    1) The first one is the general data distribution system. All Babar files
    are registered in this system (root files, but also other files that
    are of interest e.g: background triggers, conditions...).
    This system also keeps the mapping between collection
    and root files that belong to it. The system is used to distribute
    files/collections between BaBar sites (TierA and TierC). A user can use
    the system to find collections and download them.
    This system would be the DataDistributionBookkeping.
    2) The second system is used only for imports, and the main purpose is to
    allow the export and import sites to talk to each other. Once files are
    successfully imported, backed up and registered in the data
    distribution system (#1) the information in the second system isn't of
    any relevance anymore. Also this system depends more in the specific
    implementation choices (tar files), and in the moment is needed because
    we can not write directly to the data distribution system.
    This system would be the ImportDataDistBookkeeping (called
    DataDistBookkeeping on page two of your ps-file)
  • Adil's comments to Wilko:
  • Your splitting of the data dist bookkeeping makes eminent
    sense. I'm not too sure about the XML being within the tar file. Let me
    try to reason this.
    When I receive a box of goods I usually get an inventory list with the
    contents of the box separate from the box itself.
    This would argue that the XML file should be outside the tar file.
    However, when I go to Frys and buy something the inventory or
    contents list is within the box. But, it does usually contain a contents
    list on the outside of the box.
    This would argue that the XML file be in the tar file.
    There are 2 related questions that should determine whether the XML file
    should be outside or inside the tar file:
    1) Does the XML description file contain anything that's relevant to the
    tar file as a whole (ie is the XML file needed to handle the tar file as
    a file)?
    2) If we cannot untar the tar file, is there an easy way to know what
    information we have lost?
  • Gabriele pointed to a useful URL: Fulvio's export notes
  • Gabriele's tables SQL definitions
  • #
    # Table structure for table 'run'
    # Keep info on the tar (tar chksum,size,insert date, file name, oprid,... #
    CREATE TABLE run (
    id int(11) NOT NULL auto_increment,
    number int(11) NOT NULL default '0',
    oprid int(11) NOT NULL default '0',
    site_label varchar(128) NOT NULL default '',
    file_name varchar(128) NOT NULL default '',
    file_size int(11) NOT NULL default '0',
    file_chksum varchar(128) NOT NULL default '',
    prc_name varchar(128) NOT NULL default 'NULL',
    date datetime default NULL,
    locked int(11) NOT NULL default '0',
    prc_id int(11) NOT NULL default '0',
    lock_date datetime default NULL,
    type enum('pr','sp','ps') NOT NULL default 'pr',
    PRIMARY KEY (id)
    ) TYPE=MyISAM;

    #
    # Table structure for table 'site'
    # One entry for each site involved into the transfer
    # this table contains host name, user account, data path,...
    #

    CREATE TABLE site (
    type enum('production','export','tape','tier') NOT NULL default 'production',
    label varchar(128) NOT NULL default '',
    host varchar(128) NOT NULL default '',
    user varchar(128) NOT NULL default '',
    path varchar(128) NOT NULL default '',
    links varchar(128) NOT NULL default 'any',
    timeout int(11) default '0',
    max_retries int(11) default '0',
    role enum('primary','secondary') NOT NULL default 'primary',
    status enum('enabled','disabled') NOT NULL default 'disabled',
    run_type enum('pr','sp','ps') NOT NULL default 'pr',
    hd_watermark int(11) default '90',
    copy_cmd varchar(128) NOT NULL default '',
    bbcp_cmd varchar(128) NOT NULL default '',
    ssh_cmd varchar(128) NOT NULL default '',
    path_list varchar(255) NOT NULL default '',
    PRIMARY KEY (label,run_type)
    ) TYPE=MyISAM;

    #
    # Table structure for table 'status'
    # for each run entry there are more entries (one for each site)
    # this table describe the status of the tar file related to the sitei
    #
    CREATE TABLE status (
    site_type varchar(128) NOT NULL default '',
    site_label varchar(128) NOT NULL default '',
    run_id int(11) NOT NULL default '0',
    status
    enum('empty','toBeCopied','copied','toBeImported','imported','toBeRemoved','removed
    ','toBeReprocessed') NOT NULL default 'empty',
    date datetime default NULL,
    error int(11) default '0',
    retries int(11) default '0',
    PRIMARY KEY (site_label,run_id)
    ) TYPE=MyISAM;

Adil Hasan Last modified 7/Aug/03