SLAC PEP-II
BABAR
SLAC<->RAL
Babar logo
HEPIC E,S & H Databases PDG HEP preprints
Organization Detector Computing Physics Documentation
Personnel Glossary Sitemap Search Hypernews
Unwrap page!
Comp. Search
Who's who?
Meetings
FAQ Homepage
Archive
Environment
Administration
New User Info.
Web Info/Tools
Monitoring
Training
Tools & Utils
Programming
C++ Standard
SRT, AFS, CVS
QA and QC
Remedy
Histogramming
Operations
PromptReco
Simulation Production
Online SW
Dataflow
Detector Control
Evt Processing
Run Control
Calibration
Databases
Offline
Workbook
Coding Standards
Simulation
Reconstruction
Prompt Reco.
BaBar Grid
Data Distribution
Beta & BetaTools
Kanga & Root
Analysis Tools
RooFit Toolkit
Data Management
Data Quality
Event display
Event Browser
Code releases
Databases
Check this page for HTML 4.01 Transitional compliance with the
W3C Validator
(More checks...)
Back to [Data Distribution] [BBRORA] [ssh identity file] [bbftp]
skimImport [Installation] [Running] [Examples] [Calling bbftp] [Config file] [Config file examples]

Using skimImport to Import Kanga Files

Installation

To use skimImport, you will need
  1. A local BBRORA (skimData) database which has been updated from SLAC (or other mirror site) and has files to import selected with import_status=1 (or similar). See the BBRORA/MySQL documentation on how to do this.
  2. The SkimTools package, which includes the skimImport Perl script and associated Perl modules (libraries). You can use the same SkimTools setup you used for running skimSqlMirror, skimSqlSelect, etc.
  3. A file transfer tool. Currently supported "ftp" tools are
    scp
    which is part of the standard ssh distribution, and should therefore be available "everywhere";
    bbftp
    is specially optimised for bulk data transfer over wide-area networks; and
    sfcp
    is also optimised for bulk data transfer over wide-area networks.
    Support for other tools (eg. bbcp, GRID-ftp, rsync, or Unix ftp) may be added in future.

Running skimImport

See skimImport --help for a summary of all the skimImport options.

By default, skimImport scans the database for files with import_status like 1% and checks that these files don't already exist on disk (in which case their import_status can be changed to 0 immediately). Files are then copied and, when on disk, flagged as such with import_status=0.

The two parameters give the source (remote) and destination (local) root directories. Files listed in the database that are not below this destination directory will not be copied. This implies that, since the database lists file paths as $BFROOT/kanga/EventStore/groups/..., then the Kanga files locally must also be under this directory tree (though with the local definition of $BFROOT). This restriction could be lifted. The source can include a user and remote host specification (à la scp: user@host:dir) of the account and server that the files will be copied from. By default, parameters tersk.slac.stanford.edu:/afs/slac.stanford.edu/g/babar/kanga/EventStore/groups and $BFROOT/kanga/EventStore/groups (local $BFROOT definition) are used.

The file copy is ordered according to the --sort parameter. By default this copies files ordered by priority, job id (ensuring they are copied roughly in the same order they were created), then stream id. The priority is given as the second character of the import_status value: 1x, where 1 indicates that the file should be imported, and the ordinal value of x indicates the priority. Any priority letters can be used, but conventionally we use

1A
Express priority
1B
High priority
1C
Normal priority
1D
Low priority

By default, skimImport uses scp for the file transfer. This requires no special setup, so is a good way to start. bbftp can then be used to improve data throughput. sfcp offers similar functionality to bbftp (perhaps, currently, a little simpler to setup) but has problems (notably, does not seem to gain from TCP/IP window size optimisation).

While the file transfer is in progress, the incomplete file is named something like .StreamNKanga-micro.root.skimImport.$$ with file permissions that only allow owner access, so it will not be mistaken for readable data (this file will usually be removed if the transfer fails). Once the transfer (or --multiple group of transfers) is complete, the file is renamed to its proper name (eg. StreamNKanga-micro.root) and the default file permissions are restored. If the transferred file size does not match the expected size in the database, an error is displayed. Otherwise, unless --noupdate-sql is specified, a connection is made to MySQL and the import_status is set to 0.

For each file, skimImport displays the file to be copied before starting, and shows transfer statistics when done (you can tell it to shut up with the --quiet option). Each line includes a timestamp (unless --notimestamps is specified). More detail of its operation is available with the -v (or --verbose) option. This is probably a good thing to try when you first use skimImport (use, eg., --maxcopy 5, to copy just a few files with all this detail and then rerun without -v to get the rest). More -v options give even more detail, but that's probably going too far.

Most of the transfer tools use ssh authentication. This means that the user will be prompted for a password before copying each file (or --multiple group of files - currently only supported for bbftp), unless a passphraseless identity file is used. See the documentation on using an SSH identity file for more details. The identity file name can be passed to scp (or bbftp or sfcp) using the --identity option.

Note that skimImport needs to do two types of network authentication. Firstly it needs to access (and later update) the MySQL database. This is specified with the --host, --user, and --pwdfile options, much like the skimSqlMirror and skimSqlSelect commands. skimImport also needs host and authentication information for the server from which the Kanga files will be transferred. This is specified by the --node, --remote-user, and --identity (or special bbftp authentication) options (similar to the scp syntax, --node user@node or user@node:dir can be used instead, but the option syntax is useful to change the user and/or node without changing the directory root default).

Examples

  1. skimImport --user bfactory --identity ~/.ssh/identity.slac
    
    This copies all files flagged for import from SLAC, using scp (the default transfer program) with identity file ~/.ssh/identity.slac. The bfactory username will be used to access the MySQL database (the account used to transfer the Kanga files remains the default, ie. your current login id).

  2. skimImport --user bfactory \
      adye@csfsun02.rl.ac.uk:/afs/rl.ac.uk/bfactory/kanga/EventStore/groups
    
    copies data from RAL (username adye on csfsun02) using the default ssh identity file (~/.ssh/identity) or, if that doesn't exist, prompting for the password.

Using bbftp

The --ftp-type=bbftp option selects bbftp as the file transfer tool. See the bbftp installation documentation for help in setting up bbftp.

skimImport Configuration File

Configuration for the local filesystems may be specified in the skimImport configuration file. This file is loaded if specified with the --config FILE option.

[Note that this syntax for specifying the directory tree organisation is experimental. Is it enough? Is it too complicated?]

Each configuration option is specified on a line in this FILE. Comments, following a # character, are ignored. Currently two types of option are recognised

root DIR
gives the local root directory. Paths in the rest of the configuration file can be given relative to this DIRectory.

symlink SOURCE TARGET [TARGET [...]]
specifies that new directories or files that match SOURCE will be created below TARGET and a symbolic link created from SOURCE to this new directory or file.

SOURCE can contain *, **, ?, or [chars] wildcards. * in the TARGETs are replaced by the matching text (the characters matching each *, **, or group of contiguous ? or [] wildcards in the SOURCE specification replaces a * (or contiguous set of *s) in the TARGET). The special wildcard ** matches any part of the path, eg. /root1/**/dir1/file matches /root/a/b/c/dir1/file (this syntax is borrowed from rsync's --include/--exclude syntax). A trailing / on the SOURCE forces a match only on directories.

Lines with the same SOURCE specification can also be used to provide additional TARGETS (too prevent very long lines).

From the list of TARGET filesystems specified, the one with the most free space is chosen. This choice is made just before the directory or file needs to be created, so it allows the filesystems to be filled up as fairly as possible. All required directories on both the SOURCE and TARGET filesystems are created automatically before the file transfer is initiated.

Note, however, that skimImport never moves files between the filesystems (another utility must be used for that). A new symbolic link is only created if the path matching SOURCE doesn't already exist, or contains only empty directories (if there are only empty directories, and another filesystem has more space, these directories will be removed and recreated on the chosen filesystem to allow reoptimisation in this one easy case [is this a good idea?]).

Examples

  1. This first configuration will produce a directory tree similar to that currently (March 2001) at SLAC. The directory $BFROOT/kanga/EventStore/groups/skims/isPhysicsEvents contains symbolic links (with names of the form nn00 - actually, for simplicity, any 4-character directory name will match) to directories /kangax/EventStore/groups/skims/isPhysicsEvents/nn00 on disks /kanga1, /kanga2, or /kanga3. The one of these three disks with the most free space is chosen when the directory is first created.
    root $BFROOT/kanga/EventStore/groups/
    symlink skims/isPhysicsEvents/????/ /kanga1/EventStore/groups/skims/isPhysicsEvents/*
    symlink skims/isPhysicsEvents/????/ /kanga2/EventStore/groups/skims/isPhysicsEvents/*
    symlink skims/isPhysicsEvents/????/ /kanga3/EventStore/groups/skims/isPhysicsEvents/*
    

  2. An alternative configuration (similar in structure to the directory tree used in Rome) is to have an entire directory tree of symbolic links under $BFROOT/kanga/EventStore/groups, each pointing to an individual file on one of the filesystems. This way, skimImport can choose the filesystem with the most free space before each file is created.
    root $BFROOT/kanga/EventStore/groups/
    symlink **.root /kanga1/EventStore/groups/**.root
    symlink **.root /kanga2/EventStore/groups/**.root
    symlink **.root /kanga3/EventStore/groups/**.root
    

skimImport [Top] [Running] [Examples] [Calling bbftp] [Config file] [Config file examples]
Back to [Data Distribution] [BBRORA] [ssh identity file] [bbftp]

HTML 4.01 Checked... Best viewed with ANY browser! /BFROOT/www/Computing/Offline/DataDist/skimImport.html last modified 8th June 2001 by
Tim Adye, <T.J.Adye@rl.ac.uk>