SLAC PEP-II
BABAR
SLAC<->RAL
Babar logo
HEPIC E,S & H Databases PDG HEP preprints
Organization Detector Computing Physics Documentation
Personnel Glossary Sitemap Search Hypernews
Unwrap page!
Comp. Search
Who's who?
Meetings
FAQ Homepage
Archive
Environment
Administration
New User Info.
Web Info/Tools
Monitoring
Training
Tools & Utils
Programming
C++ Standard
SRT, AFS, CVS
QA and QC
Remedy
Histogramming
Operations
PromptReco
Simulation Production
Online SW
Dataflow
Detector Control
Evt Processing
Run Control
Calibration
Databases
Offline
Workbook
Coding Standards
Simulation
Reconstruction
Prompt Reco.
BaBar Grid
Data Distribution
Beta & BetaTools
Kanga & Root
Analysis Tools
RooFit Toolkit
Data Management
Data Quality
Event display
Event Browser
Code releases
Databases
Check this page for HTML 4.01 Transitional compliance with the
W3C Validator
(More checks...)

[introduction] [rsync] [obtaining rsync] [using rsync] [selecting files] [remote centres] [tape] [links]


2 Apr 2001: Except for mirroring the Kanga conditions data ($BFROOT/kanga/CondDB/) and small or personal exports, the recommended method for managing exports of Kanga event data is now with the skimData database. This method (syncslac/rsync) still works for event data, though a full scan of the Kanga directory trees can take many hours.

Exporting KanGA Files to Remote Sites

Exporting KanGA (née NOTMA) files from SLAC should be much simpler than exporting Objectivity files. Each KanGA file contains a single run's data and is under about 100 Mb. The files are logically arranged in a simple (if deep!) directory tree (though may be physically on different disks, referenced by Unix soft links).

A general purpose file transfer tool (ftp, scp, or what have you) can be used for network copies. For a few files this is probably the best course. For thousands of files it is too cumbersome, particularly when one needs to keep track of which files have already been exported.

rsync

For this reason, I suggest that the public domain rsync mirroring package be used for network transfers of KanGA files.

rsync can copy directory trees over the network (including or excluding certain files if need be), checking for files that have already been copied. If the connection is broken, it can easy pick up where it left off. During the transfer, files are hidden so that users at the remote site will not be confused by partially copied files. Although we don't expect to change already-written KanGA files (except to correct any screwups), rsync will check for modifications (efficiently recognising identical files, even if the modification date has inadvertently been changed). The transfers can be compressed, though this probably won't help us much as the ROOT files are already compressed.

Since rsync can use ssh, we needn't worry about getting through the SLAC firewall (a problem with ftp). Using ssh identity files removes the requirement for a login password, so the whole update could be performed in batch or as a cron job.

rsync has already been very successfully used for more than a year to provide a mirror of the BaBar web and CVS files at RAL (updated automatically every night).

Obtaining rsync

rsync needs to be available on both local and remote machines.

rsync is installed at SLAC at

/afs/slac.stanford.edu/public/software/rsync/bin/rsync
(This provides versions for Solaris, OSF, Linux, (AIX), and HP; these are accessible explicitly as /afs/slac.stanford.edu/public/software/rsync/2.4.3-tja4/*/bin/rsync .)

If AFS is available at your site, these binaries can of course be executed directly or copied to a local disk. Alternatively the rsync binaries (with man files) for your architecture can be downloaded from the appropriate one of:

rsync-2.4.3-tja4.Solaris26.sparc.tar.gz
rsync-2.4.3-tja4.OSF1V4.alpha.tar.gz
rsync-2.4.3-tja4.Linux22.i386.tar.gz
rsync-2.4.3-tja4.HPUX1020.tar.gz
rsync-2.4.3-tja1.AIX42.tar.gz (only version 2.4.3-tja1)
(or copied from /afs/slac.stanford.edu/public/software/rsync/dist/). Please contact me if you need some different build (eg. for another OS version). The standard distribution (rsync 2.4.3) already includes our fixes to rsync 2.3.2 for large file support (on Solaris at least) - not currently required for KanGA files, but this is a useful general tools for other transfers as well. The SLAC version has additional patches (not yet in the official rsync distribution) to correctly handle soft links (--copy-unsafe-links bugs), to show better statistics, and a fix to the return code. See the top of the patch file for details.

I have created a wrapper script to specify some useful defaults for KanGA transfers. Copy

/afs/slac.stanford.edu/public/software/package/rsync/syncslac
to your machine (or execute directly from AFS if you prefer). Make sure that rsync is in your PATH (if this is not convenient, then modify the $rsync variable at the top of syncslac).

You can check which versions are in your PATH with

syncslac --version
which, with the latest versions, should show
syncslac V2.5 - Copies files from SLAC using rsync
rsync version 2.4.3-tja4  protocol version 25

For the record, you also need ssh client and Perl installed on your machine. I have discovered that Perl 5.005 is required and syncslac does not work with Perl 5.004. I may have a fix, so if you want to get it running in Perl 5.001-5.004, please let me know.

Transferring KanGA Files

Use syncslac and also rsync with no options for a summary of the command usage.

syncslac specifies as default (all these can be overridden): transfer via ssh (with the faster blowfish cipher), remote host tersk.slac.stanford.edu (this may change), location of rsync at SLAC, extra statistics, directory tree copy, correct handling of soft links, and preservation of file dates. The rsync command is printed out so you can see exactly what it's doing.

So, to copy all the KanGA data and the KanGA conditions files to a local directory /localdisk/kanga, you could say

syncslac /afs/slac.stanford.edu/g/babar/kanga/EventStore/groups/ \
                             /localdisk/kanga/EventStore/groups/

syncslac /afs/slac.stanford.edu/g/babar/kanga/CondDB/ \
                             /localdisk/kanga/CondDB/
Note that the trailing slash (/) on the first parameter is significant (see the rsync man page for the reason). To perform an update of the files (or to pick up again after a network outage), simply rerun the command.

Currently (6 Jun 2000) $BFROOT/kanga/EventStore/groups/ contains 41400 files taking 645 GB. You may want to transfer a subtree if this is too much data or to give you an idea of the time the transfer will take (if you put it in the right place locally and later do the entire tree, then rsync won't have to copy these again).

Hints

  • The -n option does a "dry-run", just listing the files that would be copied.
  • You can specify --delete to also remove local files that are no longer at SLAC. This should be checked carefully (perhaps initially with the -n dry-run option) if updating an existing large local directory tree, because if the wrong SLAC directory specification is given, then the entire local tree will probably be removed.
  • If your SLAC username is different from your local username, you can change the default by specifying, eg., user@/afs/slac.stanford.edu/g/babar/kanga/EventStore/.
  • You can specify an ssh identity file (with syncslac -i file) to allow automatic logon (no password).
  • If you have an unreliable connection to SLAC, you can specify, eg., --retries=2. This causes syncslac to retry the rsync command (at most twice, with - by default - a 15 minute delay between each try) if it fails.

Selecting Skim Subsamples

Many sites may not want to mirror the entire $BFROOT/kanga/EventStore/groups/ directory tree. It is easy to select some subsamples by using a subtree, eg. $BFROOT/kanga/EventStore/groups/SP/ for Simulation Production. However the files for the different skims are distributed throughout the $BFROOT/kanga/EventStore/groups/skims/ tree with the stream name identified by the filename in a directory named after the Analysis Working Group (AWG).

There are several ways to select these files. To exclude the AWG directories you aren't interested in, create an exclusion file, eg. exclude.lis, listing the AWG directories and/or files you don't want copied, eg.

TauQED/
ClHBD/
PID/BPCElectronKanga-micro.root
excludes the Tau/QED and Charmless Hadronic B AWG skims and the Particle ID AWG's BPCElectronKanga stream. Then specify this file with the --exclude-from=exclude.lis option on the syncslac (or rsync) command line.

It's slightly more complicated to include just the AWG directories you are interested in. Eg., in a file include.lis use

*/
**/Charmonium/*
**/Dstar/KpiKanga-micro.root
- *
to copy only all the Charmonium AWG skims and the Dstar AWG's KpiKanga stream. Specify this file with the --include-from=include.lis option on the syncslac (or rsync) command line.

You probably don't want to know why all those asterisks are necessary (they aren't in every case, but it's probably simplest to include them), but if you do, check out the rsync man page. The first line should be "*/" (to include all directories) and the last line should be "- *" (to exclude everything else). Unlike the exclusions, these selections create the full directory structure (ie. including deselected AWG directories), but since no data is copied, this probably isn't such a problem. (If anyone has a way round this restriction, I'd be interested to hear it.)

Note that these inclusions and exclusions can also be specified using multiple --include=name and/or --exclude=name options, but it's probably simpler to use a file as described above.

Remote Centres

This procedure has been performed to transfer $BFROOT/kanga/EventStore to RAL (csfsun02.rl.ac.uk) and from there to a number of other UK sites (also Rome). European sites may prefer to copy from RAL, rather than directly from SLAC. Please contact me if you want to do this but don't have access to the RAL machines.

Tape Transfers

It is probably unfeasable to copy the entire dataset from scratch across the internet. We plan to develop a method of transferring KanGA files by tape. This is more complicated than network transfers: the tape contents will need some form of catalogue and (for efficiency) may have to be combined into larger (eg. tar) files. Once a mirror is established at a remote site, it can probably keep up with addition datataking/OPR via the network.

More Information

KanGA

Data Distribution

rsync and SSH

Please let me know if you have any comments, suggestions, or questions about exporting KanGA files.
HTML 4.01 Checked... Best viewed with ANY browser! /BFROOT/www/Computing/Offline/DataDist/kanga_export.html last modified 22nd May 2001 by
Tim Adye, <T.J.Adye@rl.ac.uk>