Back to [Data Distribution] [BBRORA] [ssh identity file] [bbftp]
skimImport [Installation] [Running] [Examples] [Calling bbftp] [Config file] [Config file examples]
To use skimImport, you will need
- A local BBRORA (skimData) database which
has been updated from SLAC (or other
mirror site) and has files to import selected
with
import_status=1 (or similar).
See the BBRORA/MySQL documentation on how to do this.
- The SkimTools package, which includes the skimImport Perl script and
associated Perl modules (libraries). You can use the same SkimTools setup
you used for running skimSqlMirror,
skimSqlSelect, etc.
- A file transfer tool. Currently supported "ftp" tools are
- scp
-
which is part of the standard ssh distribution,
and should therefore be available "everywhere";
- bbftp
-
is specially optimised for bulk data transfer over
wide-area networks; and
- sfcp
-
is also optimised for bulk data transfer over
wide-area networks.
Support for other tools (eg. bbcp, GRID-ftp, rsync, or Unix ftp) may be added
in future.
See skimImport --help for
a summary of all the skimImport options.
By default, skimImport scans the database for files
with import_status like 1% and
checks that these files don't already exist
on disk (in which case their import_status can be changed
to 0 immediately). Files are then copied and, when on disk,
flagged as such with import_status=0.
The two parameters give the source (remote) and destination (local)
root directories.
Files listed in the database that are not below this destination
directory will not be copied. This implies that, since the database
lists file paths as $BFROOT/kanga/EventStore/groups/..., then
the Kanga files locally must also be under this directory tree (though
with the local definition of $BFROOT). This restriction could
be lifted.
The source can include a user and remote host specification (à la
scp: user@host:dir) of the account and server that the
files will be copied from.
By default, parameters
tersk.slac.stanford.edu:/afs/slac.stanford.edu/g/babar/kanga/EventStore/groups
and $BFROOT/kanga/EventStore/groups
(local $BFROOT definition) are used.
The file copy is ordered according to the --sort parameter.
By default this copies files ordered by priority, job id (ensuring
they are copied roughly in the same order they were created), then stream id.
The priority is given as the second character of the import_status
value: 1x, where 1 indicates that
the file should be imported, and the ordinal value
of x indicates the priority.
Any priority letters can be used, but conventionally we use
1A- Express priority
1B- High priority
1C- Normal priority
1D- Low priority
By default, skimImport uses scp for the file transfer. This requires
no special setup, so is a good way to start. bbftp can then be used to
improve data throughput. sfcp offers similar functionality to bbftp
(perhaps, currently, a little simpler to setup) but has problems
(notably, does not seem to gain from TCP/IP window size optimisation).
While the file transfer is in progress, the incomplete file is
named something
like .StreamNKanga-micro.root.skimImport.$$ with
file permissions that only allow owner access, so it will not be mistaken
for readable data (this file will usually be removed if the transfer fails).
Once the transfer (or --multiple group of transfers) is complete,
the file is renamed to its proper name
(eg. StreamNKanga-micro.root) and the default
file permissions are restored.
If the transferred file size does not match the expected size in the database,
an error is displayed. Otherwise,
unless --noupdate-sql is specified, a connection is made
to MySQL and the import_status is set to 0.
For each file, skimImport displays the
file to be copied before starting, and shows transfer statistics when done
(you can tell it to shut up with the --quiet option).
Each line includes a timestamp (unless --notimestamps is
specified).
More detail of its operation is available with the -v
(or --verbose) option.
This is probably a good thing to try when you first use skimImport
(use, eg., --maxcopy 5, to copy just a few files with all
this detail and then rerun without -v to get the rest).
More -v options give even more detail,
but that's probably going too far.
Most of the transfer tools use ssh authentication. This means that
the user will be prompted for a password before copying each file
(or --multiple group of files - currently only supported
for bbftp), unless a passphraseless identity file is used.
See the documentation on using an SSH
identity file for more details.
The identity file name can be passed to scp (or bbftp or sfcp) using the
--identity option.
Note that skimImport needs to do two types of network authentication.
Firstly it needs to access (and later update) the MySQL database.
This is specified with the --host, --user,
and --pwdfile options, much like the
skimSqlMirror and
skimSqlSelect commands.
skimImport also needs host and authentication information for
the server from which the Kanga files will be transferred.
This is specified by the --node, --remote-user,
and --identity (or special bbftp authentication) options
(similar to the scp syntax, --node user@node or
user@node:dir can be used instead, but the option syntax
is useful to change the user and/or node without changing the directory
root default).
|