This utility is developed as an alternative solution to the problem of accommodating large volume of production data accumulated at a high rate due to system constrains. The constrains include lack of a dedicated disk server with RAID controller, limit to add additional disks to the PEPII gateway machines, and restriction to log data directly to the SCS NFS storage area due to the standalone controls network and security concerns. It has been proposed to integrate a NFS system local to the controls network as a long term and proper storage solution. For the time being, we consider that this utility is an alternative solution to the problem. The basic idea in this utility is to stage the data in the production storage area and then distribute to the SCS NFS area.
Take the acoustic sensor data from NLCTA as an example. The data is accumulated at a rate of ~ 100 MB/day. More IOCs will be added, thus higher rate will be expected. As shown in the diagram below, the data is sent from NLCTA ioc and logged into /u1, the PEPII data storage area. See the dashed blue line for data flow. The data is kept online for 24 hours, and is then distributed to SCS NFS storage area via a procedure named "distnlctadata". The procedure runs on opi00gtw04, NLCTA server, on a daily basis. Once the distribution is completed and the data is archived, the data will be removed from the staging area via another procedure named "cleanup" running on opi00gtw00, PEPII gateway server machine. The dashed red line in the diagram indicates this flow.
Procedure "distnlctadata" is implemented with an emphasis on data integrity and reliability in data distribution. The basics include:
Procedure "cleanup" is then triggered by distnlctadata via ssh after data distribution and archiving is completed and verified, and executed on opi00gtw00. X forwarding is disabled in ssh command for the fastest triggering. Only the data files which are older than one day are removed from the PEPII staging area, in order to eliminate the possibility of data loss in case that a fault trip occurs during the process of data distribution and cleanup. Cleanup is set back by the elapsed time which procedure distnlctadata takes, as an additional measure to prevent more data being removed. The backup via SCS tape facility is not in place yet.
All messages will be logged into /tmp/distnlctadata.log on opi00gtw04, any failure will disable cleanup procedure and keep user informed via e-mail.
Procedure distnlctadata is /afs/slac/g/cd/soft/scripts on AFS system, cleanup is in /usr/local/bin on PEPII gateway machine. distnlctadata runs as a trscron job on a daily basis. The trscron job is defined as
opi00gtw04;60 01 04 * * * (/afs/slac/g/cd/soft/script/distnlctadata) > /tmp/distnlctadata.log 2>&1
and submitted from shared account cddev. See all trscron job definitions for cddev in /afs/slac/g/cd/soft/scripts/trscrontab.cddev. The AFS token lifetime for this trscorn job is extended to 60 minutes, longer than default (15 minutes) in order to prevent the token from expiring before the data distribution is completed. To make trscron and triggering (i.e., remotely invoking the cleanup on opi00gtw00 from opi00gtw04 via ssh) work, there is a need to run "ssh-keygen" on opi00gtw04 to generate a public/private RSA key pair (no passphrase!) and then append it to authorized_keys file on both opi00gtw00 (for triggering) and AFS (for trscron job).
 Kristi Luchini: "Unix-based Data Storage Requirement"
 Jingchen Zhou. "Controls System UNIX Computing Environment and Data Management Facilities"
Contact: Jingchen Zhou (X4661, jingchen@slac). Last edited on 03/29/02