SLAC ESD Software Engineering Group
Stanford Linear Accelerator Center

Test Facilities Support Documentation

 
SLAC Computing
Software Engineering
Detailed
Unix
 

 

 


 


 

 

 

Subject: Computing Infrastructure Support for Test Facilities

Date: 11/11/2015

To: Test Facilities Management

From: Jingchen Zhou and Debbie Rogind

 

Listed below is the on-going minimum effort required for sustaining engineering and maintenance to provide computing infrastructure support for Test Facilities programs in a centralized fashion (as described in http://www.slac.stanford.edu/grp/cd/soft/unix/slaconly/testfac.html).  

The computing infrastructure is very fundamental and critical for Test Facilities operations. At any given time, the infrastructure can be in use by facility users (experimenters), physicists, operators, developers (HW and SW), maintenance crews, safety crews, etc. It must be kept up even during program downtime so that other work (data analysis, accelerator maintenance, development, etc.) can be performed with no interruption. Hardware, software packages, firmware, and OSes must be kept up-to-date to ensure that the computing infrastructure is maintainable and sustainable. The controls system is highly integrated, such that upgrades performed on one component (e.g., upgrading a software package to meet the DOE Cyber Security requirement) often require other interrelated components to be upgraded as well for compatibility reasons. This effort is required to ensure a minimum integrity of controls system and data for Test Facilities, and to be compliant with DOE requirements.  At minimum, 20% of one FTE from the system group is required for this on-going effort. The effort does not include high availability computing services or 24/7 support like we have for LCLS.

 

 

  • MCC network management for Test Facilities computing
    • IP management
    • Network monitoring
    • Troubleshooting
    • Configuration update
    • Firmware upgrade
  • UPS management
    • UPS monitoring
    • Configuration update
    • Firmware upgrade
    • Routine graceful shutdown test
    • Battery check and replacement
  • Local console support
    • Configuration update
    • KVM Firmware upgrade
  • Remote console support
    • DRAC firmware upgrade
    • GUI interface check and Plugin upgrade
    • Browser upgrade
  • Controls server management
    • Disk mirroring and manual failover test
    • RAID management
    • Open Management upgrade
    • Hardware upgrade
    • Server firmware upgrade and test
    • Kernel upgrade and test
    • OS and Security patching and testing
    • System hardware monitoring
    • System OS monitoring
  • Linux based Camera server management in the field
  • Printing management (CUPs, and printing queues, etc)
  • Infrastructure support for controls applications: softIOCs, ChannelWatcher, PV gateways, application filesystem protection (ACLs)
  • Data management in NFS
  • Local backup and restore for controls application servers listed in the user guide.
  • Computing infrastructure monitoring and analysis
    • System error logging (syslogd)
    • Unix Watchdog
    • Cron jobs
  • Controls system software (libraries and packages) management
    • Software build
    • Software upgrade
    • Software security management
  • Matlab license management
  • Controls production environment management
  • Elogs management and support
  • Archiver system management
  • Controls cyber security management
    • Security monitoring
    • Vulnerability fixing
    • Security report and review 
  • Resolve issues with SCCS on AFS, NFS, DHCP, DNS, NIS, Taylor  the Test Facilities servers depend on
  • Facility downtime coordination
  • User support (troubleshooting, account, development, system automation, tools, OS and etc.)
    • facility users
    • developers
    • physicists
    • maintenance crews
    • etc.
  • Resolve operations related issues
  • Troubleshooting
    • System hardware
    • Firmware
    • Linux
    • Networking
    • Controls infrastructure applications
    • Controls applications

 

 

Reference:

Computing Infrastructure for Test Facilities:  http://www.slac.stanford.edu/grp/cd/soft/unix/slaconly/testfac.html

 

 

[SLAC ESD Software Engineering Group][ SLAC Home Page]


Author:K. Brobeck

Modified: 12-Nov-2015