Stanford Linear Accelerator Center

  Ganglia Monitoring at SLAC

Ganglia Domains:
BaBar
Batch
Public-Login
KIPAC
Fileservers
MonSystems
GLAST
LSST
ATLAS
SIMES
SUNCAT

SLAC Computing
UNIX at SLAC
Updated: 12 Aug, 2010

Introduction

Ganglia is an open-source package used for monitoring large UNIX clusters in real-time. Each node in a Ganglia system runs a daemon that reports on the state of it’s host in the form of performance metrics including memory, CPU,load, disk and network statistics. In addition to these built-in metrics,custom metrics have been created specifically for SLAC applications including the BaBar fileservers. Collectors gather the data produced by the daemons and store it in round-robin databases. The information is typically presented in the form of plots via a webserver. The current state of the data can also be obtained in XML format, suitable for export to external sites for GRID applications or conversion to a traditional database format such as mySQL.

I want Ganglia monitoring information for host/cluster X    

Adding new machines and clusters is easy. The installation process is automated and controlled via a central set of scripts. SLAC Ganglia web pages are now partitioned into distinct “Domains” for simple navigation and visibility control. Please contact Yemi Adesanya (yemi@slac.stanford.edu x.2863)

Ganglia Domains at SLAC

KIPAC

We are working closely with the Kavli Insitute to install and administer KIPAC computing resources.

BaBar

The first SLAC Ganglia monitoring system was created for BaBar in 2004. It currently consists of more than 100 monitored nodes that include xrootd production fileservers. Several custom monitoring daemons were developed to provide detailed disk and tape statistics.The web pages are public and can be found here.

SCCS Batch machines

Plans are underway to extend Ganglia monitoring to cover a broader range of SLAC computing clusters. Stay tuned for more information. This page will be a jump off point to access additional Ganglia domains in the future.

SCCS Fileservers

The AFS fileservers are now part of the Ganglia system. You can keep track of IO activity on the "/vice*" disk partitions.

MonSystems

Ganglia is used to track the performance of it's own central servers and other hosts dedicated to monitoring activities including Nagios and the Network group's traffic monitoring.

GLAST

Servers and nodes dedicated to The Gamma Ray Large Area Space Telescope project.

LSST

Servers and nodes dedicated to The Large Synoptic Survey Telescope project.

ATLAS

Servers and nodes dedicated to SLAC's ATLAS effort.

Sharing Ganglia data with other sites

SLAC is committed to the Open Science Grid (OSG) and SCCS has devoted time and resources in an effort to fully understand how we can all benefit from Grid related infrastructures and tools. Ganglia has already been accepted by other sites supporting the OSG and we expect to export Ganglia data too.


Yemi Adesanya