Stanford Linear Accelerator Center

  Ganglia Monitoring at SLAC

Ganglia Domains:
BaBar
Batch
Public-Login
KIPAC
Fileservers
MonSystems
PetaCache
GLAST
LSST
ATLAS
SIMES

SLAC Computing
UNIX at SLAC
Updated: 02 Dec, 2005

Introduction

Ganglia is an open-source package used for monitoring large UNIX clusters in real-time. Each node in a Ganglia system runs a daemon that reports on the state of it’s host in the form of performance metrics including memory, CPU,load, disk and network statistics. In addition to these built-in metrics,custom metrics have been created specifically for SLAC applications including the BaBar fileservers. Collectors gather the data produced by the daemons and store it in round-robin databases. The information is typically presented in the form of plots via a webserver. The current state of the data can also be obtained in XML format, suitable for export to external sites for GRID applications or conversion to a traditional database format such as mySQL.

Why should I choose Ganglia?

A big advantage of Ganglia is that it is very easy to (re)configure and install. Introducing new hosts and metrics is painless and the architecture supports a scalable hierarchy of data collectors. It is designed to be unobtrusive and only uses a tiny fraction of resources. It is essentially a “fly on the wall”.

What does Ganglia not provide?

Ganglia does not attempt to address service monitoring or reporting (unlike Nagios). So far, we have not come across a single monitoring solution that addresses all of our needs effectively.

I want Ganglia monitoring information for host/cluster X    

Adding new machines and clusters is easy. The installation process is automated and controlled via a central set of scripts. SLAC Ganglia web pages are now partitioned into distinct “Domains” for simple navigation and visibility control. Please contact Yemi Adesanya (yemi@slac.stanford.edu x.2863)

Ganglia Domains at SLAC

KIPAC

We are working closely with the Kavli Insitute to install and administer KIPAC computing resources. This includes Mac OS X and SGI Altix servers.

BaBar

The first SLAC Ganglia monitoring system was created for BaBar in 2004. It currently consists of more than 100 monitored nodes that include xrootd production fileservers. Several custom monitoring daemons were developed to provide detailed disk and tape statistics.The web pages are public and can be found here.

SCCS Batch machines

Plans are underway to extend Ganglia monitoring to cover a broader range of SLAC computing clusters. Stay tuned for more information. This page will be a jump off point to access additional Ganglia domains in the future.

SCCS Fileservers

The AFS fileservers are now part of the Ganglia system. You can keep track of IO activity on the "/vice*" disk partitions.

MonSystems

Ganglia is used to track the performance of it's own central servers and other hosts dedicated to monitoring activities including Nagios and the Network group's traffic monitoring.

PetaCache

Prototype large-memory data servers are here.

GLAST

Servers and nodes dedicated to The Gamma Ray Large Area Space Telescope project.

LSST

Servers and nodes dedicated to The Large Synoptic Survey Telescope project.

ATLAS

Servers and nodes dedicated to SLAC's ATLAS effort.

Sharing Ganglia data with other sites

SLAC is committed to the Open Science Grid (OSG) and SCCS has devoted time and resources in an effort to fully understand how we can all benefit from Grid related infrastructures and tools. Ganglia has already been accepted by other sites supporting the OSG and we expect to export Ganglia data too.


Yemi Adesanya