Introduction
Ganglia
is an open-source package used for monitoring large UNIX clusters in
real-time. Each node in a Ganglia system runs a daemon that reports on
the state of it’s host in the form of performance metrics including
memory, CPU,load, disk and network statistics. In addition to these
built-in metrics,custom metrics have been created specifically for SLAC
applications including the BaBar
fileservers. Collectors gather the data
produced by the daemons and store it in round-robin databases. The
information is typically presented in the form of plots via a
webserver.
The current state of the data can also be obtained in XML format,
suitable for export to external sites for GRID applications
or conversion to a traditional database format such as mySQL.
I want
Ganglia monitoring information for host/cluster X
Adding
new machines and clusters is easy. The installation process is
automated
and controlled via a central set of scripts. SLAC Ganglia web pages are
now partitioned into distinct “Domains” for simple navigation and
visibility control. Please contact Yemi Adesanya (yemi@slac.stanford.edu x.2863)
Ganglia
Domains at SLAC
We are
working closely with the Kavli Insitute to install and administer KIPAC
computing resources.
The
first SLAC Ganglia monitoring system was created for BaBar in 2004. It
currently consists of more than 100 monitored nodes that include xrootd
production fileservers. Several custom monitoring daemons were
developed to provide detailed disk and tape statistics.The web pages
are public and can be found here.
Plans
are underway to extend Ganglia monitoring to cover a broader range of
SLAC computing clusters. Stay tuned for more information. This page
will
be a jump off point to access additional Ganglia domains in the future.
The AFS fileservers are now part of the Ganglia system. You can keep
track of IO activity on the "/vice*" disk partitions.
Ganglia is used to track the performance of it's own central servers
and other hosts dedicated to monitoring activities including Nagios and
the Network group's traffic monitoring.
Servers and nodes dedicated to The Gamma Ray Large Area Space Telescope
project.
Servers and nodes dedicated to The Large Synoptic Survey Telescope
project.
Servers and nodes dedicated to SLAC's ATLAS effort.
Sharing Ganglia data with other sites
SLAC
is committed to the Open Science Grid (OSG) and SCCS has devoted time
and
resources in an effort to fully understand how we can all benefit from
Grid related infrastructures and tools. Ganglia has already been
accepted by other sites supporting the OSG and we expect to export
Ganglia
data too.
Yemi Adesanya