|
|
|
|
|
|
Monitoring data is not just
for end-to-end performance analysis
|
|
Lots of Grid “middleware
services” need monitoring data too:
|
|
Grid Schedulers
|
|
find the best match of
CPUs and data sets for a given job
|
|
Grid Replica Selection
|
|
find the “best” copy of a
data set to use
|
|
Reliable File Copy Service
|
|
detect failures and
recover
|
|
Network-aware Applications
|
|
TCP buffer size tuning,
number of parallel streams, etc.
|
|
|
|
Many of these components
already exist or are in progress:
|
|
instrumentation tools
|
|
Pablo (UIUC), NetLogger
(LBNL), Magnet (LANL), ARM (Open Group), log4j (apache), web100, etc.
|
|
host and network sensors
|
|
too many to list
|
|
sensor management tools
|
|
Ganglia, Nagios, NetLogger
|
|
event publication service
|
|
CIM (DMTF), MDS (Globus),
NWS (UCSB), R-GMA (RAL), CODE (NASA), pyGMA (LBNL)
|
|
event archive service
|
|
netarchd (LBNL), NWS (UCSB)
|
|
event analysis and
visualization tools
|
|
lots, but most only work for
specific types of events:
|
|
NetLogger nlv (LBNL), Probe
(Stazi), Autopilot (UIUC), etc.
|
|
BUT, all use different
event formats and protocols!
|
|
no interoperability
|
|
Many of these tools still
in the “early prototype” stage
|
|
|