Diagnose query requests

LSF provides mbatchd system query monitoring mechanisms to help admin/support diagnose problems with clusters. This is useful when query requests generate a heavy load on the system, slowing down LSF and preventing responses to requests. Some possible causes of performance degradation by query requests include:

  • High network load caused by repeated query requests. For example, queries generated by a script run by the user or administrator (i.e., bqueues command run frequently from one host).

  • Large data size of queries from the master host using up network bandwidth (e.g., running bjobs -a -u all in a large cluster).

  • Huge number of TCP requests generated by a host.

This feature enables mbatchd to write the query source information to a log file. The log file shows information about the source of mbatchd queries, allowing you to troubleshoot problems. The log file shows who issued these requests, where the requests came from, and the data size of the query.

There are two ways to enable this feature:

  • Statically, by setting both the ENABLE_DIAGNOSE and DIAGNOSE_LOGDIR parameters in lsb.params.

  • Dynamically, with the badmin diagnose -c query command.

The dynamic method overrides the static settings. However, if you restart or reconfigure mbatchd, it switches back to the static diagnosis settings.