Power parameters in lsb.params

The power state management parameters in lsb.params enable the power management feature.

Suspend, Resume, Reset

To enable the power state management parameters in lsb.params, a valid definition includes at least one POWER_SUSPEND_CMD and POWER_RESUME_CMD pair. The configured command must have full path for execution. For example:

  • POWER_SUSPEND_CMD = $LSF_SERVERDIR/../../util/eass3/rpower_suspend.sh
  • POWER_RESUME_CMD = $LSF_SERVERDIR/../../util/eass3/rpower_resume.sh
  • POWER_RESET_CMD = $LSF_SERVERDIR/../../util/eass3/rpower_reset.sh

The power parameters support the following power actions:

  • Suspend (POWER_SUSPEND_CMD) put the host in energy saving state. Defines suspend operation command which will be called when LSF handles a host suspend power request. LSF uses the command in the format:

    command host [host …]

    The command can parse all its arguments as a host list. The command must return 0 if the power control action succeeds and 1 if the power control action fails. Each line of the output has a host and its return value. For example:

    host1 0host2 1

    A host can be suspended manually or by the power policy. A pending job can resume a suspended host only if it was suspended by the power policy. If the host was suspended manually (badmin hpower suspend), the job cannot put the host back into working state (power resume).

  • Resume (POWER_RESUME_CMD) put the host in working state. Defines the resume operation command which will be called when LSF handles a host resume power request. It should be an opposite operation to POWER_SUSPEND_CMD.
  • Reset (POWER_RESET_CMD) resets the host. A reset is issued to the host if it fails to join the cluster within a specified time after the resume command is issued (either by manual resume command, or resume triggered by a pending job). The timeout is configured by the parameter POWER_SUSPEND_TIMEOUT in lsb.params and the default is 10 minutes.

The power parameters are applied cluster-wide, to all configured power policies and manual power operations performed by the administrator. Both POWER_SUSPEND_CMD and POWER_RESUME_CMD must be specified.

The host can only enter a power saving (suspend) state when it is idle (that is, no jobs are running; NJOBS=0) and the host is in “ok” state. For example:

POWER_SUSPEND_CMD= rpower suspend
POWER_RESUME_CMD= rpower onstandby
POWER_RESET_CMD= rpower reset

Configuring events switching

The parameter POWER_STATUS_LOG_MAX in lsb.params is used to configure a trigger value for events switching. The default value is 10000. This value takes effect only if PowerPolicy (in lsb.resources) is enabled.

If a finished job number is not larger than the value of MAX_JOB_NUM, the event switch can also be triggered by POWER_STATUS_LOG_MAX, which works with MIN_SWITCH_PERIOD.

Configuring a wait time after resume

The parameter POWER_ON_WAIT in lsb.params is used to configure a wait time (in seconds) after a host is resumed and enters ok status, before dispatching a job. This is to allow other services on the host to restart and enter a ready state. The default value is 0 and is applied globally.