Learn more about Platform products at http://www.platform.com

[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]



Contents


Welcome

About This Guide
What's New in the Platform LSF Version 6.1
What's New in the Platform LSF Version 6.0
Upgrade and Compatibility Notes
Learning About Platform Products
Getting Technical Support

1 About Platform LSF

Cluster Concepts
Job Life Cycle

2 How the System Works

Job Submission
Job Scheduling and Dispatch
Host Selection
Job Execution Environment
Fault Tolerance

Part I: Managing Your Cluster

3 Working with Your Cluster

Viewing Cluster Information
Default Directory Structures
Cluster Administrators
Controlling Daemons
Controlling mbatchd
Reconfiguring Your Cluster

4 Working with Hosts

Host States
Viewing Host Information
Controlling Hosts
Adding a Host
Removing a Host
Adding Hosts Dynamically
Adding Host Types and Host Models to lsf.shared
Registering Service Ports
Host Naming
Hosts with Multiple Addresses
Host Groups
Tuning CPU Factors
Handling Host-level Job Exceptions

5 Working with Queues

Queue States
Viewing Queue Information
Controlling Queues
Adding and Removing Queues
Managing Queues
Handling Job Exceptions

6 Managing Jobs

Job States
Viewing Job Information
Changing Job Order Within Queues
Switching Jobs from One Queue to Another
Forcing Job Execution
Suspending and Resuming Jobs
Killing Jobs
Sending a Signal to a Job
Using Job Groups

7 Managing Users and User Groups

Viewing User and User Group Information
About User Groups
Existing User Groups as LSF User Groups
LSF User Groups

Part II: Working with Resources

8 Understanding Resources

About LSF Resources
How Resources are Classified
How LSF Uses Resources
Load Indices
Static Resources
Automatic Detection of Hardware Reconfiguration

9 Adding Resources

About Configured Resources
Adding New Resources to Your Cluster
Configuring lsf.shared Resource Section
Configuring lsf.cluster.cluster_name ResourceMap Section
Static Shared Resource Reservation
External Load Indices and ELIM
Modifying a Built-In Load Index

10 Managing Software Licenses with LSF

Using Licensed Software with LSF
Host Locked Licenses
Counted Host Locked Licenses
Network Floating Licenses

Part III: Scheduling Policies

11 Time Syntax and Configuration

Specifying Time Values
Specifying Time Windows
Specifying Time Expressions
Automatic Time-based Configuration

12 Deadline Constraint and Exclusive Scheduling

Deadline Constraint Scheduling
Exclusive Scheduling

13 Preemptive Scheduling

About Preemptive Scheduling
How Preemptive Scheduling Works
Configuring Preemptive Scheduling

14 Specifying Resource Requirements

About Resource Requirements
Queue-Level Resource Requirements
Job-Level Resource Requirements
About Resource Requirement Strings
Selection String
Order String
Usage String
Span String
Same String

15 Fairshare Scheduling

About Fairshare Scheduling
User Share Assignments
Dynamic User Priority
How Fairshare Affects Job Dispatch Order
Host Partition User-Based Fairshare
Queue-Level User-Based Fairshare
Cross-Queue User-Based Fairshare
Hierarchical User-Based Fairshare
Queue-Based Fairshare
Configuring Queue-Based Fairshare
Viewing Queue-Based Fairshare Allocations
Typical Slot Allocation Scenarios
Using Historical and Committed Run Time
Users Affected by Multiple Fairshare Policies
Ways to Configure Fairshare

16 Goal-Oriented SLA-Driven Scheduling

Using Goal-Oriented SLA Scheduling
Configuring Service Classes for SLA Scheduling
Viewing Information about SLAs and Service Classes
Understanding Service Class Behavior

Part IV: Job Scheduling and Dispatch

17 Resource Allocation Limits

About Resource Allocation Limits
Configuring Resource Allocation Limits
Viewing Information about Resource Allocation Limits

18 Reserving Resources

About Resource Reservation
Using Resource Reservation
Memory Reservation for Pending Jobs
Viewing Resource Reservation Information

19 Advance Reservation

About Advance Reservation
Configuring Advance Reservation
Using Advance Reservation

20 Dispatch and Run Windows

Dispatch and Run Windows
Run Windows
Dispatch Windows

21 Job Dependencies

Job Dependency Scheduling
Dependency Conditions

22 Job Priorities

User-Assigned Job Priority
Automatic Job Priority Escalation

23 Job Requeue and Job Rerun

About Job Requeue
Automatic Job Requeue
Reverse Requeue
Exclusive Job Requeue
User-Specified Job Requeue
Automatic Job Rerun

24 Job Checkpoint, Restart, and Migration

Checkpointing Jobs
Approaches to Checkpointing
Creating Custom echkpnt and erestart for Application-level Checkpointing
Checkpointing a Job
The Checkpoint Directory
Making Jobs Checkpointable
Manually Checkpointing Jobs
Enabling Periodic Checkpointing
Automatically Checkpointing Jobs
Restarting Checkpointed Jobs
Migrating Jobs

25 Chunk Job Dispatch

About Job Chunking
Configuring a Chunk Job Dispatch
Submitting and Controlling Chunk Jobs

26 Job Arrays

Creating a Job Array
Handling Input and Output Files
Redirecting Standard Input and Output
Passing Arguments on the Command Line
Job Array Dependencies
Monitoring Job Arrays
Controlling Job Arrays
Requeuing a Job Array
Job Array Job Slot Limit

Part V: Controlling Job Execution

27 Runtime Resource Usage Limits

About Resource Usage Limits
Specifying Resource Usage Limits
Supported Resource Usage Limits and Syntax
CPU Time and Run Time Normalization

28 Load Thresholds

Automatic Job Suspension
Suspending Conditions

29 Pre-Execution and Post-Execution Commands

About Pre-Execution and Post-Execution Commands
Configuring Pre- and Post-Execution Commands

30 Job Starters

About Job Starters
Command-Level Job Starters
Queue-Level Job Starters
Controlling Execution Environment Using Job Starters

31 External Job Submission and Execution Controls

Understanding External Executables
Using esub
Working with eexec

32 Configuring Job Controls

Default Job Control Actions
Configuring Job Control Actions
Customizing Cross-Platform Signal Conversion

Part VI: Interactive Jobs

33 Interactive Jobs with bsub

About Interactive Jobs
Submitting Interactive Jobs
Performance Tuning for Interactive Batch Jobs
Interactive Batch Job Messaging
Running X Applications with bsub
Writing Job Scripts
Registering utmp File Entries for Interactive Batch Jobs

34 Running Interactive and Remote Tasks

Running Remote Tasks
Interactive Tasks
Load Sharing Interactive Sessions
Load Sharing X Applications

Part VII: Running Parallel Jobs

35 Running Parallel Jobs

How LSF Runs Parallel Jobs
Preparing Your Environment to Submit Parallel Jobs to LSF
Submitting Parallel Jobs
Starting Parallel Tasks with LSF Utilities
Job Slot Limits For Parallel Jobs
Specifying a Minimum and Maximum Number of Processors
Specifying a Mandatory First Execution Host
Controlling Processor Allocation Across Hosts
Running Parallel Processes on Homogeneous Hosts
Using LSF Make to Run Parallel Jobs
Limiting the Number of Processors Allocated
Reserving Processors
Reserving Memory for Pending Parallel Jobs
Allowing Jobs to Use Reserved Job Slots
Parallel Fairshare
How Deadline Constraint Scheduling Works For Parallel Jobs
Optimized Preemption of Parallel Jobs

Part VIII: Monitoring Your Cluster

36 Achieving Performance and Scalability

Optimizing Performance in Large Sites
Tuning UNIX for Large Clusters
Tuning LSF for Large Clusters

37 Event Generation

Event Generation

38 Tuning the Cluster

Tuning LIM
Adjusting LIM Parameters
Load Thresholds
Changing Default LIM Behavior to Improve Performance
Tuning mbatchd on UNIX

39 Authentication

About User Authentication
About Host Authentication
About Daemon Authentication
LSF in Multiple Authentication Environments
User Account Mapping

40 Job Email, and Job File Spooling

Mail Notification When a Job Starts
File Spooling for Job Input, Output, and Command Files

41 Non-Shared File Systems

About Directories and Files
Using LSF with Non-Shared File Systems
Remote File Access
File Transfer Mechanism (lsrcp)

42 Error and Event Logging

System Directories and Log Files
Managing Error Logs
System Event Log
Duplicate Logging of Event Logs
LSF Job Termination Reason Logging

43 Troubleshooting and Error Messages

Shared File Access
Common LSF Problems
Error Messages
Setting Daemon Message Log to Debug Level
Setting Daemon Timing Levels

Part IX: LSF Utilities

44 Using lstcsh

About lstcsh
Task Lists
Local and Remote Modes
Automatic Remote Execution
Differences from Other Shells
Limitations
Starting lstcsh
Using lstcsh as Your Login Shell
Host Redirection
Task Control
Built-in Commands
Writing Shell Scripts in lstcsh

Index

[ Top ]


[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]


      Date Modified: June 06, 2005
Platform Computing: www.platform.com

Platform Support: support@platform.com
Platform Information Development: doc@platform.com

Copyright © 1994-2005 Platform Computing Corporation. All rights reserved.