Enabling Grids for E-sciencE
Approaches to batch system description, match-making and transfer of job control in the context of the EGEE project and the 'glite' software collection.

Francesco Prelz - INFN Milano
HEPix, October 12th 2005
INFSO-RI-508833


Contents
Enabling Grids for E-sciencE
  • How to describe batch systems.
  • How to delegate job control to batch systems.
  • Options to select individual worker nodes within a batch system.
  • Options to making policy requirements known to batch system admins.

INFSO-RI-508833


Computing center modeling
Enabling Grids for E-sciencE
  • A convention avoids the proliferation of Rosetta stones and facilitates making new resources available.
  • Was evolved via minutious negotiations into a verifiable, extensible schema.
INFSO-RI-508833


Batch system modeling
Enabling Grids for E-sciencE
INFSO-RI-508833


Classad-based matchmaking
Enabling Grids for E-sciencE
  • At a 'global' scale, we can barely afford to choose the batch system queue where to submit ==>
  • The schema is flattened, in particular the host characteristics, that need to represent some form of minimum common denominator of the hosts in the queue.
[ id = "atlfarm006.mi.infn.it:2119/blah-pbs-long"; update_time = 1128351882; expiry_time = 300; info = [ GlueCEInfoGatekeeperPort = 2119; GlueCEStateStatus = "Production"; GlueSubClusterUniqueID = "atlfarm006.mi.infn.it"; PurchasedBy = "ism_cemon_asynch_purchaser"; GlueCESEBindGroupSEUniqueID = { "gridftp05.cern.ch" }; GlueCESEBindGroupCEUniqueID = "atlfarm006.mi.infn.it:2119/blah-pbs-long"; GlueHostArchitectureSMPSize = 2; GlueHostMainMemoryVirtualSize = 2002; GlobusResourceContactString = "atlfarm006.mi.infn.it:2119/blah-pbs"; TTLCEinfo = 300; GlueHostOperatingSystemVersion = "3.0.3"; QueueName = "infinite"; GlueCEStateFreeCPUs = 1; GlueCEInfoTotalCPUs = 1; LRMSType = "pbs"; GlueHostProcessorModel = "PIII"; GlueCEStateWaitingJobs = 0; GlueSubClusterName = "atlfarm006.mi.infn.it"; GlueCEPolicyMaxWallClockTime = 172800; GlueCESEBindSEUniqueID = "gridftp05.cern.ch"; GlueCEStateTotalJobs = 0; GlueHostOperatingSystemRelease = "3.0.3"; GlueCEPolicyMaxCPUTime = 172800; CEid = "atlfarm006.mi.infn.it:2119/blah-pbs-long"; GlueHostProcessorVendor = "intel"; GlueHostMainMemoryRAMSize = 1001; AuthorizationCheck = member(other.CertificateSubject,GlueCEAccessControlBaseRule) || member(strcat("VO:",other.VirtualOrganisation), GlueCEAccessControlBaseRule); CloseStorageElements = { [ mount = "/mn/SE2"; name = gridftp05.cern.ch ] }; GlueCEStateRunningJobs = 0; GlueCEHostingCluster = "atlfarm006.mi.infn.it"; GlueHostBenchmarkSI00 = 400; GlueClusterUniqueID = "atlfarm006.mi.infn.it"; GlueForeignKey = { "GlueClusterUniqueID=atlfarm006.mi.infn.it", "GlueCEUniqueID=atlfarm006.mi.infn.it:2119/blah-pbs-long" }; GlueCEInfoLRMSType = "pbs"; GlueCEInfoHostName = "atlfarm006.mi.infn.it"; GlueHostOperatingSystemName = "SLC"; GlueCEStateEstimatedResponseTime = 0; GlueChunkKey = { "GlueClusterUniqueID=atlfarm006.mi.infn.it" }; GlueClusterService = { "atlfarm006.mi.infn.it" }; CloseOutputSECheck = IsUndefined(other.OutputSE) || member(other.OutputSE,GlueCESEBindGroupSEUniqueID); GlueHostApplicationSoftwareRunTimeEnvironment = { "GLITE_1_2", "APP1", "APP2", "APP3", "APP4", "APP5" }; GlueCEInfoLRMSVersion = "Torque_1.0"; GlueClusterName = "atlfarm006.mi.infn.it"; GlueHostNetworkAdapterOutboundIP = true; GlueHostProcessorClockSpeed = 1000; GlueCEAccessControlBaseRule = { "VO:EGEE" }; GlueCEPolicyMaxRunningJobs = 99999; GlueHostBenchmarkSF00 = 380; GlueCEName = "infinite"; GlueHostNetworkAdapterInboundIP = false; GlueCEStateWorstResponseTime = 0; GlueCEUniqueID = "atlfarm006.mi.infn.it:2119/blah-pbs-long"; GlueInformationServiceURL = { undefined, undefined, undefined }; GlueCEPolicyMaxTotalJobs = 999999; GlueCEPolicyPriority = 1 ] ]
INFSO-RI-508833


Job control delegation
Enabling Grids for E-sciencE

Most of the work goes into dealing with exceptions:

  • Lack of proper (doubly-committed) transactions to transfer job control.
  • Jobs disappearing in bottomless pits.
  • Lack of proper lease mechanisms to deal with lost and forgotten jobs.
INFSO-RI-508833


How can individual WNs be selected ?
Enabling Grids for E-sciencE

If one cannot create queues to serve uniform set of machines there are fundamentally two options:

  1. Extra match-making step at the CE level, picking individual nodes
  2. Let the local sysadmin handle mapping of job requirements

  • Both require measurement/knowledge of characteristics of individual nodes
  • Both entail a logical "partitioning" of the job queue: order can be lost, and reasons for jobs remaining on hold is not obvious when looking from the outside
  • => any attempt at choosing the 'best' queue at the 'global' level based on queue characteristics and status becomes fundamentally flawed.

INFSO-RI-508833


Matchmaking approach
Enabling Grids for E-sciencE

Pros:
Same matchmaking library and approach can be used at the 'global' and the 'local' level.
Restricting submission to a subset of worker nodes is supported by some batch systems (e.g. LSF) directly.
Cons:
Normally a matchmaker is not deployed on batch system head nodes.
Requires to keep the host descriptions up-to-date.
Submission to a subset of worker nodes requires configuration on some batch systems (e.g. on PBS/Torque).

INFSO-RI-508833


DIY Approach
Enabling Grids for E-sciencE

Pros:
Leaves batch system administrators free to choose queue configuration, node properties, etc.
Leaves batch system administrators free to choose which host attributes are supported for subcluster selection, and how.
Cons:
Requires to develop algorithm to reduce and extract the Host requirements from the job requirements expression, or to extract the Host requirements by hand into a separate expression.
Non-trivial example:
Requirements = other.GlueHostBenchmarkSI00 > 500 ? 
               other.GlueCEPolicyMaxCPUTime > 1000:
               other.GlueCEPolicyMaxCPUTime > 2000
Requires batch system administrators to map the attribute requirements onto submit flags via local scripts.

INFSO-RI-508833


Forwarding of queue policy requirements
Enabling Grids for E-sciencE
  • Interest for any requirement other than CPU time and wall clock time ?
  • Could be a by-product of 'DIY' approach to expose Host requirements.
  • We need to understand how people plan to use this info:
    while a call-out at the batch submission layer (BLAH) could suffice to map Host requirements to batch system submission, policy requirements probably need to be 'queried' out of individual batch system jobs.
  • What support for generic job properties/metadata do we have in the current batch systems matrix ?
  • We strived so far to keep the BLAHP service stateless.
  • Would this require a new component to monitor jobs ?
  • The CREAM development on-going at INFN Padova is fundamentally the coupling of a web service to submit and monitor jobs with BLAH job submission.
INFSO-RI-508833


References
Enabling Grids for E-sciencE
INFSO-RI-508833