Enabling Grids for E-sciencE
Approaches to batch system description, match-making and transfer of job control in the context of the EGEE project and the 'glite' software collection.

Francesco Prelz - INFN Milano
HEPix, October 12th 2005

Enabling Grids for E-sciencE
  • How to describe batch systems.
  • How to delegate job control to batch systems.
  • Options to select individual worker nodes within a batch system.
  • Options to making policy requirements known to batch system admins.


Computing center modeling
Enabling Grids for E-sciencE
  • A convention avoids the proliferation of Rosetta stones and facilitates making new resources available.
  • Was evolved via minutious negotiations into a verifiable, extensible schema.

Batch system modeling
Enabling Grids for E-sciencE

Classad-based matchmaking
Enabling Grids for E-sciencE
  • At a 'global' scale, we can barely afford to choose the batch system queue where to submit ==>
  • The schema is flattened, in particular the host characteristics, that need to represent some form of minimum common denominator of the hosts in the queue.
[ id = "atlfarm006.mi.infn.it:2119/blah-pbs-long"; update_time = 1128351882; expiry_time = 300; info = [ GlueCEInfoGatekeeperPort = 2119; GlueCEStateStatus = "Production"; GlueSubClusterUniqueID = "atlfarm006.mi.infn.it"; PurchasedBy = "ism_cemon_asynch_purchaser"; GlueCESEBindGroupSEUniqueID = { "gridftp05.cern.ch" }; GlueCESEBindGroupCEUniqueID = "atlfarm006.mi.infn.it:2119/blah-pbs-long"; GlueHostArchitectureSMPSize = 2; GlueHostMainMemoryVirtualSize = 2002; GlobusResourceContactString = "atlfarm006.mi.infn.it:2119/blah-pbs"; TTLCEinfo = 300; GlueHostOperatingSystemVersion = "3.0.3"; QueueName = "infinite"; GlueCEStateFreeCPUs = 1; GlueCEInfoTotalCPUs = 1; LRMSType = "pbs"; GlueHostProcessorModel = "PIII"; GlueCEStateWaitingJobs = 0; GlueSubClusterName = "atlfarm006.mi.infn.it"; GlueCEPolicyMaxWallClockTime = 172800; GlueCESEBindSEUniqueID = "gridftp05.cern.ch"; GlueCEStateTotalJobs = 0; GlueHostOperatingSystemRelease = "3.0.3"; GlueCEPolicyMaxCPUTime = 172800; CEid = "atlfarm006.mi.infn.it:2119/blah-pbs-long"; GlueHostProcessorVendor = "intel"; GlueHostMainMemoryRAMSize = 1001; AuthorizationCheck = member(other.CertificateSubject,GlueCEAccessControlBaseRule) || member(strcat("VO:",other.VirtualOrganisation), GlueCEAccessControlBaseRule); CloseStorageElements = { [ mount = "/mn/SE2"; name = gridftp05.cern.ch ] }; GlueCEStateRunningJobs = 0; GlueCEHostingCluster = "atlfarm006.mi.infn.it"; GlueHostBenchmarkSI00 = 400; GlueClusterUniqueID = "atlfarm006.mi.infn.it"; GlueForeignKey = { "GlueClusterUniqueID=atlfarm006.mi.infn.it", "GlueCEUniqueID=atlfarm006.mi.infn.it:2119/blah-pbs-long" }; GlueCEInfoLRMSType = "pbs"; GlueCEInfoHostName = "atlfarm006.mi.infn.it"; GlueHostOperatingSystemName = "SLC"; GlueCEStateEstimatedResponseTime = 0; GlueChunkKey = { "GlueClusterUniqueID=atlfarm006.mi.infn.it" }; GlueClusterService = { "atlfarm006.mi.infn.it" }; CloseOutputSECheck = IsUndefined(other.OutputSE) || member(other.OutputSE,GlueCESEBindGroupSEUniqueID); GlueHostApplicationSoftwareRunTimeEnvironment = { "GLITE_1_2", "APP1", "APP2", "APP3", "APP4", "APP5" }; GlueCEInfoLRMSVersion = "Torque_1.0"; GlueClusterName = "atlfarm006.mi.infn.it"; GlueHostNetworkAdapterOutboundIP = true; GlueHostProcessorClockSpeed = 1000; GlueCEAccessControlBaseRule = { "VO:EGEE" }; GlueCEPolicyMaxRunningJobs = 99999; GlueHostBenchmarkSF00 = 380; GlueCEName = "infinite"; GlueHostNetworkAdapterInboundIP = false; GlueCEStateWorstResponseTime = 0; GlueCEUniqueID = "atlfarm006.mi.infn.it:2119/blah-pbs-long"; GlueInformationServiceURL = { undefined, undefined, undefined }; GlueCEPolicyMaxTotalJobs = 999999; GlueCEPolicyPriority = 1 ] ]

Job control delegation
Enabling Grids for E-sciencE

Most of the work goes into dealing with exceptions:

  • Lack of proper (doubly-committed) transactions to transfer job control.
  • Jobs disappearing in bottomless pits.
  • Lack of proper lease mechanisms to deal with lost and forgotten jobs.

How can individual WNs be selected ?
Enabling Grids for E-sciencE

If one cannot create queues to serve uniform set of machines there are fundamentally two options:

  1. Extra match-making step at the CE level, picking individual nodes
  2. Let the local sysadmin handle mapping of job requirements

  • Both require measurement/knowledge of characteristics of individual nodes
  • Both entail a logical "partitioning" of the job queue: order can be lost, and reasons for jobs remaining on hold is not obvious when looking from the outside
  • => any attempt at choosing the 'best' queue at the 'global' level based on queue characteristics and status becomes fundamentally flawed.


Matchmaking approach
Enabling Grids for E-sciencE

Same matchmaking library and approach can be used at the 'global' and the 'local' level.
Restricting submission to a subset of worker nodes is supported by some batch systems (e.g. LSF) directly.
Normally a matchmaker is not deployed on batch system head nodes.
Requires to keep the host descriptions up-to-date.
Submission to a subset of worker nodes requires configuration on some batch systems (e.g. on PBS/Torque).


DIY Approach
Enabling Grids for E-sciencE

Leaves batch system administrators free to choose queue configuration, node properties, etc.
Leaves batch system administrators free to choose which host attributes are supported for subcluster selection, and how.
Requires to develop algorithm to reduce and extract the Host requirements from the job requirements expression, or to extract the Host requirements by hand into a separate expression.
Non-trivial example:
Requirements = other.GlueHostBenchmarkSI00 > 500 ? 
               other.GlueCEPolicyMaxCPUTime > 1000:
               other.GlueCEPolicyMaxCPUTime > 2000
Requires batch system administrators to map the attribute requirements onto submit flags via local scripts.


Forwarding of queue policy requirements
Enabling Grids for E-sciencE
  • Interest for any requirement other than CPU time and wall clock time ?
  • Could be a by-product of 'DIY' approach to expose Host requirements.
  • We need to understand how people plan to use this info:
    while a call-out at the batch submission layer (BLAH) could suffice to map Host requirements to batch system submission, policy requirements probably need to be 'queried' out of individual batch system jobs.
  • What support for generic job properties/metadata do we have in the current batch systems matrix ?
  • We strived so far to keep the BLAHP service stateless.
  • Would this require a new component to monitor jobs ?
  • The CREAM development on-going at INFN Padova is fundamentally the coupling of a web service to submit and monitor jobs with BLAH job submission.

Enabling Grids for E-sciencE