SLAC home page | IEPM home page
|Traveler||Roger L. A. Cottrell, Assistant Director SLAC Computing Services, SLAC, POB 4349, Stanford University, California 94309|
|Dates of Trip||February 3 - February 15, 2000|
|Purpose of Visit||To attend the Computing In High Energy Physics 2000 (CHEP00) conference, to present 3 talks, to share information on wide area networking and HENP networking with colleagues around the world; to attend the GRID workshop in Padova and learn about and share experiences with European Grid proponents.|
One issue that came up was the use of Objectivity. BaBar has embraced Objectivity, and appears to be having considerable success. However, the learning curve is steep, and for small organizations following a review they have provided root access also. RHIC and FNAL have decided against Objectivity, and are using root. The next generation experiments (Atlas and CMS) are still evaluating their options with a decision due in 2001.
Another big issue was the change from recommending commercial/costly products with vendor support towards using Open Source Software (OSS). OSS is becoming an accepted movement, and has many attractions especially given the problems encountered with non OSS. These problems included vendors not keeping current with OS changes (Objectivity and Red Hat 6), the cost of licensing to cover hundreds or thousands of machines (e.g. LSF), the complexity of the tools etc. Maybe if Objectivity could move to an Open Source model it would help its acceptance in the HENP community.
In other areas Linux is emerging as the default platform of choice for farms, servers, & triggers; commodity PCs are increasingly the platform of choice; object oriented languages are in universal use for large (1M lines of code or more) projects, mainly written in C++ but also an increasing amount of Java, nobody is talking of going back to C or FORTRAN; for networking switched Ethernet has swept the day replacing even down to the trigger levels. Storage Area Networks (SAN) based on the Fiber Channel Standard (FCS) is increasing in popularity for high performance storage access. Other solutions to high speed storage access are Gbit Ethernets, and CERN is looking at something called Gigabyte Storage Networks and a new high speed protocol called Scheduled Transfer (ST).
Two applications: remote control of trigger hardware of experiment - low bandwidth, few transactions high reliability; analysis monitoring, want to download application results from server to client - loss is tolerable, minimum bandwidth guarantee per client, dynamic allocation of the max overall capacity available. For hardware remote control have to classify & mark at edge (use src & dst and port to classify) if traffic <= max && burst < 64kbps then label as expedited, else label best effort. For analysis monitoring use WFQ for scheduling, with traffic differentiation with WRED. If traffic <= min && burst <=16kby then label as AF11; else if traffic <= max then label AF12; else drop packet. Max service / client is 256kbps. Priority Queuing (PQ) has a single queue and absolute priorities. WFQ has a set of queues and weight applied to queues is configurable. Showed experimental events for packet size versus RTT. WFQ shows higher delays compared to PQ. The graph showed that the delay was about equal up to 512bytes. They want to set up a test bed with several sites in Europe and with ESnet & FNAL.
Want to measure resource utilization in a distributed environment - cpu, net throughput, wall clock time of a single job; to locate - system & network bottlenecks. Setup includes one AMS server & one client vs one AMS server & multi clients; network configuration includes LAN (GE, FE, E), WAN with different bandwidth capacity (2Mbps to 8Mbps), QoS/diffServ services on a WAN link 2Mbps.
They use vmstat for cpu usage. the application itself records the elapsed & cpu time.
Results: 1 client 1 server with 1 GE. The max use of cpu occurs at about 60 simultaneous jobs. Time to complete jobs goes up linearly from 30 to 60 jobs. Client cpu 100% used with 5 jobs, server 100% with 50 jobs. For BaBar with 5 jobs in client it is cpu saturated. For server with 30 clients get 60% of cpu, then cpu decreases to 20% with more clients. Similar for LAN bandwidth gets up to 70% for 30 clients then drops to 30%. with more clients. On the WAN with 2 Mbps link, neither the client nor the server cpu reache 100% utilization. Starting with 40 concurrent jobs in one client jobs start crashing.
Summary with GE client saturated with 5 jobs, FE server saturated with 30 jobs, with Ethernet the network is the saturation bottleneck.
The Max GE thruughput is 37Mbps, with 100Mbps get 80Mbps, with 10Mbps Ethernet get 9Mbps. AMS/Objectivity 5.1 is not able to use a multi cpu mcahine. the optimal of 30 connections from remote jobs is too small for a production environment. Performance degrades rapidly as move away from the optimal condition. Future work includes testing Objectivity 5.2 (multi-threaded server) features. They will test a dedicated WAN with 10Mbps bandwidth links. They want to look at host tuning and RTT and QoS impacts. See www.cern.ch/MONARC for more information. BaBar is running with over 200 client machines and is getting 90% utilization. They have 4 servesr with 4 cpus each.
Used Objy/AMS 5.1 & 5.2 on LAN & WAN. The WAN is between KEK & CERN. Objy page size is 8192 bytes, the test object is 40 bytes or 175 objects/page. They write/read 1M objects on LAN (570 pages) and 57 pages WAN.
The graphs shows about 15 (5.1) - 20 % (5.2) utilization for write and 5% (V5.1) and 10% (V5.2). KEK-CERN RTT is 300msec, bandwidth is 2Mbps For LAN write they get about 1MBytes/s and for LAN get 100kBytes/sec. For read get 500Kbytes/sec and ~
50KBytes/sec for WAN. Write performance is better on WAN than FTP (due to window sizes AMS 33580 bytes, ftp 24829 bytes).
There used to be two systems, one based on H.320 & ISDN was circuit oriented, the other was vic/vat TCP/IP packet oriented. H.3232 tries to map over both of the systems. There was a suite developed at LBL, Xerox including session director, vic, rat, ....
ITU standards H.263 for video, the audio is G.711 (required), G.723 shipping in commercial products, VoIP products are classed as H.323 compatible. Rat (v3.0.35) has no echo cancellation, supports full-duplex. Vic s used for video. Other tools include the HEPNRC video bridge, sdr, and white boarding tools. If using rat then need to use hardware taht support echo cancellation (e.g. PolyCom SoundPoint). A problem is the many ways the audio level can be adjusted. Unclear how W2K will work, how compatible it will be etc.
For video drivers are a problem, video capture is unnecessary if PC supports USB devices. For a single user a headset is good (blocks feedback, kinder to office mates). Kipp recommends disable the suppress silence option on PCs. There is currently no bridge between H.323 end points and the packet tools. There's still is life in the ISDN/H.320 codecs. New codes need to support H.320 and H.323 (e.g. PolyCom Viewstation). See www.hepnrc.net/video/video.html for recommendations.
Challenges include: trust, permanent forking, modifiability of code, bazaar vs. cathedral model. Don't publish code the instant it is written. Choose secrecy only if have well developed argument (e.g. security), make the default policy to be cooperation, choose secrecy for modules rather than projects when possible. Understand the long-term costs of forking. Maintainers judgment is critical, e.g. may reject poor patches. Maintain forks as patches, not as modified source. Red Hat uses RPM to enable this, provides source, patches and script to patch & build.
When make changes then expectation is to offer it to the maintainer of the package. Changes are sent as GNU style diffs (diff -u), try to use the same coding style as the original code. Include changes to documentation if applicable. Separate functionality to be in different patches. Have an environment in which changes can be discussed (e.g. a news group). Try and recreate patches against the current development version.
Getting started right: make sure everyone knows who the technical leader of project, understand what non-leaders do, release early and often, clearly separate production and development releases, make sure version numbers are unique. Provide public read access to CVS archives. Express coding standards explicitly (refer to document). Encourage maintainability. Have a n explicit productization process - test build process, test built product, integrate. May be too small to have a productized release, may only have development releases. Lurk on other's projects to laerns from successes and failures, encourage re-use. Consider publishing products, with peer review. Consider open source standards such as automake/autoconf, shared libs etc. Use modular techniques.
Open source needs strong leadership. Maintenance changes form with open source, but still needs to happen. Participants need to respect each other, and have varied skills.
He is from the SaMBa development team. There are 14 members around the world. Will discuss painful lessons learned from collaboration team efforts. Open source does not mean better programmers rather better users (better feedback, help maintain high quality). They use jitterbug to track bug reports etc. They maintain strict revision control and can regenerate any version. They use CVS and ssh (the latter for access to the source code repository in Australia). Only core contributors have code write access. Regular contributors are invited to join team. Feedback is provided to patch contributors. Patch decisions are deferred to team leader (can lead to bruised egos). Takes about 1 year to go from alpha to stable code release. Worry about security since daemon runs as root. They go through security reviews. They avoid/ban certain C language functions that lead to buffer overflow. They promote ego-less programming, finding bugs is a group effort with users, recognize that everybody writes buggy code. Integrating patches is dull work, so core team does this, and outside contributors to do some of the exciting work. They are not afraid to heavily modify patches when integrating, but keep algorithms. Infrastructure changes are best done by the core team, only they have the broad overview. People management is hard, email discussions can suck up hours of time, people get offended. Samba uses the IETF mantra of rough consensus and working code, if contributors disagree then let them both implement and take the best one. People fall into natural roles, they have a QA manager, advanced features developer, printing features, documentation expert etc. Man pages shouls always match the code for a current release. Providing a roadmap so users are aware of the future plans. If the project is successful commercial authors will help develop documentation.
They ship code with all binaries (GPL model). Samba provides a list of commercial support consultants with the software. Code ownership is with the community of users and authors. Samba does not make money, the developers live symbiotically off their employers who want samba to succeed since it helps sell the employer's product (e.g. Red Hat, VA Linux, HP etc.). They have about 1MLOC mainly in C. One of largest open source projects is probably the postgress database, another is Mozilla.
See www.opensource.org and read "The Cathedral and the Bazaar" Eric S. Raymond, ISBN 1-56592-724-9. Is Open Source an emerging business model, a technologically related social movement. Now have commoditization of hardware, huge increases in Internet speed, global familiarity via CNN & travel etc. Realization taht sharing of ideas & listening to others opinions is more valuable than carefully guarding ideas. Many people have similar problems, the value is in the ideas, but fully collaborate you have to share the code. A vision & architecture are more effective than management to create a following. Humility is a great quality. Bazaar style development has 19 principles including: 3 plan to throw one away, you will anyway; 5 when you loose interest your last duty is to hand it off to a competent successor; 6 treat users as co-developers; 7 release ealy & often & lsiten to customers; 11 next best thing to having a good idea is receiving good ideas; 12 often the most striking & innovative solutions come from realizing your concept of the problem was wrong. The role of the project leader requires individual vision & brilliance and then amplify it through the effective construction of voluntary ideas. need self-promotion, accept merciless criticism.
So can HEP benefit from open source - actually already has, the real question is whether to do development this way. Already 2/3 of CERN compute power is Linux based. But why does each Lab write its own management software for tape/disk? Is there a need for event-oriented statistical analysis similar to something that might be used more commercially by someone like WalMart's?
There is a big commitment to deliver and maintain a high quality product. The product must be of sufficient interst to others and general enough, This can often result in the increase in resources in the original development team. It is said that open source is tapping the top 5% in quality developers worldwide Does HEP have such people, and are they allowed to contribute to the community? Could there be an HEP open software foundation. It is possible that the HEP community as a whole has not really absorbed the Internet way of doing things, there is over-emphasis is on face-to-face meetings, tribes come together show their stuff then go back and do own thing.
Manuel concludes HEP can benefit, but need to evaluate on a case by case basis, need project leaders with vision and humility, need continuous training aligned with world, build trust thru e-communication, face investment without clear direct benefit, convince management of virtues of open-source.
They acknowledge a big debt to SLAC/Andy Hanuschevsky for the HPSS/Objectivity interface. The demands over the years have been for HPSS support, > 64 bit addressing (for accessing 10,000 pBytes), Linux support. They are working on a work around for supporting Red Hat 6. They often hire CERN programmers whose contract has expired.
NAG started in 1971, they are a not for profit company (owned by people who work for it, any profit is ploughed back into development etc.). The model is a collaborative development (not quite open source) with experts and customers. The NAG library has a big overlap with the HEP requirements. Feedback from HEP includes MINOS type error calculation & reporting, there are some special CERNLIB functions, better 1D plotting (now part of IRIS Explorer 4.0), porting to Linux (IRIX 4.0 Explorer is now available, being ported to Red Hat 6).
European infrastructure is not highest performance in the world very solid, bears global comparison in all aspects and leads in some. TEN-155 topology (2000) backbone mainly 622mbps, will peer with Abilene & ESnet in NY. Problems with getting connections in NY. Bulgaria, Cyprus, Estonia, Israel, Latvia, Lithuania, Poland, Romania & Slovakia have not connected to TEN-155. Croatia does not show up. User requirements are general connectivity, distributed data &/or computing resource access, remote use/control of experiments. The idea is to have a core which everyone can connect to (European Distributed Access) as opposed to US model. AUP free for research & education.
Inter-regional connectivity can be done by hub, parallel or hybrid model. See http://www.kek.jp/~karita/chep2000-asianet.ppt
Consists of 3 specifications: HiPPI over PH (physical layer), Scheduled Transfer (ST) and something else. Gbyte speeds, multi wire copper/fiber connection. 4 parallel channels, 2 with 64kByte packets, one control channel with small packets. 50meter limitation for copper connections, 10km specified but not available over single mode fiber. ST is new simple protocol. Can encapsulate SCSI over ST.
Need open yet reasonably secure, need leading edge high performance yet reliable. There was considerable interest in the talk and afterwards in the monitoring tools that allows collection of many hours worth of data on Gbps links and then allows replay and filtering.
Started in 1992, used Web/SPIRES interface to FSU ftp site. Had about 200 entries in a year or so. Appointed editors who did a good job at first, but got stale. Now in a new era of "open-source" software, so revisit with new web technology and more distributed management. New entries can be created using a web form, each entry has one or more maintainers who can update information via web. Each entry must be updated or checked yearly. Will be implemented in pure Java. A Java servlet will provide the database access (i.e. interpret users request, validate and get information from database). The view templates will be created by other Java servlets, this is the html that is sent back to the browser. Decided not to use MS Access as SQL database, since either unavailable or buggy. Instead used a JDBC interface. They intend to use XML for Import/Export. The first area they have set up is the Java interest area (it will be available, check back in April) http://java.freehep.org/), they have a CVS repository and an email list.
Biggest problem is passwords in the clear from off-site that are sniffed at another site. Irwin said ssh is inadequate and quoted case of Xterminal users (i.e. user goes from an insecure machine and provides clear password to another machine gets into a trusted domain, but password has been sniffed). Must allow trust relationships, has to be acceptable to the user community. User will have a single sign-on, has to be ready for RUN-II. Primary goal is to prevent network disclosure of passwords, secondary goal is a single signon. Single signon simplifies account m anagement, especially terminations, enforce password policies (dictionary checks, aging etc.). Design is based on Kerberos 5. There will be 4 realms. Strengthened realm K required for all logins, un-trusted realm hosts on or offsite from which direct logins to strenthened realm are not permitted, trusted realm is an outside realm with which cross- authenticate, plus a portal Provides authentication for users who lack Kerberos software or a secure network channel and obtain their initial ticket (hardware tokens, or one time passwords).
They have a pilot project in progress to develop and try out the concepts.
Grids started in late 19080's and early 1990s. Really got going in mid 1990's, in particular with the I'way project, and the middleware started to become available. Services include uniform high level access to a range of resources, security, policy, instrumentation, monitoring, discovery, protocols. A current focus is in data grids. Includes storage, caching, distribution, scheduling (including deciding appropriate site/mirror to get the data from), performance guarantee, security, policies etc. People are building resource brokers to discover/allocate compute power, data access. Another building block is the allocation of network bandwidth. Another building block is how to replicate data and be able to discover the location of replicas and select the optimal one to access. HEP has computing/storage grid needs that are challenging. The scale of data is bigger than most applications so far. The time is ripe for HEP to get involved, both the requirements are there and the middleware is becoming available.
See www.globus.org, www.gridofrum.org, www.egrid.org, www.mkp.com/grids
They use HPSS and STK silos. For Run II the need is for 0.5PB/year and 155,000MIPS/expperiment. They push the use of commodity components. They are using 2 EMASS robot systems (1 for D0, 1 for CDF). they have encountered some reliability problems (Vickie mentioned arm problems). They are moving away from HPSS. D0 will use DESY derived ENSTORE. They also do not use Objectivity, but have opted for greater efficiency at cost of less transparency and flexibility. D0 uses Gbit Ethernet to access data, whereas CDF uses FCS/SCSI. .
Use HPSS and Objectivity. Marriage to Objy/AMS via SLAC written OOFS. They use multiple data federations. One for data taking (contains conditions, configurations), one for analysis, one for Online Prompt Reconstruction. Configurations are swept from one federation to another at daily and weekly intervals. While sweeping which can take several hours the darabases are not accessible. It is planned to reduce this outage time to less than 30 minutes. OPR could not keep up with data (partially Objectivity), data analysis was painfully slow, linking took forever, Sun E10000 has low reliability & throughput (partially AFS). Attacked the Objy problems by setting up a standalone system. Now acceptable. Can use 200 nodes simultaneously. Extensive instrumentation is essential. Current challenge is to de-randomize disk access, partial relief by making real copies of popular collections. KangaRoot to provide Objy free root-I/O based alternative is nearly working (apart from memory leaks). Exporting the data to IN2P3. Delays are about 1 month via DSTs, want to get via network. Caspur will use Kanga, RAL will use Objy. Moving to over 1000 boxes performing tens of functions is a major challenge.
Major problems are scaling up in computing, data volumes, networks. LHC has 4 experiments with 5PB/year each. LHC is much less centrally oriented than previous experiments. Will have hundreds or even thousands of users trying to access data transparently (location and media wise). Models Of Network computing at Regional Centers (MONARC) is a project to study how to do this. Coherence is a big problem i.e. management & coordination of the data and analysis that is valid & correct. Tier 1 centers have 10-20% of CERN capability. There will also be tier2 (e.g. a university) and possibly tier3 (a university group), centers. There are no database systems (object or otherwise) that work today across this level of distribution. Physicists want substantial parts of the functionality formerly planned for turn on. So start MDC in 2000 with 1% of data and ramp up. The tier1 centers are FNAL (CMS), BNL (ATLAS), France, Italy, UK. Discussions also underway in Japan, Finland, Russia, Germany. It is expected that 33% of computing will be at CERN and 66% elsewhere.
The tier 2 centers may be several in countries with a tier 1 center, maybe one in some countries with no tier 1. It is less production oriented than tier 1. May consist of a small Linux farm
Grids are about efficient resource use and increased responsiveness. Requires discovery, prediction (where to find things, speed of access - network, data, compute power available etc.) so can schedule things optimally, needs prioritization. A partnership is developing in the PPDG, GriPhyN, ALDAP collaborations which are cross field.
They have a language for requesting and granting (matching) resources. It is scalable, symmetric, and follows a Classified Advertising principle (i.e. people put in ads, using a language that defines requirements and offerings that was not officially formalized but is well understood). The user is currently responsible for moving the data. They need to extend reporting and monitoring capabilities.
LHC has 1800 physicists at 150 institutions worldwide. Will have tiered computer centers. They will need to impedance match, e.g. Tier0/Tier1~Tier1/Tier2~5-6(~Tier2/Rier3?), and Sum(Tier1)=Tier0, Sum(Tier2)~Tier1. The grid needs to be unified. Grids enable support of tiering, and tiers/grids enable matching with political reality, national Labs can't handle 1000s of users, need to divide & conquer (assert priorities regionally), new IT resources can be added "naturally", funding can be by community (e.g. NSF vs. DoE, Italy vs. France), provides a more flexible organization, extensible in that can add a center in say Mexico for Central America. Typical Linux farm of 128 nodes costs about $1M/year ($200K people cost), allowing regular upgrades, Abilene connection etc. Need to do R&D, will probably base on Globus, also incorporate Condor matchmaker. There is about $90M money (not sure from whom) for proposals with any proposal being limited to $12M.
NetLogger is an instrumentation package (not just for networks, maybe given today's grid hype should be called a gridlogger), and instrumentation is very important for data grids to optimize moving around data etc. Cannot assume problem is network congestion. Even in a research network only 40% is networking, 20% host, 40% application design (50% client, 50% server). Combines application, system and network measurements. It contains 4 components: message format, client library, visualization and monitoring tools. NTP is critical to synchronize clocks (NTP to within msec, and within microseconds if GPS close by). They use ULM format and will add XML. Libraries can allow sending of data to file, memory, host etc. Has open, write, flush, set priority level, close. Libraries support Perl, Python and other language interfaces. Support Solaris, Linux. They wrap vmstat, netstat, ping etc. They assign event IDs so can trace an event through the system (called "lilfeline" plot). Put calls to netlogger before & after disk or network I/O, before & after leaving each distributed components. Use for millisecond resolution. They have a tool for wrapping SNMP router commands, which could be interesting if can look at queue lengths (editor's comment: one can't look at queue lengths via SNMP, however, such information is accessible by logging on the router so would be accessible via an Expect type script for example).
Met with Olivier Martin of CERN to explore how to coordinate high bandwidth network performance measurements between SLAC & CERN. We agreed to explore this to see the effects of parallel streams, increased buffer sizes, and the impact on other concurrent network applications of a high speed transfer.
Had discussions with Paul Jeffries of UK Particle Physics network support about exploring the use of QoS between RAL & SLAC via JANET, Abilene, ESnet. Apparently the support is by means of CAR (Committed Access Rate) applied at the edges and WRED (Weighted Random Early Drop) for prioritization in the core Internet. Also shared information on UK connectivity and use of managed bandwidth. There is a UK/JANET pilot between RAL, Manchester University and Imperial College using 2 Mbps of reserved (ATM/PVC) bandwidth.
Discussed PingER monitoring at the University Of Wisconsin with Miron Livny. It is installed and configured and we will start gathering data in the next week. Miron is also interested in measuring and archiving the routes between the PPDG sites. I will contact John MacAllister to see where he is with traceping. We also discussed how to tie in Condor with Objectivity. Later I discussed this with Andy who says that AMS/oofs provides access to files which is probably what Condor needs.
Networking is still critical for HEP, many applications have to use the net, new developments in the areas of network measurements, there is emergence of an "Operating System" for the network (i.e. to support grids). The grid idea has 4 layers: applications; application toolkits; middleware and the network layer. We are seeing an evolution to higher levels of services (e.g. grids, QoS).
Applications included: monitoring and managing distributed processes in BaBar using CORBA; there was a talk from Japan on the use of Objectivity on the WAN; QoS is being tested for remote control of experiments.
Tools for applications in particular for collaborative work are emerging, The VRVS (Virtual Room Video System) is becoming increasingly popular. Voice over IP has some early work to see how it works, but the scalability issues still need to be addressed. The web based source code navigator looks nice for looking through complex large software systems. FreeHEP has been re-invigorated with the newer web tools. CERN's web server architecture to redirect logical web addresses to different physical addresses looks very promising to address issues such as broken links, simplifying link names etc.
Middleware talks included using CORBA to build applications; Condor is being used to provide high-throughput computing and a "callssified ad." model is being applied to matching resources to needs. There was a report on NASA's information power grid which talked about the problems of deploying grids for supporting distributed computing.
The networks are nowadays built on commodity networks, only counter example was the GSN (successor to the old HiPPI network, see the brief section above for more details). There is a lot of interest is developing tools to measure performance.
Special issues include: security to protect resources and still allow access and requires good systems management as well as strong authentication (see FNAL presentation above); QoS in particular differentiated services in addition to "best effort", and QoS is becoming critical for some tools and applications. These applications needs a reservation mechanism. This requires authentication to ensure you are who you say you are and you have the right to use the service.
The next CHEP will be in Beijing China during the 2nd week of October 2001. It will be sponsored by IHEP/Beijing, the Chinese Academy of Sciences and KEK/Japan. The best guess as to the location of the meeting in 2004 is that it will be in California.
The PPDG does not provide links, they come from ESnet, NTON, MREN etc. Particle physics is a network hungry application. The proposal is a collaboration of ANL, LBNL, BNL, Caltech, FNAL, JLAB, SLAC, SDSC, Wisconsin & includes physicists, computing/network infrastructure support people, and computer scientists. It was funded in FY99 by DoE/NGI money which it was hoped would be continuing. PPDG got $1.2M (60% of what was asked for). For outgoing years DoE/NGI is not being considered, but we are looking & hopeful for future money.
100MB/s 2-5 physicists raw data, 1000MB/s 10-20 physicists (scheduled reconstruction), 2000MB 200 physicists (scheduled production analysis), 4000MB/s ~300 physicists (chaotic individual analysis). Need to access data across WAN making data analysis from hundred of US universities. Will use QoS (not investing much effort in 1st year, but plan to take advantage of the services as they come available), distributed caching, & want robustness. First year deliverables include high speed site to site replication service at 100MB/s, multi-site cached file access service (based on deployment of file cataloging & transparent cache management & data movement middleware), first year will have optimized cached read access in file in the range of 1-10Gbytes from a data set of order PB, we will use middleware components already developed by the proponents. The first year deliverables will probably not be very useful to others.
Will need to play with windows and parallelism in order to get throughput, especially on WAN. Part of the goal is to allow high speed bulk transport to occur while not preventing other work. We are working with ESnet and there is a research test bed. A typical site has several TB of disk space, hundreds of computers in farm. Components existing include Objy/Oofs, SAM, GC cache manager, HPSS, Enstore, SRB, Globus, Condor etc. They need to be integrated which partially means defining and providing APIs.
Project start was in August 1999. There have been demonstrations & test of middleware, we have glass path Caltech to SLAC, working on high speed transfer, SRB is in use by 3 sites (SDSC, Wisc, LBNL). Coordination is very important especially as viewed by the funding agencies, so management is a very important early effort.
Existing achievement has got 57MB/sec SLAC to LBNL. Objy works with hundreds of workstations. Hope for 100MB/s via NTON in California (SLAC, LBNL, Caltech, LLNL). Will use OC12, OC3 over T3 as available, need a bulk transfer service, latency important, co-existence with other users. There are technical challenges but a major challenge is political, e.g. how to get continued sources of funding, how to make the collaborations work (collaboration management).
The resources are data generation (detector), tier 1 center (1TIPS (25KSI95)), tier 2, 3 & 4 centers. Real problem is to provide access to all the data for hundreds of users. Part of the problem is to understand the scope of the problem and to be able to explain and justify it to others.
Grid structure includes HEP applications at top, application toolkits (remote data toolkit, visualization, collaboration, sensors, comp), gridware (protocols, authentication, security, matching requests to resources, resource discovery, pre-fetch query estimation, forward prediction, prioritization, policy, caching, instrumentation, ...), all built on computers, networks etc. This requires a collaboration of HENP people and computer scientists. The OO software/data has to be integrated with the Grid middleware and to be invisible to the users. Politically need inter-facility cooperation at a new level across world regions, agreement on choice and implementation of standard grid components, services, security & authentication, interface the common services logically to match with heterogeneous resources, performance levels and local generational requirements, accounting and exchange of resources.
Solution could be widely available to data problems in others scientific fields and industry by the time of LHC startup (2005).
This is one of the Computer Science components of PPDG. One question is how to build communities into effective organizations. There will probably be many grids and we should plan that way from the start, e.g. how do we access multiple grids from a single machine which also may be a component of a/the grid. The main capabilities of Condor are the management of large collections of distributed owned heterogeneous resources, management of large collections (10K) of jobs, remote execution, remote I/O, how to find out what happened to a job, check-pointing (i.e. how to free computation when resources not available, and restart later - e.g. if user want to get good interaction on their machine, part of allowing good co-existence), matchmaking and system administration (upgrades, new release roll out).
Condor has over 4000 machines managed by Condor today, more than 1200 at UW (most often desktops not locked in computer center), more than 200 at INFN, more than 800 in industry (e.g. Oracle does its daily build on Condor cluster).
How does one get Condor to work on your problem which is multiple (600 jobs) that will in sequence run for a couple of months. First step is to get organized: turn the workstation into a personal Condor, write a script that creates 600 input files for each of calculations. Condor will keep eye on jobs & keep posted of their progress, implement your policy on when jobs run, implement your policy. Step 2 is to extend to other parts of Condor grid, i.e. take advantage of your community friends. First need permission of friends to use their pools of computers (referred to as flocking). Then need to get access (accounts) + certificates to Globus managed Grid resources and submit 599 jobs "to Globus" as glide-in Condor jobs to your personal Condor, when all jobs done remove any pending jobs and may be done in an afternoon. The "To-Globus" glide in job will transform into a Globus job, submit itself to a Globus managed Grid resource, be monitored by your personal computer, once the Globus job is allocated a resource it will use GSIFTP server to fetch Condor agents, start them, & add resources to your personal Condor, validate the resource before it is revoked by the scheduler.
The obstacles to Grids are: ownership distribution (sociology, this varies from community to community, from technolog viewpoint need to provide tools for local administrator to control, shutting down, sociology is probably the most difficult obstacle to address), customer awareness (education, need to set expectations to scientists that can do excellent research on KMart type computers), size & uncertainties (robustness, dealing with thousands of computers, with no single reboot button to get things back into a clean state), technology evolution (portability, for LHC for example how is one able to move with technology (porting, over a period of 10-15 years where there are major changes - will WNT or Linux still be around etc.), physical distribution (technology).
Looking at how to evaluate, integrate a collection of services into a high performance grid. Requirements include lot of critical legacy code.
Project funded by German Ministry to develop a prototype for seamless, intuitive and secure access to computing resources. The users are the German reseach centers and universities with high performance centers. Implementation is by Pallas & Genias, with partners/founders fecit & ESMWWF, plus as affiliates various computer vendors. The centers have different configurations of hardware, procedures etc. It is hard to get things to run at multiple sites. They wanted therefore seamless access to the computing resources with an intuitive interface for batch submission, with the same look and feel independent of the target system and configuration. Have to map abstract UNICORE specs to specific functions, map UNICORE certificates to local ones. There are 3 tiers: user interface for job preparation and management, site security, job control network supervisor.
Will cover some of the details of the middleware services and the decisions made. In particular will review security, info directory, resource discovery... Heterogeneous in terms of group, resources, interface, distance etc. They plan to deploy standard grid services that encapsulate heterogeneity (simple, non-coervice and uniform) providing resource discovery and applications configuration and optimization.
Security requires the definition of uniform authentication and authorization mechanisms that allow cooperating sites to accept credentials while retaining local control. Benefit is there is only one A/A infrastructure needs to be maintained at each site; enables intersite resource sharing and interoperability. This requires A/A standards and certification authority and policies. They have a single sign-on by global credentials (PKI), mapping to local credentials, support delegation, no plaintext passwords, retains local control over policy. Deployed across PACI & NASA sites. GSS-API bindings used by ssh, SecureCRT, gsiftp, Globus, Condor others. GAA (Generic Authorization & Access controls) interface provides hooks for policy.
The Grid information services allow: effective resource use predicated on knowledge of system components - publish structure and state info, dynamic performance info, software info etc., selection and scheduling of resources (find me an X with a Y at time T); gateways to other data sources required. The infrastructure is based on common protocols (e.g. LDAP). Research questions include: unifying metadata representation; how to support a range of access models.
Grid service management issues includes: locating & selecting resources; allocating resources; advanced reservations. The Global Research Allocation Manager (GRAM) provides uniform interface to resource management , integration with security policy; co-location of services coordinated allocation across multiple resources; Resource brokers, e.g. Condor
Grid services also provide access to data with high speed file transfer. Tools for the management of replicas of large data sets.
The current technology focus areas: high end data intensive applications, interfaces to commodity technologies, distance visualization.
The grid forum is an IETF like community to discuss & define Grid infrastructure (http://www.gridforum.org/). Two meetings June 16-18 & a 2nd meeting in Oct. There is now a European Grid Forum. There is a Beta-Grid proposal to NSF to plan & build a national infrastructure for computer systems research dedicated to research, of a scale that permits reasonable experimentation, encourage participation by adventurous application groups. Initial plan for O(20) Linux clusters, few hundred cpus/cluster, 2TB per site.
Today there is a solid technology base for security, resource management and information services. Globus 1.1 is completed with all core services complete, robust & documented, there are substantial deployment activities and application experiments, many tool projects are leveraging this considerable investment.
This is a snapshot of today, but it is moving fast, with a lot of work following CHEP2000. The background is EU-NSF discussions on transatlantic collaboration on IT subjects; there was an EU-US workshop on large scientific databases & archives in US last September; meeting between EU & HEPCCC last September; project proposal initiated & led by HEPCCC; kick-off meeting at CERN on Jan 11,2000; encouraged by EU to submit a proposal by May 10th.
The organization consists of: 2 task forces (see http://nicewww.cern.ch/~les/grid/welcome.html. The task force is coordinated by Les Robertson, proposal task force coordinated by Fabrizio, with participation by institutional representatives. The mission has a technical and political side. Participation: CERN, Hungary, FAE (Spain), IN2P3, INFN, RAL; interested NIKHEF, DESY, GSI & other German institutions. there are links to MONARC, European Research Network; LHC computing management and CERN computing management; potentially Sweden (medical) may join; project outline reviewed at CHEP2000; comments/corrections by tomorrow (Feb 13, 2000) midnight.
Industrial participation: have preliminary contacts with GRIDware, Siemens, Storage TEK. National institutions to propose industrial candidates by mid March. Equipment provider and middleware developer roles.
HEP is pioneering due to immediate needs. Other sciences connections being established with biology, life science, earth observation, and meteorology; they attended some of the workshops.
The project outline focuses on: management of large amounts of data; high throughput computing; automated management of both local computing fabrics and wide area GRID. Other strong foci are middleware development and test bed demonstrations.
The work program includes R&D required on adaptability; scalability & wide-area distribution of resources. Tentative packages include: computing fabric management; mass store management; wide area data management; work-load management; wide area application monitoring; application development adaptation.
the resources national/regional part of GRID funded within countries; high-performance bandwidth across sites to be provided by other institutions (GEANT); EU financial support for development of middleware overall integration and operation of testbeds; support for exchange of staff and dissemination of information (workshops, conferences etc.)
The next actions are to get feedback to the outline; the project technical program to be defined at ITF workshop at CERN on March 7. Consortium and program of work need to be defined by March 25. EU proposals to be finalized by end of April. Process approval/negotiation for the rest of this year. Possible project start early 2001, Foresee project duration till 2003-4. Second phase 2004-6 ??
The main issues are to define a valid & credible technical program of work in a short time. US participation is an issue, as is HEP & other sciences priorities, HEP & industry priorities and work schedule, relations to other similar initiatives in EU and US, project management (many partners, including industry, education, labs, countries, many languages etc.) and politics. Thinking of 20 FTEs (at 70Eu/year/FTE). Given the short time frame for the proposal the US participation will come later.
History: Cashmore panel 1Q99 recognized substantial resources for computing in 2005, expect to base on a GRID model. IN 2Q99 UK government brings forward funding and HEP recognizes opportunity. Will have 4Mbps link to SLAC ($1.2M), big SMP at RAL (E6500 $0.3M) six Linux farms at Birmingham, Bristol Manchester ...
Pre-GRID activities include a BaBar bid to Joint Research Equipment Initiative fund. Got $800K, ($650K funded kit worth $2M). CDF request got 10TB disk, 4*SMP workstations, D0 also putting in bid.
LHC requested prototype Tier 1 center to support all 4 experiments (100 physicists in 4 working groups). 14400 PC 99 equivalents, 296 TB, 2.0PB tape. 10 staff at RAL and 10 at universities. Notify in Autumn 2000.
Want scientists to define need and computer scientists to assist in implementation. Called an e-science grid. There was a meeting in Jan at which scientists from many disciplines attended. PP was well prepared and recommended to go ahead, bringing on other disciplines later (next is bioscience). Paul Jeffries to coordinate PP + others with infrastructure communities (computer science and industry) with a meeting 13 March 2000 which will include bio-science.
Want to define the actions: participate in testbed activities (EU grid), investigate QoS ...
A CLRC GRID team has been formed, within PP squeezed out of existing activities. THere will be people hired eventually hope for 10+10 people. need to commission 4Mbps to Chicago
UK has an active interest in networking for many years, UKERNA uses IEPM PingER. Numbver of intersting network developments and tests, hopefully 50Mbps connection to CERN.
Exciting times in UK, have real money, support in high places for GRID development, particle physics encouraged to blaze trail.
Study & deverlop a general INFN computing infrastyructure based on GRID techniques with Regional Center prototypes. Proposal submit Jan 13, 2000 with 3 year duration, next meeting with INFN management Feb-18. They have Globus tests, Condor on WAN as general purpose computing resource. GRID WG to analyze viable way to proceed. Globus being evaluated at Bologna, CNAF, LNL, Padova & Roma1. Installed in 5 Linux PCs in 3 sites. GSI works. Problems with timeouts accessing MDS data. Looking at other tools. Will test/implement interface to Condor and LSF want a smart browser to implement a smart Global Resource Manager.
Condor is a large INFN project with ~20 sites with Condor team at UWisc. A goal is Condor tuning on WAN. A 2nd Goal is the network is a Condor resource (e.g. distributed dynamic check-pointing that uses minimal network traffic itself). There is a single INFN Condor pool they also have sub-pools. There is central and local management for Condor and a steering committee. Maintenance comes from UWisc. they want application monitoring & management, will need to instrument systems with timing information et., also want resource usage accounting.
|Federico Ruggieri||CNAF/INFN||Collaborating on QoS measurements with INFN|
|David Williams||CERN||WAN performance worldwide|
|Alberto Pace||CERN||Web redirection|
|Yukio Karita||KEK||Japanese networking|
|Paul Jeffries||RAL||UK networking|
|Harvey Newman||Caltech||Grids & throughput|
|Don Petravic||FNAL||Switched networks|
|Miron Livny||U. Wisconsin||Condor & Grids|
|Thursday, February 3||Leave Menlo Park|
|Friday, February 4||Arrive Venice|
|Saturday, February 5||Vacation/recover from trip|
|Sunday, February 6||Train Venice to Padova, attend Conference reception|
|Monday, February 7 - Friday February 11||Attend CHEP00 in Padova|
|Saturday February 12||Attend Grid Workshop in Padova|
|Sunday February 13||Weekend, vacation|
|Monday February 14||Travel to Venice|
|Tuesday February 15||Leave Venice, arrive Menlo Park|