2003 DoE SciDAC PI Meeting, March 22-24, Charleston, SC

Rough Notes by Les Cottrell

http://www.csm.ornl.gov/workshops/DOE_SciDAC/

Introduction - Alan Laub, SciDAC Director

This meeting was called by the DoE Office of Science (SC) to bring together and discuss the status and futures for the Scientific Advancement through Advanced Computing (SciDAC) initiative. There were about 140 attendees. SciDAC has been very successful and has a lot of cachee in Washington. The bringing together of teams of scientists (with their applications) with mathematicians/computer/computational people (with the tools/techniques) has been a very successful idea. The involvement of the Labs is critical to provide the multi-discipline applications (typically universities tend to very compartmented based on departments).

The Grid: Essential Infrastructure for DoE Science - Foster/Kesselman

Science today is a team sport (teams of people, institutions, resources). This is especially true for the DoE/SC program. Needed to dynamically address multidisciplinary, multi-institute (organizations have own, sometimes unique  policies, management), international, complex, resource intensive activities. Grid tries to take distributed resources and integrate them to enable sharing to enable collaborations and get science to get done. Use general purpose protocols and infrastructure to achieve better-than-best-effort service. Grid is an analogy to power grid, i.e. each home does not have its own power source but taps into a power grid in an easy/transparent fashion. DoE science needs a DoE grid (cannot just borrow neighbors Grid); DoE has unique expertise (large fraction of extant human capital resides in DoE Labs).

Discussions

Constantinos Dovrolis - GATech: we discussed a possible NSF proposal for a Middleware initiative. The original idea was to have 3 thrusts:

With the mindshare enjoyed by GridFTP, we have to recognize its importance, so the idea is to use bbcp to quickly develop/try/evaluate ideas and then recommend integration of the most appropriate techniques in GridFTP/bbftp etc. At the same time, GridFTP is not heavily used in many areas of HENP. This is partly due to the community getting started with other tools such as bbftp and bbcp, early difficulties with deployment (requires Globus toolkit to be pre-installed at both ends), the security requirements (certificates) and instability concerns. Today, there is also a large community of potential users who will not install Globus and need high-performance file copy capabilities (Electronic Arts and ??? both of whom use bbcp today, and the Loci project). Bbcp has most, if not all, the capability of GridFTP or bbftp, such as secure transfers, optional compression, parallel flows, large windows, third party copies, incremental throughput reporting, restart after failure. In addition it is peer to peer (as opposed to client server), it can MD5 checksum the file (becoming more important for integrity as file sizes increase, and with certain types of possible network device bugs), the user can limit the throughput/bandwidth, it is QoS ready, supports time limited copying (i.e. kill the copy if it does not complete in time), and can perform memory to memory, disk to memory, memory to disk or disk to disk copies..

We could do with something at a bit higher level than file copy which could be regarded as infrastructure rather than middleware. Some examples of classic  middleware might be: portals, data federations (PPDG?), Storage Resource Manager (Ari Shoshani of LBNL), data replication, resource brokers (e.g. Reagan Moore of SDSC). We will work with the PPDG and HENP experiments such as BaBar to understand needs,  work on early deployment for real production needs (e.g. between BaBar tier A sites (SLAC, IN2P3, Padova, FZK, RAL), NIKHEF and CERN).

For deployment in production environments it is critical to understand one's audience. For example in BaBar, the experiment site's (SLAC) data mover servers are exclusively Solaris which today excludes today's advanced TCP stacks that only run on Linux. Even assuming Linux is acceptable, the frequent (and often sudden) updates to the kernel required to meet security demands, together with the lack of integration of the advanced TCP stacks with the distributed kernel, put demands on the advanced  TCP implementers to update their implementations for the latest Linux version. Often this results in a lag that can be unacceptable in a secure production environment. Thus, today, there is a critical need for a non kernel-space (i.e. user-space) transport implementations such as UDT,  and RBUDP. 

Constantinos will get the latest version of SOBAS to SLAC by early next week, Constantinos will approach Warren Matthews (also at GATech) to ascertain his interest. We may need to get a letter of support from Bill Allcock of ANL/GridFTP, since we want to be realistic about deploying applicable mechanisms in GridFTP. Les will write up the notes of our discussions. Constantinos will start to put together a draft proposal. We will probably need weekly phone meetings to get/keep things moving. The proposal is due in May, but we should plan to complete by mid-April given travel schedules etc.

We need to recognize the need to go from say student developed code that shows proof of principal to a robust, documented, deployable toolkit. Need end-to-end robustness.

Network Research Projects - Micah Beck

Cover all the network research projects of SCiDAC. Goals develop tools to support data-intensive SciDAC applications, includes development of advanced network tools to enable SciDAC apps to efficiently measure predict and diagnose E2E perfoamnce, develop & deploy cyber security tools to support group collaborations. Phase II is to deploy tools in production infrastructures to support data intensive SciDAC apps.

Micah's project includes NWS, Logistical networking IBP, logistical runtime system. IBP overlay intermediate node providing storage resources and transfer with persistent sockets, optional authentication, usage logging. Includes SABUL, TCP & compression. Data transfer rates of 1-400Mbps. Could have impact on DUSN (DoE UltraScienceNet).

INCITE for E2E monitoring/measurement, several approaches to understand middle by looking from edges. Three thrusts: mutiscale, multi-fractal and wavelest; e2e path probing and modeling topography/topology discover, advanced hi-speed protocols; data collection & monitoring. E2E bandwidth estimation by pathchirp (low intrusiveness). TCP-LP takes idle bandwidth (i.e. QoS without router involvement)

Pathload/pathrate developed by CAIDA/GATech. CAIDA evaluating tools, bandwidth estimation is not a solved problem.

Security & policy for group collaboration to address need for a scalable, fine-grain policy management for large dynamic collaborations. Have to model organizations and learn how each one does its policy/security and create tools to enable integration (e.g. some sites use Kerberos, others X.509 etc.)

Other discussions

Nagi Rao of ORNL has a UDP based transport called RUNAT that uses a stochastic approach to control theory. He will get us a copy to evaluate.

Thomas Ndousse does not have much money for the base program networking proposals. He needs proposals that are shown to enable science (e.g. scientists will speak up on behalf of the proposal). It would help existing proposals to be able to indicate some success in this. This would need to happen before the decision son the exiting proposals are made (in June).

Rich. Baraniuk (Rice) says their new PathChirp will enable finding out where the bottleneck is. This would be useful following identification of a problem to assist in diagnosing the problem. He acknowledges Jiri's email describing the needs for an easy to use PathChirp and hopes to have it ready for us in a month.

Micah Beck of UTK leads the Logistical Computing and Internetworking project (see http://loci.cs.utk.edu/). Basically it has storage depots at various sites with high speed network access. The idea is that an application can be modified so instead of writing to a disk it writes the file(s) or parts of  a file to one or more depots. This is referred to as uploading. As part of this metadata describing the capabilities of the file (where are the parts, the handle to get at it) is created as an xnode in XML.  The xnode can be put into a directory or emailed or whatever (it is just an XML file). The directory can be local or at a managed site. Xnodes have a lease associated with them that is set by the creator, but can be extended by the receiver. This simple mechanism enables breaking a file into parts that may be all at one site, multiple sites, and/or replicated. This break-up in turn enables parallel transfers from multiple sites with recovery if a site goes down (assuming there is an alternate site with a copy of the missing part). The xnode information can  be private or made public. The user does not need to have an account to access the data once they have the xnode information (or of course if the file is public) since it contains the necessary private information to access the file. With public access it can serve a similar purpose to anonymous FTP. Currently for transport/copying they can select TCP or RBUDP. They also support dynamic auto compression (i.e. decide during transmission whether the achievable bandwidth/cpu power makes compression a good strategy for the moment). They have Java clients and one can download the code (go to http://promise.singrg.cs.utk.edu/lodn/). We talked about our bbcp proposal/SOBAS proposal and as part of it aiminmg at providing an alternative transport layer for LOCI based on bbcp. Micah is very supportive of this since it does not need Globus etc. Micah is submitting a LOCI proposal to the NFS Network Middleware Initiative. We agreed to write supporting letters for each other.