DOE-2000 Retreat Jan 7 thru Jan 9 1998, Reston VA.
Hallway Discussions *
DOE Non-weapons related High Performance Computing*
Support for Tools beyond the Pilot Stage*
Meeting Goals - Mary Anne Scott*
Collaboratory R&D - Stu Loken*
Collaboration Management - Debbie Payne, PNNL*
Collaborative Interoperability Framework - Deb Argawal, LBNL*
Electronic Notebook - Al Geiss, ORNL*
Quality of Service - Van Jacobson, LBNL*
Floor Control - Van Jacobson, LBNL*
Security Infrastructure - Bill Johnson, LBNL*
Shared Virtual Reality - Rick Stevens, ANL*
Diesel Collaboratory - Christine Yang, SNL*
The Materials Microcharacterization Collaboratory (MMC) Pilot Project - Michael Wright*
Overview of ACTS Toolkit Projects*
Introduction - Jim McGraw*
Particles Simulation Toolkit (PST) - John Reynders, LANL*
PETSc/Optim - Barry Smith, ANL*
Aztec/Optimization - Juan Meza*
LLNL Progress and Plans - Steve Ashby*
More Numerics - Jim Demmel, ORNL*
QMR Algorithm - Noel Nachtigal, ORNL*
InDEPS Tools - Mike Koszykowski, SNL*
Global Arrays - Jarek Nieplocha, PNNL*
OO InterOperability - Dennis Gannon, Indiana University*
Cumulvs - Al Geist, ORNL*
Tau - Al Mahony, University of Oregon*
Evaluation/Test/PR - Bill Saphir, LBNL*
Collaboratory Tools (CT) Parallel Working Group (PWG)*
Summary of Collaboratory Framework & Management Parallel WG*
Collaboratory Short Term Parallel WG*
Futures (Long Term Architecture) Parallel WG*
Security - Bill Johnson, LBL*
ACTS WG Summary*
Collaboratory Sessions Summary*
There were about 80 attendees mainly from DOE and the large Labs including PNNL, ORNL, Sandia, ANL, LBNL, LLNL, LANL, ESnet & NERSC. There was nobody from FNAL or CEBAF. I was representing SLAC. There were also attendees from PSC, Indiana U, UIUC and the University of Oregon.
Roughly 50% of the attendees had laptops and used them during the meeting. Stu Loken had a Ricochet modem and was connected to the Internet during much of the time. There was no direct Internet hookup provided by the meeting organizers. I read my email using the data port on the phone in my room and dialing in to SLAC.
For me the most interesting/important subjects were:
Meeting goals and funding (see Meeting Goals - Mary Anne Scott), the ideas and progress on electronic notebooks, Van Jacobson's work on Quality of Service (see Quality of Service - Van Jacobson, LBNL) and floor control (see Floor Control - Van Jacobson, LBNL), the large interest in security (see Security Infrastructure - Bill Johnson, LBNL, and Security - Bill Johnson, LBL) the difficulty of providing support for tools across multiple platforms (see Multi-Platform Support), the need for multiway communications between all the players (toolmakers, infrastructure providers, clients), how does one move from the research phase of developing a tool (such as video conferencing) to packaging and deployment so the tools are usable by non-developers (there is often a big gap between proof of concept and shrink-wrap software, which traditionally has not been well addressed - see Support for Tools beyond the Pilot Stage for more information), and the hallway discussions (see Hallway Discussions).
There was interest in follow on face to face meetings at yearly intervals.
LBL/SLAC Network Research Projects
I talked to Van Jacobson. He was sorry to have missed the meetings between SLAC & LBL on possible network research projects, but in both cases had conflicts with other calls on his time. He is very enthusiastic to collaborate with SLAC & ESnet to experiment with QoS. It looks like the best next step would be to set up a meeting with Van at SLAC. He lives in Woodside and spends Tuesdays and Thursday's at home, so these days would be good for a visit. I will try and set up a meeting with Van, Mount, Millsom and Wendling in the near future.
I talked to Bill Johnston of LBL and updated him with the information that SLAC (as of Jan 6th) has a working NTON OC12 (622 Mbps) link to LBL. He has also heard from Bill Lennon about the problem with Pac Bell (who provide the NTON fiber to SLAC) pulling out of NTON. According to Bill, the LBL NTON fiber is from Pac Bell. They do have fibers from Sprint, but he said it would be a major effort for them to switch. Bill Johnston agrees that we (SLAC & LBL) need to get the demonstration application working ASAP while we still have the link in place. Bill said we (SLAC) need to work with Brian Tierney at LBL to get the client working at SLAC. Bill also pointed out that his group had been split up and he is now more involved with the security aspects and Brian Tierney is leading the high-speed network applications work.
SALC/DOE IEPM Project
I asked George Seweryniak of DOE about getting ongoing funding for another year for the Internet End-to-end Performance Measurement (IEPM) project. He said that the budget request for FY99 has already gone to Dan Hitchcock. He also pointed out that DOE will be funding the Pittsburgh Super computing Center (PSC) next year to the tune of $10M for which no new money has been made available, so FY99 budget will be lean. Given this any proposal we make will need to be very strong. George is very impressed with the progress so far and glad we have made it available on the Web. He also encouraged me to make a 30-minute presentation at the ESSC to be held in Washington later this month. I sent email to the ESCC chair to investigate getting onto the agenda.
DOE Non-weapons related High Performance Computing
The new under-secretary of energy (Ernie Moniz, ex MIT) believes that the DOE non-weapon's Labs need to take an approach to supercomputing parallel to the $500M/yr Accelerated Strategic Computing Initiative (ASCI) at the weapons Labs. Moniz asked for a report early this year on the Labs computing & simulation needs and how to meet them. As a result of this, there are a series of about 7 workshops in DOE each to address a particular area where high performance computing is an issue. One, organized by HEP, is to look at QCD. Of more interest to BaBar are one on Data Intensive Computing and Visualization (28/29 Jan in Washington), and a second on Modeling and Simulation (organized by Fusion Energy and to be held at Princeton). The person who is supposed to be ensuring the right representatives from HEP are at the above workshops is Dennis Kovar of the DOE Nuclear Physics office. Stu Loken will be sending out email with more details.
Support for Tools beyond the Pilot Stage
To help bridge the gap between research pilots showing proof of concept and usable production tools, NERSC is setting up a group to provide consulting and support for some of the ACTS tools (mainly involved in numerical analysis and the surrounding infrastructure). In addition Chuck MacParland of LBL is spending most of his time looking into how and what to provide in the way of support for the collaboratory tools. These efforts will be funded as part of the DOE-2000 effort.
Meeting Goals - Mary Anne Scott
Two efforts are in progress, one for Advanced Computation Testing and Simulation (ACTS) in particular for stockpile stewardship, the second for Collaboratories (collaboratory technology research and development). Want to change the way DOE accomplishes its mission, and improve the science done by DOE.
In FY97 DOE spent $8.5M, FY98 and FY99 have budgeted $11M for DOE-2000.
Collaboratory R&D - Stu Loken
Workplan developed in a series of workshops. Have funded 7 projects, on average with 2 people each, and each involving several Labs. Includes VR, software infrastructure. Have also brought in potential users of the tools, there were 17 proposals to use the tools. Two pilots were selected: diesel combustion, and materials.
Collaboration Management - Debbie Payne, PNNL
This is a joint PNNL, ANL project. It is developing a manager for real time tools such as experiment control and video conferencing. Have applied to NMR research between LBNL & PNNL. They are using Habanero from NCSA. This is a Java 1.02 application. I believe it provides scheduling, white board, shared screens etc. PNNL have added in support for Mbone, CuSeeMe and shared screens. They also have white board, multi white boards, chat, shared text editor, shared graph viewer, voting tool and a molecule modeler. Currently it runs on Windows 95. Future work includes support for Java 1.1, multi platform support including NT, Macs and AIX, adding SSL security, providing floor control, providing an Mbone/CuSeeMe bridge, LDAP, CORBA, Web tour (allows sharing of a Web session across multiple screens). They have 140 people registered for interest in the tools.
There is also something called ViNE (a Virtual Notebook Environment) which is an electronic notebook with a Web interface. I am unclear whether or how it is related to the PNNL effort. More information can be found at http://www.csi.uoregon.edu/nacse/vine/
Collaborative Interoperability Framework - Deb Argawal, LBNL
This involves LBNL, ANL, SNL, and PNNL. See http://www.mics.anl.gov/cif. The objectives are to facilitate development & interoperability of the collaboratory environment by providing convenient access to unicast & multicast messaging, common communications API, CORBA, and appropriate security. Want to make it easy for non-computer scientists, provide a consistent API, support multicast, CORBA, provide plug & play etc. Starting to use the tools for electronic notebooks in the diesel collaboration. They have Web available documentation. They will have a Java version that should work on NT.
Electronic Notebook - Al Geiss, ORNL
See http://www.epm.ornl.gov/enote/ for more information.
Includes people from ORNL, PNNL, and LBNL. Persistent information includes Email, news, papers, mail, electronic notebooks. For remote access to an experiment one needs a shared notebook which is always available for input or reading, can contain rich media types, can take input directly from other computers, can share information across multi notebooks, allows querying/searching, hyperlinks to other sources. The goals are to develop: a common open notebook architecture that is extensible, interoperable & customizable; and to develop prototype implementations. They have a client/server Web based architecture, with a notebook object defined, an editor API, support multiple media types, and provides a data storage interface. It is in use by the DOE pilots plus many groups around the world (over 70 groups from DOE Labs, universities, industry (Intel & Hughes were mentioned), and other organizations) who are providing feedback. It is Java & Perl5 object based. Both private & shared electronic notebooks have been deployed. They have explored reliable multicast and data base management tools. They have also integrated several input tools including graphics, images.
Quality of Service - Van Jacobson, LBNL
Want to offer premium service equivalent to a leased line. Can be setup in microseconds. The goal is to provide as a standard ESnet service on the normal ESnet infrastructure (routers etc.) There was s demo (non-production) at SC97. The next step is to install a service between ANL & LBNL. Van emphasizes the goal of making it is scalable, so they are avoiding basing on calls which would require every router on the path to have state information for each call. Instead they identify the service at the boundaries. They use the IP precedence field, which is set in the leaf router. Then the router at the site boundary queues the packets and prioritizes the more important packets. For SC97 they used an ATM SVC using CBR (Committed Bandwidth Reservation of 10Mbps) to do the policing.
They found that:
The next steps are to work with ANL networking & applications people to get the test net deployed inside ANL, and deploy an ESnet link between the sites.
Floor Control - Van Jacobson, LBNL
Van has written an agent to allow a floor controller to mute & unmute mikes. The first version had an appalling GUI. The second version was more based on local audio activity (e.g. you unmeet your mike which then asks the moderator to allow you to talk, when you are done you mute your mike). This requires a talk detection algorithm which unambiguously recognizes speech (as opposed to most algorithms which are more oriented to not clipping speech), so the talker has to say something specific which will not be confused with background noise (e.g. "break break") to request to talk.
Security Infrastructure - Bill Johnson, LBNL
Model provides for controlling access to resources via restrictions imposed by several types of conditions defined by stakeholders. This is implemented by means of access groups that are defined implicitly by requiring a set of attributes. At the lower levels they are using SSL on an Apache Web server. Sitting in between is a policy engine that gathers the use conditions from the stakeholders and compares them to the attributes. The stakeholder conditions are kept by means of digitally signed certificates that are network accessible. Note that a given user may assume different identities (e.g. could be a UC employee, could be a collaborator on a particular experiment, could be a staff member of LBNL, could be a DOE contractor employee etc.).
Shared Virtual Reality - Rick Stevens, ANL
They are investigating the integration of collaboration technology with immersive virtual environments extending to tele-immersion (i.e. over the network). They are exploring the notion of persistence in shared collaborative spaces. They have a "Main Worlds" prototype with avatars etc., a "Voyager" media server, and extending CAVE VR tools to the desktop using the Mbone tools. They are looking for innovative applications efforts that want to try VR & tele-immersion with groups that want to be become part of a DOE tele-immersion testbed. They showed several applications, but it was instructive that the most memorable was a 3D game of Pong.
Diesel Collaboratory - Christine Yang, SNL
Includes LANL, LBNL, LLNL, SNL, Cummins, Caterpillar, Detroit Diesel and U Illinois. The proposed capabilities are for video conferencing one-on-one collaborations & group meetings using desktop workstations; archiving of experimental data and use of electronic notebooks. They wanted to build on DOE & other tools already available (including ESnet), and provide appropriate security.
The Materials Microcharacterization Collaboratory (MMC) Pilot Project - Michael Wright
Includes ANL, ORNL, LBL, NIST, UIUC, plus industry (Microscope vendors and Sun).
Goals are to extend, improve the electronic lab environment. They setup videoconferencing, get facilities online, determine what is available, do demos, look at security, routinely use e notebooks, disseminate MMC technology. User base is 45% Mac, 4% PC, 10% Unix. Use CuSeeMe but with phone for voice. Live Web video works well (streaming JPEG). They have beamline support (LabView) with a Web interface to look at and control things. They support remote manipulation of TV camera to get a sense of presence in the control room. The browser uses Java applets.
Overview of ACTS Toolkit Projects
Introduction - Jim McGraw
Charter is to make existing tools interoperate between themselves and across platforms. There have been two rounds of funding. The work falls into: visualization and analysis tools; application support tools; numerics; code development (portable performance, multi language etc.); code execution tools (e.g. code analysis, computational steering).
Particles Simulation Toolkit (PST) - John Reynders, LANL
PETSc/Optim - Barry Smith, ANL
Looking at how to provide a component kit with APIs and implementations, in particular driven by the need to provide a cross ANL framework.
Aztec/Optimization - Juan Meza
LLNL Progress and Plans - Steve Ashby
A lot of the work is on something called pre-conditioners and integrating them into packages used in the ASCI and ACTS fields (e.g. PETSc). Many of the problems are sociological, need to get buyin and shared ownership and get past the NIH (Not Invented Here) syndrome.
More Numerics - Jim Demmel, ORNL
QMR Algorithm - Noel Nachtigal, ORNL
This project is to interface the Quasi Modified Residual technique for solving linear Hermitian problems into PETSc and AZTEC.
InDEPS Tools - Mike Koszykowski, SNL
InDEPS is a component architecture. There is a lot of psychology. It is important for the scientist to retain ownership (they understand the code, how it is implemented, its range of application, and will need to maintain it), can't force your language on the user, make the components stand-alone they do not have to be encapsulated. InDEPS is object oriented and uses Java Beans.
Global Arrays - Jarek Nieplocha, PNNL
They are working on extending the ACTS toolkit with global arrays. Global arrays is a portable shared memory environment. Modeled on NUMA (not for interprocessor communication, rather for moving data between non-homogeneous architectures).
OO InterOperability - Dennis Gannon, Indiana University
Goal is to allow scientific applications built with the ACTS toolkit to interoperate with conventional commercial applications. It is built on a component architecture as a system for building applications from reusable distributed objects. The metaphor is building software integrated circuits.
Cumulvs - Al Geist, ORNL
Provides an environment for easy integration of interactive visualization of collaborative computational steering and fault tolerance. It supports multiple viewers, integrates with existing Viz/VR interfaces, it provides for dynamic attachment & detachment and works with MPI & PVM applications. Cumulvs coordinates the consistent collection and dissemination of information to-from parallel tasks to multiple viewers.
Tau - Al Mahony, University of Oregon
TAU (Tuning and Analysis Utilities) provides comprehensive dynamic performance analysis capabilities through SciTL software layers. It profiles threads, nodes etc.
Evaluation/Test/PR - Bill Saphir, LBNL
Their goal is to improve the usability and accessibility of the ACTS tool kit, and take it to a larger audience. So NERSC will add technical support, quality assurance & marketing to existing research & development efforts. They want to increase the ACTS use, find new ER users, make the tools more effective, create ACTS information & resource center, make the ACTS tools available on the NERSC platforms, and improve the quality of the tools, provide a buffer between users & developers (if desired).
The resource center will include "Consumer Reports", will be user (as opposed to developer) oriented (e.g. what tools should I use, which are ready for prime time), documentation, tutorials FAQ's etc., "one-stop shopping".
There will be tool consulting and support. They will interact (buffer) between users and developers in a flexible way (some developers have their own support infrastructure in place already and so may not need this).
Collaboratory Tools (CT) Parallel Working Group (PWG)
Goals of this PWG:
To recall, there are two pilots Diesel & Materials and there are 7 areas: electronic notebooks; collaboration management; video conferencing (floor control, QoS); collaboratory virtual reality; collaboratory interoperability framework (CIF); security architecture. The latter 2 underlie the first areas. CIF includes video, audio
The integration areas include security, communications, directory services, data objects, plug-ins (real time & notebook tools).
Sessions proposed for the afternoon
It was decided to split the afternoon into two 2 hour sessions, the first devoted to short term issues, the second to futures (e.g. DOE2005). Each of these sessions will consist of 2 parallel sessions. The two short term sessions proposed were: requirements & priorities; architecture/APIs.
Summary of Collaboratory Framework & Management Parallel WG
They want to define what needs to be held in the directory services (e.g. LDAP), to allow a collaboration to function. Such objects might include what microscopes are available and what tools are needed to access them.
They recognize the need to provide an architectural definition (e.g. APIs) to others so they can develop tools using the framework. Types of things that need defining included the security API, & how to do checkpointing.
Collaboratory Short Term Parallel WG
Issues: making things known, e.g. notification of new releases, how does one know about and get on the email list to get these notifications.
The following working groups or areas of interest were identified as being needed:
We had a vote on which platforms the tools should be made available for. Each person had 2 votes. The results were as follows:
Electronics notebook requirements
Futures (Long Term Architecture) Parallel WG
They started out defining the chapters need in the architecture report.
Then they looked at the fundamental functional requirements for the (any) system:
Then they considered what makes the system a Collaboratory system:
They having only a single working group which has multiple foci of interest. This was rather than have multiple working groups since there is a lot of cross-area interest (e.g. sharable entities require sharing techniques). So they expect to meet in person at regular intervals and break up at the meeting to address the various issues.
Security - Bill Johnson, LBL
Starting point was that they (LBL DOE2000 security project) would implement the access control on a Web server and not modify the server. So far they have done it (replace the access control module in the Apache server - this requires "only" a recompilation & link of the server) for Apache, and expect it to be easily doable for Netscape servers since Netscape has similar access control hooks. Tools are becoming available to implement the Unix run time environment for NT. This is blessed by Microsoft (they want to remove the user needs for Unix) and there are various implementations (e.g. OpenNT). Given this, it was asserted that porting Apache to WinNT is as easy or easier than porting to a new flavor of Unix. Though it would be nice to provide the access control for the Netscape server, it is not available yet from LBL and is not one of their top priorities for the first implementation. LBL is working with the diesel collaboration and they are using an Apache server on some Unix variant. LBL expects to have a prototype version running for a digital library in about a month. The user uses a Web browser (MSIE, Netscape etc.) with SSL support and the signing is done from an applet.
ACTS WG Summary
There are about 6 projects being funded under the ACTS DOE numerics project. There is a linear solver interface-working group. The issues they are grappling with are: using packages written in different languages; don't change each other's sources - work with interfaces; insuring general interoperability; the data layout; hardware heterogeneity and architecture (CLUMPS, distributed memory, shared memory, serial); direct vs. iterative methods; need for documentation standards; decide what are the motivating applications. They will produce position papers from the projects covering: applications; optimization package's view of linear algebra; work by other forums; etc.
The software WG want to ensure continuity (persistence of software), encourage open development, enable software cooperation, bring HPC to a wider audience, define a component architecture for ACT, a problem solving environment similar to something like Maple. They want the entry fee to be low (i.e. manage the learning phase, can use a component without requiring the full package), each effort has a customer, make the application of components work across multi disciplines.
Collaboratory Sessions Summary
There were 3 working groups: pilots, architecture and tools.
There was a joint session to define the next groups and to discuss the management structure.
Parallel sessions were set up to address short-term requirements and long term architecture.
The pilots group has several gripes including difficulty of selecting/setting up/using video conferencing, the number of different tools for electronic notebooks (there is no obvious winner, MMC & Diesel collaboratories use different tools), the security requirements which are critical for proprietary data and are based on trusted users.
The tools group is grappling with how to integrate the tools, the implications for security when you integrate different security in tools, data integration and standards, reservations for instruments & facilities and the use of certificates to accomplish this.
The framework group is looking on how to bring collaboration management into the framework (need for standards process, documentation, an IETF like model), how are the tools integrated into the environment and what is the overall long-term architecture that it fits in.
The requirements are: electronic notebooks, a web-accessible file system, security, and Web-servers.
The short-term issues are e-notebooks (as a paradigm for pointing to data and files), global data archives, real-time (video, shared screens, whiteboard, session director), security, remote execution of applications, and resource management (e.g. scheduling data, instruments, compute cycles).
The following working groups were proposed: real-time synchronous activities, e-notebooks, security, distribution & support and network services (e.g. directory services). But these are just the projects so rather than separate WGs what is needed is better communication.
There was a review of the security architecture including the requirements (Netscape based and don't modify commercial software). The first implementation will be for the Diesel collaboratory (next week) with a production prototype in about a month.
The long-term architecture group came up with a model for describing the functional requirements (see above).
The discussions were very useful, more discussion is needed, people are eager to continue, there is a need to communicate more frequently internally and with potential new users and including the ACTS projects.
A big issue was the multi-platform issue. Broadly speaking the developers want to use Unix, the users want to use today's common windows (95 & NT) desktops, and many of the instruments are VAX based. It is actually more complex in that the participants are from universities, labs, industries, have multiple funding sources and accounting mechanisms, different computer support infrastructures, different levels of immersion/interest in computing, have no common guidelines for platform directions etc. Supporting the multiple platforms is very costly and if attempted typically results in reduced capability for any given platform, and much longer, more costly, more complex development cycles. There was a sentiment for more aggressive support for moving the requirements to target a single platform (at least at the client end) and encouraging people up-front to energetically migrate to the preferred platform. For a given centrally managed site (maybe even across the DOE Labs) this may be possible (e.g. by advertising and providing "free" premium support for preferred platforms/services while charging back and providing low priority for others), however doing this across a widely distributed collaboration involving universities, possibly high schools etc., is a challenge.