NGI/PPDG planning meeting at SLAC Sep-99

SLAC 23 September, 1999
Rough notes by Les Cottrell, SLAC


Introduction

The following attended in person at SLAC: Harvey Newman (Caltech), Stu Loken (LBNL), Dave Millsom (SLAC), Arie Shoshani (LBNL), Davide Salomoni (SLAC), Chip Watson (CBAF), Andy Hanuschevsky (SLAC), Richard Mount (SLAC), Dan Gunter (LBNL), Jason R. Lee (LBNL), Luis Bernardo (LBNL)), Alex Sim (LBNL). In addition there were video attendees from Wisconsin (Miron Livny), ANL (Larry Price + Dave Malon), FNAL (Ruth Pordes), Caltech (Julian Bunn.

Agenda

Reports from the Working groups
  • David Malon
  • Arie Shoshani
  • Dave Millsom
  • Miron Livny

Other reports of work in progress

PITAC review
Presentation to ESCC
NGI PI meeting
APOGEE
PPDG projects
Next steps

Applications working group - David Malon

http://gate.hep.anl.gov/malon/malon_ppdg_23sept99.ppt

Arie Shoshani

Particle Physics Data Grid Request Management working group

Network group - Dave Millsom

They are working on getting the site LAN and WAN connections down to the machine level today plus needs etc. as relevant in short (6 months to a year) and longer term (>= 2 years). To do this they are creating a survey. Examples of a configuration will be given.

SLAC has been working with NTON to resurrect the connection to SLAC via Stanford. Fiber is in place from SLAC to Stanford, and they are identifying fibers across campus to connect to Sprint. NTON is very interested in getting a demo from SLAC for SC99. SLAC hopes to have an OC12 ATM connection in next 3-4 weeks. After SC99 they hope to go to OC48 POS. Caltech also hopes to establish an OC48 POS in early 2000. LBNL is also waiting for a restoration of their link with NTON and the bandwidth and mode is to be decided.

Wisconsin - Miron Livny

Putting into place a new router to connect to the EMERGE network. The campus is putting into place peering with ESnet at Chicago. They are in process of getting delivery of cluster of 64 machines with 2 CPUs each and 500GB disk. They have had discussion with FNAL to share understandings. There are 2 modes to operate in: one is very controlled, the other is more chaotic and changing. They expect to need a hybrid. They plan on more discussions on how to proceed with the collaboration. Wisconsin has run an operational service using SRB or GAS. They ran into some interesting challenges in mapping the naming of legacy applications to the distributed naming/model (for example every object in the system has to have a unique name which has to map to the legacy file naming hierarchy.

Reviews and presentations

PITAC,October 6, 1999

Mary Anne Scott is asking for information. Rick Stevens suggested mapping out the real high bandwidth paths that exist. It would be good if the maps be posted on the PPDG web site. This is politically charged in Congress etc. PITAC is also concerned for minorities (as well as technology related) and outreach. PITAC wants: applications that demonstrate NGI capabilities; long-term research funding seems minimal; Gbit to desktop seems not to be happening; NGI goals (en-to-end bandwidth, latency QoS) are not being systematically measured. Of these we can respond to the first and need to motivate that our applications do stretch the NGI capabilities.

PITAC also wants NGI testbeds, NGI applications, geographic reach (access to rural areas), minority & small college, technology transfer, agency coordination, IT leadership. WE can help with NGI applications and Agency coordination (LHC NSF/DoE collaboration).

PITAC is hoping to hear in October about progress in applications (exciting new applications enabled by the 100x & 1000x sites and other NGI attributes), progress on testbeds (100x reach, 1000x reach, end -to-end measurements for 100x & 1000x sites/applications (latency, bandwidth, utilization ...)

We need some volunteers to send information to Mary Anne. This should include site information. Can be as planned for in 3 months. It is probably Harvey, Chip or Richard. Richard agreed to do this and will provide a couple of PowerPoint slides by Monday 9/27/99. All information to Richard by close of business tomorrow (Friday 9/23/99). Julian will try and provide information on the Caltech 100MByte/sec milestone. In addition we can include the 57MBytes/sec Clipper test.

NGI PI meeting October 4-5

The invitations say all 19 DOE NGI projects are required to send a PI to the meeting. Harvey can't make it. Chip is out of the country at the time. FNAL will look if they can send someone, possibly Vicky White. Miron Livny will also be there. Miron is also involved in  another project. Stu Loken is also planning to go. Ian Foster is prepared to present the PPDG at the meeting which would probably meet the requirements. But we need a physicist who is leading the effort. Richard has reluctantly agreed to go unless someone else can make it. Probably a 20 minute presentation will be needed. There is a fairly prescriptive format for the talk, and an electronic version will be needed during the meeting.  The second day is for break out sessions on middleware, application specific toolkits, testbeds & infrastructure/connectivity issues, relationships to other programs, measurement/monitoring infrastructure (define requirements for a measurement/monitoring infrastructure (which may well be part of the "grid services package") that will enable evaluation of DOE NGI technologies), applications goals & requirements.

ESCC Oct 19-21

We have been invited to make a presentation at JLAB on the 19th October. We need a volunteer. Les is in Georgia/Tbilisi. Chip is not familiar with the networking side of things. Harvey does not fancy travelling to JLAB. Larry will be there, but in his role as ESSC chairman.

APOGEE (& GriPhyN) - Richard Mount

These are designed to put together with the PPDG and there is a presentation at HEPAP in October. NSF/IT2 has been zeroed out of budget, also the SSI  have run into funding problems. HEP has tried to get involved in SSI for data management, infrastructure, QCD, astrophysics etc. Since some of these HEP activities were needed regardless of the progress of SSI, it was decided that they be proposed regardless of whether SSI got funded. The initial PPDG effort was very minimal A Physics Oriented Grid Environment for Experiments (APOGEE) was put forward to Peter Rosen of DoE as a follow on by Harvey Newman, Stu Loken and Richard Mount. It was proposed to merge the management of the PPDG and APOGEE if APOGEE is funded. See http://www.fnal.gov/ssi-henp/data_manage/ for more information (in particular the talks) on APOGEE.

APOGEE is aimed at full-scale grid design, optimization and prototyping. It is a follow on from RD45, Clipper, SLAC/OOFS, NILE/Condor, MONARC, PPDG and ALDAP. There will be a network instrumentation team, a simulation modeling team, a system optimization/evaluation team. There are 17 FTEs in the project including $3.5M/year. Money from the programs (e.g. HEP) will be used, augmented by extra funding from elsewhere such as NGI. GriPhyN is a first production scale "Grid Physics Network" (see http://www.phys.ufl.edu/~avery/mre/ and http://www.phys.ufl.edu/mre/white_paper.html for more information). GriPhyN is more production oriented, APOGEE is more future oriented. GriPhyN will use APOGEE to provide a new level of rigor as the development   proceeds. The time scale is being driven by LIGO which needs to be in production by 2002. The main experiments driving it are LIGO & LHC (CMS/Atlas). There will be 17 tier 1 centers: 5 each in the US for LIGO, CMS, Atlas; 2 at CERN. In addition there will be less well defined university based regional tier 2 centers partnering with tier 1 centers. The tier 0 centers are the data generation sites (e.g. accelerator labs).

The activity will need strong support from HEPAP (27 October), so the presentation will need to make sense and not be viewed as a set of disjoint projects. APOGEE is a long way from being funded. The reaction from Rosen was we should have more presentations in particular to higher ups (e.g. Martha Krebs) in the organization.

PPDG project leader - Richard Mount

There has been email exchange over the job description for the PPDG leader. It is critical to get a real leader quickly to pull everything together in order to continue to make progress, in fact without a real leader to pull it together, at the moment, we are not making the progress we need to keep pace. Now the question is how to find this person, then after finding the person and getting agreement from the collaborators, the person has selected the environment to work in, how do we fund the post (it will consume most of the PPDG funding any one site has). It is hard to move monies around. So Richard proposes that in order to move quickly, the site where the leader sits pays for the person until APOGEE is funded. This pre-supposes that the PPDG will lead to a longer-term project. The reverse is true, if there is not a leader then there will not be a future project. To attract the right person the institution that hires this person has to guarantee a future of at least 2-3 years. This is a financial exposure that may rule out some sites.

There are some practical issues such as how to find a suitable person. Probably they are not looking for a job right now. The question was raised as to how the Climate & Combustion people found someone. They do have a couple of leaders, but the collaborations are more cohesive than the PPDG. We need to be more active in the interim period, e.g. need someone to set up regular (weekly) phone/video meeting to keep things moving. Harvey & Richard are too overloaded to take this role.

CERN & International links - Harvey Newman

The US CERN link recently had the bandwidth increased to 20Mbps. There was a long delay (last mile problem from Ameritech) in connecting to STAR TAP. Now they are peering with Abilene at STAR TAP. There will be an increase in bandwidth (6-34Mbps) between INFN & CERN. There is a call for a market survey to upgrade to T3 (from 20Mbps) for CERN to the US. In preparation for the LHC computing board workshop there is a technology tracking report (continuation of report put out in 1996) that reports predictions for upgrade of transatlantic bandwidth. It will become public soon. The long awaited connection between Napoli to DANTE router in Nwe York at 45Mbps was just installed. This will probably be used for a general purpose Internet connection which will free up the other 43Mbps for science. A large fraction may be available for physics. The UK performance is also looking good at the moment with 2*OC3s.

All this improved performance may may make it much better to try high performance applications internationaly.

We need to measure the packet losses more accurately (e.g. 1 part in 10**4) if we are to use the TCP bandwidth formula (TCP WB < MSS / (RTT * sqrt(loss))).

Next Steps

SLAC is adding more students to the PPDG efforts. SLAC is  also working on getting reconnected to NTON and want to be able to demonstrate use of the link. Getting the link into place will absorb most of Dave Millsom's efforts for the PPDG. SLAC is also working on increasing the measurement accuracy of packet loss measurements for links that perform well, and understanding the impact of load on performance and how well QoS measures (CAR/WFQ) work in the presence of load.

Chip/JLAB will be working on the batch system and is interested in figuring out to interact with Arie's system. They are going to replace the commercial software with Open Source software. Chip has an open position for a Java programmer, and expects the person to be involved in PPDG and APOGEE.

Caltech has hired a young person from Protvino. He officially starts on October 1st.

LBNL has people who worked on HENP type problems as part of a Grand Challenge project. They developed STACS, and can reuse some of this system in the PPDG collaboration. These people are now assigned to work on the 2 NGI projects one of which is PPDG.

Dave Malon of ANL encouraged people to make information available on the Caltech PPDG web site. He also requested that working group leaders who have not met quickly set up voice/phone conference call meetings to bring people together and to make information available publicly (either on the web or the mailing list). ANL has a postdoc who can work fulltime on the PPDG.

Regular (fortnightly) conference calls would be useful to get commitments, clarify who is doing what, set expectations etc. It was agreed to have regular phone conferences 1pm PST Wednesday every other week starting on Wednesday October 6th. Richard will set up a teleconference.

We came up with a list of two people that need to be hired. One is a project coordinator, the other an implemention. It is unclear where they will be located.  Richard will put together the requirements (skills & duties). We need someone to coordinate the agenda. Stu volunteered to do it for the first month, send agenda items to loken@lbl.gov.

Everybody needs to post the job description for the PPDG leader.

PPDG information to be put in the PPDG web page should be sent to Julian at Caltech.

It would be useful to provide a calendar for activities such as meetings, papers, design documents, tests etc. Is there an obvious tool to provide a web based calendar? JLAB has purchased a web based tool and Chip will send information to Stu. This will be an agenda item for the phone conference.


Feedback