BaBar Collaboration Meeting, Dresden July 96, Trip Report

Les Cottrell, Assistant Director, SLAC Computer Services (SCS)

Stanford Linear Acelerator Center, Stanford University, Stanford, California 94309

July 13-16, 1996

Purpose of Trip:
The trip was arranged to allow me to give two invited plenary talks at the BaBar Collaboration Meeting in Dresden. These talks were on the SCS/BaBar Computing Support Plan, and on Internet Gridlock and WAN Monitoring. It also provided the opportunity for me to meet with a wider cross-section of the BaBar collaborators, and to share information and concerns about BaBar computing.

General Overview of Dresden BaBar Collaboration Meeting

This was an extremely valuable meeting for me (and hopefully SCS and SLAC). It provided an opportunity to interact with many BaBar people and to get a more clear idea of who the major players are, how they interact, what are the many concerns both overt and covert, how they view SLAC, SCS and the BaBar representatives who work with SCS. I believe this will be useful in the ongoing SCS/BaBar discussions, especially in understanding the pressures and flexibility (or lack of) that the BaBar representatives have.

There was considerable interest in the SCS/BaBar support plan, both at the talk I gave, in other sessions, and in the halls. People appeared to be comfortable with the proposals, the main concern appeared to be when will the extra resources become available. This is particularly relevant as BaBar tries to decide how centralized / distributed computing should/will be for the various activities.

I believe the talk I gave on WAN networking was valuable for setting realistic expectations for the WAN. There was much discussion on what might be expected in the future especially as to how this relates to the collaboration environment.

The organizers set up about 16 Xterminals. However, the networking to the U.S. was very poor, typically response times varied between at best 350ms to 1 second more typically, and even worse packet losses from 20 to 40%. One had to be quite masochistic to use the network to read and reply to email at one's home site in the U.S.

BaBar Prompt Reconstruction Requirements

There were energetic discussions and breakout sessions. One very important session, led by Tom Glanzman, was on whether BaBar needs 100% "prompt" reconstruction of the on-line data. Many people were of the opinion that this is a requirement. The people responsible for the on-line farm (Gregory Dubois-Felsman from Caltech and Tom Glanzman of SLAC) are willing to accept this requirement but are unclear whether there are enough MIPS in the on-line farm (the on-line farm is expected to have about 3000 MIPS) to accomplish this without interfering with the critical data filtering requirement of the on-line farm. Further complicating this is the uncertainty at this time of how many events will need filtering / reconstructing (depends on efficiency of early level triggers), the number of Bhabba events, the luminosity etc. Even if there are enough MIPS there are still concerns about interference between trying to do filtering and reconstruction on the same nodes.

If this is accepted then a possible way to do the "prompt" reconstruction would be to utilize the off-line (SCS) farm. This raises many questions including:

The "prompt" reconstruction is needed soon after the data is taken in order to provide: Tom Glanzman suggested quantizing the turnaround requirements. Bob Jacobsen suggested the following, which are roughly based on his recollection of what Aleph achieved last year. Jobs that crash don't count to totals. Bob Jacobsen took it as an action item to poll the LEP experiments for their experience and goals and to propose something specific in a month.

Another requirement is to put aside streams of data for further reconstruction. E.g. mu pairs, Bhabbas, psi to lepton pairs. Such data may simply be labeled in the database, or actually recorded on separate tapes. The reconstruction (complete reconstruction) of this data is reviewed on a daily basis at Aleph in order to see how the experiment is running. Several anecdotes were given as to how such information has been critical to discovering serious flaws (e.g. wrong gas in Aleph chamber).

Another question raised was how much of the prompt reconstruction code would be used in the later off-line reconstruction (i.e. the reconstruction done possibly months later).

Summary:

Action items, Bob Jacobsen: Agreed that "prompt reconstruction" is necessary. Someone needs to look at the expected event size. It has changed since the TDR. First guess from Dave Hitlin is that it may have decreased.

We will need to set up a production organization responsible for getting good data into the can. This group trains the shift takers, ensures there is documentation etc. It does not run shifts per se. Greg Dubois will set up a fast monitoring/coordination group with members/experts from multiple subsystem groups.

Need to look at requirements, see what is reasonable, possible. If have enough capacity at IR2, then do it there. Otherwise will need to look at using SCS. There are many issues to be resolved first in this latter case (see above).

We will write up a BaBar note on these agreements.

WAN Networking in Europe

Neil Geddes said that the national European networks in general were very good, often state of the art. However they are different (technologies and support models) in the UK, Germany, France and Italy. ATM is growing in importance and promises much such as bandwidth on demand, quality of service, virtual networks and better controls. However, the standards are still emerging and there are many options already. Further there appear to be no interest in pan-European solutions from the providers. Further high bandwidth solutions tend to be more expensive in Europe (up to a factor of 10) than in the U.S. The hope is that the deregulation of the European telecommunications market in 1998 will help, but the PTTs are divulging little of their plans for deregulation, so it is hard to predict the effect. Lines to the U.S. are often cheaper than lines between European countries.

There is a proposal for an upgrade to the European academic/research networks (one per country). It is called ten-34 since it connects 10 countries at 34 Mbps. It is planned to occur in 2 phases, each of 15 months. Phase 1 was to start in Feb.-96. It would support IP only, was budgeted for 12M ECUs (which Neil said was about 15% short of the requirements). A major decision/step was supposed to occur in June '96, however, this did not occur. It is now expected that the first phase will not materialize until the end of 1996, and still has to end mid 1997 (15 months after the start in Feb. '96). It is doubtful the money can be spent in this time, and it is unclear whether it can be carried over to phase 2. This plan is to connect CERN into the network. French labs such as IN2P3 are connected to PHYNET which is not the national academic network (RENATIER), similarly for the Italian Labs . I am unclear how this affects French & Italian Labs.

The UK connectivity to the U.S. has improved markedly over the last year as the aggregate bandwidth increased from 4 Mbps (I think) to 19 Mbps. It is believed that this has been a result of much pressure from the HEP and other UK research communities, with comprehensive results from monitoring to back the complaints up.

Other plans include Germany getting 2*34 Mbps to the US "soon". There is also an ESnet pilot proposed by Harvey Newman at Caltech, for a Transatlantic ATM link with costs to be shared between ESnet, CERN, Saclay, and the United Nations. Besides connections to CERN, IN2P3, and the United Nations, there also appear to be connections to DESY and Garching (Fusion site). There is no UK involvement since British Telecom are not part of the Global-1 consortium who are providing the backbone. There is supposed to be a 60 day test this summer.

Remote Computer Centers

Neil Geddes presented the requirements for remote computer centers based on what tasks they wish to address. France (IN2P3) appears to want to maintain the option to do full reconstruction which would require a full copy of the data. They have not decided whether to upgrade their STK silos to Redwood drives yet, and are watching carefully what CERN is doing. CERN is assuming they will replace all their current tertiary storage (mainly DLT with robots, and IBM 3590 (10 GByte carts) and older 3480/3490 (200 MB and 1 GB) based) and are going out for tender for a new "lights out" tertiary storage system based on robots and one of the modern high density/high bandwidth tape technologies. They hope to make a decision by the end of this year. For more details see: http://wwwcn.cern.ch/pdp/vm/tape_project.html.

Miscellaneous Items from BaBar Meeting

David Quarrie reported on his evaluations of Objectivity versus Object Store (Object Oriented Data Base Management Systems) and came down in favor Objectivity. The licensing is by host, and he expects to buy 10 developers licenses, plus 50 user licenses (run time libraries). In order to try and maintain agility to move to another vendor in the lifetime of Babar David wants to insist on staying compliant with standards (ODMG). This strategy is pretty much in line with what CERN is pursuing (according to Jamie Shiers of CERN).

The status of the various compilers, in particular vendor versus Gnu was unclear (to me). It appears that in order to support some products (e.g. Objectivity) it may be necessary to move more to the vendor compilers. CERN is recommending using vendor compilers and are working on providing good deals and interfaces to the vendors. Though there is little pressure for the Gnu supported compilers, they are available through the CERN ASIS service.

BaBar currently supports IBM AIX 3.2 and are moving to IBM AIX 4.x, HPUX 9 and are moving to HPUX 10, Digital Unix 3.2 and moving to Digital Unix 4. There is a problem with AFS under HPUX 9 which is believed to be fixed in HPUX 10. AFS also does not work under AIX 4.2 yet. The difficulty Transarc are already having tracking Unix vendor system releases is very concerning. I doubt things will get better as vendors start to support DFS, and the market for Transarc/AFS declines. Will Transarc have enough resources to continue to track systems releases across the major Unix systems?

According to Alan Silverman of CERN, an LHC committee set up by the CERN director of research has recommended that LHC support only 2 Unix platforms.


Appendix: Itinerary of Trip

July 10Leave Menlo Park
July 11Arrive Dresden
July 13-14BaBar Online Computing Workshop
July 15-16BaBar Collaboration Meeting
July 16Leave Dresden, Arrive CERN, Geneva

Appendix: List of Persons Met During Trip

Since it was a large meeting, I met many BaBar collaborators from around the world who attended. In particular I should single out the following people whom I would not normally have had detailed discussions with.
Gerry Abrams LBL
Neil Geddes RAL
Bob Jacobsen LBL