Author: Les Cottrell. Created: Jan 30 '02
There were about 10 attendees. There was wireless connectivity on the first day. The meeting was at the Arizona State University Memorial Union.
There was wireless access but the signal strength was too poor to be usable. Some people attended remotely via VRVS. The working group charter was approved.
Original bandwidth estimates in 1998, though at the time considered to be very aggressive, were found to be underestimates in 2001. Big issue was trans Atlantic bandwidth requirements which ICFA was instrumental in recommending improvements. Reviewed current and planned connectivity performance within and to/from regions of the world. A concern is the planned slow growth of ESnet capacity looks like it will not meet requirements. Summarized report to be given to ICFA in February: networking advancing rapidly, big changes coming; Grid projects attracted much funding; TCP/IP 25 years old built for 56Mbps, Ethernet 20 years old built for 10Mbps; increased bandwidth has changed viewpoints; China, India, Pakistan, FSU, S. America, Africa have poor connectivity, need to assistance.
Performance on high latency*bandwidth networks. Looked at slow start and then congestion avoidance, then went over fast recovery, showed tcptrace illustration. Used UDP to find max bandwidth without loss on CERN Caltech link. LInux TCP estimates initial sshthresh from the previous connection. Showed the effect of overestimating the bw*window size. Worked on reducing slow start time by modifying the slow start increment, did not help much, then modified the congestion avoidance increment. Looked at with simulator with 1/10K loss. Need to limit the max cwnd size. Set initial ssthresh to an appropriate vale for delay & bw of link, initial ssthresh has to be larger than delay*bw product but not too large. Looked at QBSS, does it use all bandwidth available, does it back off? Showed QBSS limits itself OK quickly and does not affect other traffic. But QBSS unable to use maximum of 120Mbps bandwidth even if no other traffic. Probably due to small queue size for the QBSS stream. Could use QBSS with UDP to measure unused bandwidth without affecting production traffic. Tried 2 ways in which Cisco implements load balancing (CEF). Found per packet load balancing works well, per-destination does not work well for one pair. But per packet load balancing resulted in 50% packets being received out of order. Reached 192Mbits/s, 99.8% of ACKs are SACKs. Decided not to use load balancing since impacts operational traffic.
A practical distributed authorization system for GARA. Idea is to investigate the signaling required to set up QoS. So far QoS implementation is done by hand on Michigan campus. Security is vital, which is difficult to cross-domain issues. GARA is PKI GSI globus based. Many sites lack a PKI, have installed a Kerberos base. KX509 translates K (v4/5) credentials into short lived (10 hours at Michigan) K credentials or junk keys. Junk keys can be used by browsers for mutual SSL communication or by GSI for Globus authentication. Short term avoids revocation problems, can be used for mobile user support. Not good for signing anything, useful for identity. Cross-domain distributed authorization design allows authorization decisions when requester and resource reside in separate domains. Policy engine applies a set of input attributes to a set of policies. Design goal is to avoid replication, ie. use existing group information. Use shared group names to avoid user/group data repllication in a central database, local groups can manage local databases. Perfomed tests between CERN and UMich machine rooms, moved GARA services onto a subset of UMich GigE backbone in physics department. Demonstration laptop with all services (Linux) next will schedule more tests between CERN and UMich.
Grid monitoring is not all kinds of monitoring, focus on grids, e.g. farms. It means having information services that users can communicate with. Group formed in Oct 2001. Initial focii: Gathering use cases & requirements, evaluate an initial set of sensors to deploy as part of the Virtual Data Toolkit 2.0. Define schema for interfacing sensors to information infrastructures (MDS, GMA, etc.) Deploy an initial set of sensors on 1-3 experiment testbeds, evaluate & update. Implement monitoriing/sensors/archives for 1-3 projects.
19 use cases from ~ 9 groups, fall into 4 categories: health of system, system upgrade evaluation, resource selection and application specific progress. Defined a template for the use cases, includes: description, contact; performance events/sensors required; how will info be used; what access is needed (last value, streaming of data, logs); size of data to be gathered; overhead constraints (for sensor); frequency data will be updated; frequency data will be accessed; how timely does data need to be; scale issues: how many producers will there be, how many consumers, what portion of this will be of interest to a specific query; security requirements; consistency or failure concerns; duration of logging (2 weeks is a good length); platform information.
Form use cases gathered requirements, split by type: network, cpu, storage system, other. Host sensor: cpu load, available memory, disk; network bandwidth & latency; storage system: available free storage; next steps what tools should we deploy?
Contact information http://www.mcs.anl.gov/~jms/pg-monitoring
Harvey raised issue of route changes and characterizing the new route or having a history of the new route.
Challenges: managing storage resiurces in an unreliable environment, heterogeneity (MSS, HPSS, Castor, Enstore, various disks systems and attachments, system attached disks, NAS, parallel ...), optimization issues (avoid extra file transfers, caches, multi-tier storage system organization). They are modifying GridFTP to use HRM in blocking mode.
Areas of the initiative: applications performance tuning ... Applications: work with specific application communities, HEP, human genome; chosen video conferencing and FTP as first applications. Host/OS issues Web100 & host tunings, performance package from computer vendors, provide packages for various vendor OS' to check/validate configuration. Measurement infrastructure: establish common measurement parameters for all portions of the end-to-end path, develop analysis techniques to determine capabilities; make info available to wide range of users. H.323 and FTP beacons (Ohio State is doing H.323 beacons and claim could do FTP) so can do FTP tests from your site to a beacon. Are these the right tools, how to control access, where to deploy. Projects: packet reflector is this useful, location & access control. Packet goes to gateway goes via tunnel to remote reflector, and remote reflector sends back over regular network. Collection of experiences and tools, contribute to the pie, use the pie, is it useful? Internet wants to glue
Discussion for each objective: are there exiting projects, who will lead, how, when, exisiting work
Need to expand/broaden membership. In particular to address areas where we are weak.
Will poll people via email list for technical roles. Also will arrange next meeting via email.
DEploy testing & mon programs link & site instrumentation & standard methodology in association with I2 E2E i so all of HENPs apps are supported.
Provide advice on the configuration of routers, switcehs, PCS & net interfaces, net testing, prob resoln to achieve
Showed how loss in slow start gives very slow (linear) ramp up in throughput. Showed fractal behavior of jitter in message transfers. As increase competing UDP load with TCP then TCP behavior becomes chaotic. Showed how with netlets can remove end-to-end jitter of TCP which should be useful for realtime applications.
Commodity hi-perf dist comp relies on Internet. Will develop inference & analysis tools. Want fast, dynamic inference of available bandwidth, location of bottlenecks and available bandwidth along a path. Internal network measurements not available. Want end-to-end model. Develop lightweight chirp and fatboy path probing. Will be both active and passive. Want to do tomography to infer what is going on in the net cloud. Create a new generation of bandwidth protocols. One question is what is probability new protocols will be deployed.
Two tools pathrate (capacity estimation), pathload (available bandwidth) estimation. Many attempts starting with pathchar. Early ones did not work at current link speeds. Use variable length packet streams, pairs and packet trains. Will develop a better GUI.
Goal is to develop a network aware operating system. Develop/deploy network probe/sensors, develop a network metrics data base, develop transport protocol optimizations, develop a network-tuning daemon. Will develop network tools analysis framework. Auto-tuning gets close to hand tuning. Concerned about overall impact of active probing.
Infrastructure for passive monitoring. Want to look into interior of the network. Want to minimize impact on network. Use fiber splitter. Based on libpcap for packet capture and bro (used for hacker signature capture). Can only monitor own traffic. Want to put on monitoring box close to each router in ESnet. Activation sent by UDP to all monitors along the path. Focus on capture tools not on analysis. Have a prototype setup at LBL and NERSC. Monitor host system installed and maintained by net administrator.
Re-examine protocol-stack issues and interactions in context of cluster & grid computing. Adaptive flow-control to give dynamic right sizing. Not TCP unfriendly. Improved throughput by factor 7-8x at SC2001 from Denver to LANL. Applies to bulk-data transfers where bandwidth delay fluctuates. Have a kernel mod. Will develop a application layer/user space version. Alpha in Linux 2.4 kernel space is available. Have a packet spacing algorithm. Wu does not have an RFC.
Interactive QoS across heterogeneous hardware and software. http://www.cc.gatech.edu/systems/projects/IQEcho
Motivation: no terascale test network, so develop a simulator. SSFnet the portable simulator written in Java. It has a domain modeling language, network elements, protocols. Renesys SSFnet is hared memory, proprietary, not 64 bit clean, has no scheduler. Will add POS, MPLS, NIST doing ATM, Web 100 MIB, JavaNPI and namespace extensions, add hinting to DML, build examples of ESnet and Internet 2.
The Web100 went into a IETF RFC at the last meeting. TCP will need to evolve (maybe via eXperimental TCP implementations (XTP)) e.g. new startup algorithms, and addressing new technologies such as lambda switching. N.b. the Internet only works due to commons sharing concept and fairness. It is a major activity to develop and deploy high quality TCP into standard operating systems, can take years. There can be problems with research not evolving to an operational infrastructure (throw it over the wall concept). Thomas asked how to maintain communication among ongoing projects. Bigger problem is the tie back to the middleware community. Middleware folks want to know what one can do with the monitoring tools. So need to ID deliverables from network research to middleware. Need to continue dialogue between applications and networking. Objective is to advance Science.
Mailing lists will be set up: measurement & analysis focus group; transport protocols focus group; interacting with applications communities focus group.
Need common CP (Certificate Policy) and CPS. Trust management is at the resource end. Certificate is like a DMV license or passport in that it gives a reasonable identification that someone who is who they say they are, does not say what they are entitled to (e.g. whether they can pay for something).
They are looking for production systems, with long term support for software to be put in hands of users, need heavy lifting (may take days/weeks to move data), large heterogeneity in OS, protocols, applications, mass storage systems. Meta data description is a challenge. Error propagation is a problem i.e. how is one told something did not work, and what does one do about it, how to tell the user, how is the error passed up the hierarchy.
I met with Rolf Riedi of Rice to discuss INCITE, how to proceed with automated chirp measurements, and arranging visits for a student to SLAC and Jiri Navratil to Rice. It appears the best time for Jiri to go to Rice will be end of March (after March 25th when Rolf returns from vacation). The student is Hong Kong Chinese so I will work on seeing what is needed to prepare for her visit. She would like to come as soon as possible since this quarter she has a light load. We agreed we need a C/perl analysis program that can be called from an application. This will be led by Rice, since they understand the analysis needed. Rolf does not feel this is very hard. This analysis code will be used to reduce the data so it is easy to report on (e.g in a time series graph) or in comparison with something else. Rolf will assist with coming up with reasonable parameters with which to call chirp with. Typical optimum chirp sizes (# of packets sent in a chirp) is in the range 6-10. We should also saves the results from one chirp run to use as input parameters to the next chirp run. SLAC will keep the raw chirp data for up to a month (about 30MBytes), and make it available (e.g. via FTP or HTTP) for Rice to pick and keep a permanent copy. Some handshaking will be needed so SLAC will know Rice has got the data, and it can be deleted. Rolf encouraged SLAC to make contact with Vinny while he is in the Bay Area (working as an intern for Sprint).
I met with Brian Tierney of LBL and Micah Beck of UTK to arrange visits to SLAC later this month. I had a long discussion with Constantinos Davropolis of U Delaware about pathrate (for capacity measurements) and pathload (for available bandwidth measurements). I resolved some questions on pathrate, and we discussed how it should be used for automated long term measurements. We will get an early beta release of pathload. I had shorter discussions with kc claffy of CAIDA and Matth Mathis of PSC. I worked with Guojun Jin of LBL to make progress in getting him an account at SLAC for assistance with Pipechar testing. Thomas Dunigan of ORNL and I talked about Web100 in particular porting webd to SLAC and setting up an appropriate host at SLAC with GE access to run it on. I had brief separate discussions with Thomas Ndousse, George Seweryniak and Mary Anne Scott all of DoE concerning funding. I talked to Jim Leighton about the needs to get higher bandwidth to Renater. Jim and George Seweryniak are trying to get funding to upgrade the ESnet backbone which is getting close to saturation as more sites get OC12 connections (the backbone at best is currently 2*OC12).
SciDAC wants to put together a monthly newsletter. SciDAC will have a booth at SC2002.