JLab 4/13/04 - 4/15/04
Rough Notes by Les Cottrell
Starting from HEP (block transfers), also eVLBI (real time data streams), Fusion Energy (time critical burst-data distribution), bio-informatics (GB on demand). "Collaboration on this scale would never have been attempted unless we had good networks", Larry Price. Bandwidth usage increased by 80-100% per year, faster than Moore's law. Current generation of 2-10Gbps backbones and major international links used in last 2 years. Rapid advances in developed regions may drive the Digital Divide wider. CERN transatlantic link growth 1 million between 1983 and 2004. Consistent with factor of 1000 growth/decade. Climate has similar needs to HENP (climate even greater) with 1-several Tbits/s by 2008. "hybrid network services offering both circuits and packet switched network are needed for science". Removing regional, last mile, local bottlenecks & compromises in network quality are now on critical path. Needs a global science network roadmap: national priority (e.g. factor ~1000 improvement per decade, hybrid optical networks. We must close the digital divide, allowing scientists and students from all world regions to take part in discoveries at the frontiers of science.
Do not have own network, use other people's networks (split evenly between Internet1 and 2). Trust is extremely import for users (medical professionals), includes QoS, performance, reliability, inexpensive, expectations. Apps include biomedical & bibliographic information, BIRN dist db, telemedicine, remote surgery. Telemedicine requires low latency, low jitter, bandwidth on demand, security. Enabling technologies include: all optical net, new TCP like protocols, teleconferencing, security and the last mile.
Architected and motivated to serve the needs of the DoE Labs. A critical feature is reliability. Traffic doubles each year for last 14 years. Most of international traffic is due to SLAC. Backbone located in telecomm centers with high reliability. With dedicated wavelengths may be able to bypass the firewalls.
Primary network drivers: applications, policy, politics, competition, culture, security.
Five NASA centers, mix of OC3 & OC12 links. Network drivers: applications with increased bandwidth requirements 1Gbps by FY04, 1-10Gbps FY05, 10Gbps FY06, IPv6, net security, nomadic networking. Application drivers: real-time, space communications. Real time requirements includes: reduced collaborative interaction time, reduced latency.
DREN AMP - Phil Dykstra: deployed focus on testing and debugging high performance flows. Ten high speed (Dell 2650s) deployed with GPS for hi-perf measurements. Delay and loss is pretty boring. Need to think about how to measure loss on a quality network. Four pings a minute, big delay changes are interesting, as are loss of reachability. Also hourly do invasive TCP load tests for 10 seconds for sites, heavily weighted by slow-start. Use nuttcp ftp://ftp.lcp.nrl.navy.mil/pub/nuttcp does not require an account at the far end, servers returns all results to the client. Do not use any scheduling. Now moving to OC48 so will need faster NICs (Intel or S2io). Hope for PCI Express solution by Fall 2004 for full 10Gbps testing.
NASA - Joe Loiacono: Have flow graphs, live monitoring, flow analysis, multi-node active measurements,
WAN Monitoring - Les Cottrell: see http://www.slac.stanford.edu/grp/scs/net/talk03/jet-apr04.ppt
End-to-end Troubleshooting - Brian Tierney: Beyond NIC to NIC and memory to memory. Users often blame the networking, need to monitor all points along the path including backbone, applications, hosts etc. Not only need to measure, need to discover, request, respond, have standard naming and protocols/schemas. need to instrument code during development phase, then need baselines to compare new values with. Need to correlate multiple sources of information, e.g. from cpu, TCP stack, I/O etc.
Abilene Measurement - Matt Zekauskas: AMP putting machines in Abilene core. Four machines at each Abilene node: nms1 iperf throughput, nms2 ad-hoc on demand (+ndt+routing), nms3 stats collection (flow snmp), nms4 latency testiing (owamp, troute) CDMA GPS timing source (nms 1,2 are Linux 2.4.20, nms 3, 4 are running FreeBSD 4.6-STABLE (buffers tuned).
Internet2 E2E piPEs: - Eric Boyd: collaborative effort with many other groups including AMP, CENIC, ESnet, LBL, SLAC, PSC, UDel etc.
JetNets to deploy measurement boxes at boundaries (interdomain trouble-shooting), recommend one box for users. Recommend a framework with standard extensible schema for discovery request/response data request/response and name definitions. Need to experiment with network engineers to find out what was useful from the measurements. Define success & failure (not just fix but not repeated fix). Focus on communities that care and also useful for the little guy. Need a suite of on-demand measurements that can be made following a problem and useful to pin-point problem causes and agreed as useful by engineers. Can one categorize institutions by their achievable performance and get peer pressure to get institutions to improve. Need measurements at computer centers to understand and publicize performance. Has to be relevant to end-users.
Low hanging fruit: test peering actively; expert peering utilization, global authentication; test points for TCP/UDP; automatic alarm generation.
Recommendation 1: Charter a technical committee to design and lead the deployment of network measurement infrastructures to measure end-to-end across this community of networks and their end-to-end users.
Recommendation 2: Contribute to and adopt the GGF NMWG schemas for sharing measurement data.
RDDP (also called RDMA) implementation: easy top do in proprietary protocls, not too hard in block oriented protocols such as SCTP. TOE has worked in some cases since subset (toy implementations) but hard in general case where it has to provide all the operating system services. MTU 1 ppm losses work at 100Mbits/s but losses scale as square of speed so gets very hard at 10Gbps. Overhead 1G * 1kB pkts vs 1M 1MB pkts, are in different layers small pkts have 1000 times more software overhead. Double HW costs for large packets (500 times less overhead per unit cost). The LAN industry has optimized their part of the cost at the expense of other parts of the stack.
Path MTU discover (RFC 1191) does not work well. Requires ICMP messages from routers (many problems outlined in RFC 2923). When it fails, the symptoms are hung connections. Matt has a new algorithm, does not rely on messages from the net, solves tunneled protocol problems too.
User mode vs. kernel mode. Non TCP are implemented in user mode. Pros rapid prototyping & experimentation, easy debugging w/ less exotic tools, faster deployment. Cons: burns more total syscall overhead. Double hit for TCP: it is also mission critical so if screw up screw up a whole machine. TCP loses.
Epilogue: TCP is easy to blame but most bottlenecks apply to all protocols when deployed: not diagnosable due to hour glass; remote required congestion control; no direct data placement; tiny MTUs. These are often fixed in test environments.
Fix problems in all protocols: measurement Web100.
Do we raise the "tent pole" or raise the skirt by pushing everyday systems (millions of US R&E systems that might use > 100 Mbps). Funders are more interested in glitzy prototypes (not suitable for global deployment, ignore the unglamorous real problems).
TCP probs, slow recovery after loss. For short flows throughput determined by slow start. Rate based UDP may be better. RBLAST, RBUDP (http://www.evl.uic.edu/cavern/quanta, Tsunami (last release Dec. 2002) UDP for data stream TCP for control, uses loss rate to determine sending rate, used a file transfer onlt, SABUL (Simple Available Bandwidth Utilization) streaming protocol uses a window and an AIMD rate control, and nor RTT dependent (so potentially better fairness than TCP), UDT uses UDT for both data and control still under active development . UDP protocol security: hi-jacking of sessions, corruptions, encryption. Other higher level work in extracting the storage layer: P2P BitTorrent, I2 Logistical Networking, Dataspace, decoupling of LAN/WAN storage.
Good protocol is one that can be widely deployed. UDT/SABUL can only do things at user level so cannot do all the sophisticated tuning that kernel level TCP is able to do. There may be a group who design protocols that are for long term and can run anywhere, as opposed to those developing protocols for dedicated more specialized purposes such as high performance data distribution/mining.
Measurements: pervasive theme end-to-end performance. Importance of cooperation for seamless performance, snmp everywhere for greater good, test points to divide and conquer, support for inter-domain trouble-shooting, what box to install, need standard protocols, schemas. Do an experiment to monitor 2+ Jetnets together with explicit timeline start with communities that care (e.g. Abilene, ESnet, EOS) but must be widely applicable, install best interoperable platform, push boxes to end sites, see if helps problems (define success/failure clearly). Unclear how JET can assist for international collaboration. Recommend charter a technical committee to lead & design the deployment of measurement infrastructures ti measure end-to-end performance among and across this community of networks and their users. Second recommendation to contribute to & adopt the GGF NMWG schemas or equivalent for sharing network data.
Transport protocols: TCP or not TCP need instrumentation, better congestion control, remote direct placement to reduce overhead. UDP replacements: Netblt, RBUDP, Tsunami, SABUL/UDT ... Role of JET no clear winner, encourage experimentation, lead/foster deployment of new protocols and network technologies. Recommendation: JET nets should foster the development and deployment of new protocols and network technologies.
International Overview: how are international connections coordinated overseas? Little coordination need JET to improve coordination. Cost sharing is an option, can JET help in match making between international partners or communities. Look at underserved regions of the world
Big plug for Bro
Biology analogy to system defense: a new approac, needs model of healthy state, self learning state determination (infant immunology), notions of self vs. non-self (freign bodies as opposed to signature based detection, Active response is part of immune system high cost of false positives, need to control active response, must determine damage is being done before active response is initiated. Notion of sacrificial cells. Advantages: can respond to new, unknown attacks, reduce false-positive rtaes, automated response. Disadvantages: sacrificial cells (less important systems) get sick. Attacks against immune system, can it be made practical? Recommend additional research, deployments in honeynets, diversified JET environments provide beneficial test sites.
Authenticated firewalls: deny all is unrealistic in open high performance research community. Need to open specific holes for specific time (short-term application specific holes in firewall). Social & political issues are daunting.
Need adequate AAA for implementing JET goals, needed for international connectivity, measurements. Globus security model is extensible & flexible seems to be capable of meeting most security policies. Security & performance are usually trade-offs.
Recommendations: connect security researchers & security policy makers with requirements of JET networks, separate technical security issues from political security issues, can we adopt the Global security model among JET networks (or something interoperable). Is it time for another CIS-like program