Authors: Les Cottrell. Created: September 27, 2000
|Introduction||Reduced Complexity Models||Analysis of Internet Topology||Backplane||Adaptive hierarchical net M&S||Online simulation & control||NetLets||UCLA modeling and control||Breakout on Interface models|
|Smart Networks||Multiscale traffic monitoring||Tracking metamorphic structure||Day 1 taking stock||Hybrid fluid models||TelScope||Breakout on monitoring||JavaSim||Discussions|
|Internet Theory||Understanding net performance||SSFnet||Cisco view of S&M||SAMAN||Distributed QoS control||Breakout on modeling||Information Assurance|
|Modeling & Simulation||XIWT||Maya||Fluid Methods||Scalable online Net M&S||Online simulation network control||Bellcore research||Breakout for experiments|
The goals are to create network modeling & simulation tools that are trustworthy to provide a basis for on-line prediction and control. Models need to be fluid, multi-scale, multi-resolution and dealing with fast changing traffic and connectivity needs.
The tasks are
The idea is that models are integrated into experiment design and fast simulation and they in-turn feed back to modeling.
PowerPoint presentations will be made available.
Three people at Cisco (including Karl Auerbach) and about 12 at UCB.
Today only basic measurements are used today for net ops. Use more advanced measurements to improve plan, provision, resource allocation to diff classes of service, fault isolation.
Main focii are Diffserv for network planning, CoS provisioning, operations (e.g. bandwidth brokers, tuning); MPLS for route selection algorithms, robustness to link/node failures selection of back- up routes; diffserve/MPLS integration,; set up tests bed to implement & try out provide proof of concept & small scale experimentation.
Vision is to have network devices that can make measurements that feed info to tools that provides information to operator, designer. Models use stochastics, applied statistics, simulation techniques & control systems. Measurements > model fitting > decisions.
Diffserv admission control policy is based on worst-case statistics.
MPLS labels packets and routers forward based on label/input port and assign new label, allows to by pass IP routing protocol. RSVP is signaling protocol used to establish MPLS path. Traffic engineering issues include what is the right policy for path selection this requires knowledge of traffic statistics etc. The selection may be done online or offline, it may be centralized or distributed. Needs to robust to failures (e.g. reserve 2 disjoint paths and active & backup). How does one re-optimize (especially for mobile hosts). How to choose paths with multiple criteria.
Theory requirements need to address complex multi scale phenomena, the new science of complexity lacks rigor. Fluid, solid statistical & quantum mechanics have major unresolved fundamental problems, e.g. shear flow turbulence. The research plan is to develop a more unified and versatile theory. Complex systems in biology, technology are driven by design to high scale, need to be robust to changes/perturbations. 1/(1+x/10). Log-log plots for cumulative probability. Fat tail distributions are very common (e.g. power outages, forest fires, web files, Unix files, cpu utilization, word rank, populations of cities). Power laws are more likely than Gaussians (central limit theorem produces Gaussians and power laws). Fat tails are great for control if exploited, but hard to exploit if decentralized. Need to design for fat tails. The origin of fat tails in web traffic are an intrinsic property of human information (e.g. not an intrinsic property of web but appears for example in libraries etc.). Alpha ~ 1/beta (Beta is dimension, alpha is the power law slope). Fragilities happen in systems where there are users, e.g. in garment industry the equivalent of IP is sewing and cloth, the applications are clothes, and changes from users such as fashion changes mean clothes go out of style rapidly (before they wear out) and this leads to fragility for the clothes industry. Need feedback to control fragility (attenuate disturbances). Yet more robust it becomes as one adds negative feedback to important areas the more fragile it becomes elsewhere.
Looking at a viable strategy for success in large scale networks. Problems are explosive growth, immense moving target, heterogeneity, complex user behavior & dynamics. But it should be possible, in our favor is the highly engineered structure with a layered architecture, routing, unique measurement capabilities, computers can measure but expect surprises; weak point is exploiting the measurements - how to analyze. Challenges are heterogeneity, constant change, volumes. Statistical inference is dead, the curse of traditional time series analysis too much focus on details. Need to focus on invariants (avoid tinkering with details), discovering surprises.
Modeling challenges must be robust under changes/different conditions. Need model that provides insight & physical understanding. New approach is structural modeling one exploits the context in which the data arise, aim for physical-based description, and are validated.
How does one validate without classical approach of statistical fits. Instead close the loop with a constructive model using elementary mechanism (math), construction makes sense in network context, empirical validation of elementary mechanisms, leads to some answer and many new questions.
Self-similarity is an example. It discovered an invariant = long-term correlations, so build structural models with heavy tailed distributions and close the loop heavy-tail sessions/connections/flows. By comparison TCP & slef similarity (chaos) is based on simulated data and flawed analysis. Another example is inter-AS connectivity, the invariant is a power law distribution (most ISP connect to a few, a few ISP connect to many), a first attempt at a structural model of preferential peering, but the loop has not been closed yet.
To scale up to large networks need to understand the user/application characteristics which are likely here to stay, so have application-level workload generators, want to know why heavy tails, where is the network beyond self similarity, at small time-scales there are new unexpected scaling phenomena which require you to know about protocols and topology.
Future challenges for modeling & simulation (so far focus on single link/router, traffic is exogenously generated - open loop, trivial network structure) traffic as a function of time, space and network layer, traffic is the result of endogenous conditions (closed loop) that constantly change, realistic network structure (importance of routing).
New developments include: a new breed of network measurement infrastructures network wide, potentially from thousands of nodes (e.g. NIMI, Akami); new breed of measurements multi- point, high-volume, high semantic contexts; new breed of network simulators; new breed of network experimental and theoretical research.
Goal is to develop simple models to accurately predict the performance of complex heterogeneous networks. Need to address large networks, hundreds of nodes, millions of flows, dynamics at various time scales, heterogeneous sources, different requirements, unicast vs. multicast, heterogeneous networks with wireless and wireline.
Their approach is to use user Utility functions and add constraints (network available thruput, loss).
They are evaluating a queuing mechanism where they mark packets when the network is busier than some fraction (unlike RED they do not drop such packets) where the marked packets will allow end-nodes etc. to reduce thruput somehow.
Objective is to be scalable edge based tools for on-line network analysis, modeling & measurement. Based on multi-fractal ideas. Want to be realistic and analytically tractable. The model uses probes to measure the RTT at different intervals (probes sent in pairs, first pair back-to-back, then separated by packet size*bottleneck bandwidth, then doubling 6 times), based on the time granularity of the fractal, so multiple time intervals are sent. The idea is these probes are close enough together to measure the fractals of the cross-utilization (i.e the competing traffic).
TI has an unannounced super high speed network measurement engine that will run at Gbit rates.
Goals: web based repository with access to real data with interactive graphical and statistical analyses that can be extended
It is difficult to get good measurements, since need permission from ISPs to measure in the core, so the way in might be to build tools that would be beneficial to them. Current tools lack well defined traffic metrics (e.g. support SLAs or billing). The taxonomy includes: topology, performance, workload, routing. Can get data in all categories but little correlation between categories.
Topology is being done in the grand scale by Lucent (Cheswick), and CAIDA/Skitter. Skitter has 22 monitors. The data analysis is difficult.
Traces for performance http://moat.nlanr.net/Traces/ (OC3/OC12 "real" networks), http://ita.ee.lbl.gov/html/traces.html (mainly campus/corporate nets).
Includes work from AT&T (includes Walt Willinger), Renesys, Dartmouth & ...
Hard problems are: drowning in data, need new sophisticated tools. How to manage thousands of probes. How to scale to future Internet. Need to know which details to shed.
SSFNet (Scalable Simulation Framework) modeling & simulation of Internet modeling infrastructure. Goals ID invariants to develop a new network theory.; routing and E2E traffic dynamics; space time correlations ... scale to 1000s of nodes. Defined terms Black Box (don't know how Internet works just measure effect), glass box (model the internet). The modeling package software (in Java) is available at www.ssfnet.org as open source.
This is work being done at UCLA & Caltech.
Objective to improve the operation of large scale heteroegneous networks by orders of magnitude. Want to use on-line simulation to on-line/real-time control. They also want to understand network stability.
Increasing window size causes more collisions between data packets and ACKs traveling in opposite directions.
Want a box in your lab that can emulate the Internet. Then can develop applications and experiments with live applications without needing a real network. They need large scale detailed models of the emulated network for improved emulation accuracy, understanding network behaviors (soln. parallel & distributed RT exec of discrete event models).
Need plug & play methodology for heterogeneous emulation/simulation tools.
Need repeatable execution capability for improved debugging and testing support.
Idea is to provide a software backplane to plug in various simulation packages (e.g. Opnet, PARSEC, ns2) so they will inter-operate.
Other examples of network emulation are NISTnet, Ohio Network dummynet with focus on detailed simulation of underlying network. They can get up top 256K nodes in the routing table of ns2 using NixVector to reduce the storage needed for complete BGP tables at all nodes.
Models: 3 models SLA/Diffserver moderate maturity(UCB); MF/Adaptive probing moderate maturity (Rice/ATT); Markers moderate/low maturity (UIUC).
Tools: power law CDF based (UCR), where do we take the power laws operationally, how does one exploit the power laws vs. say exponential/Poisson? E.g. can Cisco router put out the power law coefficient rather than having to put out all the raw data, does it make sense to develop routers that take advantage of power laws. The ideas of parameterizing a complex system by a single parameter is a big simplification but can be very valuable, e.g. using the Dow Jones index in order to characterize the economy and then is used as input on how to change other parameters (e.g. interest rates) that affect/control the economy.
Are the measurements sufficient, are the measurements familiar enough to modelers. Are people writing SLAs based on PingER data; CAIDA + Passive data, predict end-end performance, passive GPS may come easy for DoD nets so can this be exploited. Want more details on ATT work (Fred True).
Large scale simulations: how does one validate, does it scale, is it non-linear. For tomorrow hook up simulator/emulator with specific experiment based on CONOPS and specific demonstrations. Simulation/emulation is important for protocol design, but Sri is unclear that is a DARPA focus since the commercial folks will address. One needs not just be able to simulate well, one needs to also be able to understand the "physics" of what is going on, e.g. understanding self-similarity in terms of on-off effects etc.
Cisco does provide support for research on topics of current & future interest to Cisco. This provides a venue for risky or orphaned projects. They have 19 experts on a University Research Board. The awards are in the range $20-100K, they have 33 awards to 25 institutions. The grants can be used to leverage DARPA, NSF awards. See www.cisco.com/research/
Need real edge-to-edge coverage, past models were too simple, IP only, only vBNS, need to incorporate business models, technology not factored in e.g. ATM cell drop.
S&M used for analysis & design, real time configuration & control. Industry wants useful & realistic tools. Theory vs. experimentation e.g. Poisson vs. self-similar traffic. Need validation of models with real data & network topologies.
Need to understand nodes such as routers with 1000s of interfaces, how do they support 128bit IPv6 addresses at 10 Gbit speeds ...
Why modeling will get tougher:
It is going to get tougher with SLAs cascading over clouds & ISPs; diffserv/intserv with all varieties; multi-plane IP over x, y, z; compare/contrast IP over Lambdas, IP over ATM over SONET, over glass, ...; routing protocols (BGP, OSPF, ...); policy based services (bandwidth brokers). Not just Best effort but also less than best effort traffic (scavenger QoS), streaming, latency sensitive, etc. traffic; tunneling IPSEC, MPLS ...; multi-path IP & MPLS, load balancing; simulation/modeling in real time loops; modeling for worst case scenarios e.g. for diffserv like QoS support in clouds; data gathering at high end or sample, what is a good sampling algorithm, how does one validate it. And then there is multicast used to be 1:N for large N, now becoming A*(B:C) for large A, small B & C. Then add in mobile IP which is a special case of tunneling between Home Agents and Foreign Agents, but what happens when 50% of a network is now mobile/nomadic. Add in micro/pico nets (e.g. a body network with a belt router providing access for PDA, cell phone, pager etc.). Active networks (i.e. networks where the routers are reconfigured on the fly, very interesting to the military) how do they affect S&M.
Issues for researchers:
Need to model SLAs how to deal with cascading & dynamic SLAs; statistical sampling & verification of models against capturing of all data; modeling edge-to-edge or e2e & how they interact with core clouds; models & tools to compare/analyze IP/ATM/SONET/glass vs. IP/glass.
Tools to compare/contrast/validate highly available systems, i.e. can we model cost of added HW/SW vs. benefit of higher availability. How to characterize convergence of 1 plane then multi-plane (MPLS, routing flaps, dynamic net reconfiguration). Can we determine heuristics to build simple yet meaningful models.
Super simulation centers, Morph the Utah center, but there is a potential risk > non-real/unfair models. IPv6 vs. IPv4 routing, forwarding table lookup (128 vs. 32), memory schemes etc. nodal analysis. Generic ISP topologies (edge & core). real representative data for verification.
Can we model/characterize rapid convergence after flaps. Can we model content distribution & routing services - show optimization. How can we characterize the error between model and network such that we know how much effort to put into refining the model.
In summary need realistic, validateable models. Need real data, I2 nets won't cut it. New network models intelligence at the edges, multi-plane, mobile, persistent presence, ubiquitous computing ... Edge, enterprise, campus, core nets. Its only going to get tougher as networks get more complex.
Simulated a network with 5-10 routers with some having RED and looked at queue lengths and thruput using 2 methods. One simulation was via a fluid model, the other was a fixed point method. Fluid flow was 60 to thousands of times faster (exact amount depends on number of packets, i.e. speed of links). Compared with ns for loss (ns is supposed to be an accurate though time intensive simulation). Looks like they get good agreement. Looked at effect of RED on queue lengths, appears it is too slow in providing feedback and so results in oscillations, instead propose a method that uses the queue growth (differential method) to predict when to drop a packet. It looks much better since it has faster feedback.
Want to evaluate impact of models on performance etc. Thinking of real-time control. In order to do real time need to have a minimum complexity model. One of the models uses wavelets, will look at multi-fractals with different time scales.
Motivation: new high speed networks with compressed video, file transfer, network modeling & analysis technologies are urgently needed for network control (admission & congestion), planning etc.
Current analytic models cannot capture burstiness & are overly optimistic. Simulation of complex networks is not feasible or takes too long and requires too much data to be stored.
Fluid flow events correspond to rate changes which occur far less frequently than discrete packet arrivals so provides an aggregation simplicity, so good for large scale simulations. Model uses "cans" with inputs and a hole for output. Model input flows, service rate, capacity (size of can and holes).
Network robustness is a key challenge: understanding, predicting & avoiding failures, planning failure recovery strategies. SAMAN is designed to address. What if a link becomes overloaded (today discover thru manual monitoring), need good model to assist in understanding. Loss of one router can cause a cascading failure. Need simulator to explore several failure scenarios.
Specific failure conditions: fail-stop, e.g. back-hoe; traffic overload (maybe benign cause or DOS); cascading failures. Can simulate end user performance as packet loss increases.
Key result is failure prediction, increase of protocol robustness (study how protocols obey at edges of their operating limits); network early-warning systems (tools to predict imminent network failures and trigger preventative or corrective actions), clear mapping from tools to specific failures.
Expect to be a set of model generation tools. Application driven models to capture application level dynamics (feedback, user behavior), validated and applicable across wide range of time scales.
Builds on ns simulation environment which has broad community of support. Large simulations require abstraction (e.g. fluids, centralizing routers from distributed BGP tables, rather than requiring each node to calculate).
Modeling real-audio traffic as an example of a streaming . At gross scale looks like a CBR (Constant Bit Rate) source. But the quartiles of packet separation has some quantization at 1.8seconds. So this needs to be added to the model. Currently validating the model.
Want to scale simulation to multiple domains & hundreds of thousands of flows and look at 2nd order traffic & routing control. Need to speed up simulation so can use results for traffic management. Use results to tune parameters of routing and traffic management. Clone current network then put in simulator and adjust parameters to see optimum setting, then feedback parameters into real network. Can use farms to calculate different settings parameters in the simulator. Showed result of feedback on impact of buffer size variability with RED. Iterations converge in 5-10 iterations. Simulation time of a single iteration decreases super-linearly with number of simulated domains. Can use different flavor of TCP implementation in models.
A single domain might be 200 links each at 100Mbps with each at 40% utilization, with 200byte packets (header & data) each packet is a single event, then 1 hour simulation creates 18Billion event or 5M events/second.
Obtained from real traffic experimental evidence of the presence of self-similarity in individual long TCP flows with a direct relationship between loss rate and self-similarity. Developed a math model for self-similarity based on micro structure of TCP. Want to further validate TCP micro structure predictions of self similarity based for model over aggregate flows.
Potential effect of adaptive routing can result in a large amount of traffic for routing updates.
Big emphasis on National Security for event/attack detection and re-planning against attack. One of needs is for a rapidly deployed mobile evolving network for National Guard in case of a nerve gas attack on a community. For bigger operations it requires cross forces (Army, Navy, Marines, Air Force plus coalition) joint network management (e.g. for Mid-East), also network needs to rapidly move forward with advance of forces, needs to be secure and may be interrupted (e.g. by counter-attack, by dropping/destruction of equipment).
Product fielded by AFRL, Provides accurate communication route analysis against user base, identify alternate paths for user traffic in the event of communication failures, help locate fragile portions of network, has friendly user interface. Uses OPnet wizard (thinking of dropping because of cost), on NT & Solaris, fed from flat files, much faster, $8K/seat plus maintenance (20% cost) to deploy. Can model at 6K to 15K nodes. Input is network infrastructure and user lay-down. No users outside government and government contractors. Modeling input data is XML like to export from DB to flat file. Has web-pages, on-line help, help links back to the configuration management site.
Mainly looking at IntServ so far. Wants to use economics theory to minimize the amount of information that needs to be exchanged to provide QoS. So talks of user benefit, and costs. Products being purchased were effective bandwidth, buffer sizes.
Goal is to incorporate traffic simulation into real-time control. Assume have persistent controlled sources, have fluid traffic with continuous transmission rate, control epochs are periodic, non-negligible RTT, high-priority cross traffic from a known Markovian model. Action (u) is the transmission rate, state (x) is queue size, cross-traffic state, history of rates, and a reward structure:
R(x) = T(x) - aD(x) - bF(x)
T=thruput, D= delay, F=Fairness penalty, want to find a policy to maximize the rewards. They estimate Q(x,u) for different future cross-traffic (i.e. the traffic that is not high priority) service rates, and calculate the optimal reward. Seems to provide good results for loss, thruput etc.
Uses a daemon between the network and applications to choose best routes. There is a server at the remote node to reflect the data sent back to the daemon. The data is sent as TCP stream with random sizes (up to 10,000 * 50 byte lines) and then plot RTT/2 vs bytes sent. Assumes the best path is the one with best bandwidth or at least currently that is what he is optimizing. This is probably OK since he is using TCP flows which inherently are affected by loss. Daemon measures RTT by sending multiple packet sizes and the idea is that the application talks to the Netlet which decides the best route to the data. The daemons at each host/site will have a built in router which provides an infrastrucure for the Netlets to choose alternate routes.
Major sources of information are:
XIWT has a history of data available, looking for what to measure next to help emulation
What to measure:
What if we could get real data for comparing with simulations is to make them on ESnet. ESnet is a manageable size, known architecture of network, we have access to the network admins so may be able to get router configs, SNMP utilization etc, netwflow information. Modelers want to validate their models. Kind of information needed is topology complete with routing & other settings for network; traffic source/destination average number of hops on 30 sec basis packets and bytes (is there a privacy issue); TCP/UDP port data at source; router and switch settings. For validation need thruput, RTT, loss. Also want traceroute. There is no existing data showing all the above with the possible exception of ATT. Unsure of how much granularity one needs, does one need complete flows, can the data be aggregated, who will do the aggregation. If need raw data it will need a large data warehouse and management etc. ATT (Fred True) has a lot of data, but is it available, in what form etc. Unfortunately he was not at the meeting.
One of the goals is to be able to build network aware applications.
Guidelines for demonstrations: context & scenarios drastically different workloads, removal of part of the infrastructure (for example how long does it take to heal after a route change), emergency battlefield operational, tickling a hidden fragility; improving the state of the art of S&M showing improved capability.
Need to show a real problem with animation: operational, configuration (initial, planning, evolving), e.g. a cascading instability, needs realism, historical catastrophic data.
Validation & accuracy metrics are very difficult to solve in the abstract, there is no single answer. Need to specify examples of what is needed to validate, such as cross-validation across multiple details and time-scales; failure re-creation with set-up in lab;
Question asked was "where do we go from here?" Need multi-point measurements; structural adaptive data-driven user-models & multi-level abstractions; measure QoS; close the loop via planned networking experiments. Participants will prepare a list of demos.
Want to answer the question what timescale should measurements be collected for reliable estimation assuming traffic is described by fractal Brownian motion (1 second for capacity estimation, 1 minute is error of 20%, 1 second gives 2-4%). Also doing work on MPLS, WFQ for differentiated SLAs, delays seen by customer traffic vs. delay measured by probes (this is work in progress, they hope to have a paper in the next year, it is related to traffic arrival times, and they show that the probes underestimate the delay unless they sample very frequently, the amount of underestimation depends on network utilization), estimation of available bandwidth on link seen by probes from remote sites (this latter is pending a patent application and so is under wraps).
Developed simulator in Java (called JavaSim), each entity is a component. They have modules for packets, packet headers, packet body, port, routing table, packet filters, diffserv marker, OSPF etc. A goal is to make code more manageable (less spaghetti) by taking advantage of component architecture so one can plug components together. TCL (integrated via Jacl a pure-Java implementation of Tcl 8.0 with Tc;/Java extension) is used as glue language. Have a front end to provide animated view of simulation results.
NSF have been supporting this, early on supported by DARPA. Hope to release software in 2 weeks into public domain. Will continue to support after release.
Is the information getting to the recipient in time, without being tampered, did it come from person expected from, e.g. is it trustworthy, did the soldier get captured with equipment. Lot to do with intrusion detection, tolerance etc. DARPA has a lot of money invested into this ($100M/year). Concerns not making headway so looking for fresh brains. There is a belief that network M&S can provide a big help. Big worries are cyber-terrorism and bio warfare. Challenges are network complexity. Want to model network during attack & recovery, how does the network change during attack & recovery, distributed DoS, Red team (tries to determine system vulnerabilities and exploit them) evaluation and assessment, insider threat, intrusion detection dataset. Can one see what red team is doing and develop models of it. Can one come up with ways to detect the activity of a compromised intruder. They want to simulate the development of an attack, come up with stealth, signatures etc.
Another problem is multi-lateral security policies (Army, Navy, Marines, Air Force, various theatres etc.), how do they interact.
Output required from session:
There are many components that feed into one another. These include the real network generates data that goes into analysis goes into the model & simulation goes into more analysis goes to control to control the real network. Also left out what feedback to what new needs to be measured and how to aggregate. Interoperability between all these is becoming a critical issue. Another issue is better understanding of the comparisons of fluid level and packet level models.
What are priorities: data collection, analysis, model validation; find common ground between fluid & packet simulation; how big a simulation is "big enough" number of nodes, flows, flow class distribution. May need common disclosure.
Bob Aiken/Cisco (concerns over what happens when/if Internet ceases to be TCP friendly either benignly (poor implementations of stacks, growth of UDP apps), or deliberate denial of service attacks, concern over Cisco's variable quality in implementing SNMP/MIBs, LBE QoS), George Riley/Georgia Tech (modeling packet loss, streams vs. window size, and understanding effect of reordering), Nagy Rao/ORNL (look at/mine PingER data - Thomas Ndousse is keen on ORNL doing this, run iperf servers, SLAC run Netlet when the code is ready, compare PingER, iperf and Netlet data), Thomas Ndousse/DoE (interest in applying neural networks to analysis of network data, also in using for feedback, RFP in November), Kevin Mills/NIST (contacts for NISTnet firstname.lastname@example.org, email@example.com). Gary Warren/SAIC has been analyzing the PingER data from XIWT. Chuck Brownstein/CNRI (loss of people, looking for what to do leaning towards measurements to assist in S&M, concern over PIs involvement, how to assure involvement of key XIWT members, IPERF meeting in November), Rudof H Reidi/Rice (instrument PingER to look at fractal behavior, need to extend ping (sub second intervals, and sub millisecond reporting) provide wider scale real network measurements, possible collaboration if can extend ping with NIKHEF folks). Bruce Hajek/UIUC netflow measurements for ECN, ATQ