|
|
|
|
Network is by
design transparent so hard to find out information about how it is working
etc. GGF Grid High Performance Network group is trying to bring together
networkers/applications writers/users by creating documents on “Top ten
things network engineers wish grid
programmers knew” and vice versa. http://www.csm.ornl.gov/ghpn/
|
|
|
|
Understanding is
hard: Immense, moving target, traditional (e.g. Poisson distributions)
mathematical tools don’t work, looking for invariants, need parsimonious
models. See Vern Paxson’s work, e.g.
http://www.icir.org/vern/talks/vp-painfully-hard.UCB-mig.99.ps.gz
|
|
|
|
The top three
networking problems according to a paper by Claudia DeLuna of JPL, are
Ethernet duplex, host configuration and
bad media.
|
|
|
|
Failure cause
breakdown for 3 Internet sites indicated 51% caused by operator error. “Self
Repairing Computers”, Scientific
American, June 2003
|
|
|
|
Reviewing
user reported long lasting (typically days, i.e. does not include router
reboots, or time out for reconfiguration) WAN problems that SLAC over the
last two years, the biggest contributors (30%) were a combination of
mis-configured routers (loose unicast RPF filters, wrong buffer size, poorly
chosen backup route), misconfigured switches (needed reboot, PVC incorrectly
rate limited), firewalls (limit throughput, reset window scaling option).
Note these are mainly engineering problems or bugs as opposed to problems we
need to research to know how to fix each one individually. However, we do need to investigate how to
accurately and automatically identify and report on the location and cause of
such problems for the end-user.
|
|
|
|
|