Network is by design transparent so hard to find out information about how it is working etc. GGF Grid High Performance Network group is trying to bring together networkers/applications writers/users by creating documents on “Top ten things  network engineers wish grid programmers knew” and vice versa. http://www.csm.ornl.gov/ghpn/

Understanding is hard: Immense, moving target, traditional (e.g. Poisson distributions) mathematical tools don’t work, looking for invariants, need parsimonious models. See Vern Paxson’s work, e.g. http://www.icir.org/vern/talks/vp-painfully-hard.UCB-mig.99.ps.gz

The top three networking problems according to a paper by Claudia DeLuna of JPL, are Ethernet duplex, host configuration and  bad media.

Failure cause breakdown for 3 Internet sites indicated 51% caused by operator error. “Self Repairing Computers”,  Scientific American, June 2003

Reviewing user reported long lasting (typically days, i.e. does not include router reboots, or time out for reconfiguration) WAN problems that SLAC over the last two years, the biggest contributors (30%) were a combination of mis-configured routers (loose unicast RPF filters, wrong buffer size, poorly chosen backup route), misconfigured switches (needed reboot, PVC incorrectly rate limited), firewalls (limit throughput, reset window scaling option). Note these are mainly engineering problems or bugs as opposed to problems we need to research to know how to fix each one individually.  However, we do need to investigate how to accurately and automatically identify and report on the location and cause of such problems for the end-user.