Method
- Measurement
- Each Collection site keeps list of remote hosts to ping at sites it is interested in
- Every 30 mins ping each remote host with 11 * 100 byte followed by 10 * 1000 byte pings
- Min separation of pings is 1 second, timeout 20 seconds
- Throw away first ping
- Measure response, packet loss, host unreachable (no answer to any ping)
- Record data and make available
Notes:
Use standard ping. Looked at variations from NIKHEF & others, minor diference, and means collection sites must install & run as root.
“Though Shalt Jitter Your Timers” - Van Jacobson
Should Run Ping Probes at Random Times, Where avg=30 min, Poisson Distribution
pingdata Would Have to Run as a Daemon
One second separation reduces self-correlation effects of 2 packet queuing behind previous
DNS Lookup Done for Each Node in Remote Nodes List
If Whole Site is Down, Then Often the DNS Server for That Site is Down As Well
Need to Cache DNS Entries So We Try Pinging Last Known Good Address Instead of Just Failing When DNS Lookup Fails
Issue: Is It OK to Treat DNS Failures as 100% Loss, Or Should We Try Pinging Anyway?
Record SRC_NODE, SRC_IP, DEST_NODE, DEST_IP, TIME, PACKET_SIZE, PACKETS_SENT, PACKETS_RECEIVE, MIN_RT, AVG_RT, MAX_RT