After an IP address change to the iepm-maggie.niit.edu.pk host at NIIT, nodes at SLAC were unable to access content off of it reliably. When accessing web content from the server, blank pages or 404 errors would be returned.
Initial investigation showed that when resolving iepm-maggie's IP address, we were still getting the old value of 202.125.157.212 from one of SLAC's DNS servers.
pinger@pinger $ dig @134.79.18.40 iepm-maggie.niit.edu.pk. ; <<>> DiG 9.4.1-P1 <<>> @134.79.18.40 iepm-maggie.niit.edu.pk. ; (1 server found) ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 18449 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 2 ;; QUESTION SECTION: ;iepm-maggie.niit.edu.pk. IN A ;; ANSWER SECTION: iepm-maggie.niit.edu.pk. 85951 IN A 202.125.157.212 ;; AUTHORITY SECTION: niit.edu.pk. 85951 IN NS ns.niit.edu.pk. niit.edu.pk. 85951 IN NS ns2.niit.edu.pk. ;; ADDITIONAL SECTION: ns.niit.edu.pk. 81253 IN A 202.125.157.194 ns2.niit.edu.pk. 77112 IN A 203.99.50.202 ;; Query time: 1 msec ;; SERVER: 134.79.18.40#53(134.79.18.40) ;; WHEN: Thu Aug 23 10:42:45 2007 ;; MSG SIZE rcvd: 124
Another SLAC DNS server was showing the current up-to-date A record as confirmed by conversations with NIIT.
pinger@pinger $ dig @134.79.18.45 iepm-maggie.niit.edu.pk. ; <<>> DiG 9.4.1-P1 <<>> @134.79.18.45 iepm-maggie.niit.edu.pk. ; (1 server found) ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64814 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 2 ;; QUESTION SECTION: ;iepm-maggie.niit.edu.pk. IN A ;; ANSWER SECTION: iepm-maggie.niit.edu.pk. 81218 IN A 202.125.157.206 ;; AUTHORITY SECTION: niit.edu.pk. 81218 IN NS ns2.niit.edu.pk. niit.edu.pk. 81218 IN NS mail.niit.edu.pk. ;; ADDITIONAL SECTION: ns2.niit.edu.pk. 63458 IN A 203.99.50.202 mail.niit.edu.pk. 63458 IN A 202.125.157.195 ;; Query time: 1 msec ;; SERVER: 134.79.18.45#53(134.79.18.45) ;; WHEN: Thu Aug 23 10:43:26 2007 ;; MSG SIZE rcvd: 126
For machines at SLAC that were using the name server with the wrong answer cached in it, they were being pointed to a different web server at NIIT that did not have the IEPM-BW content on it.
SLAC unix-admin was contacted and asked to flush the cache on the errant name server, but that was of little avail -- the bad answer was still cached upstream. We waited for the cached entries to expire.
A day later, we were still seeing the inconsistent answers. It was then noticed that the authoritative name servers for niit.edu.pk were not responding. NIIT was contacted and their admins brought the servers back up.
The second listed name server still had the wrong value for iepm-maggie, though:
dig @ns2.niit.edu.pk iepm-maggie.niit.edu.pk. ; <<>> DiG 9.4.1-P1 <<>> @ns2.niit.edu.pk iepm-maggie.niit.edu.pk. ; (1 server found) ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 18386 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 2 ;; QUESTION SECTION: ;iepm-maggie.niit.edu.pk. IN A ;; ANSWER SECTION: iepm-maggie.niit.edu.pk. 86400 IN A 202.125.157.212 ;; AUTHORITY SECTION: niit.edu.pk. 86400 IN NS ns.niit.edu.pk. niit.edu.pk. 86400 IN NS ns2.niit.edu.pk. ;; ADDITIONAL SECTION: ns.niit.edu.pk. 86400 IN A 202.125.157.194 ns2.niit.edu.pk. 86400 IN A 203.99.50.202 ;; Query time: 1141 msec ;; SERVER: 203.99.50.202#53(203.99.50.202) ;; WHEN: Mon Aug 27 11:58:55 2007 ;; MSG SIZE rcvd: 124
ns.niit.edu.pk was returning the right A value for iepm-maggie at this point.
NIIT updated both servers and after the TTL expired on the bad cached entries, all the DNS servers at SLAC returned the correct values.
It was suggested to NIIT to lower the TTL values on their DNS entries to something between two and six hours to decrease the latency between when updates are made and when they propagate to other DNS servers.