TULIP is a web application being developed by the
MAGGIE-NS team from the
National University of Sciences and Technology (NUST)
School of Electrical Engineering and Computer
Sciences (SEECS) (formerly known as NIIT)
and the Stanford Linear Accelerator Center
(SLAC)
Internet End-to-end Performance
Monitoring (IEPM) project. TULIP's purpose
is to geolocate a specified target host
(identified by IP name or address) using ping RTT delay
measurements to the target from reference landmark hosts whose positions are well known.
Knowing the speed of light in fibre or copper (roughly 0.6*c, we use 1ms. is equivalent to 100km),
the minimum ping RTT measurement of 5 pings (see Ref
1 for the error distance
versus the number of probes)
from each landmark site gives a rough estimate of the fibre + copper cable
distance of the landmark
from the target host. Lateration is applied on these distance estimates to estimate the
position of the specified host on the globe. We are focusing on a platform agnostic, open non-proprietary tool (c.f.
Traceware from Digital island or Edgescape from Akamai) that can be used to evaluate the
effectiveness of this technique for hosts outside the U.S. and Europe
specifically in less well developed countries. There is a
map and a
table of the TULIP landmarks.
Lateration is the calculation of position information
based on distance measurements. Calculating
an object's position in two dimensions requires distance measurements from 3
non-collinear points (hence
Trilateration).
Multilateration computes the position of an
object by measuring its distance from multiple reference positions. We
use
Multilateration
following
"Wireless Position Technologies and Applications" written by
Alan Bensky, 2008, British library Cataloguing.
The algorithm for multilateration was designed for Wireless Sensor Networks,
with a little tweaking of parameters like Time of Arrival and distance based
on wireless sensor location.
Also see
Problem of Apollonius
and
Descartes Theorem
for tangential circles.
Geolocation:
- If one knows where a host is located then one can choose what content to
send to the host, for example what language to use, what local services
to recommend etc. Typically this does not demand accurate geographical locating,
often determining the state or country is enough.
- It can be useful for security to pin-point the
location of a suspicious host (assuming it has not blocked pings).
- It can be used to help determine from where to get a replicated service.
- Applications that try to draw maps of host locations, such as
Visual Traceroute,
require accurate locations of routers.
- By determining the geographical path data travels on, one can
analyze the efficiency of a network. For example, determining tha route used
between countries in Africa and even within countries in Africa,
one can determine that traffic frequently
goes via Europe or North America, vastly increasing the RTTs and using
more expensive transcontinental links.
- It can be used to supplement or verify the
information in databases such as Whois, DNS, Geo IP Tools and PingER.
- The pings from multiple landmarks can help identify hosts that have
proxies. For example many web servers in developing countries have
proxy web servers in N. America or Europe.
- Hosts can move. For example host names belonging to companies
that are acquired can move to new locations, or hosts names that are associated with a
show (e.g. SuperComputing) that moves to new locations can also move.
Then there are hosts (e.g. laptops, PDAs) that are inherantly mobile.
- Projects such as Zooknic Internet Intelligence
study the geography of the Intenet industry providing maps of the Internet domains in the world
and their relations to economic growth.
- TULIP is being used in Phantom OS as a location estimation service for
making self configuring sub-grids. Phantom OS is a Grid OS being developed at
SEECS (formerly NIIT), Pakistan in collaboration with UWE, UK.
TULIP
- TULIP can also be used to make ping requests from multiple landmarks to see
whether the target is accessible by ping from multiple sites. If it is accessible from
many landmarks but not all, then:
- It is possible that the landmark is not working. To test this try other
targets and see if they all fail from a given landmark.
- The target may be blocking ping access from some landmarks.
- There may be some network problems (e.g. routing) between the landmark and
target. Reviewing the ping output may assist in determining whether
the target host's name is known to the landmark.
- TULIP can show up anomalies, e.g. a host masquerading as another host.
In this case multiple hosts may show up with
inconsistent min-RTTs. For example we have seen a case where a registered mail server in
Iran (Geo-IP Tools and traceroutes showed it in Teheran), yet the IP addrress
had min-RTTs from US and Canadian landmarks that showed less than a few tens of ms.
- TULIP can help identify if a host is connected via a satellite link. In this case
minimum RTT from most or all landmarks will be >~ 400 msec.
- TULIP can help identify hosts that are replicated. For example
root name servers
(e.g. 193.0.14.129) or servers (e.g. gfx1.hotmail.com, yahoo.com) with identical
names or IP addresses that show up in many
different regions. In this case
the replicated host as seen from multiple landmarks will have impossibly
short minimum RTTs that
are less than the RTTs between the landmarks.
- By using the landmarks in the various regions, TULIP can be used to identify
which regions have large RTTs to which regions. This may be used to identify
where it would be advantageous
to place a server in a region to reduce the RTTs and hence improve service.
It may also be used to verify Service Level Agreements that involve RTTs.
There is no geographical tie between Internet architecture and geogrpahy.
For example. unlike the phone system where phone numbers provide countries, areas and exchanges in
areas, the Internet IP address is not designed to provide any location information.
In fact, it needs to be
understood that methods to derive location of Internet hosts were not originally designed for this.
As mentioned previously, however, it can be important to know the location of a host.
It is also very useful to have multiple ways to find the location of a host
both since all methods ahve their problems, and also
to look for agreements or discrepancies. The paper
Distributed Traceroute Approach to Geographically Locating IP Devices
investigates and evaluates existing (2003) methods and solutions.
Basically there are three major ways of locating a host:
- By using databases such as whois, DNS, or location specific databases.
- By using traceroutes and extracting locations from router names.
- By using ping Round Trip Times.
Databases may give the location of a router as being the location of the owner's location
rather than the location of the router itself.
For example, GeoIP Tools locates address 4.68.116.16. in Kansas, whereas it has < 5ms minimum RTT
from Liverpool and < 1ms (< 100km) from Rutherford Lab near Oxford in the UK.
The router is in ASN 3356 which is Level 3 Communications, based in the U.S. It also locates
mia2-fiu-1-us.mia.seabone.net as being in Europe (presumably since it is owned by
a European ISP (Seabone) however it is in located in the US. When trying to
map the topology of
a traceroute onto a map of the world, such errors cause problems.
For example a
traceroute from Brazil to Costa Rica apparently
(according to GeoIP Tools)
goes to Florida then to Italy, back to Florida and then to Costa Rica.
Actually it goes to Florida and then to Costa Rica. Hostip.info gets it right. Another
example is a traceroute from Brazil to a Venezuelan
(according to Geo IP Tools) node.
Again the route according to Geo IP Tools goes via Italy and again HostIP.com gets it
right. Geo IP Tools also shows the end host www.unerg.edu.ve in Venezuela while
Hostip.info says its in the US. Other tests (RTT, Octant) make us believe it is in
the US possibly Dallas (Octant) or Florida (TULIP).
We are starting to compare
Hostip with locations from the PingER database of known host,
and see how well TULIP does
for various regions (see for example
TULIP estimates from Europe to PingER hosts).
Examples include:
- Databases:
- Domain Name Services (DNS) may also help in locating a host.
The
DNS LOC (location)
resource record is designed to make this data available.
In addition the names
of routers often contain their location (e.g. city) so a traceroute
may help identify where a host is near. Examples include
VisualRoute, NeoTrace and GTrace. See reference
1 for a comparison
for the U.S. of the DNS method compared to ping RTTs and a cluster technique.
- Autonomous Systems (AS): Given an IP or host name you can use
Fixed Orbits to find the
relevant AS. Then using
a table of AS number to name
you can find out more about the AS (e.g. contacts, HQ site etc.)
- Whois databases:
Examples of sites that
provide information from such sources include:
IP2location (max 20 requests per day unless sign up),
AntiOnline,
MaxMind
(max 25 requests per day unless sign up),
DNSstuff Geolocation,
and Hostip.info.
Unfortunately the information is often missing, inaccurate or stale. Also a large
block of geographically disperded IP addresses may be assigend to a single entity and the
Whois database may contain a single entry for all of them.
- Also see
NetGeo from CAIDA, which though no longer maintained has many useful links.
It has a database of previously successfully found hosts, if this fails it uses DNS,
then a traceroute is performed with a WHOIS database lookup as a last resort.
It is now a commercial product from NetGeo Inc..
- Geo IP Tool
(also see the
explanation) and
IP-address.com display the location of a
selected host/address using
Google maps. Geo IP Tool uses a database and probably has the best overall coverage and accuracy.
However, it often fails for routers.
GeoBytes requires one to provide the
IP address (not the name) which is a slight inconvenience. It provides lat/long
as well as City, Country, population, currency etc.
- GeoTool this seems a
promising new entry, it uses
maxmind's database (see above).
- Networldmap determines geographical
information by acquring location information from willing participants.
- Traceroute: Typically such methods use regular expressions to deduce
the location of a router (e.g. a router with the name
500.Serial3-11.GW8.BOS1.ALTER.NET is using the Boston, US airport code (BOS) and is in
the city of Boston, Massachussetts.)
- Round Trip Times: methods typically use the minimum RTT
from several landmarks
to the target host to triangulate the poistion of the target host.
- TULIP uses a
Trilateration algorithm.
- Similar tools to TULIP are the Constraint Based Geolocation of Hosts2 and
Octant from Cornell. Both of these foused on
the technique, only worked in the U.S. and are longer providing a service.
If you want
to find the great circle distance and know the latitude and
longitude coordinates of the two ends then you can use the
Movable Type Scripts web page.
World Gazeteer
provides
access to data with lat/longs, cities, countries & populations
(
download data). If you want to calculate it for yourself then see
Deriving the Haversine Formula.
You can also make a
name
server lookup for a host, or if you don't know the exact name try
DomainSurfer. There is also an
Atlas of
Cyberspace that provides maps and graphic representations of the
geographies of the new electronic territories of the Internet,
the World-Wide Web and other emerging Cyberspaces and the Corpex sponsored
Cyber Geography Research.
If you need to find the latitude and longitude of a place whose location
you can find on a map, then try the
Latitude &
Longitude finder.
Latitude &
Longitude finder 2.
Versions of TULIP
We have developed two version of TULIP. The first (TULIP1) was developed
to understand the feasability. The second (TULIP2) evolved from ideas and
experiences encountered with TULIP1.
TULIP1 is based on
Java and Java
Web Start must be installed on your system.
The applet does all the work, it get the request for the target from
the user, sends the requests to ping the target to the landmarks, gets
the results back, amek the analysis and displays the rsults in graphical form.
It requires a configuration file
that provides the name and location of each landmark, the URLs for the ping and
traceroutes. At SLAC
this is kept at /afs/slac/www/comp/net/wan-mon/tulip/Sites.txt.
TULIP2
The user's browser accesses a form to make a request to locate a target.
This is sent to
tulip-viz.cgi at www-wanmon.slac.stanford.edu. It
uses a Google visualization package to display a
sortable table of the
landmarks and their RTTs to the target. When it has gathered all the RTTs
from the responding landmarks, it uses
Google maps to display a map of the RTT circles used in the lateration
and various location estimates for the target.
The requests to the landmarks
to ping the target are made by
reflector.cgi
running on www-wanmon.slac.stanford.edu. Having a central reflector enables
more control over the
requests and their impact, as well as keeping logs.
It also enables us to use a single cookie to access
PlanetLabs landmarks.
For hosts in the world at large it is important to have landmarks that
enable the host to sit within a triangle of landmarks
4,5. Thus we are very
interested in getting more landmarks that cover the world.
We also need the
latitude and longitude (lat/long) of each landmark.
Most Internet hosts are
located
in developed countries of North America,
Europe and East Asia. Thus we need landmarks to cover these areas.
Many countries in the developing world do not have
direct access to other nearby countries (see for example the
Case Studies on S. Asia, Africa and Palestine) but go via Europe or the US.
Thus the
route is very indirect and extended so distances estimated by RTT will be
too long.
Thus we also need landmarks in such developing countries in particular
those with a
large Internet presence, e.g. countries with > 1 million connected Internet users
(see
Internet World Statistics).
We have three main sources of landmarks:
- One possible source of ping servers are the various, such as
Public Route Servers and Looking Glass
Sites,
Advanced Internet Routing Resources,
Cisco.net
- A second source is Planetlab.
As of June 2007 there are about 370 PlanetLab hosts
with about 50% in the U.S., 25% in Europe, 5% in Japan etc. In many cases
there are >1 host at a site with the same lat/long. So we use only a
subset of the hosts.
For the PlanetLab sites there are
are about 18 covering China, 2 in Puerto Rico, 6 in Brazil and 2 in Uruguay.
To utilize them one sends a script to be executed. We have such a script and will
integrate it. We have a
map of the PlanetLab sites.
- We get landmarks installed at interesting places.
The requirement on such a landmark is to install a reverse traceroute/ping
server (see
Traceroute Servers for HENP & ESnet)
on a web server at the landmark site. Instructions for downloading
and installing the traceroute server are available at
http://www.slac.stanford.edu/comp/net/wan-mon/traceroute-srv.html#code.
After it is installed please let us know so we can add the landmark to the TULIP
configuration file (see the bottom of this page for our email addresses).
Currently (May 2007) we have landmarks in about 30 countries
(see
map),
so we have a way to go.
The SLAC traceroute/landmark server that is frequently used by landmarks servers:
rejects attempts to traceroute to a broadcast address;
does not allow a remote host name to be greater than 255 characters to prevent
buffer overflow attempts;
does not allow a remote host in a different domain to do a traceroute
to a host within the same domain as the web server;
limits the maximum number of traceroute processes running in the server to
reduce the chance of a denial of service request;
starts the traceroute after 3 hops if the client/browser and server are in
different domains in order to hide internal routing information from outsiders.\;
has a blacklist of sites that are blocked.
The use of a central reflector to manage all the requests enables us to provide a single
IP address that landmarks can enable access from, while disabling requests from other
hosts.
A major concern is that the target is pinged simultaneously from multiple landmarks.
This can look like a scan of multiple hosts when the target host responds to the ping requests.
It can also look like a denial of service attack, especially for hosts with limited
available bandwdth, such as are found in developing countries. We thus limit the number
of pings from a landmark to a target to 5. We are also looking at tiering the
landmarks. The top tier will enable us to locate the region of the world and then the
second tier can be used to find the location in that region. This reduces the number of
landmarks used and divides them in time into two or more sets. We are thus studying
using
tiering to tier the N. American and European hosts.
TULIP only allows one copy of the client to be running on a client host.
TULIP also hides the URLs used for the landmarks to reduce the possibility
of people gleaning the URLs for a denial of service attack. Editing the landmark
URL's requires a password known only to the developers.
We have also considered whether the knowledge that a machine and possibly the usual owner can be
accurately located may violate some privacy issue. This may require us to add some fuzz to results.
So far this has not been done.
There is a centralized
log of < 100 Mbytes, with time stamped records of all requests,
the requesting host (client), the target, landmark, result (RTT, loss),
errors etc. The log is truncated to the last 20% of the records when it reaches its maximum size.
This is
analyzed
for response time of landmarks, abusing clients, and types of failures.
-
If the target host is connected via a geostationary satellite (minimum RTTs > 400ms)
then the triangulation will
not be accurate.
- Some IP names and addresses actually refer to multiple hosts in very different locations.
An example are some of the Internet's
root name servers or yahoo.com. Such hosts will appear to be close
to multiple locations, so TULIP will be confused.
- Sometimes name service can be very slow causing the pings to timeout. In such a case you can try
giving it the IP address instead of the name.
- If TULIP complains that it cannot load since another copy is already running,
then delete the file C:\Program Files\Mozilla Firefox\pres
- Some landmarks appear to be unable to resolve IP names and so you will need to provide the
IP address
rather than the name if these landmarks are to provide information.
- Multiple IP addresses may resolve to a single IP name. In this case it is probably best if the user
is asked to resolve which IP address they want.
- Sometimes routes are very indirect. This can add greatly to the RTT and hence give bad
distance estimates. Examples include:
- Between India and Pakistan the routes go
via the US or Canada.
- To get to E. Asia from Europe the
undersea fibre goes around Spain, through the Mediterranean,
the Red Sea, around India, then past Singapore and around the East coast of Asia. This is much more
indirect than a great circle route which crosses the Asia land mass.
On the other hand the route from Europe to Australia is more
direct via mainland US and Hawaii.
- The geographical distance from ICTP, Trieste in Italy to Ljubljana the capital of Slovenia
is ~ 60 miles.
However, the Internet route goes via Milan and Vienna and is a factor of ten larger.
Development and More Information
Development is at School of Electrical Engineering and Computer Sciences (SEECS, formerly known as NIIT), National University of Sciences and
Technology (NUST), Pakistan and SLAC by Qasim Bilal Lone (SEECS and SLAC), Shahryar Khan (SEECS and SLAC) and Les Cottrell (SLAC).
More information may be found at:
Acknowledgements
We gratefully acknowledge the cooperation of the landmark sites, in particular PlanetLab, and those
installing the SLAC reverse traceroute/ping server in developing regions (not Australia, N. America,
Europe, Japan) of the world where there are fewer landmarks.
These include:
- Africa:
- South Africa: TENET (Cape Town).
- Democratic Republic of the Congo (Kinshasa).
- Burkina Faso (ouagadougou).
- E. Asia
- China: IHEP (Beijing)
- Hong Kong: UST (Kowloon)
- Korea: KHU (Suwon)
- Singapore: NOC (Singapore)
- Taiwan: TWAREN and NCHC (Taipei)
- Thailand: UNINET (Bangkok)
- Latin America:
- Bolivia: University Mayor de San Simon (La Paz)
- Brazil: RNP (Brasilia), SPRACE and UNESP (Sao Paolo), UERJ (Rio De Janeiro)
- Mexico: CUDI (Juarez)
- Middle East
- Israel: ILAN (Tel Aviv)
- Palestine: AQU (Jerusalem), IUGAZA (Gaza City),
- Russia: BINP (Novosibirsk), ITEP, KIAE (Moscow)
- S. Asia
- India: CDAC (Mumbai and Pune), VSNL (Mumbai)
- Pakistan: SEECS, NUST (formerly NIIT) Islamabad, Micronet, NCP and PERN (Islamabad)
- Sri Lanka: LERN (Colombo)
References
1
"An Investigation of Geographic Mapping Techniques for Internet Hosts", N. N. Padmanabhan
and L. Subramanian,
2
"Constraint-Based Geolocation of Internet Hosts", B. Gueye, M. Crovella, A. Ziviani, S. Fdida .(2004)
3
"Constraint-Based Geolocation of Internet Hosts", B. Gueye, M. Crovella, A. Ziviani, S. Fdida .(December 2006)
4
"An Empirical Evaluation of Landmark Placement. on Internet Coordinate Schemes."
Sridhar Srinivasan. Ellen Zegura.
5
"Geometric Exploration of the Landmark Selection. Problem." Liying Tang and Mark Crovella
Geolocation
Software
Geolocation
Contacts:
Qasim Bilal Lone (SEECS and SLAC) <qasim.lone at gmail.com> Faran Javed <faran.javed at gmail.com>, Shahryar Khan (SEECS and SLAC) <shahryar2001 at gmail.com> and
Les Cottrell (SLAC) <cottrell at slac.stanford.edu> as part of the MAGGIE-NS team
NUST Institute of IT 2006