Archive for the 'Geolocation' Category

CAIDA’s 2022 Annual Report

Monday, July 10th, 2023 by kc

The CAIDA annual report summarizes CAIDA’s activities for 2022 in the areas of research, infrastructure, data collection and analysis. The executive summary is excerpted below:

Hoiho API (Holistic Orthography of Internet Hostname Observations)

Monday, February 13th, 2023 by Bradley Huffaker

In December 2021, CAIDA published a method and system to automatically learn rules that extract geographic annotations from router hostnames. This is a challenging problem, because operators use different conventions and different dictionaries when they annotate router hostnames. For example, in the following figure, operators have used IATA codes (“iad”, “was”), a CLLI prefix (“asbnva”), a UN/LOCODE (“usqas”), and even city names (“ashburn”, “washington”) to refer to routers in approximately the same location — Ashburn, VA, US. Note that “ash” (router #4) is an IATA code for Nashua, NH, US, that the operators of and used to label routers in Ashburn, VA, US. Some operators also encoded the country (“us”) and state (“va”).

Our system, Hoiho, released as open-source as part of scamper, uses CAIDA’s Macroscopic Internet Topology Data Kit (ITDK) and observed round trip times to infer regular expressions that extract these apparent geolocation hints from hostnames. The ITDK contains a large dataset of routers with annotated hostnames, which we used as input to Hoiho for it infer rules (encoded as regular expressions) that extract these annotations. CAIDA has released these inferred rulesets in recent ITDKs.

Today, CAIDA is launching an API ( and web front end ( which returns extracted geographic locations from a user-provided list of DNS names. The API uses the rules that CAIDA infers with each ITDK. For embedded IATA, UN/LOCODE, and city names, the API returns the city name and a lat/long representing the location. For embedded CLLI codes, the API returns the CLLI code; please contact iconectiv for a dictionary that maps CLLI codes to locations.

Try the API out, and let us know if you find it useful!

[HOIHO] Luckie, M., Huffaker, B., Marder, A., Bischof, Z., Fletcher, M., and claffy, k., 2021. “Learning to Extract Geographic Information from Internet Router Hostnames.” ACM SIGCOMM Conference on emerging Networking EXperiments and Technologies (CoNEXT),

Unintended consequences of submarine cable deployment on Internet routing

Tuesday, December 15th, 2020 by Roderick Fanou

Figure 1: This picture shows a line of floating buoys that designate the path of the long-awaited SACS (South-Atlantic Cable System). This submarine cable now connects Angola to Brazil (Source: G Massala,, Feb 2018.)

The network layer of the Internet routes packets regardless of the underlying communication media (Wifi, cellular telephony, satellites, or optical fiber). The underlying physical infrastructure of the Internet includes a mesh of submarine cables, generally shared by network operators who purchase capacity from the cable owners [2,11]. As of late 2020, over 400 submarine cables interconnect continents worldwide and constitute the oceanic backbone of the Internet. Although they carry more than 99% of international traffic, little academic research has occurred to isolate end-to-end performance changes induced by their launch.

In mid-September 2018, Angola Cables (AC, AS37468) activated the SACS cable, the first trans-Atlantic cable traversing the Southern hemisphere [1][A1]. SACS connects Angola in Africa to Brazil in South America. Most assume that the deployment of undersea cables between continents improves Internet performance between the two continents. In our paper, “Unintended consequences: Effects of submarine cable deployment on Internet routing”, we shed empirical light on this hypothesis, by investigating the operational impact of SACS on Internet routing. We presented our results at the Passive and Active Measurement Conference (PAM) 2020, where the work received the best paper award [11,7,8]. We summarize the contributions of our study, including our methodology, data collection and key findings.

[A1]  Note that in the same year, Camtel (CM, AS15964), the incumbent operator of Cameroon, and China Unicom (CH, AS9800) deployed the 5,900km South Atlantic Inter Link (SAIL), which links Fortaleza to Kribi (Cameroon) [17], but this cable was not yet lit as of March 2020.


CAIDA’s Annual Report for 2017

Tuesday, May 29th, 2018 by kc

The CAIDA annual report summarizes CAIDA’s activities for 2017, in the areas of research, infrastructure, data collection and analysis. Our research projects span Internet topology, routing, security, economics, future Internet architectures, and policy. Our infrastructure, software development, and data sharing activities support measurement-based internet research, both at CAIDA and around the world, with focus on the health and integrity of the global Internet ecosystem. The executive summary is excerpted below:

Geolocation Terminology: Vantage Points, Landmarks, and Targets

Thursday, November 17th, 2016 by Bradley Huffaker

While reviewing a recent paper, it occurred to me there is a pretty serious nomenclature inconsistency across Internet measurement research papers that talk about geolocation. Specifically, the term landmark is not well-defined. Some literature uses the term landmark to refer to measurement infrastructure (e.g., nodes that source active measurements) in specific known geographic locations [Maziku2013,Komosny2015]. In other literature the same term refers to locations with known Internet identifiers — such as IP addresses — against which one collects calibration measurements [Arif2010,Wang2011,Hu2012,Eriksson2012,Chen2015].

In pursuit of clarity in our field, we recommend the following terms and definitions:

  • A Vantage Point (VP) is a measurement infrastructure node with a known geographic location.
  • A Landmark is a responsive Internet identifier with a known location to which the VP will launch a measurement that can serve to calibrate other measurements to potentially unknown geographic locations.
  • A Target is an Internet identifier whose location will be inferred from a given method. Depending on the type of identifier and inference methodology, this may not be a single well defined location. Typically, some targets have known geographic locations (ground truth), which researchers can use to evaluate the accuracy of their geolocation methodology.
  • A Location is a geographic place that geolocation techniques attempt to infer for a given target. Examples include cities and ISP Points of Presences (PoPs).

Not all papers need to use all terms. Below we depict a simple constraint-based geolocation algorithm to show how we understand these terms in practice.

A simple constraint-based geolocation algorithm.

A simple constraint-based geolocation algorithm.

[Potential useful resource, although not actively maintained: CAIDA’s Geolocation Bibliography]