CAIDA’s 2016 Annual Report

May 9th, 2017 by kc

[Executive summary and link below]

The CAIDA annual report summarizes CAIDA’s activities for 2016, in the areas of research, infrastructure, data collection and analysis. Our research projects span Internet topology, routing, security, economics, future Internet architectures, and policy. Our infrastructure, software development, and data sharing activities support measurement-based internet research, both at CAIDA and around the world, with focus on the health and integrity of the global Internet ecosystem. The executive summary is excerpted below:

Mapping the Internet. We continued to expand our topology mapping capabilities using our Ark measurement infrastructure. We improved the accuracy and sophistication of our topology annotations, including classification of ISPs, business relationships between them, and geographic mapping of interdomain links that implement these relationships. We released two Internet Topology Data Kits (ITDKs) incorporating these advances.

Mapping Interconnection Connectivity and Congestion. We continued our collaboration with MIT to map the rich mesh of interconnection in the Internet in order to study congestion induced by evolving peering and traffic management practices of CDNs and access ISPs. We focused our efforts on the challenge of detecting and localizing congestion to specific points in between networks. We developed new tools to scale measurements to a much wider set of available nodes. We also implemented a new database and graphing platform to allow us to interactively explore our topology and performance measurements. We produced related data collection and analyses to enable evaluation of these measurements in the larger context of the evolving ecosystem: infrastructure resiliency, economic tussles, and public policy.

Monitoring Global Internet Security and Stability. We conducted infrastructure research and development projects that focus on security and stability aspects of the global Internet. We developed continuous fine-grained monitoring capabilities establishing a baseline connectivity awareness against which to interpret observed changes due to network outages or route hijacks. We released (in beta form) a new operational prototype service that monitors the Internet, in near-real-time, and helps identify macroscopic Internet outages affecting the edge of the network.

CAIDA also developed new client tools for measuring IPv4 and IPv6 spoofing capabilities, along with services that provide reporting and allow users to opt-in or out of sharing the data publicly.

Future Internet Architectures. We continued studies of IPv4 and IPv6 paths in the Internet, including topological congruency, stability, and RTT performance. We examined the state of security policies in IPv6 networks, and collaborated to measure CGN deployment in U.S. broadband networks. We also continued our collaboration with researchers at several other universities to advance development of a new Internet architecture: Named Data Networking (NDN) and published a paper on the policy and social implications of an NDN-based Internet.

Public Policy. Acting as an Independent Measurement Expert, we posted our agreed-upon revised methodology for measurement methods and reporting requirements related to AT&T Inc. and DirecTV merger (MB Docket No. 14-90). We published our proposed method and a companion justification document. Inspired by this experience and a range of contradicting claims about interconnection performance, we introduced a new model describing measurements of interconnection links of access providers, and demonstrated how it can guide sound interpretation of interconnection-related measurements regardless of their source.

Infrastructure operations. It was an unprecedented year for CAIDA from an infrastructure development perspective. We continued support for our existing active and passive measurement infrastructure to provide visibility into global Internet behavior, and associated software tools and platforms that facilitate network research and operational assessments.

We made available several data services that have been years in the making: our prototype Internet Outage Detection and Analysis service, with several underlying components released as open source; the Periscope platform to unify and scale querying of thousands of looking glass nodes on the global Internet; our large-scale Internet topology query system (Henya); and our Spoofer system for measurement and analysis of source address validation across the global Internet. Unfortunately, due to continual network upgrades, we lost access to our 10GB backbone traffic monitoring infrastructure. Now we are considering approaches to acquire new monitors capable of packet capture on 100GB links.

As always, we engaged in a variety of tool development, and outreach activities, including maintaining web sites, publishing 13 peer-reviewed papers, 3 technical reports, 4 workshop reports, one (our first) BGP hackathon report, 31 presentations, 20 blog entries, and hosting 6 workshops (including the hackathon). This report summarizes the status of our activities; details about our research are available in papers, presentations, and interactive resources on our web sites. We also provide listings and links to software tools and data sets shared, and statistics reflecting their usage. Finally, we report on web site usage, personnel, and financial information, to provide the public a better idea of what CAIDA is and does.

For the full 2016 annual report, see

Response to RFI for Future Needs for Advanced Cyberinfrastructure to Support Science and Engineering Research

April 18th, 2017 by kc

I sent the following to NSF in response to a recent Request for Information (RFI) for Future Needs for Advanced Cyberinfrastructure to Support Science and Engineering Research. (The format required an abstract and answers to 3 specific questions.)


As the Internet and our dependence on it have grown, the structure and dynamics of the network, and how it relates to the political economy in which it is embedded, have gathered increasing attention by researchers, operators and policy makers. All of these stakeholders bring questions that they lack the capability to answer themselves. Epistemological challenges lie in developing and deploying measurement instrumentation and protocols, expertise required to soundly interpret and use complex data, lack of tools to synthesize different sources of data to reveal insights, data management cost and complexity, and privacy issues. Although a few interdisciplinary projects have succeeded, the current mode of collaboration simply does not scale to the exploding interest in scientific study of the Internet, nor to complex and visionary scientific uses of CAIDA’s data by non-networking experts. We believe the community needs a new shared cyberinfrastructure resource that integrates active Internet measurement capabilities, multi-terabyte data archives, live data streams, heavily curated topology data sets revealing coverage and business relationships, and traffic measurements. Such a resource would enable a broad set of researchers to pursure new scientific directions, experiments, and data products that promote valid interpretations of data and derived inferences.

Read the rest of this entry »

Why IP source address spoofing is a problem and how you can help.

March 24th, 2017 by Bradley Huffaker

information: Software Systems for Surveying Spoofing Susceptibility

This material is based on research sponsored by the Department of Homeland Security (DHS) Science and Technology Directorate, Homeland Security Advanced Research Projects Agency, Cyber Security Division (DHS S&T/HSARPA/CSD) BAA HSHQDC-14-R-B0005, and the Government of United Kingdom of Great Britain and Northern Ireland via contract number D15PC00188. Views should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Department of Homeland Security, the U.S. Government, or the Government of United Kingdom of Great Britain and Northern Ireland.

Help save the Internet: Install the new Spoofer client (v1.1.0)!

December 18th, 2016 by Josh Polterock

The greatest security vulnerability of the Internet (TCP/IP) architecture is the lack of source address validation, i.e., any sender may put a fake source address in a packet, and the destination-based routing protocols that glue together the global Internet will get that packet to its intended destination. Attackers exploit this vulnerability by sending many (millions of) spoofed-source-address packets to services on the Internet they wish to disrupt (or take offline altogether). Attackers can further leverage intermediate servers to amplify such packets into even larger packets that will cause greater disruption for the same effort on the attacker’s part.

Although the IETF recommended best practices to mitigate this vulnerability by configuring routers to validate that source addresses in packets are legitimate, compliance with such practices (BCP38 and BCP84) are notoriously incentive-incompatible. That is, source address validation (SAV) can be a burden to a network who supports it, but its deployment by definition helps not that network but other networks who are thus protected from spoofed-source attacks from that network. Nonetheless, any network who does not deploy BCP38 is “part of the DDoS problem”.

Over the past several months, CAIDA, in collaboration with Matthew Luckie at the University of Waikato, has upgraded Rob Beverly’s original spoofing measurement system, developing new client tools for measuring IPv4 and IPv6 spoofing capabilities, along with services that provide reporting and allow users to opt-in or out of sharing the data publicly. To find out if your network provider(s), or any network you are visiting, implements filtering and allow IP spoofing, point your web browser at and install our simple client.

This newly released spoofer v1.1.0 client has implemented parallel probing of targets, providing a 5x increase in speed to complete the test, relative to v.1.0. Among other changes, this new prober uses scamper instead of traceroute when possible, and has improved display of results. The installer for Microsoft Windows now configures Windows Firewall.

For more technical details about the problem of IP spoofing and our approach to measurement, reporting, notifications and remediation, see the slides from Matthew Luckie’s recent slideset, “Software Systems for Surveying Spoofing Susceptibility”, presented to the Australian Network Operators Group (AusNOG) in September 2016.

The project web page reports recently run tests from clients willing to share data publicly, test results classified by Autonomous System (AS) and by country, and a summary statistics of IP spoofing over time. We will enhance these reports over the coming months.

This material is based on research sponsored by the Department of Homeland Security (DHS) Science and Technology Directorate, Homeland Security Advanced Research Projects Agency, Cyber Security Division (DHS S&T/HSARPA/CSD) BAA HSHQDC-14-R-B0005, and the Government of United Kingdom of Great Britain and Northern Ireland via contract number D15PC00188. Views should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Department of Homeland Security, the U.S. Government, or the Government of United Kingdom of Great Britain and Northern Ireland.

Henya: Large-Scale Internet Topology Query System

December 17th, 2016 by Josh Polterock

CAIDA’s Internet topology mapping experiment running on our Ark infrastructure has collected traceroute-like measurements of the Internet from nodes hosted in academic, commercial, transit, and residential networks around the globe since September 2007. Discovery of the full potential value of this raw data is best served by a rich, easy-to-use interactive exploratory interface. We have implemented a web-based query interface — henya — to allow researchers to find the most relevant data for their research, such as all traceroutes through a given region and time period toward/across a particular prefix/AS.

We hope that Henya’s large-scale topology query system will become a powerful tool in the researcher’s toolbox for remotely searching CAIDA’s traceroute data. Built-in analysis and visualization modules (still under development) will facilitate our understanding of route and prefix hijacking events as well as provide us with the means to conduct longitudinal analysis. Below we show a screenshot of Henya’s query interface, but Young Hyun, Henya’s creator, also created a useful short introduction video.


(A note about the name Henya: Jeju, an island off the coast of South Korea, has a long history of free diving. Over the past few centuries, skilled women divers called haenyeo (pronounced “HEN-yuh”, literally “sea women”) earned their living harvesting and selling various sea-life. Highly-trained haenyeo can dive up to 30 meters deep and can hold their breath for over three minutes. Through this tiring and dangerous work, these women became the breadwinners of their families. Sadly the tradition has nearly died out, with only a few thousand practitioners left, nearly all of the them elderly. Free diving is an inspiring metaphor for data querying, and we thought this name would serve to honor this dying tradition by preserving the name in the topo-query system.)

The work was funded by the Department of Homeland Security (DHS) Science and Technology Directorate, Cyber Security Division DHS S&T/CSD) Broad Agency Announcement 11-02 and SPAWAR Systems Center Pacific via contract number N66001-12-C-0130, and by Research and Development Canada (DRDC) pursuant to an Agreement between the U.S. and Canadian governments for Cooperation in and Technology for Critical Infrastructure Protection and Border Security. The work represents the position of the authors and necessarily that of DHS or DRDC.

bdrmap: Inference of Borders Between IP Networks

December 1st, 2016 by Josh Polterock

Matthew Luckie presented some recent topology research results of CAIDA’s NSF-funded Internet interconnection mapping project at the recent ACM 2016 Internet Measurement Conference in Santa Monica, CA. This measurement, data analysis, and software tool development project focused on automatic inference of network boundaries in traceroute. The paper explains why such a conceptually simple task is hard in the real world, and how lack of progress has impeded a wide range of research and development efforts for decades. We developed and validated a method that uses targeted traceroutes, knowledge of traceroute idiosyncrasies, and codification of topological constraints in a structured set of heuristics, to correctly identify interdomain links at the granularity of individual border routers. We limited our scope to those network boundaries we have most confidence we can accurately infer in the presence of inherent sampling bias: interdomain links attached to the network launching the traceroute. We developed a scalable implementation of our algorithm and validated it against ground truth information provided by four networks on 3,277 links, which showed 96.3% – 98.9% of our inferences for these links were correct. With 19 vantage points (VPs) distributed across a large U.S. broadband provider, we used our method to reveal the tremendous density of router-level interconnection between some ASes. For example, in January 2016, we observed 45 router-level links between one large U.S. broadband provider and one of its Tier-1 peers. We also quantified the VP deployment required to observe this ISP’s interdomain connectivity, finding that we needed 17 VPs to observe all 45 links. Our method forms the cornerstone of the system we are now building to map interdomain performance. We released our code as a new module of the open source scamper measurement tool.

Our approach begins with assembling routing and addressing data used to inform data collection and analysis. Then, we deploy an efficient variant of traceroute to trace the path from each VP to every routed prefix observed in the global BGP routing system. We apply alias resolution techniques to infer routers and point-to-point links used for interdomain interconnection. We then use this collected data to assemble constraints that guide our execution of heuristics to infer router ownership.

The data collected on the Ark infrastructure using this new methodology to construct an improved router-level map will provide input for development of applications and experiments in several research areas including studies on interdomain congestion, better solutions for AS path prediction and development of a reliable tool for AS-level traceroute.

This work was supported by NSF-funded grant: CNS-1414177, and the DHS S&T contracts N66001-12-C-0130 and HHSP 233201600012C.

BGPstream: a software framework for live and historical BGP data analysis

November 23rd, 2016 by kc

One of the three CAIDA papers presented at ACM’s 2016 Internet Measurement Conference this month punctuated years of work to develop an open-source software framework for the analysis of historical and real-time BGP (border gateway protocol, the Internet’s interdomain routing protocol) data. Although BGP is a crucial operational component of the Internet infrastructure, and is the subject of research in the areas of Internet performance, security, topology, protocols, economics, etc., until now there has been no efficient way of processing large amounts of distributed and/or live BGP measurement data. BGPStream fills this gap, enabling efficient investigation of events, rapid prototyping, and building complex tools large-scale monitoring applications (e.g., detection of connectivity disruptions or BGP hijacking attacks).

We released the BGPstream platform earlier in the year, and it has already served to support projects at three hackathons, starting with CAIDA’s First BGP hackathon in February 2016. Then in June 2016, John Kristoff led a project at NANOG’s first hackathon (June 2016); when he presented the project results he noted, “Ultimately, [BGPStream] was going to save us a tremendous amount of time because this provided us with an interface into routing data that CAIDA collects and aggregates from multiple places. That allowed us to build short pieces of code that would tie in pulling out information based on community tag or next hop address.” Most recently at the RIPE NCC IXP Tools Hackathon in October 2016, the Universal Looking Glass team based their analysis on BGPStream, and worked to add the BGP measurement data published by Packet Clearing House as a data source supported by BGPStream.

Other researchers have also already made use of BGPstream for Internet path prediction projects, including Sibyl: A Practical Internet Route Oracle (where they used BGPstream to extract AS paths for comparison against traceroute measurements), and PathCache: a path prediction toolkit.

The IMC paper describes the goals and architecture of BGPStream, and uses case studies to illustrate how to apply the components of the framework to different scenarios, including complex services for global Internet monitoring that we built on top of it.

It was particularly gratifying to hear the next speaker in the session at the conference begin his talk by saying that BGPstream would have made the work he was about to present a lot easier. That’s exactly the impact we hope BGPstream has on the community!

The work was supported by two NSF-funded grants: CNS-1228994 and CNS-1423659, and the DHS-funded contract N66001-12-C-0130.

Geolocation Terminology: Vantage Points, Landmarks, and Targets

November 17th, 2016 by Bradley Huffaker

While reviewing a recent paper, it occurred to me there is a pretty serious nomenclature inconsistency across Internet measurement research papers that talk about geolocation. Specifically, the term landmark is not well-defined. Some literature uses the term landmark to refer to measurement infrastructure (e.g., nodes that source active measurements) in specific known geographic locations [Maziku2013,Komosny2015]. In other literature the same term refers to locations with known Internet identifiers — such as IP addresses — against which one collects calibration measurements [Arif2010,Wang2011,Hu2012,Eriksson2012,Chen2015].

In pursuit of clarity in our field, we recommend the following terms and definitions:

  • A Vantage Point (VP) is a measurement infrastructure node with a known geographic location.
  • A Landmark is a responsive Internet identifier with a known location to which the VP will launch a measurement that can serve to calibrate other measurements to potentially unknown geographic locations.
  • A Target is an Internet identifier whose location will be inferred from a given method. Depending on the type of identifier and inference methodology, this may not be a single well defined location. Typically, some targets have known geographic locations (ground truth), which researchers can use to evaluate the accuracy of their geolocation methodology.
  • A Location is a geographic place that geolocation techniques attempt to infer for a given target. Examples include cities and ISP Points of Presences (PoPs).

Not all papers need to use all terms. Below we depict a simple constraint-based geolocation algorithm to show how we understand these terms in practice.

A simple constraint-based geolocation algorithm.

A simple constraint-based geolocation algorithm.

[Potential useful resource, although not actively maintained: CAIDA’s Geolocation Bibliography]

The Remote Peering Jedi

November 11th, 2016 by Josh Polterock

During the RIPE 73 IXP Tools Hackathon, Vasileios Giotsas, working with collaborators at FORTH/University of Crete, AMS-IX, University College, London, and NFT Consult, created the Remote Peering Jedi Tool to provide a view into the remote peering ecosystem. Given a large and diverse corpus of traceroute data, the tool detects and localizes remote peering at Internet Exchange Points (IXP).

To make informed decisions, researchers and operators desire to know who has remote peering at the various IXPs. For their RIPE hackathon project, the group created a tool to automate the detection using average RTTs from the RIPE Atlas’ massive corpus of traceroute paths. The group collected validation data from boxes inside the three large IXPs to compare to RTTs estimated via Atlas. The data suggests possible opportunities for Content Distribution Networks (CDN) to improve services for smaller IXPs. The project results also offer insights into how to interpret some of the information in PeeringDB. The project further examined how presence-informed RTT geolocation can contribute to identifying the location of resources. These results help reduce the problem space by exploiting the fact that the IP space of a given AS can appear where the AS has presence.

For more details, you can watch Vasileios’ presentation of the Remote Peering Jedi Tool. Or, visit the remote peering portal to see the tool in action.


NANOG68: PERISCOPE: Standardizing and Orchestrating Looking Glass Querying

November 4th, 2016 by Vasileios Giotsas

CAIDA’s Vasileios Giotsas had the opportunity to present PERISCOPE: Standardizing and Orchestrating Looking Glass Querying to the folks at NANOG68. The presentation covered his work on the Periscope Looking Glass API.

The work sets out to unify the heterogenous thousands of autonomously operated Looking Glass (LG) servers into a single unified standardized API for querying and executing experiments across the collective resource as a whole. From the beginning, we understood that while the hosting networks make these services public, usage policies varied and many LG services request clients rate limit their queries or impose rate limits and some forbid automated queries entirely. We do our best with Periscope administration to respect LG resources and implement conservative client rate limiting enforcing a per-user and per-LG rate limits. We identify our clients to provide transparency and accountability.

We believe the Periscope architecture brings several primary benefits. The LG data complements our current trace data and extends the topology coverage. It allows us to implement intelligent load design across all LG servers, uses caching to reduce the number of redundant queries, and makes more efficient use of the LG resources as a whole. Finally, Periscope improves troubleshooting capabilities (often the reason for supporting these services in the first place).

A webcast of the NANOG68 Periscope presentation is available, as well as the accompanying slideset presented at NANOG68.

Full paper:
V. Giotsas, A. Dhamdhere, and k. claffy, “Periscope: Unifying Looking Glass Querying“, in Passive and Active Network Measurement Workshop (PAM), Mar 2016.

Periscope Architecture v1.0

Periscope Architecture v1.0

This work was supported in part by the National Science Foundation, the DHS Science and Technology Directorate, Cyber Security Division (DHS S&T/CSD) and by Defence R&D Canada (DRDC).