bdrmap: Inference of Borders Between IP Networks

December 1st, 2016 by Josh Polterock

Matthew Luckie presented some recent topology research results of CAIDA’s NSF-funded Internet interconnection mapping project at the recent ACM 2016 Internet Measurement Conference in Santa Monica, CA. This measurement, data analysis, and software tool development project focused on automatic inference of network boundaries in traceroute. The paper explains why such a conceptually simple task is hard in the real world, and how lack of progress has impeded a wide range of research and development efforts for decades. We developed and validated a method that uses targeted traceroutes, knowledge of traceroute idiosyncrasies, and codification of topological constraints in a structured set of heuristics, to correctly identify interdomain links at the granularity of individual border routers. We limited our scope to those network boundaries we have most confidence we can accurately infer in the presence of inherent sampling bias: interdomain links attached to the network launching the traceroute. We developed a scalable implementation of our algorithm and validated it against ground truth information provided by four networks on 3,277 links, which showed 96.3% – 98.9% of our inferences for these links were correct. With 19 vantage points (VPs) distributed across a large U.S. broadband provider, we used our method to reveal the tremendous density of router-level interconnection between some ASes. For example, in January 2016, we observed 45 router-level links between one large U.S. broadband provider and one of its Tier-1 peers. We also quantified the VP deployment required to observe this ISP’s interdomain connectivity, finding that we needed 17 VPs to observe all 45 links. Our method forms the cornerstone of the system we are now building to map interdomain performance. We released our code as a new module of the open source scamper measurement tool.

Our approach begins with assembling routing and addressing data used to inform data collection and analysis. Then, we deploy an efficient variant of traceroute to trace the path from each VP to every routed prefix observed in the global BGP routing system. We apply alias resolution techniques to infer routers and point-to-point links used for interdomain interconnection. We then use this collected data to assemble constraints that guide our execution of heuristics to infer router ownership.

The data collected on the Ark infrastructure using this new methodology to construct an improved router-level map will provide input for development of applications and experiments in several research areas including studies on interdomain congestion, better solutions for AS path prediction and development of a reliable tool for AS-level traceroute.

This work was supported by NSF-funded grant: CNS-1414177, and the DHS S&T contracts N66001-12-C-0130 and HHSP 233201600012C.

BGPstream: a software framework for live and historical BGP data analysis

November 23rd, 2016 by kc

One of the three CAIDA papers presented at ACM’s 2016 Internet Measurement Conference this month punctuated years of work to develop an open-source software framework for the analysis of historical and real-time BGP (border gateway protocol, the Internet’s interdomain routing protocol) data. Although BGP is a crucial operational component of the Internet infrastructure, and is the subject of research in the areas of Internet performance, security, topology, protocols, economics, etc., until now there has been no efficient way of processing large amounts of distributed and/or live BGP measurement data. BGPStream fills this gap, enabling efficient investigation of events, rapid prototyping, and building complex tools large-scale monitoring applications (e.g., detection of connectivity disruptions or BGP hijacking attacks).

We released the BGPstream platform earlier in the year, and it has already served to support projects at three hackathons, starting with CAIDA’s First BGP hackathon in February 2016. Then in June 2016, John Kristoff led a project at NANOG’s first hackathon (June 2016); when he presented the project results he noted, “Ultimately, [BGPStream] was going to save us a tremendous amount of time because this provided us with an interface into routing data that CAIDA collects and aggregates from multiple places. That allowed us to build short pieces of code that would tie in pulling out information based on community tag or next hop address.” Most recently at the RIPE NCC IXP Tools Hackathon in October 2016, the Universal Looking Glass team based their analysis on BGPStream, and worked to add the BGP measurement data published by Packet Clearing House as a data source supported by BGPStream.

Other researchers have also already made use of BGPstream for Internet path prediction projects, including Sibyl: A Practical Internet Route Oracle (where they used BGPstream to extract AS paths for comparison against traceroute measurements), and PathCache: a path prediction toolkit.

The IMC paper describes the goals and architecture of BGPStream, and uses case studies to illustrate how to apply the components of the framework to different scenarios, including complex services for global Internet monitoring that we built on top of it.

It was particularly gratifying to hear the next speaker in the session at the conference begin his talk by saying that BGPstream would have made the work he was about to present a lot easier. That’s exactly the impact we hope BGPstream has on the community!

The work was supported by two NSF-funded grants: CNS-1228994 and CNS-1423659, and the DHS-funded contract N66001-12-C-0130.

Geolocation Terminology: Vantage Points, Landmarks, and Targets

November 17th, 2016 by Bradley Huffaker

While reviewing a recent paper, it occurred to me there is a pretty serious nomenclature inconsistency across Internet measurement research papers that talk about geolocation. Specifically, the term landmark is not well-defined. Some literature uses the term landmark to refer to measurement infrastructure (e.g., nodes that source active measurements) in specific known geographic locations [Maziku2013,Komosny2015]. In other literature the same term refers to locations with known Internet identifiers — such as IP addresses — against which one collects calibration measurements [Arif2010,Wang2011,Hu2012,Eriksson2012,Chen2015].

In pursuit of clarity in our field, we recommend the following terms and definitions:

  • A Vantage Point (VP) is a measurement infrastructure node with a known geographic location.
  • A Landmark is a responsive Internet identifier with a known location to which the VP will launch a measurement that can serve to calibrate other measurements to potentially unknown geographic locations.
  • A Target is an Internet identifier whose location will be inferred from a given method. Depending on the type of identifier and inference methodology, this may not be a single well defined location. Typically, some targets have known geographic locations (ground truth), which researchers can use to evaluate the accuracy of their geolocation methodology.
  • A Location is a geographic place that geolocation techniques attempt to infer for a given target. Examples include cities and ISP Points of Presences (PoPs).

Not all papers need to use all terms. Below we depict a simple constraint-based geolocation algorithm to show how we understand these terms in practice.

A simple constraint-based geolocation algorithm.

A simple constraint-based geolocation algorithm.

[Potential useful resource, although not actively maintained: CAIDA’s Geolocation Bibliography]

The Remote Peering Jedi

November 11th, 2016 by Josh Polterock

During the RIPE 73 IXP Tools Hackathon, Vasileios Giotsas, working with collaborators at FORTH/University of Crete, AMS-IX, University College, London, and NFT Consult, created the Remote Peering Jedi Tool to provide a view into the remote peering ecosystem. Given a large and diverse corpus of traceroute data, the tool detects and localizes remote peering at Internet Exchange Points (IXP).

To make informed decisions, researchers and operators desire to know who has remote peering at the various IXPs. For their RIPE hackathon project, the group created a tool to automate the detection using average RTTs from the RIPE Atlas’ massive corpus of traceroute paths. The group collected validation data from boxes inside the three large IXPs to compare to RTTs estimated via Atlas. The data suggests possible opportunities for Content Distribution Networks (CDN) to improve services for smaller IXPs. The project results also offer insights into how to interpret some of the information in PeeringDB. The project further examined how presence-informed RTT geolocation can contribute to identifying the location of resources. These results help reduce the problem space by exploiting the fact that the IP space of a given AS can appear where the AS has presence.

For more details, you can watch Vasileios’ presentation of the Remote Peering Jedi Tool. Or, visit the remote peering portal to see the tool in action.

remote-peering-jedi

NANOG68: PERISCOPE: Standardizing and Orchestrating Looking Glass Querying

November 4th, 2016 by Web Team

CAIDA’s Vasileios Giotsas had the opportunity to present PERISCOPE: Standardizing and Orchestrating Looking Glass Querying to the folks at NANOG68. The presentation covered his work on the Periscope Looking Glass API.

The work sets out to unify the heterogenous thousands of autonomously operated Looking Glass (LG) servers into a single unified standardized API for querying and executing experiments across the collective resource as a whole. From the beginning, we understood that while the hosting networks make these services public, usage policies varied and many LG services request clients rate limit their queries or impose rate limits and some forbid automated queries entirely. We do our best with Periscope administration to respect LG resources and implement conservative client rate limiting enforcing a per-user and per-LG rate limits. We identify our clients to provide transparency and accountability.

We believe the Periscope architecture brings several primary benefits. The LG data complements our current trace data and extends the topology coverage. It allows us to implement intelligent load design across all LG servers, uses caching to reduce the number of redundant queries, and makes more efficient use of the LG resources as a whole. Finally, Periscope improves troubleshooting capabilities (often the reason for supporting these services in the first place).

A webcast of the NANOG68 Periscope presentation is available, as well as the accompanying slideset presented at NANOG68.

Full paper:
V. Giotsas, A. Dhamdhere, and k. claffy, “Periscope: Unifying Looking Glass Querying“, in Passive and Active Network Measurement Workshop (PAM), Mar 2016.

Periscope Architecture v1.0

Periscope Architecture v1.0

This work was supported in part by the National Science Foundation, the DHS Science and Technology Directorate, Cyber Security Division (DHS S&T/CSD) and by Defence R&D Canada (DRDC).

Fantastic NSF PI meeting for Future Internet Architecture program

October 8th, 2016 by kc

I had the honor and pleasure of participating in a fantastic PI meeting last month — the National Science Foundation’s Future Internet Architecture (FIA) research program, 20-21 September 2014. As the formal FIA program winds down, NSF wants to maximize the opportunities for return on its investments into this program by helping connect principal investigators and researchers with other potential applied research and development funding sources. We are all well aware that, at least in the case of the NDN project (in which CAIDA participates), there are still huge open research challenges that will require years to conquer. But there are also tremendous opportunities to apply the ideas (and the code base) at this stage of the project’s evolution.

Much credit goes to John Wroclawski and Craig Partridge, who led the organization of this meeting. They arranged short presentations by seven federal agency representatives who outlined strategic interests of their agencies that were relevant to FIA technologies, and how to effectively engage those agencies: Stu Wagner (DARPA/I2O), Joe Evans (DARPA/STO), Mark Laurri (DARPA/MTO), Rich Carlson (DOE SC-ACSR), Dan Massey (DHS S&T), Kevin Thompson (NSF), and Doug Montgomery (NIST). They each provided a view of what their programs are, guidelines for how to propose ideas to their agency, links to recent funding opportunities, and answers to any questions we had.

This firehose-of-information session was followed by lunch and then breakouts to prepare pitches to friendly external respondents for feedback and discussion. Each respondent brought broad experience with non-NSF government funding across agencies and technical areas. The FIA researchers got some priceless preparation from some of the best and brightest in the federal funding community. The next challenge for FIA PIs is to convince some of them to participate in the next round of investment into FIA research ideas and technologies. Kudos to NSF and to John and Craig for great assistance with this goal.

CRA Congressional visit to Washington D.C.

September 27th, 2016 by kc

As part of a Computing Research Association (CRA) effort to introduce policymakers to the contributions and power of IT research for the nation and the world, this month I had the honor of visiting with the offices of four U.S. senators and a U.S. Representative:

Internet-specific topics I discussed included the importance of scientific measurement infrastructure to support empirical network and security research, broadband policy, and Internet governance.

We left them with a terrific infographic from the National Academy study “Continuing Innovation in Information Technology“, which shows the economic impact of different areas of fundamental IT research. The 2-pager flyer and the whole National Academy report, Depicting Innovation in Information Technology, is available on the National Academies of Science, Engineering, and Medicine Computer Science Telecommunications Board (CSTB) site.
Continuing Innovation in Information Technology

Even with many folks in Congress having a higher priority of passing a budget and getting back home to their districts to prepare for elections, all the staffers were gracious and genuinely interested in our field. (Who wouldn’t be? 😉 )

Kudos to the Computing Research Association for providing a wonderful opportunity to engage with policy folks.

Adding geographic annotations to ISP interconnects

September 20th, 2016 by Bradley Huffaker
AS links  annotated geographic locations.

Geographic annotations on AS links.

The Internet arises from the interconnection of thousands of independently operated networks. Its structure is often modeled as a collection of Autonomous Systems (ASes), nodes, exchanging traffic across interconnects, links. These models are reductive by nature, with large international organizations made up of thousands of machines and cables reduced to a single node, and multiple exchange points reduced to a single link.

We extended this model with the introduction of geographic locations attached to links between ISPs, represented by ASes. This extension maintains the simple node and link structure of the AS graph, and allows us to capture some of the geographic complexity in the topology.

AS graphic with geographic locations.

AS graphic with geographic locations.

Consider the path from UCSD to U.Washington depicted in the illustration above. Level 3 has two possible paths: Level 3 ➡ Cogent ➡ U.Wash and Level 3 ➡ NTT ➡ U.Wash. Both paths have the same AS path length. Assuming Level 3 uses hot-potato routing, in order to spend as little money on carrying traffic as possible, it transfers the traffic as soon as possible onto another provider. In this example, NTT’s Los Angeles connection is closer to San Diego than Cogent’s Las Vegas connection, so Level 3 chooses to route the traffic through NTT.

AS links path

In addition to supporting research on path prediction, these type of geographic annotations of links can provide a more realistic indication of the network’s resilience to link failure. In the figure below, duplicate links between ASes reflect multiple interconnects between ASes. e.g., this figure implies that a single link failure would disconnect UCSD from Level 3, while three links would have to fail for Level 3 and NTT to become disconnected.

 Shows multiple links between ASes that connect in multiple locations.

Shows multiple links between ASes that connect in multiple locations.

Details on our geographic link annotation methods and this data is available at CAIDA’s AS Relationships with geographic annotations page.

NSF WATCH series talk: Mapping Internet Interdomain Congestion

August 26th, 2016 by kc

Last week I gave a talk at NSF’s 39th Washington Area Trustworthy Computing Hour (WATCH) seminar series on CAIDA’s efforts to map internet interdomain congestion. A recorded webcast of the talk is available.

Abstract:

We used the Ark infrastructure to support an ambitious collaboration with MIT to map the rich mesh of interconnection in the Internet, with a focus on congestion induced by evolving peering and traffic management practices of CDNs and access ISPs, including methods to detect and localize the congestion to specific points in networks. We undertook several studies to pursue two dimensions of this challenge. First, we developed methods and tools to identify interconnection borders, and in some cases their physical locations, from comprehensive Internet topology measurements from many edge vantage points. Then, we developed and deployed scalable performance measurement tools to observe performance at thousands of interconnections, algorithms to mine for evidence of persistent congestion in the resulting data; and a system to visualize the results. We produce other related data collection and analysis to enable evaluation of these measurements in the larger context of the evolving ecosystem: quantifying a given network service providers’ global routing footprint; and business-related classifications of networks. In parallel, we examined the peering ecosystem from an economic perspective, exploring fundamental weaknesses and systemic problems of the currently deployed economic framework of Internet interconnection that will continue to cause peering disputes between ASes.

The slides presented are posted on the CAIDA website: Mapping Internet Interdomain Congestion

CAIDA as Independent Measurement Expert for AT&T

August 18th, 2016 by kc

On August 6, 2016, AT&T sent a letter to the FCC regarding Applications of AT&T Inc. and DIRECTV for Consent To Assign or Transfer Control of Licenses and Authorizations, MB Docket No. 14-90 reporting that an amended version of CAIDA’s proposed methodology as an independent measurement expert of AT&T’s interconnection performance has been accepted by AT&T to address the concerns that AT&T had with the original proposed methodology.

The amended report, First Amended Report of AT&T Independent Measurement Expert: Reporting requirements and measurement methods is available online, along with the justification for the amendment.

CAIDA’s work with AT&T is found on CAIDA’s Measuring Internet Interconnection Performance Metrics page.