IPv6 adoption as seen from an Internet backbone link

May 29th, 2018 by Paul Hick and Josh Polterock

For the last ten years (with some gaps due to network upgrades), CAIDA has captured monthly traffic samples on Internet backbone links in several large U.S[ cities (San Jose, Chicago, and since March this year, New York City).
We publish statistics for these traces at http://www.caida.org/data/passive/trace_stats/, which illustrates the growth in IPv6 traffic, relative to IPv4. Over the 10-year period covered by our traffic captures, the increase follows a steady exponential trend (linear on a log-lin graph), increasing 10-fold every 3 years. Currently the IPv6 fraction hovers around 1%. Were this trend to continue, the ratios would be roughly 50% each around October 2022 (for packets) September 2023 (for bytes). The byte fraction increases more slowly, reflecting a slightly smaller average IPv6 packet size compared to IPv4.

IPv6 Traffic Seen on a Backbone Link

We are not making any predictions, and note that CGN deployment is also increasing rapidly. We are just reporting the best available data we have.

CAIDA’s Program Plan 2018-2023

May 29th, 2018 by kc

We finally published our new Program Plan for 2018-2023. (Previous program plans are at http://www.caida.org/home/about/progplan.) Executive summary below:

For the last 20 years UC San Diego’s Center for Applied Internet Data Analysis (CAIDA) has been developing data-focused services, products, tools and resources to advance the study of the Internet, which has permeated disciplines ranging from theoretical computer science to political science, from physics to tech law, and from network architecture to public policy. As the Internet and our dependence on it have grown, the structure and dynamics of the network, and how it relates to the political economy in which it is embedded, is gathering increasing attention by researchers, operators and policy makers, all of whom bring questions that they lack the capability to answer themselves. CAIDA has spent years cultivating relationships across disciplines (networking, security, economics, law, policy) with those interested in CAIDA data, but the impact thus far has been limited to a handful of researchers. The current mode of collaboration simply does not scale to the exploding interest in scientific study of the Internet.

On a more operational dimension, large-scale Internet cyber-attacks and incidents — route hijacking, network outages, fishing campaigns, botnet activities, large-scale bug exploitation, etc. — represent a major threat to public safety and to both public and private strategic and financial assets. Mitigation and recovery, as well as prevention of further attacks of similar nature, are often impeded by the fact that such events can remain unnoticed or are hard to understand and characterize. Because of their macroscopic nature, identifying such events and understanding their scope and dynamics requires: (a) combining data of different type and origin; and (b) teamwork of experts with varied background and skills; (c) agile tools for rapid, cooperative, interactive analysis.

These two infrastructure research challenges will require high performance research infrastructure, and CAIDA will embark on a new stage in our infrastructure development endeavors to support these challenges, re-using and sharing software and data components wherever possible. We will integrate existing as well as develop new measurement and analysis components and capabilities into interactive online platforms, accessible via web interfaces as well as APIs. These novel developments will enable researchers from various disciplines including non-networking experts to access and productively use Internet data, thus advancing more complex and visionary scientific studies of the Internet ecosystem. We hope these efforts will enable us and others to widen access to and utility of the best possible Internet measurement data available to research, operational, and policy communities worldwide.

On the research side, we will continue our Internet cartography efforts, improving our IPv4 and IPv6 topology mapping capabilities, and our ability to measure and analyze interdomain congestion. We will also continue development our of Internet Topology Data Kit (ITDK) data sets, but shift our focus to simplified versions of the data and visual interfaces that are easier for researchers to use. We will undertake a new project that studies topological weaknesses from a nation-state security and stability perspective. We will explore implications of these analysis for network resiliency, economics, and policy. Among our new collaborations is an interdisciplinary project to model and design an ecosystem for market-mediated software defined communications infrastructure at the wireless edge. And in the intersection between research and infrastructure, we will start a new research project that explores an ambitious new way of designing measurement infrastructure platforms to facilitate broader deployment and sharing of nodes across scientific experimenters.

As always, we will lead and participate in tool development to support measurement, analysis, indexing, and dissemination of data from operational global Internet infrastructure. Our outreach activities will include peer-reviewed papers, workshops, blogging, presentations, educational videos, and technical reports.

Note that not all of the activities described in this program plan are fully funded yet; we are seeking additional support to enable us to accomplish our ambitious agenda.


Complete program plan for 2018-2023 at: http://www.caida.org/home/about/progplan/progplan2018/.

CAIDA’s Annual Report for 2017

May 29th, 2018 by kc

The CAIDA annual report summarizes CAIDA’s activities for 2017, in the areas of research, infrastructure, data collection and analysis. Our research projects span Internet topology, routing, security, economics, future Internet architectures, and policy. Our infrastructure, software development, and data sharing activities support measurement-based internet research, both at CAIDA and around the world, with focus on the health and integrity of the global Internet ecosystem. The executive summary is excerpted below:
Read the rest of this entry »

TCP Congestion Signatures

February 6th, 2018 by Srikanth Sundaresan

Roadsign: TCP Congestion Ahead

Congestion in the Internet is an age-old problem. With the rise of broadband networks, it had been implicitly accepted that congestion is most likely to occur in the ‘last mile’, that is, the broadband link between the ISP and the home customer. This is due to service plans or technical factors that limit the bandwidth in the last mile.

However, two developments have challenged this assumption: the improvement in broadband access speeds, and the exponential growth in video traffic.

Video traffic now consumes a significant fraction of bandwidth even in transit networks, to the extent that interconnection points between major networks can also be potential sources of congestion. A case in point is the widespread interconnection congestion reported between transit network Cogent and several US access ISPs, in 2014.

It is therefore important to understand where congestion occurs—if it occurs in the last mile, then users are limited by their service plan, and if it occurs elsewhere, they are limited by forces outside of their control.

Although there are many TCP forensic tools available, ranging from simple speed tests to more sophisticated diagnostic tools, they do not give information beyond available throughput or that the flow was limited by congestion or other factors such as latency.

Using TCP RTT to distinguish congestion types

In our paper ‘TCP Congestion Signatures‘, which we recently presented at the 2017 Internet Measurement Conference, we developed and validated techniques to identify whether a TCP flow was bottlenecked by:

  • (i) an initially unconstrained path (that the connection then fills), or
  • (ii) an already congested path.

Our method works without prior knowledge about the path, for example, the capacity of its bottleneck link. As a specific application of this general method, the technique can distinguish congestion experienced on interconnection links from congestion that naturally occurs when a last-mile link is filled to capacity. In TCP terms, we re-articulate the question: was a TCP flow bottlenecked by an already congested (possibly interconnect) link, or did it induce congestion in an otherwise lightly loaded (possibly a last-mile) link?

We use simple intuition based on TCP dynamics to answer this question: TCP’s congestion control mechanism affects the round-trip time (RTT) of packets in the flow. In particular, as TCP scales up to occupy a link that is initially lightly loaded, it gradually fills up the buffer at the head of that link, which in turn increases the flow’s RTT. This effect is most pronounced during the initial slow start period, as the flow throughput increases from zero.

On the contrary, for links that are operating at close to capacity, the buffer at the bottleneck is already occupied, and consequently the new TCP flow’s congestion control does not have a measurable impact on the RTT. In this case, the RTT is more or less constant over the duration of the TCP flow.

We identify two parameters based on flow RTT during TCP slow start that we use to distinguish these two cases: the coefficient of variation and the normalized difference between the minimum and maximum RTT. We feed these two parameters, which can be easily estimated for TCP flows, into a simple decision tree classifier. The figures below shows a simple example of these two metrics for a controlled experiment.

Graph

Figure 1. This figure shows the coefficient of variation of packet RTTs during slow start. Flows that are affected by self-induced congestion have higher coefficient of variation than those affected by external congestion.

Graph

Figure 2. This figure shows the difference between the maximum and minimum RTT of packets during slow start for flows that are affected by self-induced congestion (blue) and those affected by external congestion (red). Self-induced congestion causes a larger difference in the RTT.

For this experiment we set up an emulated ‘access’ link with a bandwidth of 20 Mbps and 100 ms buffer, and an ‘interconnect’ link of bandwidth 1 Gbps with a 50 ms buffer. We run throughput tests over the links under two conditions: when the interconnect link is busy (it becomes the bottleneck) and when it is not (the access link becomes the bottleneck), and compute the two metrics for the test flows.

The figures show the cumulative distribution function of the two parameters over 50 runs of the experiment. We see that the two cases are clearly distinguishable: both the coefficient of variation and the difference metrics are significantly higher for the case where the access link is the bottleneck.

We validate our techniques using a variety of controlled experiments and real-world datasets, including data from the Measurement Lab platform during and after the interconnection congestion episode between Cogent and various ISPs in early 2014 — for this case we show that the technique distinguishes the two cases of congestion with high accuracy.

Read TCP Congestion Signatures for more details on the experiment.

Uses and Limitations

Our technique distinguishes between self-induced congestion versus externally induced congestion and can be implemented by content providers (for example, video streaming services and speed test providers). The provider would only need to configure the servers to measure the TCP flow during slow start. While we currently use packet captures to extract the metrics we need, we are exploring lighter-weight techniques that require fewer resources.

Implementing such a capability would help a variety of stakeholders. Users would understand more about what limits the performance they experience, content providers could design better solutions to alleviate the effects of congestion, and regulators of the peering ecosystem could rule out consideration of issues where customers are limited by their own contracted service plan.

In terms of limitations, our technique depends on the existence of buffers that influence RTTs, and TCP variants that attempt to fill those buffers. Newer congestion control variants such as BBR that base their congestion management on RTT (and try to reduce buffering delays) may confound the method; we plan to study this, as well as how such congestion control mechanisms interact with older TCP variants, in future work.

Contributors: Amogh Dhamdhere, Mark Allman and kc Claffy

Srikanth Sundaresan’s research interests are in the design and evaluation of networked systems and applications. This work is based on a research paper written when he was at Princeton University. He is currently a software engineer at Facebook.

AS Rank (updated!)

January 16th, 2018 by Bradley Huffaker

CAIDA welcomes the new year with an update to one of our flagship services, http://as-rank.caida.org/. As part of our new NSF project “DIBBs: Integrated Platform for Applied Network Data Analysis (PANDA)” we will offer researchers more accessible calibrated user-friendly tools for collecting, analyzing, querying, and interpreting measurements of the Internet ecosystem. Our razing and redesign of the AS Rank v1 service represents the beginning of efforts to build toward this new platform. For this update we did a complete redesign with entirely new backend database (redis), web application framework (Symfony), and front-end web development environment (Bootstrap 4). The redesigned service focuses on optimizing query efficiency to serve a larger user population (We would like to have the capability to support concurrent queries from 30+ students in a classroom.) It also focuses on getting the data to researchers in a useful format (JSON) via a new programmatic interface to the AS Rank data. To see the details, check out the new RESTFUL API documentation at http://as-rank.caida.org/api/v1. September 2019 Update: AS Rank version 2 no longer uses the RESTFUL API.

Those who remember the old service may know the performance challenges we experienced and so understand the need to start fresh. The decision to start from scratch means that we will need to reimplement quite a number of features to get back to the full functionality provided by the previous server. We plan to expand its features over the coming months. Please send your ideas for features you would find especially useful to asrank-feedback@caida.org.

New and improved AS Rank.

 

CAIDA’s 2016 Annual Report

May 9th, 2017 by kc

[Executive summary and link below]

The CAIDA annual report summarizes CAIDA’s activities for 2016, in the areas of research, infrastructure, data collection and analysis. Our research projects span Internet topology, routing, security, economics, future Internet architectures, and policy. Our infrastructure, software development, and data sharing activities support measurement-based internet research, both at CAIDA and around the world, with focus on the health and integrity of the global Internet ecosystem. The executive summary is excerpted below:

Mapping the Internet. We continued to expand our topology mapping capabilities using our Ark measurement infrastructure. We improved the accuracy and sophistication of our topology annotations, including classification of ISPs, business relationships between them, and geographic mapping of interdomain links that implement these relationships. We released two Internet Topology Data Kits (ITDKs) incorporating these advances.

Mapping Interconnection Connectivity and Congestion. We continued our collaboration with MIT to map the rich mesh of interconnection in the Internet in order to study congestion induced by evolving peering and traffic management practices of CDNs and access ISPs. We focused our efforts on the challenge of detecting and localizing congestion to specific points in between networks. We developed new tools to scale measurements to a much wider set of available nodes. We also implemented a new database and graphing platform to allow us to interactively explore our topology and performance measurements. We produced related data collection and analyses to enable evaluation of these measurements in the larger context of the evolving ecosystem: infrastructure resiliency, economic tussles, and public policy.

Monitoring Global Internet Security and Stability. We conducted infrastructure research and development projects that focus on security and stability aspects of the global Internet. We developed continuous fine-grained monitoring capabilities establishing a baseline connectivity awareness against which to interpret observed changes due to network outages or route hijacks. We released (in beta form) a new operational prototype service that monitors the Internet, in near-real-time, and helps identify macroscopic Internet outages affecting the edge of the network.

CAIDA also developed new client tools for measuring IPv4 and IPv6 spoofing capabilities, along with services that provide reporting and allow users to opt-in or out of sharing the data publicly.

Future Internet Architectures. We continued studies of IPv4 and IPv6 paths in the Internet, including topological congruency, stability, and RTT performance. We examined the state of security policies in IPv6 networks, and collaborated to measure CGN deployment in U.S. broadband networks. We also continued our collaboration with researchers at several other universities to advance development of a new Internet architecture: Named Data Networking (NDN) and published a paper on the policy and social implications of an NDN-based Internet.

Public Policy. Acting as an Independent Measurement Expert, we posted our agreed-upon revised methodology for measurement methods and reporting requirements related to AT&T Inc. and DirecTV merger (MB Docket No. 14-90). We published our proposed method and a companion justification document. Inspired by this experience and a range of contradicting claims about interconnection performance, we introduced a new model describing measurements of interconnection links of access providers, and demonstrated how it can guide sound interpretation of interconnection-related measurements regardless of their source.

Infrastructure operations. It was an unprecedented year for CAIDA from an infrastructure development perspective. We continued support for our existing active and passive measurement infrastructure to provide visibility into global Internet behavior, and associated software tools and platforms that facilitate network research and operational assessments.

We made available several data services that have been years in the making: our prototype Internet Outage Detection and Analysis service, with several underlying components released as open source; the Periscope platform to unify and scale querying of thousands of looking glass nodes on the global Internet; our large-scale Internet topology query system (Henya); and our Spoofer system for measurement and analysis of source address validation across the global Internet. Unfortunately, due to continual network upgrades, we lost access to our 10GB backbone traffic monitoring infrastructure. Now we are considering approaches to acquire new monitors capable of packet capture on 100GB links.

As always, we engaged in a variety of tool development, and outreach activities, including maintaining web sites, publishing 13 peer-reviewed papers, 3 technical reports, 4 workshop reports, one (our first) BGP hackathon report, 31 presentations, 20 blog entries, and hosting 6 workshops (including the hackathon). This report summarizes the status of our activities; details about our research are available in papers, presentations, and interactive resources on our web sites. We also provide listings and links to software tools and data sets shared, and statistics reflecting their usage. Finally, we report on web site usage, personnel, and financial information, to provide the public a better idea of what CAIDA is and does.

For the full 2016 annual report, see http://www.caida.org/home/about/annualreports/2016/

Response to RFI for Future Needs for Advanced Cyberinfrastructure to Support Science and Engineering Research

April 18th, 2017 by kc

I sent the following to NSF in response to a recent Request for Information (RFI) for Future Needs for Advanced Cyberinfrastructure to Support Science and Engineering Research. (The format required an abstract and answers to 3 specific questions.)

Abstract

As the Internet and our dependence on it have grown, the structure and dynamics of the network, and how it relates to the political economy in which it is embedded, have gathered increasing attention by researchers, operators and policy makers. All of these stakeholders bring questions that they lack the capability to answer themselves. Epistemological challenges lie in developing and deploying measurement instrumentation and protocols, expertise required to soundly interpret and use complex data, lack of tools to synthesize different sources of data to reveal insights, data management cost and complexity, and privacy issues. Although a few interdisciplinary projects have succeeded, the current mode of collaboration simply does not scale to the exploding interest in scientific study of the Internet, nor to complex and visionary scientific uses of CAIDA’s data by non-networking experts. We believe the community needs a new shared cyberinfrastructure resource that integrates active Internet measurement capabilities, multi-terabyte data archives, live data streams, heavily curated topology data sets revealing coverage and business relationships, and traffic measurements. Such a resource would enable a broad set of researchers to pursure new scientific directions, experiments, and data products that promote valid interpretations of data and derived inferences.

Read the rest of this entry »

Why IP source address spoofing is a problem and how you can help.

March 24th, 2017 by Bradley Huffaker

video: http://www.caida.org/publications/animations/security/spoofer-sav-intro
information: Software Systems for Surveying Spoofing Susceptibility
download: https://spoofer.caida.org/

This material is based on research sponsored by the Department of Homeland Security (DHS) Science and Technology Directorate, Homeland Security Advanced Research Projects Agency, Cyber Security Division (DHS S&T/HSARPA/CSD) BAA HSHQDC-14-R-B0005, and the Government of United Kingdom of Great Britain and Northern Ireland via contract number D15PC00188. Views should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Department of Homeland Security, the U.S. Government, or the Government of United Kingdom of Great Britain and Northern Ireland.

Help save the Internet: Install the new Spoofer client (v1.1.0)!

December 18th, 2016 by Josh Polterock

The greatest security vulnerability of the Internet (TCP/IP) architecture is the lack of source address validation, i.e., any sender may put a fake source address in a packet, and the destination-based routing protocols that glue together the global Internet will get that packet to its intended destination. Attackers exploit this vulnerability by sending many (millions of) spoofed-source-address packets to services on the Internet they wish to disrupt (or take offline altogether). Attackers can further leverage intermediate servers to amplify such packets into even larger packets that will cause greater disruption for the same effort on the attacker’s part.

Although the IETF recommended best practices to mitigate this vulnerability by configuring routers to validate that source addresses in packets are legitimate, compliance with such practices (BCP38 and BCP84) are notoriously incentive-incompatible. That is, source address validation (SAV) can be a burden to a network who supports it, but its deployment by definition helps not that network but other networks who are thus protected from spoofed-source attacks from that network. Nonetheless, any network who does not deploy BCP38 is “part of the DDoS problem”.

Over the past several months, CAIDA, in collaboration with Matthew Luckie at the University of Waikato, has upgraded Rob Beverly’s original spoofing measurement system, developing new client tools for measuring IPv4 and IPv6 spoofing capabilities, along with services that provide reporting and allow users to opt-in or out of sharing the data publicly. To find out if your network provider(s), or any network you are visiting, implements filtering and allow IP spoofing, point your web browser at http://spoofer.caida.org/ and install our simple client.

This newly released spoofer v1.1.0 client has implemented parallel probing of targets, providing a 5x increase in speed to complete the test, relative to v.1.0. Among other changes, this new prober uses scamper instead of traceroute when possible, and has improved display of results. The installer for Microsoft Windows now configures Windows Firewall.

For more technical details about the problem of IP spoofing and our approach to measurement, reporting, notifications and remediation, see the slides from Matthew Luckie’s recent slideset, “Software Systems for Surveying Spoofing Susceptibility”, presented to the Australian Network Operators Group (AusNOG) in September 2016.

The project web page reports recently run tests from clients willing to share data publicly, test results classified by Autonomous System (AS) and by country, and a summary statistics of IP spoofing over time. We will enhance these reports over the coming months.

This material is based on research sponsored by the Department of Homeland Security (DHS) Science and Technology Directorate, Homeland Security Advanced Research Projects Agency, Cyber Security Division (DHS S&T/HSARPA/CSD) BAA HSHQDC-14-R-B0005, and the Government of United Kingdom of Great Britain and Northern Ireland via contract number D15PC00188. Views should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Department of Homeland Security, the U.S. Government, or the Government of United Kingdom of Great Britain and Northern Ireland.

Henya: Large-Scale Internet Topology Query System

December 17th, 2016 by Josh Polterock

CAIDA’s Internet topology mapping experiment running on our Ark infrastructure has collected traceroute-like measurements of the Internet from nodes hosted in academic, commercial, transit, and residential networks around the globe since September 2007. Discovery of the full potential value of this raw data is best served by a rich, easy-to-use interactive exploratory interface. We have implemented a web-based query interface — henya — to allow researchers to find the most relevant data for their research, such as all traceroutes through a given region and time period toward/across a particular prefix/AS.

We hope that Henya’s large-scale topology query system will become a powerful tool in the researcher’s toolbox for remotely searching CAIDA’s traceroute data. Built-in analysis and visualization modules (still under development) will facilitate our understanding of route and prefix hijacking events as well as provide us with the means to conduct longitudinal analysis. Below we show a screenshot of Henya’s query interface, but Young Hyun, Henya’s creator, also created a useful short introduction video.

henya-query

(A note about the name Henya: Jeju, an island off the coast of South Korea, has a long history of free diving. Over the past few centuries, skilled women divers called haenyeo (pronounced “HEN-yuh”, literally “sea women”) earned their living harvesting and selling various sea-life. Highly-trained haenyeo can dive up to 30 meters deep and can hold their breath for over three minutes. Through this tiring and dangerous work, these women became the breadwinners of their families. Sadly the tradition has nearly died out, with only a few thousand practitioners left, nearly all of the them elderly. Free diving is an inspiring metaphor for data querying, and we thought this name would serve to honor this dying tradition by preserving the name in the topo-query system.)

The work was funded by the Department of Homeland Security (DHS) Science and Technology Directorate, Cyber Security Division DHS S&T/CSD) Broad Agency Announcement 11-02 and SPAWAR Systems Center Pacific via contract number N66001-12-C-0130, and by Research and Development Canada (DRDC) pursuant to an Agreement between the U.S. and Canadian governments for Cooperation in and Technology for Critical Infrastructure Protection and Border Security. The work represents the position of the authors and necessarily that of DHS or DRDC.