IMAPS Workshop on Internet Measurements and Political Science: Network Outages

October 10th, 2014 by Josh Polterock

On Wednesday 1 October 2014, CAIDA hosted a small invitation only workshop that brought together researchers working on large-scale Internet outage detection and characterization with researchers from the political sciences with specific expertise in Internet censorship, political violence (including Internet connectivity disruption ordered by authoritarian regimes for censorship), and Internet penetration. Participants viewed and demonstration of and discussed CAIDA’s current data analysis platform for the exploration of historical and realtime Internet measurement data (named “Charthouse”), and possible extensions of the platform to support political science research related to  macroscopic Internet outages.

 A primary use of our current platform is to detect/characterize large-scale Internet outages, i.e., entire regions or countries getting disconnected from the Internet for hours or days. We intend to extend the platform to enable more agile analysis, support larger datasets, improve geographic-based exploration and visualization, based on use case scenarios defined together with political scientists.

The workshop also included experts from the San Diego Supercomputer Center’s Data Enabled Scientific Computing Group, who provided valuable insights into methods for scalable analysis of large data sets requiring high performance computing platforms.  We currently plan to implement part of the Charthouse platform using the Spark/Shark data analytics stack.

Dataset Comparison: IPv4 vs IPv6 traffic seen at the DNS Root Servers

October 1st, 2014 by Bradley Huffaker

image

As economic pressure imposed by IPv4 address exhaustion has grown, we seek methods to track deployment of IPv6, IPv4’s designated successor. We examine per-country allocation and deployment rates through the lens of the annual “Day in the Life of the Internet” (DITL) snapshots collected at the DNS roots by the DNS Operations, Analysis, and Research Center (DNS-OARC) from 2009 to 2014.

For more details of data sources and analysis, see:
http://www.caida.org/research/policy/dns-country/

Recent collections added to DatCat

September 29th, 2014 by Paul Hick

As announced in the CAIDA blog “Further Improvements to the Internet Data Measurement Catalog (DatCat)” of August 26, 2014, the new Internet Data Measurement Catalogue DatCat is now operational. New entries by the community are welcome, and about a dozen have been added so far. We plan to advertise new and interesting entries on a regular basis with a short entry in this blog. This is the first contribution in this series.

Added on July 31, 2014, was the collection “DNS Zone Files”.

http://imdc.datcat.org/collection/1-0718-Y=DNS-Zone-Files;
contributed 2014-07-31 by Tristan Halvorson:

This collection contains Zone files with NS and A records for all new (2013 and later) TLDs.

ICANN has opened up the TLD creation process to a large number of new registries with a centralized service for downloading all of this new data. Each TLD has a separate zone file, and each zone file contains entries for every registered domain. This data collection contains step-by-step instructions to acquire this data directly from the registries through ICANN. This method only works for TLDs released during 2013 or later.

Comment In the Matter of Protecting and Promoting the Open Internet

September 22nd, 2014 by kc

From the executive summary of public comment to FCC GN Docket No. 14-28., Approaches to transparency aimed at minimizing harm and maximizing investment (by David Clark, Steve Bauer, and kc claffy):

Embedded in a challenging legal and historical context, the FCC must act in the short term to address concerns about harmful discriminatory behavior. But its actions should be consistent with an effective, long-term approach that might ultimately reflect a change in legal framing and authority. In this comment we do not express a preference among short-term options, e.g., section 706 vs. Title II. Instead we suggest steps that would support any short-term option chosen by the FCC, but also inform debate about longer term policy options. Our suggestions are informed by recent research on Internet connectivity structure and performance, from technical as well as business perspectives, and our motivation is enabling fact-based policy. Our line of reasoning is as follows.

  1. Recent discourse about Internet regulation has focused on whether or how to regulate discrimination rather than on its possible harms and benefits. For four reasons, we advocate explicit attention to possible harms, their causes, and means to prevent them. First, the court has stated that while the FCC cannot ban traffic discrimination unless it reclassifies Internet access providers under Title II, the FCC does have the authority to remedy harms. Second, a focus on harms provides a possible way to govern specialized services, which are currently not subject to traffic management constraints. Third, if the FCC chooses Title II, it will open up many questions about which parts to enforce, which will require a discussion of the harms vs. benefits of selective forbearance. Fourth, any new regulatory framework would be well-served by a thorough understanding of
    potential harms and benefits that result from behavior of various actors.
  2. Read the rest of this entry »

DRoP:DNS-based Router Positioning

September 6th, 2014 by Bradley Huffaker

As part of CAIDA’s ongoing research into Internet topology mapping, we have been working on improving our ability to geolocate backbone router infrastructure. Determining the physical locations of Internet routers is crucial for characterizing Internet infrastructure and understanding geographic pathways of global routing, as well as for creating more accurate geographic-based maps. Current commercial geolocation services focus predominantly on geolocating clients and servers, that is, edge hosts rather than routers in the core of the network.

DRoP-process Figure 1, shows the inputs and steps used by the DRoP process to generate hostname decoding rules.

In a recent paper, DRoP:DNS-based Router Positioning, we presented a new methodology for extracting and decoding geography-related strings from fully qualified domain names (DNS hostnames). We first compiled an extensive dictionary associating geographic strings (e.g., airport codes) with geophysical locations. We then searched a large set of router hostnames for these strings, assuming each autonomous naming domain uses geographic hints consistently within that domain. We used topology and performance data continually collected by our global measurement infrastructure to ascertain whether a given hint appears to co-locate different hostnames in which it is found. Finally, we combine geolocation hints into domain-specific rule sets. We generated a total of 1,711 rules covering 1,398 different domains, and validated them using domain-specific ground truth we gathered for six domains. Unlike previous efforts that relied on labor-intensive domain-specific manual analysis, our process for inferring domain-specific heuristics is automated, representing a measurable advance in the state-of-the-art of methods for geolocating Internet resources.

DDec processFigure 2, shows how users interact with DDec to decode hostnames.

In order to provide a public interface and gather feedback on our inferences, we have developed DDec. DDec allows users to decode individual hostnames, exmaine rulesets for individual domains, and provide feedback on rulesets. In addition to DRoP’s inferences, we have also included undns rules.

For more details please review the paper or the slides.

Under the Telescope: Time Warner Cable Internet Outage

August 29th, 2014 by Vasco Asturiano and Alberto Dainotti

In the early hours of August 27th 2014, Time Warner Cable (TWC) suffered a major Internet outage, which started around 9:30am and lasted until 11:00am UTC (4:30am-6:00am EST). According to Time Warner, this disconnect was caused by an issue with its Internet backbone during a routine network maintenance procedure.

A few sources have documented the outage based on BGP and/or active measurements, including Renesys and RIPE NCC. Here we present a view from passive traffic measurement, specifically from the UCSD Network Telescope, which continuously listens for Internet Background Radiation (IBR) traffic. IBR is a constantly changing mix of traffic caused by benign misconfigurations, bugs, malicious activity, scanning, responses to spoofed traffic (backscatter), etc.  In order to extract a signal usable for our inferences, we count the number of unique source IP addresses (in IBR observed from a certain AS or geographical area) that pass a series of filters. Our filters try to remove (i) spoofed traffic, (ii) backscatter, and (iii) ports/protocols that generate significant noise.

Most of TWC’s Autonomous Systems seem to have been affected during the time of the reported outage. Our indicators from the telescope show a total absence of traffic from TWC’s ASes, indicating a complete network outage.

Figure 1: Number of unique IBR source IPs (after filtering) observed per minute for the TWC ASes

Figure 1 shows the number of unique source IPs originated by TWC ASes per minute, as observed by the network telescope; we plot only TWC ASes from which there was any IBR traffic observed just before and after the event. For reference, these ASes are: AS7843, AS10796, AS11351, AS11426, AS11427, AS11955, AS12271 and AS20001.

TWC is a large Internet access provider in the United States, and this IBR signal can also reveal insight into the impact of this outage across the country. Figure 2 shows the same metric as Figure 1, but for source IPs across the entire country, indicating a drop of about 12% in the number of (filtered) IBR sources, which suggests that during the incident, a significant fraction of the US population lost Internet access.

Figure 2: Number of unique IBR source IPs (after filtering) observed in the US 

Drilling down to a regional level shows which US states seem to have suffered a larger relative drop in traffic.

Figure 3: Decrease ratio of unique IBR source IPs per US state 

Figure 3 compares the number of IBR sources observed in the 5 minute-interval just before the incident (9:25-9:30UTC) to the 5-minute interval after it (9:30-9:35UTC). The yellow to red color gradient represents the ratio at which a certain state’s IBR sources have decreased (redder means larger drop). States that did not suffer a substantial relative decrease are shown in yellow. This geographical spread is likely correlated with market penetration of TWC connectivity within each state.

 

 

Further Improvements to the Internet Data Measurement Catalog (DatCat)

August 26th, 2014 by Josh Polterock

Internet researchers and metadata enthusiasts,

In response to feedback and guidance from contributors and users, we continue to refine the Internet Measurement Data Catalog (DatCat). To encourage additional contributions, we have streamlined the DatCat data model and minimized the number of required metadata fields. Specifically, we eliminated the Data and Package objects and merged their most important information into relevant Collections. We also made dozens of other little improvements all over the code base.

We invite folks to browse the catalog, create an account, and contribute some metadata to the catalog to help document the existence and availability of Internet measurement data.

Cheers.

CAIDA’s new program plan, and new name!

July 18th, 2014 by kc

We finally published our new Program Plan for 2014-2017. (Previous program plans are at http://www.caida.org/home/about/progplan.) Executive summary below:

Executive summary:

This program plan outlines CAIDA’s anticipated activities for 2014-2017, in the areas of research, infrastructure, data collection and analysis to support the research community. Our research projects span Internet topology, routing, security, economics, future Internet architectures, and policy. We will continue to pursue Internet cartography, improving our IPv4 and IPv6 topology mapping capabilities using our expanding and extensible Ark measurement infrastructure. We will improve the accuracy and sophistication of our topology annotation capabilities, including economic information and business relationships between ISPs. Using our evolving alias resolution measurement system, which integrates and improves on the best available technology for IP address alias resolution, we will continue to collect, curate, and release our Internet Topology Data Kit (ITDK), including simplified versions that are easier for researchers to use.

We will use this infrastructure and rich data sets to support a new project: Mapping Interconnection in the Internet: Colocation, Connectivity and Congestion. The goal of this project is to characterize the changing nature of the Internet’s topology and traffic dynamics, and to investigate the implications of these changes on network science, architecture, operations, and public policy. We will construct a new type of semantically rich Internet map to guide a study of congestion induced by evolving peering and traffic management practices of CDNs and ISPs, including methods to detect and localize the congestion to specific points in wired (and hopefully eventually mobile) networks. Ark will also support our ongoing (entering its third year) project to study large-scale disruptions of Internet connectivity via correlation of a variety of disparate sources of data; We will have a outage-detection system operational by the end of 2015. Finally, we will extend our participation in future Internet research in two dimensions: measuring and modeling IPv6 deployment; and an expanded role in the Named Data Networking project, one of the NSF-funded future Internet architecture projects headed into its fourth year.

Our infrastructure activities include developing, deploying, and operating an active measurement platform that cost-effectively supports global Internet research and security vulnerability analysis. We will expand our software infrastructure activities to include a system for allowing measurement of compliance with BCP38 (ingress filtering best practices) across government, research, and commercial networks, and analysis of resulting data. We will expand our data sharing efforts, making older topology and some traffic data sets public that used to be restricted to academic researchers. As always, we will lead and participate in tool development to support measurement, analysis, indexing, and dissemination of data from operational global Internet infrastructure. Our outreach activities will include peer-reviewed papers, workshops, blogging, presentations, and technical reports.

Note that not all of the activities described in this program plan are fully funded yet; we are seeking additional support to enable us to accomplish our ambitious agenda.

Finally, we are taking this opportunity of reflection and strategic planning to change the expansion of CAIDA’s acronym to more accurately match what we do. Effective this month we will be the Center for Applied Internet Data Analysis.

Our annual reports are at http://www.caida.org/home/about/annualreports/. This program plan is available at http://www.caida.org/home/about/progplan/. Feedback and questions are welcome at info at caida.org.


Complete program plan for 2014-2017 at: http://www.caida.org/home/about/progplan/progplan2014/.

Hot interconnection links: a HOT topic

June 22nd, 2014 by kc

We’re seeing unprecedented interest in the debate around whose responsibility it is to upgrade the Internet to handle current and impending demand. The carriers have expressed their positions (Verizon, Comcast, AT&T), as have intermediate content providers (e.g., Cogent, Level3), and large content providers such as Netflix. And while Netflix defends its version of transparency, there is clearly room for improvement (Each side emphasizing the need for more transparency from the other side).

A few more timely and related developments this week:

  1. The FCC finally begins to pursue more transparency.
  2. Independent industry group BITAG is undertaking its own effort to improve transparency about how Internet interconnection works.
  3. This past week the MIT CSAIL Information Policy Project and the Congressional Internet Caucus Advisory Committee hosted a briefing introducing our (CAIDA/MIT) research developing methods to detect interdomain congestion at specific location (presented two weeks ago to BITAG). (Audio available here (almost 2 hours).) Plenty of press reports followed.

Stay tuned, much more to say here.

presentation at BITAG meeting on internet interdomain congestion

June 13th, 2014 by kc

I had the honor of being invited to the most recent BITAG (Broadband Internet Technical Advisory Group) meeting, to present some recent research (a collaboration with MIT’s CSAIL group) on identifying and analyzing instances of Internet interdomain congestion (an earlier version of which Matthew presented at a NANOG lightning talk in February).

Per their web site, BITAG’s mission is to “bring together engineers and other similar technical experts to develop consensus on broadband network management practices or other related technical issues that can affect users’ Internet experience“. (Their web site also hosts summaries of Silicon Flatirons workshop discussions that inspired the establishment of BITAG.)

It was gratifying to present to such an interested audience, who provided plenty of constructive feedback as well an invitation to join the technical working group (TWG). I look forward to future interactions with BITAG; they seem a potentially potent means of bringing much-needed transparency to increasingly compelling aspects of the Internet ecosystem.