Archive for the 'Measurement' Category

1st CAIDA BGP Hackathon brings students and community experts together

Thursday, February 18th, 2016 by Josh Polterock

We set out to conduct a social experiment of sorts, to host a hackathon to hack streaming BGP data. We had no idea we would get such an enthusiastic reaction from the community and that we would reach capacity. We were pleasantly surprised at the response to our invitations when 25 experts came to interact with 50 researchers and practitioners (30 of whom were graduate students). We felt honored to have participants from 15 countries around the world and experts from companies such as Cisco, Comcast, Google, Facebook and NTT, who came to share their knowledge and to help guide and assist our challenge teams.

Having so many domain experts from so many institutions and companies with deep technical understanding of the BGP ecosystem together in one room greatly increased the kinetic potential for what we might accomplish over the course of our two days.


IMAPS Workshop on Internet Measurements and Political Science: Network Outages

Friday, October 10th, 2014 by Josh Polterock

On Wednesday 1 October 2014, CAIDA hosted a small invitation only workshop that brought together researchers working on large-scale Internet outage detection and characterization with researchers from the political sciences with specific expertise in Internet censorship, political violence (including Internet connectivity disruption ordered by authoritarian regimes for censorship), and Internet penetration. Participants viewed and demonstration of and discussed CAIDA’s current data analysis platform for the exploration of historical and realtime Internet measurement data (named “Charthouse”), and possible extensions of the platform to support political science research related to  macroscopic Internet outages.

 A primary use of our current platform is to detect/characterize large-scale Internet outages, i.e., entire regions or countries getting disconnected from the Internet for hours or days. We intend to extend the platform to enable more agile analysis, support larger datasets, improve geographic-based exploration and visualization, based on use case scenarios defined together with political scientists.

The workshop also included experts from the San Diego Supercomputer Center’s Data Enabled Scientific Computing Group, who provided valuable insights into methods for scalable analysis of large data sets requiring high performance computing platforms.  We currently plan to implement part of the Charthouse platform using the Spark/Shark data analytics stack.

Dataset Comparison: IPv4 vs IPv6 traffic seen at the DNS Root Servers

Wednesday, October 1st, 2014 by Bradley Huffaker


As economic pressure imposed by IPv4 address exhaustion has grown, we seek methods to track deployment of IPv6, IPv4’s designated successor. We examine per-country allocation and deployment rates through the lens of the annual “Day in the Life of the Internet” (DITL) snapshots collected at the DNS roots by the DNS Operations, Analysis, and Research Center (DNS-OARC) from 2009 to 2014.

For more details of data sources and analysis, see:

DRoP:DNS-based Router Positioning

Saturday, September 6th, 2014 by Bradley Huffaker

As part of CAIDA’s ongoing research into Internet topology mapping, we have been working on improving our ability to geolocate backbone router infrastructure. Determining the physical locations of Internet routers is crucial for characterizing Internet infrastructure and understanding geographic pathways of global routing, as well as for creating more accurate geographic-based maps. Current commercial geolocation services focus predominantly on geolocating clients and servers, that is, edge hosts rather than routers in the core of the network.

DRoP-process Figure 1, shows the inputs and steps used by the DRoP process to generate hostname decoding rules.

In a recent paper, DRoP:DNS-based Router Positioning, we presented a new methodology for extracting and decoding geography-related strings from fully qualified domain names (DNS hostnames). We first compiled an extensive dictionary associating geographic strings (e.g., airport codes) with geophysical locations. We then searched a large set of router hostnames for these strings, assuming each autonomous naming domain uses geographic hints consistently within that domain. We used topology and performance data continually collected by our global measurement infrastructure to ascertain whether a given hint appears to co-locate different hostnames in which it is found. Finally, we combine geolocation hints into domain-specific rule sets. We generated a total of 1,711 rules covering 1,398 different domains, and validated them using domain-specific ground truth we gathered for six domains. Unlike previous efforts that relied on labor-intensive domain-specific manual analysis, our process for inferring domain-specific heuristics is automated, representing a measurable advance in the state-of-the-art of methods for geolocating Internet resources.

DDec processFigure 2, shows how users interact with DDec to decode hostnames.

In order to provide a public interface and gather feedback on our inferences, we have developed DDec. DDec allows users to decode individual hostnames, exmaine rulesets for individual domains, and provide feedback on rulesets. In addition to DRoP’s inferences, we have also included undns rules.

For more details please review the paper or the slides.

Under the Telescope: Time Warner Cable Internet Outage

Friday, August 29th, 2014 by Vasco Asturiano

In the early hours of August 27th 2014, Time Warner Cable (TWC) suffered a major Internet outage, which started around 9:30am and lasted until 11:00am UTC (4:30am-6:00am EST). According to Time Warner, this disconnect was caused by an issue with its Internet backbone during a routine network maintenance procedure.

A few sources have documented the outage based on BGP and/or active measurements, including Renesys and RIPE NCC. Here we present a view from passive traffic measurement, specifically from the UCSD Network Telescope, which continuously listens for Internet Background Radiation (IBR) traffic. IBR is a constantly changing mix of traffic caused by benign misconfigurations, bugs, malicious activity, scanning, responses to spoofed traffic (backscatter), etc.  In order to extract a signal usable for our inferences, we count the number of unique source IP addresses (in IBR observed from a certain AS or geographical area) that pass a series of filters. Our filters try to remove (i) spoofed traffic, (ii) backscatter, and (iii) ports/protocols that generate significant noise.

Most of TWC’s Autonomous Systems seem to have been affected during the time of the reported outage. Our indicators from the telescope show a total absence of traffic from TWC’s ASes, indicating a complete network outage.

Figure 1: Number of unique IBR source IPs (after filtering) observed per minute for the TWC ASes

Figure 1 shows the number of unique source IPs originated by TWC ASes per minute, as observed by the network telescope; we plot only TWC ASes from which there was any IBR traffic observed just before and after the event. For reference, these ASes are: AS7843, AS10796, AS11351, AS11426, AS11427, AS11955, AS12271 and AS20001.

TWC is a large Internet access provider in the United States, and this IBR signal can also reveal insight into the impact of this outage across the country. Figure 2 shows the same metric as Figure 1, but for source IPs across the entire country, indicating a drop of about 12% in the number of (filtered) IBR sources, which suggests that during the incident, a significant fraction of the US population lost Internet access.

Figure 2: Number of unique IBR source IPs (after filtering) observed in the US 

Drilling down to a regional level shows which US states seem to have suffered a larger relative drop in traffic.

Figure 3: Decrease ratio of unique IBR source IPs per US state 

Figure 3 compares the number of IBR sources observed in the 5 minute-interval just before the incident (9:25-9:30UTC) to the 5-minute interval after it (9:30-9:35UTC). The yellow to red color gradient represents the ratio at which a certain state’s IBR sources have decreased (redder means larger drop). States that did not suffer a substantial relative decrease are shown in yellow. This geographical spread is likely correlated with market penetration of TWC connectivity within each state.



network mapping and measurement conference

Tuesday, May 28th, 2013 by kc

I had the honor of presenting an overview of CAIDA’s recent research activities at the Network Mapping and Measurement Conference hosted by Sean Warnick and Daniel Zappala. Talks topics included: social learning behavior in complex networks, re-routing based on expected network outages along current paths, twitter data mining to analyze suicide risk factors and political sentiments (three different talks). James Allen Evans gave a sociology of science talk, an interview form of which seems to be achived by the Oxford Internet Institute. The organizers even arranged a talk from a local startup, NUVI, doing some fascinating real-time visualization and analytics of social network data (including Twitter, Facebook, Reddit, Youtube).

The workshop was held at Sundance, Utah, one of the most beautiful places I’ve ever been for a workshop. This workshop series was originally DoD-sponsored with lots of government attendees interested in Internet infrastructure protection, but sequester and travel freezes this year yielded only two USG attendees, and budget constraints may keep this workshop from happening again next year. I hope not, it was really a unique environment and exposed me to a range of work I would not otherwise have discovered anytime soon. Kudos to the organizers and sponsors.

Carna botnet scans confirmed

Monday, May 13th, 2013 by Alistair King

On March 17, 2013, the authors of an anonymous email to the “Full Disclosure” mailing list announced that last year they conducted a full probing of the entire IPv4 Internet. They claimed they used a botnet (named “carna” botnet) created by infecting machines vulnerable due to use of default login/password pairs (e.g., admin/admin). The botnet instructed each of these machines to execute a portion of the scan and then transfer the results to a central server. The authors also published a detailed description of how they operated, along with 9TB of raw logs of the scanning activity.

Online magazines and newspapers reported the news, which triggered some debate in the research community about the ethical implications of using such data for research purposes. A more fundamental question received less attention: since the authors went out of their way to remain anonymous, and the only data available about this event is the data they provide, how do we know this scan actually happened? If it did, how do we know that the resulting data is correct?



Tuesday, January 22nd, 2013 by Robert Beverly

[This blog entry is guest written by Robert Beverly at the Naval Postgraduate School.]

In many respects, the deployment, adoption, use, and performance of IPv6 has received more recent attention than IPv4. Certainly the longitudinal measurement of IPv6, from its infancy to the exhaustion of ICANN v4 space to native 1% penetration (as observed by Google), is more complete than IPv4. Indeed, there are many vested parties in (either the success or failure) of IPv6, and numerous IPv6 measurement efforts afoot.

Researchers from Akamai, CAIDA, ICSI, NPS, and MIT met in early January, 2013 to firstly share and make sense of current measurement initiatives, while secondly plotting a path forward for the community in measuring IPv6. A specific objective of the meeting was to understand which aspects of IPv6 measurement are “done” (in the sense that there exists a sound methodology, even if measurement should continue), and which IPv6 questions/measurements remain open research problems. The meeting agenda and presentation slides are archived online.


Packet Loss Metrics from Darknet Traffic

Thursday, January 17th, 2013 by Karyn Benson

At the CoNEXT Student Workshop, in Nice, France on December 10, 2012, CAIDA shared recent research on Internet outages in a poster entitled “Gaining Insight Into AS-Level Outages through Analysis of Internet Background Radiation.”


Syria disappears from the Internet

Wednesday, December 5th, 2012 by Alistair King and Alberto Dainotti

On the 29th of November, shortly after 10am UTC (12pm Damascus time), the Syrian state telecom (AS29386) withdrew the majority of BGP routes to Syrian networks (see reports from Renesys, Arbor, CloudFlare, BGPmon). Five prefixes allocated to Syrian organizations remained reachable for another several hours, served by Tata Communications. By midnight UTC on the 29th, as reported by BGPmon, these five prefixes had also been withdrawn from the global routing table, completing the disconnection of Syria from the rest of the Internet.