Archive for the 'Measurement' Category

Toward a Congestion Heatmap of the Internet

Friday, June 3rd, 2016 by Amogh Dhamdhere

In the past year, we have made substantial progress on a system to measure congestion on interdomain links between networks. This effort is part of our NSF-funded project on measuring interdomain connectivity and congestion. The basic nugget of our technique is to send TTL-limited probes from a vantage point (VP) within a network, toward the near and the far end of an interdomain (border) link of that network, and to monitor diurnal patterns in the near and far-side time series. We refer to this method as “Time-Series Latency Probing”, or TSLP. Our hypothesis is that a persistently elevated RTT to the far end of the link, but no corresponding RTT elevation to the near side, is a signal of congestion at the interdomain link.

It turns out that identifying interdomain links from a VP inside a network is surprisingly challenging, for several reasons: lack of standard IP address assignment practices for inter domain links; unadvertised address space by ISPs; and myriad things that can go wrong with traceroute measurements (third-party addresses, unresponsive routers). See our paper at the 2014 Internet Measurement Conference (IMC) for a description of these issues. To overcome those challenges and identify network borders from within a network, we have developed bdrmap, an active measurement tool to accurately identify interdomain links between networks. A paper describing the bdrmap algorithms is currently under submission to IMC 2016.

Our second major activity in the last year has been to develop a backend system that manages TSLP probing from our set of distributed vantage points, collects and organizes data, and presents that data for easy analysis and visualization. A major goal of the backend system is to be adaptive, i.e., the probing state should adapt to topological and routing changes in the network. To this end, we run the bdrmap topology discovery process continuously on each VP. Every day, we process completed bdrmap runs from each monitor and add newly discovered interdomain links or update the probing state for existing links (i.e., destinations we can use to probe those links, and the distance of those links from our VP). We then push updated probing lists to the monitor. This adaptive process ensures that we always probe a relatively current state of thousands of interdomain links visible from our VPs.

Third, we have greatly expanded the scale of our measurement system. We started this project in 2014 with an initial set of approximately ten VPs in 5-6 access networks mostly in the United States. We are now running congestion measurements from over sixty Archipelago VPs in 39 networks and 26 countries around the world. Our Ark VPs have sufficient memory and compute power to run both the border mapping process and the TSLP probing without any issues. However, when we looked into porting our measurements to other active measurement platforms such as Bismark or the FCC’s measurement infrastructure operated by SamKnows, we found that the OpenWRT-based home routers were too resource-constrained to run bdrmap and TSLP directly. To overcome this challenge, we developed a method to move the bulk of the resource-intensive processing from the VPs to a central controller at CAIDA, so the VP only has to run an efficient probing engine (scamper) with a small memory footprint and low CPU usage. We have deployed a test set of 15 Bismark home routers in this type of remote configuration, with lots of help from the folks at the Bismark Project. Our next target deployment will be a set of >5000 home routers that are part of the FCC-SamKnows Measuring Broadband America infrastructure.

A fourth major advance we have made in the last year is in visualization and analysis of the generated time series data. We were on the lookout for a time series database to store, process and visualize the TSLP data. After some initial experimentation, we found influxDB to be well-suited to our needs, due to its ability to scale to millions of time series, scalable and usable read/write API, and SQL-like querying capability. We also discovered Grafana, a graphing frontend that integrates seamlessly with the influxDB database to provide interactive querying and graphing capability. Visualizing time series plots from a given VP to various neighbor networks and browsing hundreds of time series plots is now possible with a few mouse clicks on the Grafana UI. The figure below shows RTT data for 7 interdomain links between a U.S. access provider and a content provider over the course of a week. This graph took a few minutes to produce with influxDB and Grafana; previously this data exploration would have taken hours using data stored in standard relational databases.

 

dashboard_agg

As the cherry on the cake, we have set up the entire system to provide a near real-time view of congestion events. TSLP data is pulled off our VPs and indexed into the influxDB database within 30 minutes of being generated. Grafana provides an auto-refresh mode wherein we can set up a dashboard to periodically refresh when new data is available. There is no technical barrier to shortening the 30-minute duration to an arbitrarily short duration, within reason. The figure below shows a pre-configured dashboard with the real-time congestion state of interdomain links from 5 large access networks in the US to 3 different content providers/CDNs (network names anonymized). Several graphs on that dashboard show a diurnal pattern that signals evidence of congestion on the interdomain link. While drawing pretty pictures and having everything run faster is certainly satisfying, it is neither the goal nor the most challenging aspect of this project. A visualization is only as good as the data that goes into it. Drawing graphs was the easy part; developing a sustainable and scalable system that will keep producing meaningful data was infinitely more challenging. We are delighted with where we are at the moment, and look forward to opening up the data exploration interface for external users.

dashboard-ac

So what happens next? We are far from done here. We are currently working on data analysis modules for time series data with the goal of producing alarms, automatically and without human intervention, that indicate evidence of congestion. Those alarms will be input to a reactive measurement system that we have developed to distribute on-demand measurement tasks to VPs. We envision different types of reactive measurement tasks, e.g., confirming the latency-based evidence of congestion by launching probes to measure loss rate, estimating the impact on achievable throughput by running NDT tests, or estimating potential impacts to user Quality of Experience (QoE). The diagram below shows the various components of the measurement system we are developing. The major piece that remains is continuous analysis of the TSLP data, generating alarms, and pushing on-demand measurements to the reactive measurement system. Stay tuned!

system-diagram

The team: Amogh Dhamdhere, Matthew Luckie, Alex Gamero-Garrido, Bradley Huffaker, kc claffy, Steve Bauer, David Clark

Online course “Internet Measurements: a Hands-on Introduction”

Wednesday, March 30th, 2016 by kc

We just learned our colleagues Timur Friedman (UPMC) and Renata Teixeira (INRIA) and Timur Friedman (UPMC) are teaching a new course: “Internet Measurements: a Hands-on Introduction.” The course will be available from May 23rd to June 19th, 2016 on the platform France Université Numérique (FUN).

fun_mooc_screenshot

This free online course, taught in English, will cover internet measurement basics including network topology and routes; connectivity, losses, latency, and geolocation; bandwidth; and traffic measurements; with hands-on exercises on PlanetLab Europe.
Students of this course will ideally have a level of understanding of internet technology that comes from an advanced undergraduate course or a first Masters course in networking, or equivalent professional experience.

fun_mooc_screenshot2

Registration and details available at https://www.fun-mooc.fr/courses/inria/41011/session01/about

1st CAIDA BGP Hackathon brings students and community experts together

Thursday, February 18th, 2016 by Josh Polterock

We set out to conduct a social experiment of sorts, to host a hackathon to hack streaming BGP data. We had no idea we would get such an enthusiastic reaction from the community and that we would reach capacity. We were pleasantly surprised at the response to our invitations when 25 experts came to interact with 50 researchers and practitioners (30 of whom were graduate students). We felt honored to have participants from 15 countries around the world and experts from companies such as Cisco, Comcast, Google, Facebook and NTT, who came to share their knowledge and to help guide and assist our challenge teams.

Having so many domain experts from so many institutions and companies with deep technical understanding of the BGP ecosystem together in one room greatly increased the kinetic potential for what we might accomplish over the course of our two days.

(more…)

IMAPS Workshop on Internet Measurements and Political Science: Network Outages

Friday, October 10th, 2014 by Josh Polterock

On Wednesday 1 October 2014, CAIDA hosted a small invitation only workshop that brought together researchers working on large-scale Internet outage detection and characterization with researchers from the political sciences with specific expertise in Internet censorship, political violence (including Internet connectivity disruption ordered by authoritarian regimes for censorship), and Internet penetration. Participants viewed and demonstration of and discussed CAIDA’s current data analysis platform for the exploration of historical and realtime Internet measurement data (named “Charthouse”), and possible extensions of the platform to support political science research related to  macroscopic Internet outages.

 A primary use of our current platform is to detect/characterize large-scale Internet outages, i.e., entire regions or countries getting disconnected from the Internet for hours or days. We intend to extend the platform to enable more agile analysis, support larger datasets, improve geographic-based exploration and visualization, based on use case scenarios defined together with political scientists.

The workshop also included experts from the San Diego Supercomputer Center’s Data Enabled Scientific Computing Group, who provided valuable insights into methods for scalable analysis of large data sets requiring high performance computing platforms.  We currently plan to implement part of the Charthouse platform using the Spark/Shark data analytics stack.

Dataset Comparison: IPv4 vs IPv6 traffic seen at the DNS Root Servers

Wednesday, October 1st, 2014 by Bradley Huffaker

image

As economic pressure imposed by IPv4 address exhaustion has grown, we seek methods to track deployment of IPv6, IPv4’s designated successor. We examine per-country allocation and deployment rates through the lens of the annual “Day in the Life of the Internet” (DITL) snapshots collected at the DNS roots by the DNS Operations, Analysis, and Research Center (DNS-OARC) from 2009 to 2014.

For more details of data sources and analysis, see:
http://www.caida.org/research/policy/dns-country/

DRoP:DNS-based Router Positioning

Saturday, September 6th, 2014 by Bradley Huffaker

As part of CAIDA’s ongoing research into Internet topology mapping, we have been working on improving our ability to geolocate backbone router infrastructure. Determining the physical locations of Internet routers is crucial for characterizing Internet infrastructure and understanding geographic pathways of global routing, as well as for creating more accurate geographic-based maps. Current commercial geolocation services focus predominantly on geolocating clients and servers, that is, edge hosts rather than routers in the core of the network.

DRoP-process Figure 1, shows the inputs and steps used by the DRoP process to generate hostname decoding rules.

In a recent paper, DRoP:DNS-based Router Positioning, we presented a new methodology for extracting and decoding geography-related strings from fully qualified domain names (DNS hostnames). We first compiled an extensive dictionary associating geographic strings (e.g., airport codes) with geophysical locations. We then searched a large set of router hostnames for these strings, assuming each autonomous naming domain uses geographic hints consistently within that domain. We used topology and performance data continually collected by our global measurement infrastructure to ascertain whether a given hint appears to co-locate different hostnames in which it is found. Finally, we combine geolocation hints into domain-specific rule sets. We generated a total of 1,711 rules covering 1,398 different domains, and validated them using domain-specific ground truth we gathered for six domains. Unlike previous efforts that relied on labor-intensive domain-specific manual analysis, our process for inferring domain-specific heuristics is automated, representing a measurable advance in the state-of-the-art of methods for geolocating Internet resources.

DDec processFigure 2, shows how users interact with DDec to decode hostnames.

In order to provide a public interface and gather feedback on our inferences, we have developed DDec. DDec allows users to decode individual hostnames, exmaine rulesets for individual domains, and provide feedback on rulesets. In addition to DRoP’s inferences, we have also included undns rules.

For more details please review the paper or the slides.

Under the Telescope: Time Warner Cable Internet Outage

Friday, August 29th, 2014 by Vasco Asturiano

In the early hours of August 27th 2014, Time Warner Cable (TWC) suffered a major Internet outage, which started around 9:30am and lasted until 11:00am UTC (4:30am-6:00am EST). According to Time Warner, this disconnect was caused by an issue with its Internet backbone during a routine network maintenance procedure.

A few sources have documented the outage based on BGP and/or active measurements, including Renesys and RIPE NCC. Here we present a view from passive traffic measurement, specifically from the UCSD Network Telescope, which continuously listens for Internet Background Radiation (IBR) traffic. IBR is a constantly changing mix of traffic caused by benign misconfigurations, bugs, malicious activity, scanning, responses to spoofed traffic (backscatter), etc.  In order to extract a signal usable for our inferences, we count the number of unique source IP addresses (in IBR observed from a certain AS or geographical area) that pass a series of filters. Our filters try to remove (i) spoofed traffic, (ii) backscatter, and (iii) ports/protocols that generate significant noise.

Most of TWC’s Autonomous Systems seem to have been affected during the time of the reported outage. Our indicators from the telescope show a total absence of traffic from TWC’s ASes, indicating a complete network outage.

Figure 1: Number of unique IBR source IPs (after filtering) observed per minute for the TWC ASes

Figure 1 shows the number of unique source IPs originated by TWC ASes per minute, as observed by the network telescope; we plot only TWC ASes from which there was any IBR traffic observed just before and after the event. For reference, these ASes are: AS7843, AS10796, AS11351, AS11426, AS11427, AS11955, AS12271 and AS20001.

TWC is a large Internet access provider in the United States, and this IBR signal can also reveal insight into the impact of this outage across the country. Figure 2 shows the same metric as Figure 1, but for source IPs across the entire country, indicating a drop of about 12% in the number of (filtered) IBR sources, which suggests that during the incident, a significant fraction of the US population lost Internet access.

Figure 2: Number of unique IBR source IPs (after filtering) observed in the US 

Drilling down to a regional level shows which US states seem to have suffered a larger relative drop in traffic.

Figure 3: Decrease ratio of unique IBR source IPs per US state 

Figure 3 compares the number of IBR sources observed in the 5 minute-interval just before the incident (9:25-9:30UTC) to the 5-minute interval after it (9:30-9:35UTC). The yellow to red color gradient represents the ratio at which a certain state’s IBR sources have decreased (redder means larger drop). States that did not suffer a substantial relative decrease are shown in yellow. This geographical spread is likely correlated with market penetration of TWC connectivity within each state.

 

 

network mapping and measurement conference

Tuesday, May 28th, 2013 by kc

I had the honor of presenting an overview of CAIDA’s recent research activities at the Network Mapping and Measurement Conference hosted by Sean Warnick and Daniel Zappala. Talks topics included: social learning behavior in complex networks, re-routing based on expected network outages along current paths, twitter data mining to analyze suicide risk factors and political sentiments (three different talks). James Allen Evans gave a sociology of science talk, an interview form of which seems to be achived by the Oxford Internet Institute. The organizers even arranged a talk from a local startup, NUVI, doing some fascinating real-time visualization and analytics of social network data (including Twitter, Facebook, Reddit, Youtube).

The workshop was held at Sundance, Utah, one of the most beautiful places I’ve ever been for a workshop. This workshop series was originally DoD-sponsored with lots of government attendees interested in Internet infrastructure protection, but sequester and travel freezes this year yielded only two USG attendees, and budget constraints may keep this workshop from happening again next year. I hope not, it was really a unique environment and exposed me to a range of work I would not otherwise have discovered anytime soon. Kudos to the organizers and sponsors.

Carna botnet scans confirmed

Monday, May 13th, 2013 by Alistair King

On March 17, 2013, the authors of an anonymous email to the “Full Disclosure” mailing list announced that last year they conducted a full probing of the entire IPv4 Internet. They claimed they used a botnet (named “carna” botnet) created by infecting machines vulnerable due to use of default login/password pairs (e.g., admin/admin). The botnet instructed each of these machines to execute a portion of the scan and then transfer the results to a central server. The authors also published a detailed description of how they operated, along with 9TB of raw logs of the scanning activity.

Online magazines and newspapers reported the news, which triggered some debate in the research community about the ethical implications of using such data for research purposes. A more fundamental question received less attention: since the authors went out of their way to remain anonymous, and the only data available about this event is the data they provide, how do we know this scan actually happened? If it did, how do we know that the resulting data is correct?

(more…)

2001:deba:7ab1:e::effe:c75

Tuesday, January 22nd, 2013 by Robert Beverly

[This blog entry is guest written by Robert Beverly at the Naval Postgraduate School.]

In many respects, the deployment, adoption, use, and performance of IPv6 has received more recent attention than IPv4. Certainly the longitudinal measurement of IPv6, from its infancy to the exhaustion of ICANN v4 space to native 1% penetration (as observed by Google), is more complete than IPv4. Indeed, there are many vested parties in (either the success or failure) of IPv6, and numerous IPv6 measurement efforts afoot.

Researchers from Akamai, CAIDA, ICSI, NPS, and MIT met in early January, 2013 to firstly share and make sense of current measurement initiatives, while secondly plotting a path forward for the community in measuring IPv6. A specific objective of the meeting was to understand which aspects of IPv6 measurement are “done” (in the sense that there exists a sound methodology, even if measurement should continue), and which IPv6 questions/measurements remain open research problems. The meeting agenda and presentation slides are archived online.

(more…)