Archive for the 'Topology' Category

Henya: Large-Scale Internet Topology Query System

Saturday, December 17th, 2016 by Josh Polterock

CAIDA’s Internet topology mapping experiment running on our Ark infrastructure has collected traceroute-like measurements of the Internet from nodes hosted in academic, commercial, transit, and residential networks around the globe since September 2007. Discovery of the full potential value of this raw data is best served by a rich, easy-to-use interactive exploratory interface. We have implemented a web-based query interface — henya — to allow researchers to find the most relevant data for their research, such as all traceroutes through a given region and time period toward/across a particular prefix/AS.

We hope that Henya’s large-scale topology query system will become a powerful tool in the researcher’s toolbox for remotely searching CAIDA’s traceroute data. Built-in analysis and visualization modules (still under development) will facilitate our understanding of route and prefix hijacking events as well as provide us with the means to conduct longitudinal analysis. Below we show a screenshot of Henya’s query interface, but Young Hyun, Henya’s creator, also created a useful short introduction video.

henya-query

(A note about the name Henya: Jeju, an island off the coast of South Korea, has a long history of free diving. Over the past few centuries, skilled women divers called haenyeo (pronounced “HEN-yuh”, literally “sea women”) earned their living harvesting and selling various sea-life. Highly-trained haenyeo can dive up to 30 meters deep and can hold their breath for over three minutes. Through this tiring and dangerous work, these women became the breadwinners of their families. Sadly the tradition has nearly died out, with only a few thousand practitioners left, nearly all of the them elderly. Free diving is an inspiring metaphor for data querying, and we thought this name would serve to honor this dying tradition by preserving the name in the topo-query system.)

The work was funded by the Department of Homeland Security (DHS) Science and Technology Directorate, Cyber Security Division DHS S&T/CSD) Broad Agency Announcement 11-02 and SPAWAR Systems Center Pacific via contract number N66001-12-C-0130, and by Research and Development Canada (DRDC) pursuant to an Agreement between the U.S. and Canadian governments for Cooperation in and Technology for Critical Infrastructure Protection and Border Security. The work represents the position of the authors and necessarily that of DHS or DRDC.

The Remote Peering Jedi

Friday, November 11th, 2016 by Josh Polterock

During the RIPE 73 IXP Tools Hackathon, Vasileios Giotsas, working with collaborators at FORTH/University of Crete, AMS-IX, University College, London, and NFT Consult, created the Remote Peering Jedi Tool to provide a view into the remote peering ecosystem. Given a large and diverse corpus of traceroute data, the tool detects and localizes remote peering at Internet Exchange Points (IXP).

To make informed decisions, researchers and operators desire to know who has remote peering at the various IXPs. For their RIPE hackathon project, the group created a tool to automate the detection using average RTTs from the RIPE Atlas’ massive corpus of traceroute paths. The group collected validation data from boxes inside the three large IXPs to compare to RTTs estimated via Atlas. The data suggests possible opportunities for Content Distribution Networks (CDN) to improve services for smaller IXPs. The project results also offer insights into how to interpret some of the information in PeeringDB. The project further examined how presence-informed RTT geolocation can contribute to identifying the location of resources. These results help reduce the problem space by exploiting the fact that the IP space of a given AS can appear where the AS has presence.

For more details, you can watch Vasileios’ presentation of the Remote Peering Jedi Tool. Or, visit the remote peering portal to see the tool in action.

remote-peering-jedi

NANOG68: PERISCOPE: Standardizing and Orchestrating Looking Glass Querying

Friday, November 4th, 2016 by Vasileios Giotsas

CAIDA’s Vasileios Giotsas had the opportunity to present PERISCOPE: Standardizing and Orchestrating Looking Glass Querying to the folks at NANOG68. The presentation covered his work on the Periscope Looking Glass API.

The work sets out to unify the heterogenous thousands of autonomously operated Looking Glass (LG) servers into a single unified standardized API for querying and executing experiments across the collective resource as a whole. From the beginning, we understood that while the hosting networks make these services public, usage policies varied and many LG services request clients rate limit their queries or impose rate limits and some forbid automated queries entirely. We do our best with Periscope administration to respect LG resources and implement conservative client rate limiting enforcing a per-user and per-LG rate limits. We identify our clients to provide transparency and accountability.

We believe the Periscope architecture brings several primary benefits. The LG data complements our current trace data and extends the topology coverage. It allows us to implement intelligent load design across all LG servers, uses caching to reduce the number of redundant queries, and makes more efficient use of the LG resources as a whole. Finally, Periscope improves troubleshooting capabilities (often the reason for supporting these services in the first place).

A webcast of the NANOG68 Periscope presentation is available, as well as the accompanying slideset presented at NANOG68.

Full paper:
V. Giotsas, A. Dhamdhere, and k. claffy, “Periscope: Unifying Looking Glass Querying“, in Passive and Active Network Measurement Workshop (PAM), Mar 2016.

Periscope Architecture v1.0

Periscope Architecture v1.0

This work was supported in part by the National Science Foundation, the DHS Science and Technology Directorate, Cyber Security Division (DHS S&T/CSD) and by Defence R&D Canada (DRDC).

Adding geographic annotations to ISP interconnects

Tuesday, September 20th, 2016 by Bradley Huffaker
AS links  annotated geographic locations.

Geographic annotations on AS links.

The Internet arises from the interconnection of thousands of independently operated networks. Its structure is often modeled as a collection of Autonomous Systems (ASes), nodes, exchanging traffic across interconnects, links. These models are reductive by nature, with large international organizations made up of thousands of machines and cables reduced to a single node, and multiple exchange points reduced to a single link.

We extended this model with the introduction of geographic locations attached to links between ISPs, represented by ASes. This extension maintains the simple node and link structure of the AS graph, and allows us to capture some of the geographic complexity in the topology.

AS graphic with geographic locations.

AS graphic with geographic locations.

Consider the path from UCSD to U.Washington depicted in the illustration above. Level 3 has two possible paths: Level 3 ➡ Cogent ➡ U.Wash and Level 3 ➡ NTT ➡ U.Wash. Both paths have the same AS path length. Assuming Level 3 uses hot-potato routing, in order to spend as little money on carrying traffic as possible, it transfers the traffic as soon as possible onto another provider. In this example, NTT’s Los Angeles connection is closer to San Diego than Cogent’s Las Vegas connection, so Level 3 chooses to route the traffic through NTT.

AS links path

In addition to supporting research on path prediction, these type of geographic annotations of links can provide a more realistic indication of the network’s resilience to link failure. In the figure below, duplicate links between ASes reflect multiple interconnects between ASes. e.g., this figure implies that a single link failure would disconnect UCSD from Level 3, while three links would have to fail for Level 3 and NTT to become disconnected.

 Shows multiple links between ASes that connect in multiple locations.

Shows multiple links between ASes that connect in multiple locations.

Details on our geographic link annotation methods and this data is available at CAIDA’s AS Relationships with geographic annotations page.

NSF WATCH series talk: Mapping Internet Interdomain Congestion

Friday, August 26th, 2016 by kc

Last week I gave a talk at NSF’s 39th Washington Area Trustworthy Computing Hour (WATCH) seminar series on CAIDA’s efforts to map internet interdomain congestion. A recorded webcast of the talk is available.

Abstract:

We used the Ark infrastructure to support an ambitious collaboration with MIT to map the rich mesh of interconnection in the Internet, with a focus on congestion induced by evolving peering and traffic management practices of CDNs and access ISPs, including methods to detect and localize the congestion to specific points in networks. We undertook several studies to pursue two dimensions of this challenge. First, we developed methods and tools to identify interconnection borders, and in some cases their physical locations, from comprehensive Internet topology measurements from many edge vantage points. Then, we developed and deployed scalable performance measurement tools to observe performance at thousands of interconnections, algorithms to mine for evidence of persistent congestion in the resulting data; and a system to visualize the results. We produce other related data collection and analysis to enable evaluation of these measurements in the larger context of the evolving ecosystem: quantifying a given network service providers’ global routing footprint; and business-related classifications of networks. In parallel, we examined the peering ecosystem from an economic perspective, exploring fundamental weaknesses and systemic problems of the currently deployed economic framework of Internet interconnection that will continue to cause peering disputes between ASes.

The slides presented are posted on the CAIDA website: Mapping Internet Interdomain Congestion

CAIDA’s 2015 Annual Report

Tuesday, July 19th, 2016 by kc

[Executive summary and link below]

The CAIDA annual report summarizes CAIDA’s activities for 2015, in the areas of research, infrastructure, data collection and analysis. Our research projects span Internet topology, routing, security, economics, future Internet architectures, and policy. Our infrastructure, software development, and data sharing activities support measurement-based internet research, both at CAIDA and around the world, with focus on the health and integrity of the global Internet ecosystem. The executive summary is excerpted below:

Mapping the Internet. We continued to pursue Internet cartography, improving our IPv4 and IPv6 topology mapping capabilities using our expanding and extensible Ark measurement infrastructure. We improved the accuracy and sophistication of our topology annotation capabilities, including classification of ISPs and their business relationships. Using our evolving IP address alias resolution measurement system, we collected curated, and released another Internet Topology Data Kit (ITDK).

Mapping Interconnection Connectivity and Congestion.
We used the Ark infrastructure to support an ambitious collaboration with MIT to map the rich mesh of interconnection in the Internet, with a focus on congestion induced by evolving peering and traffic management practices of CDNs and access ISPs, including methods to detect and localize the congestion to specific points in networks. We undertook several studies to pursue different dimensions of this challenge: identification of interconnection borders from comprehensive measurements of the global Internet topology; identification of the actual physical location (facility) of an interconnection in specific circumstances; and mapping observed evidence of congestion at points of interconnection. We continued producing other related data collection and analysis to enable evaluation of these measurements in the larger context of the evolving ecosystem: quantifying a given ISP’s global routing footprint; classification of autonomous systems (ASes) according to business type; and mapping ASes to their owning organizations. In parallel, we examined the peering ecosystem from an economic perspective, exploring fundamental weaknesses and systemic problems of the currently deployed economic framework of Internet interconnection that will continue to cause peering disputes between ASes.

Monitoring Global Internet Security and Stability. We conduct other global monitoring projects, which focus on security and stability aspects of the global Internet: traffic interception events (hijacks), macroscopic outages, and network filtering of spoofed packets. Each of these projects leverages the existing Ark infrastructure, but each has also required the development of new measurement and data aggregation and analysis tools and infrastructure, now at various stages of development. We were tremendously excited to finally finish and release BGPstream, a software framework for processing large amounts of historical and live BGP measurement data. BGPstream serves as one of several data analysis components of our outage-detection monitoring infrastructure, a prototype of which was operating at the end of the year. We published four other papers that either use or leverage the results of internet scanning and other unsolicited traffic to infer macroscopic properties of the Internet.

Future Internet Architectures. The current TCP/IP architecture is showing its age, and the slow uptake of its ostensible upgrade, IPv6, has inspired NSF and other research funding agencies around the world to invest in research on entirely new Internet architectures. We continue to help launch this moonshot from several angles — routing, security, testbed, management — while also pursuing and publishing results of six empirical studies of IPv6 deployment and evolution.

Public Policy. Our final research thrust is public policy, an area that expanded in 2015, due to requests from policymakers for empirical research results or guidance to inform industry tussles and telecommunication policies. Most notably, the FCC and AT&T selected CAIDA to be the Independent Measurement Expert in the context of the AT&T/DirecTV merger, which turned out to be as much of a challenge as it was an honor. We also published three position papers each aimed at optimizing different public policy outcomes in the face of a rapidly evolving information and communication technology landscape. We contributed to the development of frameworks for ethical assessment of Internet measurement research methods.

Our infrastructure operations activities also grew this year. We continued to operate active and passive measurement infrastructure with visibility into global Internet behavior, and associated software tools that facilitate network research and security vulnerability analysis. In addition to BGPstream, we expanded our infrastructure activities to include a client-server system for allowing measurement of compliance with BCP38 (ingress filtering best practices) across government, research, and commercial networks, and analysis of resulting data in support of compliance efforts. Our 2014 efforts to expand our data sharing efforts by making older topology and some traffic data sets public have dramatically increased use of our data, reflected in our data sharing statistics. In addition, we were happy to help launch DHS’ new IMPACT data sharing initiative toward the end of the year.

Finally, as always, we engaged in a variety of tool development, and outreach activities, including maintaining web sites, publishing 27 peer-reviewed papers, 3 technical reports, 3 workshop reports, 33 presentations, 14 blog entries, and hosting 5 workshops. This report summarizes the status of our activities; details about our research are available in papers, presentations, and interactive resources on our web sites. We also provide listings and links to software tools and data sets shared, and statistics reflecting their usage. sources. Finally, we offer a “CAIDA in numbers” section: statistics on our performance, financial reporting, and supporting resources, including visiting scholars and students, and all funding sources.

For the full 2015 annual report, see http://www.caida.org/home/about/annualreports/2015/

Toward a Congestion Heatmap of the Internet

Friday, June 3rd, 2016 by Amogh Dhamdhere

In the past year, we have made substantial progress on a system to measure congestion on interdomain links between networks. This effort is part of our NSF-funded project on measuring interdomain connectivity and congestion. The basic nugget of our technique is to send TTL-limited probes from a vantage point (VP) within a network, toward the near and the far end of an interdomain (border) link of that network, and to monitor diurnal patterns in the near and far-side time series. We refer to this method as “Time-Series Latency Probing”, or TSLP. Our hypothesis is that a persistently elevated RTT to the far end of the link, but no corresponding RTT elevation to the near side, is a signal of congestion at the interdomain link.

It turns out that identifying interdomain links from a VP inside a network is surprisingly challenging, for several reasons: lack of standard IP address assignment practices for inter domain links; unadvertised address space by ISPs; and myriad things that can go wrong with traceroute measurements (third-party addresses, unresponsive routers). See our paper at the 2014 Internet Measurement Conference (IMC) for a description of these issues. To overcome those challenges and identify network borders from within a network, we have developed bdrmap, an active measurement tool to accurately identify interdomain links between networks. A paper describing the bdrmap algorithms is currently under submission to IMC 2016.

Our second major activity in the last year has been to develop a backend system that manages TSLP probing from our set of distributed vantage points, collects and organizes data, and presents that data for easy analysis and visualization. A major goal of the backend system is to be adaptive, i.e., the probing state should adapt to topological and routing changes in the network. To this end, we run the bdrmap topology discovery process continuously on each VP. Every day, we process completed bdrmap runs from each monitor and add newly discovered interdomain links or update the probing state for existing links (i.e., destinations we can use to probe those links, and the distance of those links from our VP). We then push updated probing lists to the monitor. This adaptive process ensures that we always probe a relatively current state of thousands of interdomain links visible from our VPs.

Third, we have greatly expanded the scale of our measurement system. We started this project in 2014 with an initial set of approximately ten VPs in 5-6 access networks mostly in the United States. We are now running congestion measurements from over sixty Archipelago VPs in 39 networks and 26 countries around the world. Our Ark VPs have sufficient memory and compute power to run both the border mapping process and the TSLP probing without any issues. However, when we looked into porting our measurements to other active measurement platforms such as Bismark or the FCC’s measurement infrastructure operated by SamKnows, we found that the OpenWRT-based home routers were too resource-constrained to run bdrmap and TSLP directly. To overcome this challenge, we developed a method to move the bulk of the resource-intensive processing from the VPs to a central controller at CAIDA, so the VP only has to run an efficient probing engine (scamper) with a small memory footprint and low CPU usage. We have deployed a test set of 15 Bismark home routers in this type of remote configuration, with lots of help from the folks at the Bismark Project. Our next target deployment will be a set of >5000 home routers that are part of the FCC-SamKnows Measuring Broadband America infrastructure.

A fourth major advance we have made in the last year is in visualization and analysis of the generated time series data. We were on the lookout for a time series database to store, process and visualize the TSLP data. After some initial experimentation, we found influxDB to be well-suited to our needs, due to its ability to scale to millions of time series, scalable and usable read/write API, and SQL-like querying capability. We also discovered Grafana, a graphing frontend that integrates seamlessly with the influxDB database to provide interactive querying and graphing capability. Visualizing time series plots from a given VP to various neighbor networks and browsing hundreds of time series plots is now possible with a few mouse clicks on the Grafana UI. The figure below shows RTT data for 7 interdomain links between a U.S. access provider and a content provider over the course of a week. This graph took a few minutes to produce with influxDB and Grafana; previously this data exploration would have taken hours using data stored in standard relational databases.

 

dashboard_agg

As the cherry on the cake, we have set up the entire system to provide a near real-time view of congestion events. TSLP data is pulled off our VPs and indexed into the influxDB database within 30 minutes of being generated. Grafana provides an auto-refresh mode wherein we can set up a dashboard to periodically refresh when new data is available. There is no technical barrier to shortening the 30-minute duration to an arbitrarily short duration, within reason. The figure below shows a pre-configured dashboard with the real-time congestion state of interdomain links from 5 large access networks in the US to 3 different content providers/CDNs (network names anonymized). Several graphs on that dashboard show a diurnal pattern that signals evidence of congestion on the interdomain link. While drawing pretty pictures and having everything run faster is certainly satisfying, it is neither the goal nor the most challenging aspect of this project. A visualization is only as good as the data that goes into it. Drawing graphs was the easy part; developing a sustainable and scalable system that will keep producing meaningful data was infinitely more challenging. We are delighted with where we are at the moment, and look forward to opening up the data exploration interface for external users.

dashboard-ac

So what happens next? We are far from done here. We are currently working on data analysis modules for time series data with the goal of producing alarms, automatically and without human intervention, that indicate evidence of congestion. Those alarms will be input to a reactive measurement system that we have developed to distribute on-demand measurement tasks to VPs. We envision different types of reactive measurement tasks, e.g., confirming the latency-based evidence of congestion by launching probes to measure loss rate, estimating the impact on achievable throughput by running NDT tests, or estimating potential impacts to user Quality of Experience (QoE). The diagram below shows the various components of the measurement system we are developing. The major piece that remains is continuous analysis of the TSLP data, generating alarms, and pushing on-demand measurements to the reactive measurement system. Stay tuned!

system-diagram

The team: Amogh Dhamdhere, Matthew Luckie, Alex Gamero-Garrido, Bradley Huffaker, kc claffy, Steve Bauer, David Clark

1st CAIDA BGP Hackathon brings students and community experts together

Thursday, February 18th, 2016 by Josh Polterock

We set out to conduct a social experiment of sorts, to host a hackathon to hack streaming BGP data. We had no idea we would get such an enthusiastic reaction from the community and that we would reach capacity. We were pleasantly surprised at the response to our invitations when 25 experts came to interact with 50 researchers and practitioners (30 of whom were graduate students). We felt honored to have participants from 15 countries around the world and experts from companies such as Cisco, Comcast, Google, Facebook and NTT, who came to share their knowledge and to help guide and assist our challenge teams.

Having so many domain experts from so many institutions and companies with deep technical understanding of the BGP ecosystem together in one room greatly increased the kinetic potential for what we might accomplish over the course of our two days.

(more…)

What’s in a Ranking? comparing Dyn’s Baker’s Dozen and CAIDA’s AS Rank

Thursday, July 2nd, 2015 by Bradley Huffaker

The Internet infrastructure is composed of thousands of independent networks (Autonomous Systems, or ASes) that engage in typically voluntary bilateral interconnection (“peering”) agreements to provide reachability to each other. Underlying these peering relationships, are business relationships between networks, although whether and how much money ASes exchange when they interconnect is not generally published. Some of these business relationships are relatively easy to infer with a high degree of confidence using a basic economic assumption that commercial providers do not give away traffic transit services (i.e., route announcements) for free.

For several years CAIDA has used publicly available BGP data to infer business relationships among ASes and, consequently, rank Autonomous Systems based on a measure of their influence in the global routing system, specifically the size of their customer cone. (An AS’s customer cone is the set of ASes, IPv4 prefixes, or IPv4 addresses that the AS can reach via its customers, i.e., by crossing only customer links.) The methodology behind our ranking is described in detail in our IMC2013 paper (“AS Relationships, Customer Cones, and Validation”). By default, CAIDA’s AS Rank sorts by the number of other ASes in each AS’s customer cone (an AS granularity), but the AS Rank web interface also supports sorting by the number of IPv4 prefixes or IPv4 addresses observed in each AS’s customer cone (which the web interface calls prefix or IP address granularities).

Other organizations also provide rankings of ASes; the most well-known is Dyn’s IP Transit Intelligence AS ranking. Since both CAIDA’s and Dyn’s rankings aim to use a metric that reflects some notion of “predominant role in the global Internet routing system”, we have received several inquiries on how our ranking methodology and results differ from theirs. In this essay we try to answer this question to the best of our ability, acknowledging that their methodology is proprietary and we do not know exactly what they are doing beyond what they have released publicly. This 2013 MENOG presentation (Dyn bought the Renesys company in 2014) states that their ranking is based on quantity of transited IP space, so the closest possible comparison to what we currently do would be to compare their ranking with our IP-address-based customer cone ranking (which is not currently our default). For this exercise we will compare CAIDA’s 1st January 2015 AS ranking by customer cone with the chronologically last value on Dyn’s 2014 Baker’s Dozen, which is based on data observed around the same date.

Dyn’s web site provides the following image showing their rankings throughout 2014: Dyn-Bakers-Dozen-2014-All

In order to compare not only the computed ranking, but the values of the metrics being ranked (i.e., transited IPv4 space vs. number of addresses in customer cone), we create a mapping between the two spaces. Dyn does not put numbers on their y-axis, and they plot only the top 13 ranked ASes, so we do not know the range of y-values represented. In order to make the comparison possible, we will (make a leap of faith and) assume that the top thirteen ranked ASes for each metric cover roughly the same rank of values. (We caution that this assumption may be unjustified and are trying to validate it with Dyn.) So we map the top ranked ASes in Dyn (Level 3 AS3356), to the top ranked AS in CAIDA (also Level 3 AS3356), and map the 13th-ranked AS in Dyn, (Hurricane AS6939), to the 13th ranked AS in CAIDA, (Korea Telecom AS4766). These upper and lower thresholds result in the following mapping between the transited IPv4 space and number of IPv4 addresses in customer cone:

ASdyn_i.dyn_y = ASdyn_i.transit_ip – ASdyn_13.transit_ip + AScaida_13.number_addresses
AScaida_0.num_addresses – AScaida_13.num_addresses

Dyn vs CAIDA's AS Ranking
An AS’s rank is based on the number of ASes with a value (of the ranked metric) greater than the given AS. CAIDA’s 8th, 11th, and 13th ranked ASes are gray because we do not know their Dyn ranking.
as-prefix-percentage Hilbert map visulization shows utilization of IPv4 address space, rendered in two dimensions using as space-filling continous fractal Hilbert curve of order 12. Each pixel in the full resolution image represents a /24 block; red indicates used blocks, green unassigned blocks and blue RFC special blocks. Routed unused blocks are grey and unrouted assigned black

Although their order changes, the top nine ASes are the same in both rankings. Three of Dyn’s top-ranked ASes — China Telecom (AS4134), Beyond (AS3491), and Level 3 (AS3549) — are not in CAIDA’s top 14 ranked ASes; instead CAIDA’s top 14 includes AT&T (AS7018), Deutsche Telecom (AS3320), and Korea Telecom (AS4766). Some of this discrepancy can be explained by Dyn’s curation of the data, including “dealing with anomalies, discounting pre-CIDR allocations, ignoring short-lived announcements, counting remaining prefixes (non-linearly) based on size (/8 – /24 only), etc“. We assume these heuristics aim to make the number of transited addresses a closer approximation to the amount of transited traffic, which Dyn suggests is the more interesting ranking (in the same 2013 MENOG presentation).

We agree with Dyn that the number of IP addresses is not representative of traffic, and have always emphasized that we are not in a position to rank ASes by traffic transited. Not only is there huge variation in traffic to/from different IP addresses (e.g., home user versus popular web servers), but many announced IP addresses are not even assigned to any hosts. In an October 2013 study, CAIDA researchers found that of the 10.4M addresses announced in that month, only 5.3M (51%) were observed sending traffic (these “used” address blocks are shown as red in the Hilbert map on the right). This observation suggests another arguably more meaningful (but computationally expensive) method to rank ASes: normalizing by the amount of observably actively used address space.

July, August, and September 2013



Since we do not yet have census information for January 2015, we use July, August, and September 2013 usage data to compare Dyn’s 2013 ranking with CAIDA’s AS ranking weighted by the number of observably used /24 IPv4 prefixes in the customer cone. (A /24 is defined as “used” if the census observed it as in use.)

The results of this ranking by “observably used IPv4 address /24 blocks”-based customer cone (i.e., the number of apparently used /24 blocks in an AS’s customer cone) look more similar to the Dyn rankings, consistent with the fact that this method of calculating customer cones accounts for some of the effect Dyn captures by discounting pre-CIDR blocks, which are less likely to be fully utilized.

Dyn vs CAIDA's AS Ranking
An AS’s ranking is based on the number of ASes with a value greater than the given AS. The CAIDA’s 8th, 12th, and 13th ranked AS are colored gray to indicate that we do not have a known their Dyn ranking.
 2015   2013 
 dyn   address   address   used   dyn 
 2015 

dyn 1.00 0.82 0.83 0.86 0.82
address 0.82 1.00 0.74 0.66 0.49
 2013 

address 0.83 0.74 1.00 0.96 0.86
used 0.86 0.66 0.96 1.00 0.90
dyn 0.82 0.49 0.86 0.90 1.00

We computed the Pearson correlation coefficient between the results of the two ranking methods. A value of 1 shows perfect correlation or that the two systems have identical rankings. A 0 means there is no correlation or that the two rankings are completely different. Outside the comparison with themselves, which by definition produces 1.00, the two most similar rankings are Dyn’s 2013 transit addresses and CAIDA’s 2013 used /24s with a correlation of 0.90.

This approach improves the correlation between Dyn’s and CAIDA’s ranking (e.g., the Pearson correlation coefficient increases from 0.82 to 0.90, see Table), but it amplifies the dominance of the top-ranked AS (Level 3 AS3356) for CAIDA’s census-derived customer cone ranking.

If we correlate how the rankings have changed over the last two years — which we cannot do for the census-based ranking since we only have 2013 data — we find that Dyn’s ranking showed greater consistency (a correlation between the 2013 and 2015 rankings of 0.82 compared with CAIDA’s 0.74), perhaps due to their data curation process.

In summary, CAIDA’s IPv4 address-based customer cone and Dyn’s transited IPv4 address space roughly agree on the top ASes, although their relative weighting diverges.


CAIDA Delivers More Data To the Public

Wednesday, February 12th, 2014 by Paul Hick

As part of our mission to foster a collaborative research environment in which data can be acquired and shared, CAIDA has developed a framework that promotes wide dissemination of our datasets to researchers. We classify a dataset as either public or restricted based on a consideration of privacy issues involved in sharing it, as described in our data sharing framework document Promotion of Data Sharing (http://www.caida.org/data/sharing/).

Public datasets are available for downloaded from our public dataserver (http://data.caida.org) subject to conditions specified in our Acceptable Use Agreement (AUA) for public data (http://www.caida.org/home/legal/aua/public_aua.xml). CAIDA provides access to restricted datasets conditionally to qualifying researchers of academic and CAIDA-member institutions agreeing to a more restrictive AUA (http://www.caida.org/home/legal/aua/).

In January 2014 we reviewed our collection of datasets in order to re-evaluate their classification. As a result, as of February 1, we have converted several popular restricted CAIDA datasets into public datasets, including most of one of our largest and most popular data collections: topology data from the (now retired) skitter measurement infrastructure (operational between 1998 and 2008), and its successor, the Archipelago (or Ark) infrastructure (operational since September 2007). We have now made all IPv4 measurements older than two years (which includes all skitter data) publicly available. In addition to the raw data, this topology data includes derived datasets such as the Internet Topology Data Kits (ITDKs). Further, to encourage research on IPv6 deployment, we made our IPv6 Ark topology and performance measurements, from,December 2008 up to the present, publicly available as a whole. We have added these new public data to the existing category of public data sets, which includes AS links data inferred from traceroute measurements taken by skitter and Ark platforms.

Several other datasets remain under consideration for public release, so stay tuned. For an overview of all datasets currently provided by CAIDA (both public and restricted) see our data overview page (http://www.caida.org/data/overview/).

Support for this data collection and sharing provided by DHS Science and Technology Directorate’s PREDICT project via Cooperative Agreement FA8750-12-2-0326 and NSF’s Computing Research Infrastructure Program via CNS-0958547.