CAIDA’s Annual Report for 2012
July 31st, 2013 by kc[Executive Summary from our annual report for 2012.]
This annual report covers CAIDA’s activities in 2012, summarizing highlights from our research, infrastructure, data-sharing and outreach activities. Our research projects span Internet topology, routing, traffic, economics, future Internet architectures, and policy. Our infrastructure activities continue to support measurement-based studies of the Internet’s core infrastructure, with focus on the health and integrity of the global Internet’s topology, routing, addressing, and naming systems. In 2012 we increased our participation in future Internet research in two dimensions: measuring and modeling IPv6 deployment; and an expanded role (in management) of the Named Data Networking project, one of the NSF-funded future Internet architecture projects headed into its third year. We also began a project to study large-scale Internet outages via correlation of a variety of disparate sources of data.
We continued to make advances in Internet topology research, supported by our expanding Ark measurement infrastructure. We collect and share the largest Internet topology data sets (IPv4 and IPv6) available to academic researchers, and we share many aggregated annotated derivative data sets publicly, including rankings of ISPs annotated with (our estimated) business relationships between autonomous networks. Our topology measurement platform supports IPv6 — by the end of 2012, 28 of our 64 Ark hosting sites provided IPv6 connectivity and topology measurements. Using our new alias resolution measurement system, which integrates and improves on the best available technology for IP address alias resolution, we collected, analyzed, processed and released our fifth published Internet Topology Data Kit (ITDK), reflecting measurements taken in July 2012. The July 2012 ITDK includes two related router-level topologies, router-to-AS assignments; geographic location of each router; and DNS lookups of all observed IP addresses. After an extensive exercise with our validation data via AS Rank, we also spent many months this year overhauling our AS relationship inference algorithm so that we can add AS relationship annotations to future ITDKs.
On the theoretical side of topology research, we developed a new model in which new connections optimize certain trade-offs between popularity and similarity of nodes, instead of simply preferring popular nodes. This framework has a geometric interpretation in which popularity preference emerges from local optimization. In contrast to standard preferential attachment, our optimization framework accurately describes the large-scale evolution of technological (the Internet), social (trust relationships between people) and biological (Escherichia coli metabolic) networks, accurately predicting the probability of new links. We developed a related framework to support mapping a real network into a hyperbolic plane in a way congruent with this model of network growth. Perhaps our most exciting theoretical result was our discovery of structural similarity (power-law graph with strong clustering) between a casual network representing the large-scale structure of spacetime in our accelerating universe, and complex networks such as the Internet, social, or biological networks. We collaborated with supercomputing experts at SDSC to run HPC simulations that provided evidence that this structural similarity is due to asymptotic equivalence in large-scale growth dynamics of complex networks and spacetime in the universe.
In 2012 we continued applying our theoretical, empirical, and practical understandings of the Internet’s evolution to the challenge of enable dramatically more scalable global Internet routing. We continued our partnership in the Named Data Networking project, a 12-university collaboration funded by NSF’s Future Internet Architecture (FIA) Research program to explore a generalization of the Internet architecture that allows naming more than just communication endpoints, i.e, the source and destination IP addresses, but also data (content) itself. This approach shifts the focus from where — addresses and hosts in today’s Internet — to what — the content that users and applications care about. By naming data instead of locations, the new architecture transforms data into a first-class entity while addressing the known technical challenges of today’s Internet: routing scalability, network security, content protection and privacy. In 2012 we investigated combinations of name-space structure and network topology that optimize the efficiency of NDN algorithms and participated in NDN testbed development and evaluation. The most challenging part of this routing research as it pertains to the Internet still lies ahead, and will require a broader community of engaged thinkers: application of these and other theoretical results to real-world Internet security, economic, and policy contexts.
A more immediate architectural need of the global Internet has inspired us to study the transition to IPv6. The two main lessons we can glean from the scant data available are: (i) architectural transitions – even those deemed minor but essential – are slow; (ii) the U.S. is behind other regions of the world in IPv6 deployment, and has not thus far invested in shedding quantitative light on this problem, despite making attempts to lightly nudge the market toward wider IPv6 adoption. With support from NSF, we collaborated with the Naval Postgraduate School (Rob Beverly) we studied the deployment of IPv6 at the Autonomous System (AS) level using historical BGP data and recent active measurements, to compare IPv4 topology structure and adoption trends. While most core Internet transit providers have deployed IPv6, edge networks are lagging. IPv6 deployment is stronger in Europe and the Asia-Pacific region, than in North America. The IPv6 topology is characterized by a single dominant player, Hurricane Electric, which appears in a large fraction of IPv6 AS paths, and is more dominant in IPv6 than the most dominant player in IPv4. Routing dynamics in the IPv6 topology are largely similar to those in IPv4, and churn in both networks grows at the same rate as the underlying topologies. We found that performance over IPv6 paths is comparable to that over IPv4 paths if the AS-level paths are the same, but can be much worse than IPv4 if the AS-level paths differ. To support a separate but related modeling effort, we developed and conducted a survey of network operators to gauge IPv6 deployment patterns and plans. Based on the results we hope to refine and re-issue a survey next year, to inform and parameterize a predictive model of possible IPv6 future trajectories.
We made significant progress on our Internet economics research, one goal of which is to create a scientific basis for modeling Internet interdomain interconnection and dynamics, capturing relevant interactions between network business relations, internetwork topology, routing policies, and resulting interdomain traffic flow. We developed and published a holistic cost model that can help operators evaluate the costs of various routing and peering decisions, among other network operation costs. Using traffic data from a large carrier network, our model revealed how network operators can significantly reduce the cost of carrying traffic in their networks by adjusting routing for a small fraction of total traffic. We also published a paper on our GENESIS simulator, which embodies a computational model of interdomain network formation that captures key factors influencing network formation dynamics: highly skewed traffic matrix, policy-based routing, geographic co-location constraints, and the costs of transit/peering agreements. This simulator enables us to study “what-if” questions, such as asking how open peering strategies affect networks in terms of topology, traffic flow, and financial health. We continued studying available interdomain traffic matrix (ITM) data, and discovered that we can model the traffic sent by an AS as either a log-normal or Pareto distribution, depending on whether congestion levels. We found correlations between different ASes mostly due to relatively few highly popular prefixes. We also held a successful interdisciplinary Workshop on Internet Economics (WIE) in December 2012 (co-hosted with MIT’s Dave Clark), focused on reaching consensus on definitions and data to support a regulatory framework for a converged communications infrastructure.
In early 2012, we undertook a new three-year research effort to study large-scale Internet outages, under an exciting new Transition to Practice area of NSF’s Secure and Trustworthy Cyberspace research program. In this project we are applying our successful results in studying the Egypt and Libya censorship-induced outages (our IMC2011 paper) to the development, testing, and deployment of an operational capability to detect, monitor, and characterize future episodes of Internet connectivity disruptions. In early 2012, we published a study that used the UCSD darknet traffic data to analyze other outages caused by geophysical disasters — the earthquakes in Christchurch and Tohoku in 2011 — which won an ACM SIGCOMM CCR award for one of the best CCR papers of 2012.
We continued to dedicate resources to support the infrastructure measurement and data sharing interests and needs of two U.S. federal agency programs: the National Science Foundation’s International Research Network Connections (IRNC) program, and the Department of Homeland Security’s Protected Repository of Data on Internet CyberThreats (PREDICT) data-sharing project (http://www.predict.org). The PREDICT funding provides essential support for deployment and operations of our measurement infrastructure, and the collection, curation, and sharing of several unprecedented data sets available to researchers (http://www.caida.org/data/). We are responsive to researcher requests for additional/different Internet data sets, to the extent possible given our resources. We have found an increasing number of disciplines (physicists, sociologists, biologists) interested in our Internet measurement data sets and research results as they apply to other complex network structure, behavior, and evolution.
Finally, as always, we engaged in a variety of tool development, data-sharing, and outreach activities, including web sites, 16 peer-reviewed papers, 5 technical and workshop reports, 47 presentations, 13 blog entries, 7 animations, and (six) workshops, and a seminar series.
Full annual report:
http://www.caida.org/home/about/annualreports/2012/
Program plan for 2010-2013:
http://www.caida.org/home/about/progplan/progplan2010/
We will be creating a new 3-year program plan in 2013. Please do not hesitate to send comments or questions to info at caida dot org.