[Executive Summary from our annual report for 2011.]
This annual report covers CAIDA’s activities in 2011, summarizing highlights from our research, infrastructure, data-sharing and outreach activities. Our current research projects span topology, routing, traffic, economics, future Internet architectures, and policy. Our infrastructure activities continue to support measurement-based studies of the Internet’s core infrastructure, with focus on the health and integrity of the global Internet’s topology, routing, addressing, and naming systems. We are also dedicating resources to support the infrastructure measurement and data sharing interests and needs of two U.S. federal agency programs: the National Science Foundation’s International Research Network Connections (IRNC) program, and the Department of Homeland Security’s Protected Repository of Data on Internet CyberThreats (PREDICT) data-sharing project.
We continue to expand our Internet active measurement platform Ark in scale and functionality, and use this platform to collect and share the largest Internet topology data sets (IPv4 and IPv6) available to academic researchers, and share many aggregated annotated derivative data sets publicly. Our topology measurement platform supports IPv6 — by the end of 2011, 28 of our 57 Ark hosting sites provided IPv6 connectivity and topology measurements. We have dramatically improved existing techniques for IP address alias resolution for large Internet graphs; we submitted a paper describing and evaluating the performance of our algorithms in late 2011, hopefully for publication in 2012. (Preliminary technical report available on the web site now, see Topology section of the report.) Using these new techniques, we collected, analyzed, processed and released two Internet Topology Data Kit (ITDK) Datasets, reflecting measurements taken in April and October 2011. Each 2011 ITDK includes two related router-level topologies, router-to-AS assignments; geographic location of each router; and DNS lookups of all observed IP addresses. We are still working on improving and validating our AS relationship inference algorithm so that we can add additional annotations to future ITDKs.
On the theoretical side of topology research, we continued investigation of the geometric model we developed last year to study the structure and function of complex networks. This model assumes that hyperbolic geometry underlies many complex networks, which if true provides a natural explanation for the heterogeneous degree distributions and strong clustering that characterize so many complex networks, i.e., they are simple reflections of the negative curvature and metric property of the underlying hyperbolic geometry. We also showed that not only popularity but also similarity acts as a strong force in shaping complex network structure and dynamics. We developed a framework where new connections, instead of preferring popular nodes, optimize certain trade-offs between popularity and similarity. The optimization framework more accurately describes large-scale Internet evolution (new links) than previous models, e.g., preferential attachment. The mathematically inclined will appreciate our related recent investigation of random bipartite networks using a hidden variable formalism that facilitates study of the structure and function of complex networks, as well as inference of individual characteristics, attributes, and annotations of nodes in real bipartite networks. Particular applications of interest are network geometry and navigability.
We gained momentum on our economics and policy research agenda, focused primarily on explanatory and predictive modeling of the economics of transit and peering interconnections in the Internet. Two historical developments contribute to a persistent disconnect between economic models and actual operational practices on the Internet. First, the Internet became too complex – in traffic dynamics, topology, and economics – for currently available analytical tools to allow realistic modeling. Second, the data needed to parameterize more realistic models is simply not available. The problem is fundamental, and familiar: simple models are not valid, and complex models cannot be validated. We are making progress in both dimensions: creating more powerful, empirically parameterized computational tools, and enabling broader validation than previously possible. We also held the second interdisciplinary Workshop on Internet Economics (WIE) in December, connecting academic researchers, commercial Internet facilities and service providers, theorists, policy makers, and pundits of Internet economics to frame an Internet economics research agenda, and more specifically to improve the realism, utility, and predictive power of economic models of Internet topology and dynamics.
In the first months of 2011, Internet communications were disrupted in several North African countries in response to civilian protests and threats of civil war. We analyzed episodes of these disruptions in two countries: Egypt and Libya. Using both control plane and data plane data sets in combination allowed us to narrow down which forms of Internet access disruption were implemented in a given region over time. Among other insights, we detected what we believe were Libya’s attempts to test firewall-based blocking before they executed more aggressive BGP-based disconnection. Our methodology could be used, and automated, to detect outages or similar macroscopically disruptive events in other geographic or topological regions.
We are applying our theoretical, empirical, and practical understandings of the Internet’s evolution to engage in the NSF’s exciting Future Internet Architecture (FIA) Research program. In 2011 we participated in the Named Data Networking project, a 12-university collaboration funded by the FIA program to explore a generalization of the Internet architecture that allows naming more than just communication endpoints, i.e, the source and destination IP address, but also data (content) itself. This approach shifts the focus from where — addresses and hosts in today’s Internet — to what — the content that users and applications care about. By naming data instead of locations, the new architecture transforms data into a first-class entity while addressing the known technical challenges of the today Internet: routing scalability, network security, content protection and privacy. In 2011 we investigated combinations of name-space structure and network topology that optimize the efficiency of NDN algorithms and participated in NDN testbed development and evaluation.
Finally, as always, we engaged in a variety of tool development, data-sharing, and outreach activities, including web sites, peer-reviewed papers, technical reports, presentations, blogging, animations, and (six) workshops. Details of our activities are below. CAIDA’s program plan for 2010-2013 is available at http://www.caida.org/home/about/progplan/progplan2010/. Please do not hesitate to send comments or questions to info at caida dot org.
Full annual report:
Program plan for 2010-2013: