Archive for the 'Updates' Category

CAIDA Delivers More Data To the Public

Wednesday, February 12th, 2014 by paul

As part of our mission to foster a collaborative research environment in which data can be acquired and shared, CAIDA has developed a framework that promotes wide dissemination of our datasets to researchers. We classify a dataset as either public or restricted based on a consideration of privacy issues involved in sharing it, as described in our data sharing framework document Promotion of Data Sharing (http://www.caida.org/data/sharing/).

Public datasets are available for downloaded from our public dataserver (http://data.caida.org) subject to conditions specified in our Acceptable Use Agreement (AUA) for public data (http://www.caida.org/home/legal/aua/public_aua.xml). CAIDA provides access to restricted datasets conditionally to qualifying researchers of academic and CAIDA-member institutions agreeing to a more restrictive AUA (http://www.caida.org/home/legal/aua/).

In January 2014 we reviewed our collection of datasets in order to re-evaluate their classification. As a result, as of February 1, we have converted several popular restricted CAIDA datasets into public datasets, including most of one of our largest and most popular data collections: topology data from the (now retired) skitter measurement infrastructure (operational between 1998 and 2008), and its successor, the Archipelago (or Ark) infrastructure (operational since September 2007). We have now made all IPv4 measurements older than two years (which includes all skitter data) publicly available. In addition to the raw data, this topology data includes derived datasets such as the Internet Topology Data Kits (ITDKs). Further, to encourage research on IPv6 deployment, we made our IPv6 Ark topology and performance measurements, from,December 2008 up to the present, publicly available as a whole. We have added these new public data to the existing category of public data sets, which includes AS links data inferred from traceroute measurements taken by skitter and Ark platforms.

Several other datasets remain under consideration for public release, so stay tuned. For an overview of all datasets currently provided by CAIDA (both public and restricted) see our data overview page (http://www.caida.org/data/overview/).

Support for this data collection and sharing provided by DHS Science and Technology Directorate’s PREDICT project via Cooperative Agreement FA8750-12-2-0326 and NSF’s Computing Research Infrastructure Program via CNS-0958547.

 

 

(Re)introducing the Internet Measurement Data Catalog (DatCat)

Monday, October 7th, 2013 by josh

In 2002, we began to create a metadata catalog where operators and other data owners could index their datasets. We were motivated by several goals we hoped the catalog would enable: data providers sharing data with researchers; researchers finding data to support specific research questions; promoting reproducibility of scientific results using Internet data; and correlating heterogeneous measurement data to analyze macroscopic Internet trends. This last goal was perhaps the most ambitious: we imagined a scenario where enough data would be richly enough indexed that the metadata itself would reveal macroscopic trends about Internet (traffic) characteristics, e.g., average packet size over time, average fraction of traffic carried via HTTP, without even needing to touch the underlying traffic data (netflow or pcap files).

To support this variety of uses of the catalog, we developed a rich metadata model that supported extremely precise descriptions of indexed data sets. For a given data set, one could specify: a description of a collection of files with similar purpose; scholarly paper, articles or publications that make use of the data; descriptions of the files containing the actual data and its format; the package format used for download; contact information; location of the data; a list of keywords; the size of the files/collection; the geographic, network, and logical location of the data; the platform used to collect the data; the start time, end time, and duration; and free form user notes. We allowed searching on any of these fields.

The catalog allows the user to not only index data but also flexibly group data sets into collections, link collections to entries describing the tools used to collect the data, and link collections to publications that used the data. We considered many corner cases and implemented our complex metadata model in an industrial strength relational database. We released the Internet Measurement Data Catalog (DatCat) in June of 2006, prepopulated with our own data sets and introduced via a hands-on workshop where we helped create scripts to assist other researchers in indexing their own data for contribution to the catalog.

In retrospect, we over-complicated the data model and the process of data indexing. Having undertaken data collection for years ourselves, we were familiar with the jargon used to describe precise characteristics of data and the variety of scenarios in which users collect Internet data. We tried to cover each and every possible case. We overshot. The result was a cumbersome and time-consuming interface. Based on feedback from those who took the time to contribute to DatCat, it became clear that we needed to streamline the submission interface. Further, we had built the original service atop an expensive, proprietary database that incurred unnecessary licensing costs.

In August 2011, NSF’s SDCI program provided support for three additional tasks building on what we learned: (1) reduce the burden on those contributing data via a streamlined interface and tools for easier indexing, annotation and navigation of relevant data; (2) convert from use of a proprietary database backend (Oracle) to a completely open source solution (Postgresql); and (3) expand DatCat’s relevance to the cybersecurity and other research communities via forums.

The new database objects have drastically fewer required fields so that contributors can more easily enter new dataset collections. The new streamlined collections require only collection name, short description, and summary fields. We have the new DatCat web site back online serving with the new open-source Postgresql database backend and streamlined interface. Also, we developed a public forums interface to hold discussions of data sharing issues and to answer frequently asked questions regarding the DatCat and the information it contains.

We hope that DatCat evolves to become a lightweight mechanism supporting operators and researchers who want to announce the availability and existence of datasets relevant to (cybersecurity) research. It could also assist NSF PIs with the new requirement that every proposal must include a data management plan for documenting types of data, data and metadata standards, policies for access, sharing, and provisions for re-use, re-distribution, and derivative works and location of archives. Finally, we hope the DatCat service will facilitate collaboration among cybersecurity and network researchers and operators around the world.

We now invite you to take a(nother) look at the Internet Measurement Data Catalog (DatCat). Please point your browser at http://imdc.datcat.org/, browse the catalog, run a few searches, crawl the keywords, create an account, and index your favorite dataset. Please send any questions or feedback to info at datcat dot org.

IPv4 and IPv6 AS Core 2013

Friday, August 9th, 2013 by bradley

We recently released a visualization at http://www.caida.org/research/topology/as_core_network/ that represents our macroscopic snapshots of IPv4 and IPv6 Internet topology samples captured in 2013. The plots illustrate both the extensive geographical scope as well as rich interconnectivity of nodes participating in the global Internet routing system.

IPv4 and IPv6 AS Core Graph, Jan 2013

This AS core visualization addresses one of CAIDA’s topology mapping project goals is to develop techniques to illustrate structural relationships and depict critical components of the Internet infrastructure. These IPv4 and IPv6 graphs show the relative growth of the two Internet topologies, and in particular the steady continued growth of the IPv6 topology. Although both IPv4 and IPv6 topologies experienced a lot of churn, the net change in number of ASes was 3,290 (10.7%) in our IPv4 graph and 495 (25.7%) in our IPv6 graph.

In order to improve our AS Core visualization over previous years, this year we made two major refinements to our graphing methodology, including how we rank individual ASes. First, we now rank ASes based on their transit degree rather then their outdegree. Second, we now infer links across Internet eXchange (IX) point address space, rather than considering the IX itself a node to which various ISPs attach. Details at http://www.caida.org/research/topology/as_core_network/.

[For details on a more sophisticated methodology for ranking AS interconnectivity, based on inferring AS relationships from BGP data, see http://www.caida.org/data/active/as-relationships/.]

CAIDA’s Annual Report for 2012

Wednesday, July 31st, 2013 by kc

[Executive Summary from our annual report for 2012.]

This annual report covers CAIDA’s activities in 2012, summarizing highlights from our research, infrastructure, data-sharing and outreach activities. Our research projects span Internet topology, routing, traffic, economics, future Internet architectures, and policy. Our infrastructure activities continue to support measurement-based studies of the Internet’s core infrastructure, with focus on the health and integrity of the global Internet’s topology, routing, addressing, and naming systems. In 2012 we increased our participation in future Internet research in two dimensions: measuring and modeling IPv6 deployment; and an expanded role (in management) of the Named Data Networking project, one of the NSF-funded future Internet architecture projects headed into its third year. We also began a project to study large-scale Internet outages via correlation of a variety of disparate sources of data.

(more…)

network mapping and measurement conference

Tuesday, May 28th, 2013 by kc

I had the honor of presenting an overview of CAIDA’s recent research activities at the Network Mapping and Measurement Conference hosted by Sean Warnick and Daniel Zappala. Talks topics included: social learning behavior in complex networks, re-routing based on expected network outages along current paths, twitter data mining to analyze suicide risk factors and political sentiments (three different talks). James Allen Evans gave a sociology of science talk, an interview form of which seems to be achived by the Oxford Internet Institute. The organizers even arranged a talk from a local startup, NUVI, doing some fascinating real-time visualization and analytics of social network data (including Twitter, Facebook, Reddit, Youtube).

The workshop was held at Sundance, Utah, one of the most beautiful places I’ve ever been for a workshop. This workshop series was originally DoD-sponsored with lots of government attendees interested in Internet infrastructure protection, but sequester and travel freezes this year yielded only two USG attendees, and budget constraints may keep this workshop from happening again next year. I hope not, it was really a unique environment and exposed me to a range of work I would not otherwise have discovered anytime soon. Kudos to the organizers and sponsors.

Third Workshop on Internet Economics (WIE2012)

Friday, April 19th, 2013 by kc

As part of our NSF-funded network research project on modeling Internet interconnection dynamics, David Clark (MIT) and I hosted the second Workshop on Internet Economics (WIE2012) last December 12-13. The goal of the workshop was to provide a forum for researchers, commercial Internet facilities and service providers, technologists, economists, theorists, policy makers, and other stakeholders to empirically inform emerging regulatory and policy debates. The theme for this year’s workshop was “Definitions and Data”. The final report describes the discussions and presents relevant open research questions identified by workshop participants. Slides presented at the workshop are available at the workshop home page. From the intro (but the full report (6-page pdf) is worth reading):
(more…)

2001:deba:7ab1:e::effe:c75

Tuesday, January 22nd, 2013 by rob

[This blog entry is guest written by Robert Beverly at the Naval Postgraduate School.]

In many respects, the deployment, adoption, use, and performance of IPv6 has received more recent attention than IPv4. Certainly the longitudinal measurement of IPv6, from its infancy to the exhaustion of ICANN v4 space to native 1% penetration (as observed by Google), is more complete than IPv4. Indeed, there are many vested parties in (either the success or failure) of IPv6, and numerous IPv6 measurement efforts afoot.

Researchers from Akamai, CAIDA, ICSI, NPS, and MIT met in early January, 2013 to firstly share and make sense of current measurement initiatives, while secondly plotting a path forward for the community in measuring IPv6. A specific objective of the meeting was to understand which aspects of IPv6 measurement are “done” (in the sense that there exists a sound methodology, even if measurement should continue), and which IPv6 questions/measurements remain open research problems. The meeting agenda and presentation slides are archived online.

(more…)

CAIDA at the NSF Secure and Trustworthy Cyberspace (SaTC) Principal Investigators’ Meeting

Tuesday, December 4th, 2012 by Alberto

Last week CAIDA researchers (Alberto and kc) visited National Harbor (Maryland) for the 1st NSF Secure and Trustworthy Cyberspace (SaTC) Principal Investigators Meeting. The National Science Foundation’s SATC program is an interdisciplinary expansion of the old Trustworthy Computing program sponsored by CISE, extended to include the SBE, OCI, MPS, and EHR directorates. The SATC program also includes a bold new Transition to Practice category of project funding — to address the challenge of moving from research to capability — which we are excited and honored to be a part of.

(more…)

two recent workshop reports

Friday, July 27th, 2012 by kc

This month CCR published final reports from two our of workshops: our BGP/traceroute workshop last July 2011 (final report here or here) and AIMS-4 last February (final report here or here).

CAIDA’s Annual Report for 2011

Thursday, July 12th, 2012 by kc

[Executive Summary from our annual report for 2011.]

This annual report covers CAIDA’s activities in 2011, summarizing highlights from our research, infrastructure, data-sharing and outreach activities. Our current research projects span topology, routing, traffic, economics, future Internet architectures, and policy. Our infrastructure activities continue to support measurement-based studies of the Internet’s core infrastructure, with focus on the health and integrity of the global Internet’s topology, routing, addressing, and naming systems. We are also dedicating resources to support the infrastructure measurement and data sharing interests and needs of two U.S. federal agency programs: the National Science Foundation’s International Research Network Connections (IRNC) program, and the Department of Homeland Security’s Protected Repository of Data on Internet CyberThreats (PREDICT) data-sharing project.

(more…)