Archive for the 'Updates' Category

Further Improvements to the Internet Data Measurement Catalog (DatCat)

Tuesday, August 26th, 2014 by josh

Internet researchers and metadata enthusiasts,

In response to feedback and guidance from contributors and users, we continue to refine the Internet Measurement Data Catalog (DatCat). To encourage additional contributions, we have streamlined the DatCat data model and minimized the number of required metadata fields. Specifically, we eliminated the Data and Package objects and merged their most important information into relevant Collections. We also made dozens of other little improvements all over the code base.

We invite folks to browse the catalog, create an account, and contribute some metadata to the catalog to help document the existence and availability of Internet measurement data.

Cheers.

NSF Future Internet Architecture (Next Phase) PI Meeting

Thursday, June 5th, 2014 by josh

On 19-20 May 2014, the NSF Computer and Network Systems (CNS) Core Programs hosted a kickoff meeting in Washington D.C. for the next phase of the Future Internet Architectures Program. The program funds three projects for an additional two years each to create and demonstrate prototype implementations of their architecture protocol suites and test and evaluate them in one or more relevant application environments. The meeting allowed the projects to present overviews of their architectures and the environments in which they plan to test them, as well as their thoughts on how their architecture may shift the balance of power among players in the Internet ecosystem, and other ideas on how to evaluate their architecture’s benefits and incentives to deploy. CAIDA participates in the Named-Data Networking Project (NDN), one of the three projects that receive funding from the FIA NP Program. The NDN team’s presentations at this meeting are posted at http://named-data.net/publications/presentations/.

CAIDA’s Annual Report for 2013

Tuesday, June 3rd, 2014 by kc

[Executive Summary from our annual report for 2013:]

This annual report covers CAIDA’s activities in 2013, summarizing highlights from our research, infrastructure, data-sharing and outreach activities. Our research projects span Internet topology, routing, traffic, security and stability, future Internet architecture, economics and policy. Our infrastructure activities support measurement-based Internet studies, both at CAIDA and around the world, with focus on the health and integrity of the global Internet ecosystem.
(more…)

NDN for humans

Tuesday, April 22nd, 2014 by admin

Recently posted to the Named-Data Networking site:

In an attempt to lower the barriers to understanding this revolutionary (as well as evolutionary) way of looking at networking, three recently posted documents are likely to answer many of your questions (and inspire a few more):

(1) Almost 5 years ago, Van gave a 3+ hour tutorial on Content-Centric Networking for the Future Internet Summer School (FISS 09) hosted by the University of Bremen in Germany. We finally extracted an approximate transcript of this goldmine and are making it available, along with pointers to the slides and (4-part) video of his tutorial hosted by U. Bremen.

(Our FAQ answers the commonly asked question of How does NDN differ from Content-Centric Networking (CCN))

(2) A short (8-page) technical report, Named Data Networking, introducing the Named Data Networking architecture. (A version of this report will appear soon in ACM Computer Communications Review.)

(3) Another technical report exploring he potential social impacts of NDN: A World on NDN: Affordances & Implications of the Named Data Networking Future Internet Architecture. This paper highlights four departures from today’s TCP/IP architecture, which underscore the social impacts of NDN: the architecture’s emphases on enabling semantic classification, provenance, publication, and decentralized communication. These changes from TCP/IP could expand affordances for free speech, and produce positive outcomes for security, privacy and anonymity, but raise new challenges regarding data retention and forgetting. These changes might also alter current corporate and law enforcement content regulation mechanisms by changing the way data is identified, handled, and routed across the Web.

We welcome feedback on these and any NDN publications.

CAIDA Delivers More Data To the Public

Wednesday, February 12th, 2014 by paul

As part of our mission to foster a collaborative research environment in which data can be acquired and shared, CAIDA has developed a framework that promotes wide dissemination of our datasets to researchers. We classify a dataset as either public or restricted based on a consideration of privacy issues involved in sharing it, as described in our data sharing framework document Promotion of Data Sharing (http://www.caida.org/data/sharing/).

Public datasets are available for downloaded from our public dataserver (http://data.caida.org) subject to conditions specified in our Acceptable Use Agreement (AUA) for public data (http://www.caida.org/home/legal/aua/public_aua.xml). CAIDA provides access to restricted datasets conditionally to qualifying researchers of academic and CAIDA-member institutions agreeing to a more restrictive AUA (http://www.caida.org/home/legal/aua/).

In January 2014 we reviewed our collection of datasets in order to re-evaluate their classification. As a result, as of February 1, we have converted several popular restricted CAIDA datasets into public datasets, including most of one of our largest and most popular data collections: topology data from the (now retired) skitter measurement infrastructure (operational between 1998 and 2008), and its successor, the Archipelago (or Ark) infrastructure (operational since September 2007). We have now made all IPv4 measurements older than two years (which includes all skitter data) publicly available. In addition to the raw data, this topology data includes derived datasets such as the Internet Topology Data Kits (ITDKs). Further, to encourage research on IPv6 deployment, we made our IPv6 Ark topology and performance measurements, from,December 2008 up to the present, publicly available as a whole. We have added these new public data to the existing category of public data sets, which includes AS links data inferred from traceroute measurements taken by skitter and Ark platforms.

Several other datasets remain under consideration for public release, so stay tuned. For an overview of all datasets currently provided by CAIDA (both public and restricted) see our data overview page (http://www.caida.org/data/overview/).

Support for this data collection and sharing provided by DHS Science and Technology Directorate’s PREDICT project via Cooperative Agreement FA8750-12-2-0326 and NSF’s Computing Research Infrastructure Program via CNS-0958547.

 

 

(Re)introducing the Internet Measurement Data Catalog (DatCat)

Monday, October 7th, 2013 by josh

In 2002, we began to create a metadata catalog where operators and other data owners could index their datasets. We were motivated by several goals we hoped the catalog would enable: data providers sharing data with researchers; researchers finding data to support specific research questions; promoting reproducibility of scientific results using Internet data; and correlating heterogeneous measurement data to analyze macroscopic Internet trends. This last goal was perhaps the most ambitious: we imagined a scenario where enough data would be richly enough indexed that the metadata itself would reveal macroscopic trends about Internet (traffic) characteristics, e.g., average packet size over time, average fraction of traffic carried via HTTP, without even needing to touch the underlying traffic data (netflow or pcap files).

To support this variety of uses of the catalog, we developed a rich metadata model that supported extremely precise descriptions of indexed data sets. For a given data set, one could specify: a description of a collection of files with similar purpose; scholarly paper, articles or publications that make use of the data; descriptions of the files containing the actual data and its format; the package format used for download; contact information; location of the data; a list of keywords; the size of the files/collection; the geographic, network, and logical location of the data; the platform used to collect the data; the start time, end time, and duration; and free form user notes. We allowed searching on any of these fields.

The catalog allows the user to not only index data but also flexibly group data sets into collections, link collections to entries describing the tools used to collect the data, and link collections to publications that used the data. We considered many corner cases and implemented our complex metadata model in an industrial strength relational database. We released the Internet Measurement Data Catalog (DatCat) in June of 2006, prepopulated with our own data sets and introduced via a hands-on workshop where we helped create scripts to assist other researchers in indexing their own data for contribution to the catalog.

In retrospect, we over-complicated the data model and the process of data indexing. Having undertaken data collection for years ourselves, we were familiar with the jargon used to describe precise characteristics of data and the variety of scenarios in which users collect Internet data. We tried to cover each and every possible case. We overshot. The result was a cumbersome and time-consuming interface. Based on feedback from those who took the time to contribute to DatCat, it became clear that we needed to streamline the submission interface. Further, we had built the original service atop an expensive, proprietary database that incurred unnecessary licensing costs.

In August 2011, NSF’s SDCI program provided support for three additional tasks building on what we learned: (1) reduce the burden on those contributing data via a streamlined interface and tools for easier indexing, annotation and navigation of relevant data; (2) convert from use of a proprietary database backend (Oracle) to a completely open source solution (Postgresql); and (3) expand DatCat’s relevance to the cybersecurity and other research communities via forums.

The new database objects have drastically fewer required fields so that contributors can more easily enter new dataset collections. The new streamlined collections require only collection name, short description, and summary fields. We have the new DatCat web site back online serving with the new open-source Postgresql database backend and streamlined interface. Also, we developed a public forums interface to hold discussions of data sharing issues and to answer frequently asked questions regarding the DatCat and the information it contains.

We hope that DatCat evolves to become a lightweight mechanism supporting operators and researchers who want to announce the availability and existence of datasets relevant to (cybersecurity) research. It could also assist NSF PIs with the new requirement that every proposal must include a data management plan for documenting types of data, data and metadata standards, policies for access, sharing, and provisions for re-use, re-distribution, and derivative works and location of archives. Finally, we hope the DatCat service will facilitate collaboration among cybersecurity and network researchers and operators around the world.

We now invite you to take a(nother) look at the Internet Measurement Data Catalog (DatCat). Please point your browser at http://imdc.datcat.org/, browse the catalog, run a few searches, crawl the keywords, create an account, and index your favorite dataset. Please send any questions or feedback to info at datcat dot org.

IPv4 and IPv6 AS Core 2013

Friday, August 9th, 2013 by bradley

We recently released a visualization at http://www.caida.org/research/topology/as_core_network/ that represents our macroscopic snapshots of IPv4 and IPv6 Internet topology samples captured in 2013. The plots illustrate both the extensive geographical scope as well as rich interconnectivity of nodes participating in the global Internet routing system.

IPv4 and IPv6 AS Core Graph, Jan 2013

This AS core visualization addresses one of CAIDA’s topology mapping project goals is to develop techniques to illustrate structural relationships and depict critical components of the Internet infrastructure. These IPv4 and IPv6 graphs show the relative growth of the two Internet topologies, and in particular the steady continued growth of the IPv6 topology. Although both IPv4 and IPv6 topologies experienced a lot of churn, the net change in number of ASes was 3,290 (10.7%) in our IPv4 graph and 495 (25.7%) in our IPv6 graph.

In order to improve our AS Core visualization over previous years, this year we made two major refinements to our graphing methodology, including how we rank individual ASes. First, we now rank ASes based on their transit degree rather then their outdegree. Second, we now infer links across Internet eXchange (IX) point address space, rather than considering the IX itself a node to which various ISPs attach. Details at http://www.caida.org/research/topology/as_core_network/.

[For details on a more sophisticated methodology for ranking AS interconnectivity, based on inferring AS relationships from BGP data, see http://www.caida.org/data/active/as-relationships/.]

CAIDA’s Annual Report for 2012

Wednesday, July 31st, 2013 by kc

[Executive Summary from our annual report for 2012.]

This annual report covers CAIDA’s activities in 2012, summarizing highlights from our research, infrastructure, data-sharing and outreach activities. Our research projects span Internet topology, routing, traffic, economics, future Internet architectures, and policy. Our infrastructure activities continue to support measurement-based studies of the Internet’s core infrastructure, with focus on the health and integrity of the global Internet’s topology, routing, addressing, and naming systems. In 2012 we increased our participation in future Internet research in two dimensions: measuring and modeling IPv6 deployment; and an expanded role (in management) of the Named Data Networking project, one of the NSF-funded future Internet architecture projects headed into its third year. We also began a project to study large-scale Internet outages via correlation of a variety of disparate sources of data.

(more…)

network mapping and measurement conference

Tuesday, May 28th, 2013 by kc

I had the honor of presenting an overview of CAIDA’s recent research activities at the Network Mapping and Measurement Conference hosted by Sean Warnick and Daniel Zappala. Talks topics included: social learning behavior in complex networks, re-routing based on expected network outages along current paths, twitter data mining to analyze suicide risk factors and political sentiments (three different talks). James Allen Evans gave a sociology of science talk, an interview form of which seems to be achived by the Oxford Internet Institute. The organizers even arranged a talk from a local startup, NUVI, doing some fascinating real-time visualization and analytics of social network data (including Twitter, Facebook, Reddit, Youtube).

The workshop was held at Sundance, Utah, one of the most beautiful places I’ve ever been for a workshop. This workshop series was originally DoD-sponsored with lots of government attendees interested in Internet infrastructure protection, but sequester and travel freezes this year yielded only two USG attendees, and budget constraints may keep this workshop from happening again next year. I hope not, it was really a unique environment and exposed me to a range of work I would not otherwise have discovered anytime soon. Kudos to the organizers and sponsors.

Third Workshop on Internet Economics (WIE2012)

Friday, April 19th, 2013 by kc

As part of our NSF-funded network research project on modeling Internet interconnection dynamics, David Clark (MIT) and I hosted the second Workshop on Internet Economics (WIE2012) last December 12-13. The goal of the workshop was to provide a forum for researchers, commercial Internet facilities and service providers, technologists, economists, theorists, policy makers, and other stakeholders to empirically inform emerging regulatory and policy debates. The theme for this year’s workshop was “Definitions and Data”. The final report describes the discussions and presents relevant open research questions identified by workshop participants. Slides presented at the workshop are available at the workshop home page. From the intro (but the full report (6-page pdf) is worth reading):
(more…)