DHS S&T PREDICT PI Meeting, Marina del Rey, CA

June 6th, 2014 by Josh Polterock

On 28-29 May 2014, DHS Science and Technology Directorate (S&T) held a meeting of the Principal Investigators of the PREDICT (Protected Repository for the Defense of Infrastructure Against Cyber Threats) Project, an initiative to facilitate the accessibility of computer and network operational data for use in cybersecurity defensive R&D. The project is a three-way partnership among government, critical information infrastructure providers, and security development communities (both academic and commercial), all of whom seek technical solutions to protect the public and private information infrastructure. The primary goal of PREDICT is to bridge the gap between producers of security-relevant network operations data and technology developers and evaluators who can leverage this data to accelerate the design, production, and evaluation of next-generation cybersecurity solutions.

In addition to presenting project updates, each PI presented on a special topic suggested by Program Manager Doug Maughan. I presented some reflective thoughts on 10 Years Later: What Would I Have done Differently? (Or what would I do today?). In this presentation, I revisited my 2008 top ten list of things lawyers should know about the Internet to frame some proposed forward-looking strategies for the PREDICT project in 2014.

Also noted at the meeting, DHS recently released a new broad agency announcement (BAA) that will contractually require investigators contribute into PREDICT any data created or used in testing and evaluation of the funded work (if the investigator has redistribution rights, and subject to appropriate disclosure control).

NSF Future Internet Architecture (Next Phase) PI Meeting

June 5th, 2014 by Josh Polterock

On 19-20 May 2014, the NSF Computer and Network Systems (CNS) Core Programs hosted a kickoff meeting in Washington D.C. for the next phase of the Future Internet Architectures Program. The program funds three projects for an additional two years each to create and demonstrate prototype implementations of their architecture protocol suites and test and evaluate them in one or more relevant application environments. The meeting allowed the projects to present overviews of their architectures and the environments in which they plan to test them, as well as their thoughts on how their architecture may shift the balance of power among players in the Internet ecosystem, and other ideas on how to evaluate their architecture’s benefits and incentives to deploy. CAIDA participates in the Named-Data Networking Project (NDN), one of the three projects that receive funding from the FIA NP Program. The NDN team’s presentations at this meeting are posted at http://named-data.net/publications/presentations/.

CAIDA’s Annual Report for 2013

June 3rd, 2014 by kc

[Executive Summary from our annual report for 2013:]

This annual report covers CAIDA’s activities in 2013, summarizing highlights from our research, infrastructure, data-sharing and outreach activities. Our research projects span Internet topology, routing, traffic, security and stability, future Internet architecture, economics and policy. Our infrastructure activities support measurement-based Internet studies, both at CAIDA and around the world, with focus on the health and integrity of the global Internet ecosystem.
Read the rest of this entry »

NDN for humans

April 22nd, 2014 by kc

Recently posted to the Named-Data Networking site:

In an attempt to lower the barriers to understanding this revolutionary (as well as evolutionary) way of looking at networking, three recently posted documents are likely to answer many of your questions (and inspire a few more):

(1) Almost 5 years ago, Van gave a 3+ hour tutorial on Content-Centric Networking for the Future Internet Summer School (FISS 09) hosted by the University of Bremen in Germany. We finally extracted an approximate transcript of this goldmine and are making it available, along with pointers to the slides and (4-part) video of his tutorial hosted by U. Bremen.

(Our FAQ answers the commonly asked question of How does NDN differ from Content-Centric Networking (CCN))

(2) A short (8-page) technical report, Named Data Networking, introducing the Named Data Networking architecture. (A version of this report will appear soon in ACM Computer Communications Review.)

(3) Another technical report exploring he potential social impacts of NDN: A World on NDN: Affordances & Implications of the Named Data Networking Future Internet Architecture. This paper highlights four departures from today’s TCP/IP architecture, which underscore the social impacts of NDN: the architecture’s emphases on enabling semantic classification, provenance, publication, and decentralized communication. These changes from TCP/IP could expand affordances for free speech, and produce positive outcomes for security, privacy and anonymity, but raise new challenges regarding data retention and forgetting. These changes might also alter current corporate and law enforcement content regulation mechanisms by changing the way data is identified, handled, and routed across the Web.

We welcome feedback on these and any NDN publications.

CAIDA Delivers More Data To the Public

February 12th, 2014 by Paul Hick

As part of our mission to foster a collaborative research environment in which data can be acquired and shared, CAIDA has developed a framework that promotes wide dissemination of our datasets to researchers. We classify a dataset as either public or restricted based on a consideration of privacy issues involved in sharing it, as described in our data sharing framework document Promotion of Data Sharing (http://www.caida.org/data/sharing/).

Public datasets are available for downloaded from our public dataserver (http://data.caida.org) subject to conditions specified in our Acceptable Use Agreement (AUA) for public data (http://www.caida.org/home/legal/aua/public_aua.xml). CAIDA provides access to restricted datasets conditionally to qualifying researchers of academic and CAIDA-member institutions agreeing to a more restrictive AUA (http://www.caida.org/home/legal/aua/).

In January 2014 we reviewed our collection of datasets in order to re-evaluate their classification. As a result, as of February 1, we have converted several popular restricted CAIDA datasets into public datasets, including most of one of our largest and most popular data collections: topology data from the (now retired) skitter measurement infrastructure (operational between 1998 and 2008), and its successor, the Archipelago (or Ark) infrastructure (operational since September 2007). We have now made all IPv4 measurements older than two years (which includes all skitter data) publicly available. In addition to the raw data, this topology data includes derived datasets such as the Internet Topology Data Kits (ITDKs). Further, to encourage research on IPv6 deployment, we made our IPv6 Ark topology and performance measurements, from,December 2008 up to the present, publicly available as a whole. We have added these new public data to the existing category of public data sets, which includes AS links data inferred from traceroute measurements taken by skitter and Ark platforms.

Several other datasets remain under consideration for public release, so stay tuned. For an overview of all datasets currently provided by CAIDA (both public and restricted) see our data overview page (http://www.caida.org/data/overview/).

Support for this data collection and sharing provided by DHS Science and Technology Directorate’s PREDICT project via Cooperative Agreement FA8750-12-2-0326 and NSF’s Computing Research Infrastructure Program via CNS-0958547.

 

 

(Re)introducing the Internet Measurement Data Catalog (DatCat)

October 7th, 2013 by Josh Polterock

In 2002, we began to create a metadata catalog where operators and other data owners could index their datasets. We were motivated by several goals we hoped the catalog would enable: data providers sharing data with researchers; researchers finding data to support specific research questions; promoting reproducibility of scientific results using Internet data; and correlating heterogeneous measurement data to analyze macroscopic Internet trends. This last goal was perhaps the most ambitious: we imagined a scenario where enough data would be richly enough indexed that the metadata itself would reveal macroscopic trends about Internet (traffic) characteristics, e.g., average packet size over time, average fraction of traffic carried via HTTP, without even needing to touch the underlying traffic data (netflow or pcap files).

To support this variety of uses of the catalog, we developed a rich metadata model that supported extremely precise descriptions of indexed data sets. For a given data set, one could specify: a description of a collection of files with similar purpose; scholarly paper, articles or publications that make use of the data; descriptions of the files containing the actual data and its format; the package format used for download; contact information; location of the data; a list of keywords; the size of the files/collection; the geographic, network, and logical location of the data; the platform used to collect the data; the start time, end time, and duration; and free form user notes. We allowed searching on any of these fields.

The catalog allows the user to not only index data but also flexibly group data sets into collections, link collections to entries describing the tools used to collect the data, and link collections to publications that used the data. We considered many corner cases and implemented our complex metadata model in an industrial strength relational database. We released the Internet Measurement Data Catalog (DatCat) in June of 2006, prepopulated with our own data sets and introduced via a hands-on workshop where we helped create scripts to assist other researchers in indexing their own data for contribution to the catalog.

In retrospect, we over-complicated the data model and the process of data indexing. Having undertaken data collection for years ourselves, we were familiar with the jargon used to describe precise characteristics of data and the variety of scenarios in which users collect Internet data. We tried to cover each and every possible case. We overshot. The result was a cumbersome and time-consuming interface. Based on feedback from those who took the time to contribute to DatCat, it became clear that we needed to streamline the submission interface. Further, we had built the original service atop an expensive, proprietary database that incurred unnecessary licensing costs.

In August 2011, NSF’s SDCI program provided support for three additional tasks building on what we learned: (1) reduce the burden on those contributing data via a streamlined interface and tools for easier indexing, annotation and navigation of relevant data; (2) convert from use of a proprietary database backend (Oracle) to a completely open source solution (Postgresql); and (3) expand DatCat’s relevance to the cybersecurity and other research communities via forums.

The new database objects have drastically fewer required fields so that contributors can more easily enter new dataset collections. The new streamlined collections require only collection name, short description, and summary fields. We have the new DatCat web site back online serving with the new open-source Postgresql database backend and streamlined interface. Also, we developed a public forums interface to hold discussions of data sharing issues and to answer frequently asked questions regarding the DatCat and the information it contains.

We hope that DatCat evolves to become a lightweight mechanism supporting operators and researchers who want to announce the availability and existence of datasets relevant to (cybersecurity) research. It could also assist NSF PIs with the new requirement that every proposal must include a data management plan for documenting types of data, data and metadata standards, policies for access, sharing, and provisions for re-use, re-distribution, and derivative works and location of archives. Finally, we hope the DatCat service will facilitate collaboration among cybersecurity and network researchers and operators around the world.

We now invite you to take a(nother) look at the Internet Measurement Data Catalog (DatCat). Please point your browser at http://imdc.datcat.org/, browse the catalog, run a few searches, crawl the keywords, create an account, and index your favorite dataset. Please send any questions or feedback to info at datcat dot org.

IPv4 and IPv6 AS Core 2013

August 9th, 2013 by Bradley Huffaker

We recently released a visualization at http://www.caida.org/research/topology/as_core_network/ that represents our macroscopic snapshots of IPv4 and IPv6 Internet topology samples captured in 2013. The plots illustrate both the extensive geographical scope as well as rich interconnectivity of nodes participating in the global Internet routing system.

IPv4 and IPv6 AS Core Graph, Jan 2013

This AS core visualization addresses one of CAIDA’s topology mapping project goals is to develop techniques to illustrate structural relationships and depict critical components of the Internet infrastructure. These IPv4 and IPv6 graphs show the relative growth of the two Internet topologies, and in particular the steady continued growth of the IPv6 topology. Although both IPv4 and IPv6 topologies experienced a lot of churn, the net change in number of ASes was 3,290 (10.7%) in our IPv4 graph and 495 (25.7%) in our IPv6 graph.

In order to improve our AS Core visualization over previous years, this year we made two major refinements to our graphing methodology, including how we rank individual ASes. First, we now rank ASes based on their transit degree rather then their outdegree. Second, we now infer links across Internet eXchange (IX) point address space, rather than considering the IX itself a node to which various ISPs attach. Details at http://www.caida.org/research/topology/as_core_network/.

[For details on a more sophisticated methodology for ranking AS interconnectivity, based on inferring AS relationships from BGP data, see http://www.caida.org/data/active/as-relationships/.]

CAIDA’s Annual Report for 2012

July 31st, 2013 by kc

[Executive Summary from our annual report for 2012.]

This annual report covers CAIDA’s activities in 2012, summarizing highlights from our research, infrastructure, data-sharing and outreach activities. Our research projects span Internet topology, routing, traffic, economics, future Internet architectures, and policy. Our infrastructure activities continue to support measurement-based studies of the Internet’s core infrastructure, with focus on the health and integrity of the global Internet’s topology, routing, addressing, and naming systems. In 2012 we increased our participation in future Internet research in two dimensions: measuring and modeling IPv6 deployment; and an expanded role (in management) of the Named Data Networking project, one of the NSF-funded future Internet architecture projects headed into its third year. We also began a project to study large-scale Internet outages via correlation of a variety of disparate sources of data.

Read the rest of this entry »

network mapping and measurement conference

May 28th, 2013 by kc

I had the honor of presenting an overview of CAIDA’s recent research activities at the Network Mapping and Measurement Conference hosted by Sean Warnick and Daniel Zappala. Talks topics included: social learning behavior in complex networks, re-routing based on expected network outages along current paths, twitter data mining to analyze suicide risk factors and political sentiments (three different talks). James Allen Evans gave a sociology of science talk, an interview form of which seems to be achived by the Oxford Internet Institute. The organizers even arranged a talk from a local startup, NUVI, doing some fascinating real-time visualization and analytics of social network data (including Twitter, Facebook, Reddit, Youtube).

The workshop was held at Sundance, Utah, one of the most beautiful places I’ve ever been for a workshop. This workshop series was originally DoD-sponsored with lots of government attendees interested in Internet infrastructure protection, but sequester and travel freezes this year yielded only two USG attendees, and budget constraints may keep this workshop from happening again next year. I hope not, it was really a unique environment and exposed me to a range of work I would not otherwise have discovered anytime soon. Kudos to the organizers and sponsors.

Carna botnet scans confirmed

May 13th, 2013 by Alistair King and Alberto Dainotti

On March 17, 2013, the authors of an anonymous email to the “Full Disclosure” mailing list announced that last year they conducted a full probing of the entire IPv4 Internet. They claimed they used a botnet (named “carna” botnet) created by infecting machines vulnerable due to use of default login/password pairs (e.g., admin/admin). The botnet instructed each of these machines to execute a portion of the scan and then transfer the results to a central server. The authors also published a detailed description of how they operated, along with 9TB of raw logs of the scanning activity.

Online magazines and newspapers reported the news, which triggered some debate in the research community about the ethical implications of using such data for research purposes. A more fundamental question received less attention: since the authors went out of their way to remain anonymous, and the only data available about this event is the data they provide, how do we know this scan actually happened? If it did, how do we know that the resulting data is correct?

Read the rest of this entry »