Archive for the 'Data Collection' Category

DatCat and DITL (day-in-the-life) data used in classroom curriculum — anonymization revisited

Friday, January 23rd, 2009 by kc

I was delighted to see Sid Faber and Tim Shimeall co-teaching a “Network situational awareness” course at Carnegie-Mellon University last semester, using DatCat and DITL data, they even put the class projects online. Not only did some of the students use DITL data (contributed by Japanese academics), as well as Internet2’s netflow data, but they used DatCat to find both data sets. To quote Sid,

“About three weeks into the class, we finally got across one of the key features to the students: we were looking at how things really work on the internet, not just a theoretical discussion of RFCs. The data sets were invaluable, but we had challenges dealing with anonymization, sampling, and the overall volume of the data sets — kind of understandable for the first offering of the course.”

(more…)

proposition: International Bureau of Internet Statistics

Friday, January 9th, 2009 by kc

Last month I submitted two proposals to the National Cyber Leap Year call for input from the U.S. Networking Information Technology Research and Development (NITRD) Program. I submitted two ideas, the International Bureau of Internet Statistics, and Cooperative Measurement and Modeling of Open Networked Systems (COMMONS, a two-year old idea). The Bureau of Internet Statistics still strikes some as batty, but over the holidays I caught up on some panicky OECD state-of-malware-landscape papers on how uninformed we are and how little data we have, while the only concrete recommendation in the “ITU’s study on the financial aspects of network security: malware and spam” report was

Although the financial aspects of malware and spam are increasingly documented, serious gaps and inconsistencies exist in the available information. This sketchy information base also complicates finding meaningful and effective responses. For this reason, more systematic efforts to gather more reliable information would be highly desirable.

(more…)

the inevitable conflict between data privacy and science

Sunday, January 4th, 2009 by kc

Balancing individual privacy against other needs, such as national security, critical infrastructure protection, or even science, has long been a challenge for law enforcement, policymakers and scientists. It’s good news when regulations prevent unauthorized people from examining the contents of your communications, but current privacy laws often make it hard — sometimes impossible — to provide academic researchers with data needed to scientifically study the Internet. Our critical dependence on the Internet has rapidly grown much stronger than our comprehension of its underlying structure, performance limits, dynamics, and evolution, and unfortunately current privacy law is part of the problem — legal constraints intended to protect individual communications privacy also leave researchers and policymakers trying to analyze the global Internet ecosystem essentially in the dark. To make matters worse, the few data points suggest a dire picture, shedding doubt on the Internet’s ability to sustain its role as the world’s preferred communications substrate. In the meantime, Internet science struggles to make progress given much less available empirical data than most fields of scientific inquiry.

(more…)

Internet2 launching its own “IRB”

Friday, October 10th, 2008 by kc

I (and others) have spent a bit of time over the last year encouraging Internet2 to take a more proactive role in supporting network research. So I was delighted to see the proposal of a new network research review council, which I reckon will amount to a network-research-dedicated IRB for Internet2.For most researchers, Internet2 has the closest they will get to real large-scale network operators. Internet2 operators are more willing to expose pain points and obstacles they encounter, and Internet2 provides more data about itself to the public, than any other network I know, public or private. Even better, Internet2 management is also more capable of fostering effective, cross-disciplinary, scientific Internet research than the private sector, simply by virtue of their incentive structure.

(more…)

apostle of a new faith “whose miracles can be seen in front of people”

Sunday, August 24th, 2008 by kc

In April 2007 I was invited to David Isenberg’s Freedom to Connect (F2C) conference to participate on a panel about Yochai Benkler‘s new book, Wealth of Networks (amazon, pdf chapters). In Wealth of Networks, Yochai first observes that two phenomena — communication and computation — are becoming affordable and ubiquitous at the same time that they are each becoming fundamental as input as well as output to our economic systems. He then provides empirical evidence [wikipedia] that this ubiquitous availability of information technology (communication and computational resources, or in math speak, links and nodes) among actors enables forms of collaboration so enormously effective as to offer an alternative to traditional models of production, i.e., market-based or government-backed systems.

(more…)

top ten things lawyers should know about the Internet: #7

Wednesday, April 23rd, 2008 by kc

[Jump to a Top Ten item: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10]

#7: The traditional mode of getting data from public infrastructures to inform policymaking — regulating its collection — is a quixotic path, since the government regulatory agencies have as much reason to be reluctant as providers regarding disclosure of how the Internet is engineered, used, and financed.

(more…)

top ten things lawyers should know about the Internet: #5

Sunday, April 20th, 2008 by kc

[Jump to a Top Ten item: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10]

#5: Thus the research community is in the absurd situation of not being able to do the most basic network research even on the networks established explicily to support academic network research.

(more…)

top ten things lawyers should know about the Internet: #4

Saturday, April 19th, 2008 by kc

[Jump to a Top Ten item: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10]

#4: The data dearth is not a new problem in the field; many public and private sector efforts have tried and failed to solve it.

(more…)

top ten things lawyers should know about the Internet: #2

Thursday, April 17th, 2008 by kc

[Jump to a Top Ten item: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10]

#2: Our scientific knowledge about the Internet is weak, and the obstacles to progress are primarily issues of economics, ownership, and trust (EOT), rather than technical.

(more…)

top ten things lawyers should know about the Internet: #1

Wednesday, April 16th, 2008 by kc

[Jump to a Top Ten item: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10]
[Originally written as a series of blog entries, this document was later converted to a booklet/pamphlet, seeĀ  “Top Ten Things Lawyers Should Know About the Internet“]

last year Kevin Werbach invited me to his Supernova 2007 conference to give a 15-minute vignette on the challenge of getting empirical data to inform telecom policy. They posted the video of my talk last year, and my favorite tech podcast ITConversations, posted the mp3 as an episode last week. i clearly needed more than 15 minutes.

in response to my “impassioned plea”, i was invited to attend a meeting in March 2008 hosted by Google and Stanford Law School — Legal Futures — a “conversation between some of the world’s leading thinkers about the future of privacy, intellectual property, competition, innovation, globalization, and other areas of the law undergoing rapid change due to technological advancement.” there i had 5 minutes to convey the most important data points I knew about the Internet to lawyers thinking about how to update legal frameworks to best accommodate information technologies in the 21st century. Google will be posting the talks from this meeting too, but since I probably left even more out at that meeting, I will post my top ten list of the most important things we need lawyers to understand about the Internet..one per day for the next ten days.

#1: updating legal frameworks to accomodate technological advancement requires first updating other legal frameworks to accommodate empirically grounded research into what we have built, how it is used, and what it costs to sustain.

(more…)