Archive for April, 2008

top ten things lawyers should know about Internet research: #7

Wednesday, April 23rd, 2008

[Jump to a Top Ten item: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10]

#7: The traditional mode of getting data from public infrastructures to inform policymaking — regulating its collection — is a quixotic path, since the government regulatory agencies have as much reason to be reluctant as providers regarding disclosure of how the Internet is engineered, used, and financed.

For every other critical infrastructure in society we have devoted a government agency to its stewardship. The Internet was designed for a cooperative rather than competitive policy architecture, so its designers did not consider regulatory aspects. But as a communications infrastructure serving the public, most regulatory aspects of Internet fall under the jurisdiction of the agency who regulates the tubes it typically runs atop: in the United States that means the FCC. Unfortunately, the FCC is not completely up to speed on the Internet, and does not even approve of how it is measuring broadband penetration. The FCC has no empirical basis in fact nor apparent authority in a conversation about traffic, structure, pricing, or vulnerabilities on the network since it has no access to data from Internet infrastructure beyond what providers volunteer to provide. And yet little data is needed to reveal that the Internet’s underlying network architecture, implementation, and usage is fundamentally inconsistent with almost every aspect of our current communications and media policy architecture. The Internet sheds deep skepticism on current legal frameworks for copyright, wiretapping, and privacy, as well as transforms or destroys dozens of industries that hold great economic and political power today.

The national security components of Internet regulation, from wiretapping to disaster recovery to unstable leadership lamenting its budgetary and policy handicaps, inspire concern than hope. That over 1% of observed web pages are modified in flight without our knowledge is no source of comfort either.

Hence it should be no surprise if solutions to measurement, like other persistent problems of the Internet, require engaging deeply with economics, ownership and trust issues. Alas, Internet economics research is one of the few fields worse off than Internet traffic or topology research with regard to the ability to validate any models or assumptions. (If you think tcpdump and traceroute are replete with measurement error, you should try analyzing the economics of network infrastructure companies. And if you think packet header and internal topology data is hard to get, you should try to get financial numbers from the same companies broken out by service offered so you could see how the economics are actually evolving.)

Unfortunately (again) understanding the economics of the system is not where spare private or public sector capital is going. In the 1990’s the telecoms spent their capital suing each other and the government over laws so vaguely written as to defy consistent interpretation, much less measurable enforcement, across any two constituencies in the ecosystem. This decade we are spending our capital suing the telecoms for not suing the government after 9/11 when the government asked them to break laws that are just as outdated as the copyright laws. Thomas Jefferson would no doubt recommend rewriting all of it from scratch. Unfortunately the timing is bleak: these developments are occurring at a time when sustaining Internet growth (which, no, we still do not have good ways to measure..) will require extraordinary investment of capital, as well as realignment of incentives to promote cooperation among competitive players. Where does that capital and incentive to cooperate come from?

top ten things lawyers should know about Internet research: #6

Monday, April 21st, 2008

[Jump to a Top Ten item: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10]

#6: While the looming problems of the Internet indicate the need for a closer objective look, a growing number of segments of society have network measurement access to, and use, private network information on individuals for purposes we might not approve of if we knew how the data was being used.

To the extent that we are investing public or private sector dollars in trying to measure the Internet, they are not in pursuit of answers to questions related to the overall network infrastructure’s health, system efficiency or end-to-end performance, or any of the questions that engineers would recommend knowing about a communications system. The measurements happening today are either for national security or business purposes, which both have an incentive to maximize the amount of personal information they extract from the data. No one is investing in technology to learn about networks while minimizing the amount of privacy compromised in the process.

This inherent information asymmetry of the industry is at the root of our inability to verify claims regarding either security or bandwidth crises justifying controversial business practices that threaten an admittedly fuzzy, but increasingly popular concept of Internet access rights. Although the little data that researchers can scrape together, most of it from outside the U.S., do not support the “p2p is causing a bandwidth problem” claim, the press releases we see as a popular substitute for real data in the U.S. do support the claim that the current Internet transit business model is broken.

Whether the growth in traffic is due to http transport of user-generated video, or radically distributed peer-to-peer file sharing (also often video), there is strong evidence from network providers themselves that the majority of bytes on the network are people moving files from machine to machine, often the same files moving from a few sources to many users. Unfortunately, this evidence implies that the current network and policy architectures are astonishingly inefficient, and that clean slate Internet researchers should be thinking about how to create truly scalable interdomain routing and policy architectures that are content-centric, leverage our best understanding of the structure of complex networks, and still manage to respect privacy. No easy trick, especially with no viable deployment path for such a new architecture, at least in the U.S. where we have jettisoned the policy framework that allowed innovations like the Internet.

It should be no surprise if the status quo is unsustainable, since we are using the network quite differently from how it was intended. But if a new network architecture is needed, that’s a discussion that needs to include some validated empirical analysis of what we have already built. So long as the network infrastructure companies are so counterincented to share data, we will continue having to make trillion-dollar communication and technology policy decisions in the dark.

top ten things lawyers should know about Internet research: #5

Sunday, April 20th, 2008

[Jump to a Top Ten item: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10]

#5: Thus the research community is in the absurd situation of not being able to do the most basic network research even on the networks established explicily to support academic network research.

This inability to do research on our own research networks leads to unresolvable contradictions in our field of “science”, including on the most politically relevant network research questions of the decade: what are the costs and benefits of using QOS to support multiple service classes, to users as well as providers, and how should these service classes be determined? Two research papers on this same topic contradict each other — Why Premium IP Service Has Not Deployed (and Probably Never Will) from Internet2 (the U.S. research and education backbone) and The Evolving Internet - Traffic, Engineering, and Roles from ATT — with neither paper offering actual network data, although the Internet2 paper claims to be based on data from the Internet2 backbone. The ATT paper uses unsubstantiated numbers from unvalidated sources on the web and a model and simulation construction with parameters arranged to prove the need for the kind of traffic management behavior that ATT lobbyists are trying to justify to regulators and their customers.

As with many other questions about network architecture, behavior, and usage, there are valid (i.e., empirically validated) inferences to make regarding QoS versus the alternatives, which could immediately inform telecom and media policy, but researchers are not in a position to make them.

top ten things lawyers should know about Internet research: #4

Saturday, April 19th, 2008

[Jump to a Top Ten item: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10]

#4: The data dearth is not a new problem in the field; many public and private sector efforts have tried and failed to solve it.

  1. Information Sharing and Analysis Centers, such as those that exist for the financial services industry have been attempted several times, but there is no research activity or channel to share data with the research community, nor any independent analysis of the performance or progress of such a group.
  2. The National Science Foundation has spent at least $1M on CAIDA’s Internet measurement data catalog to support sharing of Internet measurements, but as a science and engineering funding agency, NSF could only fund the technical aspects of the data sharing activity: developing a database to support curation, indexing, and annotation of Internet data collected by researchers and providers. Since the real obstacles have to do with economic, ownership (legal), and trust (privacy) constraints rather than technology issues, this catalog has been less utilized than we hoped.
  3. Recognizing that the data sharing problem constitutes a threat to national security, the U.S. Department of Homeland Security (specifically, HSARPA) has spent 4 years developing a project — PREDICT — to facilitate protected sharing of realistic network data that will enable cybersecurity researchers to validate the network security research and technologies they develop. Unfortunately after four years the PREDICT project has not yet launched, and when it does it will not be able to include data on networks that serve the public, since the legal territory is too muddy for DHS lawyers to navigate while EFF lawsuits have everyone in the U.S. government skittish about acknowledging surveillance of any kind. Even the private networks that PREDICT can serve immediately, such as Internet2 (the research backbone in the U.S. serving a few hundred educational, commercial, government, and international partners) have lamented that the PREDICT framework does not solve their two biggest problems: sketchy legal territory, and fear of RIAA subpoenas and/or lawsuits. Meanwhile, other accounts (from non-objective parties, with no data sources) claim that the vast majority of traffic on the Internet is illegal by current laws, and ISPs should be held accountable for preventing this traffic. Given the exposure to copyright lawsuits for file-sharing (ironically, what the Internet was originally designed to do), the counterincentives to sharing data on operational networks grow stronger by the day..

top ten things lawyers should know about Internet research: #3

Friday, April 18th, 2008

[Jump to a Top Ten item: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10]

#3: Despite the methodological limitations of Internet science today, the few data points available suggest a dire picture:

  1. We’re running out of IPv4 addresseses that can be allocated (there are many allocated addresses that are not in observed use , but there is no policy support (yet) for reclamation or reuse ), and the purported technology solution ( IPv6 ) requires investment that most ISPs are not prepared to make . Regardless of whether Internet growth is supported by IPv6 or a concerted effort to scrape more lifetime out of the current IPv4 protocol, it will induce growth of core Internet routing tables relying on a routing system that is increasingly inappropriate for the Internet’s evolving structure. So while it’s fair to say that we need a new routing system , no institution or agency has responsibility for developing one much less the global econonomic and political challenge of deploying it.
  2. Pervasively distributed end-to-end peering to exchange information is not only threatening the integrity of the routing system, but also the business models of the ISPs . Although it bears noting that the business models for moving Internet traffic around have long been suspect, since the network infrastructure companies that have survived the bubble have done so by spending the last fifteen years manipulating the network architecture and the regulatory architecture away from the Internet architecture (smart endpoints) toward something they can control (smart network) in order to more effectively monetize their assets . Since the Internet architecture was originally designed to be a government-sponsored file-sharing network with no support for usage-based (or any) billing, its failure as a platform for a purely competitive telecommunication industry is unsurprising. But we are going to be so surprised..
  3. There are demonstrated vulnerabilities in the most fundamental layers of the infrastructure ( naming and routing ) for which technological solutions have been developed but have failed to gain traction under the political and economic constraints of real-world deployment. In the meantime, over 98% of traffic sent to root domain name servers is pollution.
  4. The common lawyerly assumption that “the Internet security situation must not be so bad because the network is still pretty much working” discounts the fact that criminals using the Internet need it to work just as well as the rest of us. Although we admit we don’t know how to measure the exact size of botnets what we know for sure is that millions of compromised (Windows) systems are taking advantage of network and host software vulnerabilities to support unknown (but underground estimates are many) billions of dollars per year of criminal activities (or activities that would be criminal if lawmakers understood enough to legislate against them) with no incentive framework to support their recovery. Although ICANN is trying to set policies to counter some of the malfeasance that arguably falls under its purview (domain names and IP addresses), ICANN lacks the architecture and legitimacy it needs to enforce any regulations , and continues to struggle more than succeed at its own mission .

We don’t have a lot of data about the Internet, but what little we have is unequivocally cause for concern..

top ten things lawyers should know about Internet research: #2

Thursday, April 17th, 2008

[Jump to a Top Ten item: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10]

#2: Our scientific knowledge about the Internet is weak, and the obstacles to progress are primarily issues of economics, ownership, and trust (EOT), rather than technical.

Economically, network research is perpetually behind network evolution — basic instrumentation can increase in cost 10X with one network upgrade, while network research budgets are lucky to stay even. But the ownership and trust obstacles are even greater: policy support for scientific Internet research has deteriorated along several dimensions since the National Science Foundation left the scene in 1995, and further when DARPA pulled out of funding academic networking research after 9/11. Some data points exposing the state of “Internet science”:

  1. Two decades of Internet research have failed to produce generally usable tools for bandwidth estimation, traffic modeling, usage characterization, traffic matrix estimation, topology mapping, or realistic Internet simulation, with progress primarily blocked on the ability to test them out in realistic network and traffic scenarios. A few researchers who do manage to get data via relationships of mutual trust (including CAIDA) are not allowed to share data with other researchers, inhibiting reproducibility of any result. Compared to established fields of science, it is hard to defend what happens in the field of Internet research as science at all.
  2. U.S. (and other) government agencies continue to spend hundreds of millions of dollars per year on network research — with cybersecurity research being the most fashionable this decade — funding researchers who almost never have any data from realistic operational networks. An illustrative example: the National Science Foundation’s program for Internet security research spends ~$35M/year on dozens of research projects, none of which have data from operational Internet infrastructure.
  3. Not only is traffic data off limits, but sharing data on the structure of the network is forbidden too — commercial ISPs are typically not even allowed to disclose the existence of peering agreements, much less their terms. So when developing tools for accurate Internet mapping, researchers cannot validate the connectivity inferences they make, since the information is typically intended to be secret.
  4. OECD published a 53-page report: Measuring security and trust in the online environment: a view using official data. As you may have guessed by now, the report about ‘measuring security’ is based on no measurements from any networks, only survey data reflecting user perceptions of their own security, which other studies have shown to be uncorrelated with reality. Another caveat: most security-related studies are published or funded by companies trying to sell more security software, their objectivity is also in dispute. Again, EOT factors render truth elusive.

top ten things lawyers should know about Internet research: #1

Wednesday, April 16th, 2008

[Jump to a Top Ten item: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10]last year Kevin Werbach invited me to his Supernova 2007 conference to give a 15-minute vignette on the challenge of getting empirical data to inform telecom policy. They posted the video of my talk last year, and my favorite tech podcast ITConversations, posted the mp3 as an episode last week. i clearly needed more than 15 minutes..

in response to my “impassioned plea”, i was invited to attend a meeting in March 2008 hosted by Google and Stanford Law School — Legal Futures — a “conversation between some of the world’s leading thinkers about the future of privacy, intellectual property, competition, innovation, globalization, and other areas of the law undergoing rapid change due to technological advancement.'’ there i had 5 minutes to convey the most important data points I knew about the Internet to lawyers thinking about how to update legal frameworks to best accommodate information technologies in the 21st century. Google will be posting the talks from this meeting too, but since I probably left even more out at that meeting, I will post my top ten list of the most important things we need lawyers to understand about the Internet..one per day for the next ten days.

#1: updating legal frameworks to accomodate technological advancement requires first updating other legal frameworks to accommodate empirically grounded research into what we have built, how it is used, and what it costs to sustain.

there is increasing recognition that various legal frameworks (from copyright to privacy to wiretapping to common carriage) need updating in light of technological developments of the last few decades. unfortunately, the light is too dim to really understand Internet behavior, usage patterns, architectural limitations, and economic constraints, because current legal frameworks for network provisioning also prevent sharing of data with researchers to scientifically investigate any of these questions. even for data that is legal to share, there are overwhelming counterincentives to sharing any data at all in the competitive environment we have chosen — although not achieved — for the network provisioning industry.

so while i support updating legal frameworks to be congruent with reality, i think we need to first confront that we have no basis for claiming what reality is yet.

no aphorism is more frequently repeated…than that we must ask Nature few questions, or ideally, one question at a time. The writer is convinced that this view is wholly mistaken. Nature, he suggests, will best respond to a logically and arefully thought out questionnaire; indeed if we ask her a single question, she will often refuse to answer until some other topic has been dicussed.
Sir Ronald A. Fisher, Perspectives in Medicine and Biology, 1973.

-k.