what we can’t measure on the Internet

August 26th, 2007 by kc

As the era of the NSFnet Backbone Service came to a close in April 1995, the research community, and the U.S. public, lost the only set of publicly available statistics for a large national U.S. backbone. The transition to the commercial sector essentially eliminated the public availability of statistics and analyses that would allow scientific understanding of the Internet a macroscopic level.

In 2004 I compiled an (incomplete) list of what we generally can’t measure on the Internet, from a talk I gave on our NSF-funded project correlating heterogeneous measurement data to achieve system-level analysis of Internet traffic trends:

  1. for the most part we really have no idea what’s on the network
  2. can’t figure out where an IP address is
  3. can’t measure topology effectively in either direction, at any layer
  4. can’t track the propagation of a routing update across the Internet.
  5. can’t get a router to send you all available routes, just best routes
    (prevents realistic simulation of what-if scenarios)
  6. can’t get precise one-way delay from two places on the Internet
  7. can’t get an hour of packets from any backbone
  8. can’t get accurate flow counts from any backbone
  9. can’t get anything at all from the backbones [we used to have anonymized traces]
  10. can’t get topology information from providers
  11. can’t get accurate bandwidth or capacity info. not even along a path, much less per link
  12. can’t trust whois registry data
  13. no general tool for `what’s causing my problem now?
  14. privacy/legal issues deter research (& it was hard in a enlightened monarchy)
  15. privacy/legal issues deter measurement

    kc, 2004 NSF SCI PI meeting

Some caveats are in order:

  1. Although some of these phenomenon are possible to partially or imprecisely measure under certain instrumented circumstances, or within a single company, this data is not generally available for research use.
  2. There are a few small efforts underway that attempt to share existing data, e.g., PREDICT, Datapository, Datcat,
    Media Research Hub, but they all rely on voluntary data submissions and scant operational budgets which limits their use and impact.
  3. After 9/11, national security concerns led to an increase in measurement and access capability for law enforcement officials at both tax and consumer expense, but none of this measurement has (yet) been made available (even in anonymized form) for research use.
  4. After the telecom crash, ISPs also started to deploy more measurement capability, motivated by security concerns and perhaps even more by the need to better understand and manipulate their own traffic profiles to increase the return on their infrastructure investments.
  5. The academic network research community has (few, but loud) examples of egregiously poor judgment, e.g., deanonymizing anonymized traces without consulting those who gave you the data, violating the trust model of those who shared data, and giving providers even more reason to keep data taps closed.

So I don’t mean to imply that Internet measurement is not occurring; on the contrary; it has become clear that a growing number of segments of society have access to — and use — sensitive private network information on individuals for purposes we might not approve of if we knew how the data was being used. But the scientific research community as well as the public remains severely under-informed regarding any macroscopic characteristics of the Internet. And although the Internet seems to survive quite well without macroscopic measurement, I also note a few reasons to worry.

  1. the growing gap between operations and scientific research, and the continuing opacity of the sector to consumers, auditors, regulators, and the public illustrates Stiglitz’s information asymmetry — the telecom bubble, crashes, restatements, and indictments of this decade are just the beginning of this systemic weakness unless the imbalance is corrected.
  2. Legislators, regulators, and politicians are engaged in deep public policy debate regarding our communications fabric, a conversations rooted in empirical questions that we cannot answer well with the current state of data availability.
  3. While the core of the Internet continues its relentless evolution, scientific measurement and modeling of its systemic characteristics has largely stalled. What little measurement is occurring reveals some disturbing realities about the ability of the Internet’s architecture to serve society’s needs and expectations.

It is eye-opening to note that even throughout the several decades of U.S. government stewardship of the early Internet, the only statistics collected regularly were those required by government contract. Since the privatization of the Internet in 1994-5, the United States has embraced a policy (and others have followed) that has sacrificed this data access in exchange for other public policy goals, such as Internet market expansion unfettered by the kind of regulatory reporting requirements applied to telephone companies. In fact one can attribute much of the recent industry angst to the growth success of the 90s that rendered data transport so affordable.

But Internet growth in this country has started to slow according to OECD rankings, and in particular the differentiating parameter between the U.S. and those countries ahead of us in the rankings (Denmark, Netherlands, Iceland, Korea, Switzerland, Norway, Finland, Sweden, Canada, Belgium, UK, Luxembourg, France, and Japan) has been government policy, specifically regulations governing cooperative shared use of critical communication facilties.

So now, in addition to the data/science crisis inside the ivory tower, we have set of public policy crises out in the real world: how to most cost-effectively improve — and measure — high-speed access to the Internet for Americans? Incumbent duopolists promise that their proprietary QoS innovations will help, but they want to charge a heavy price: not sharing infrastructure facilities. That is, the proposed solution of the incumbent telco and cablecos is to take the United States in the opposite policy direction from every nation with greater broadband penetration than we have, in order to achieve greater broadband in the U.S. And they want us to accept this strategy with no empirical data from their networks upon which to base a discussion. This level of discourse makes the prospect of regulation seem less surprising, even less disconcerting, to those seeking a healthy competitive network environment.

Of course, the first question that comes up in the discussion of broadband penetration and growth is: what and how do we measure this? And it turns out that no one is happy with how the U.S. FCC measures broadband — not even the FCC. My goodness, what a long road we have ahead of us.

k.

One Response to “what we can’t measure on the Internet”

  1. Vitamin E Says:

    Thank God we still live in a world where you can get internet privacy, even if it comes at a price. Since we the people have been deemed unworthy to maintain our own internet privacy, what has the world come to?

Leave a Reply