Archive for the 'Commentaries' Category

top ten things lawyers should know about Internet research: #3

Friday, April 18th, 2008

[Jump to a Top Ten item: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10]

#3: Despite the methodological limitations of Internet science today, the few data points available suggest a dire picture:

  1. We’re running out of IPv4 addresseses that can be allocated (there are many allocated addresses that are not in observed use , but there is no policy support (yet) for reclamation or reuse ), and the purported technology solution ( IPv6 ) requires investment that most ISPs are not prepared to make . Regardless of whether Internet growth is supported by IPv6 or a concerted effort to scrape more lifetime out of the current IPv4 protocol, it will induce growth of core Internet routing tables relying on a routing system that is increasingly inappropriate for the Internet’s evolving structure. So while it’s fair to say that we need a new routing system , no institution or agency has responsibility for developing one much less the global econonomic and political challenge of deploying it.
  2. Pervasively distributed end-to-end peering to exchange information is not only threatening the integrity of the routing system, but also the business models of the ISPs . Although it bears noting that the business models for moving Internet traffic around have long been suspect, since the network infrastructure companies that have survived the bubble have done so by spending the last fifteen years manipulating the network architecture and the regulatory architecture away from the Internet architecture (smart endpoints) toward something they can control (smart network) in order to more effectively monetize their assets . Since the Internet architecture was originally designed to be a government-sponsored file-sharing network with no support for usage-based (or any) billing, its failure as a platform for a purely competitive telecommunication industry is unsurprising. But we are going to be so surprised..
  3. There are demonstrated vulnerabilities in the most fundamental layers of the infrastructure ( naming and routing ) for which technological solutions have been developed but have failed to gain traction under the political and economic constraints of real-world deployment. In the meantime, over 98% of traffic sent to root domain name servers is pollution.
  4. The common lawyerly assumption that “the Internet security situation must not be so bad because the network is still pretty much working” discounts the fact that criminals using the Internet need it to work just as well as the rest of us. Although we admit we don’t know how to measure the exact size of botnets what we know for sure is that millions of compromised (Windows) systems are taking advantage of network and host software vulnerabilities to support unknown (but underground estimates are many) billions of dollars per year of criminal activities (or activities that would be criminal if lawmakers understood enough to legislate against them) with no incentive framework to support their recovery. Although ICANN is trying to set policies to counter some of the malfeasance that arguably falls under its purview (domain names and IP addresses), ICANN lacks the architecture and legitimacy it needs to enforce any regulations , and continues to struggle more than succeed at its own mission .

We don’t have a lot of data about the Internet, but what little we have is unequivocally cause for concern..

top ten things lawyers should know about Internet research: #2

Thursday, April 17th, 2008

[Jump to a Top Ten item: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10]

#2: Our scientific knowledge about the Internet is weak, and the obstacles to progress are primarily issues of economics, ownership, and trust (EOT), rather than technical.

Economically, network research is perpetually behind network evolution — basic instrumentation can increase in cost 10X with one network upgrade, while network research budgets are lucky to stay even. But the ownership and trust obstacles are even greater: policy support for scientific Internet research has deteriorated along several dimensions since the National Science Foundation left the scene in 1995, and further when DARPA pulled out of funding academic networking research after 9/11. Some data points exposing the state of “Internet science”:

  1. Two decades of Internet research have failed to produce generally usable tools for bandwidth estimation, traffic modeling, usage characterization, traffic matrix estimation, topology mapping, or realistic Internet simulation, with progress primarily blocked on the ability to test them out in realistic network and traffic scenarios. A few researchers who do manage to get data via relationships of mutual trust (including CAIDA) are not allowed to share data with other researchers, inhibiting reproducibility of any result. Compared to established fields of science, it is hard to defend what happens in the field of Internet research as science at all.
  2. U.S. (and other) government agencies continue to spend hundreds of millions of dollars per year on network research — with cybersecurity research being the most fashionable this decade — funding researchers who almost never have any data from realistic operational networks. An illustrative example: the National Science Foundation’s program for Internet security research spends ~$35M/year on dozens of research projects, none of which have data from operational Internet infrastructure.
  3. Not only is traffic data off limits, but sharing data on the structure of the network is forbidden too — commercial ISPs are typically not even allowed to disclose the existence of peering agreements, much less their terms. So when developing tools for accurate Internet mapping, researchers cannot validate the connectivity inferences they make, since the information is typically intended to be secret.
  4. OECD published a 53-page report: Measuring security and trust in the online environment: a view using official data. As you may have guessed by now, the report about ‘measuring security’ is based on no measurements from any networks, only survey data reflecting user perceptions of their own security, which other studies have shown to be uncorrelated with reality. Another caveat: most security-related studies are published or funded by companies trying to sell more security software, their objectivity is also in dispute. Again, EOT factors render truth elusive.

top ten things lawyers should know about Internet research: #1

Wednesday, April 16th, 2008

[Jump to a Top Ten item: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10]last year Kevin Werbach invited me to his Supernova 2007 conference to give a 15-minute vignette on the challenge of getting empirical data to inform telecom policy. They posted the video of my talk last year, and my favorite tech podcast ITConversations, posted the mp3 as an episode last week. i clearly needed more than 15 minutes..

in response to my “impassioned plea”, i was invited to attend a meeting in March 2008 hosted by Google and Stanford Law School — Legal Futures — a “conversation between some of the world’s leading thinkers about the future of privacy, intellectual property, competition, innovation, globalization, and other areas of the law undergoing rapid change due to technological advancement.'’ there i had 5 minutes to convey the most important data points I knew about the Internet to lawyers thinking about how to update legal frameworks to best accommodate information technologies in the 21st century. Google will be posting the talks from this meeting too, but since I probably left even more out at that meeting, I will post my top ten list of the most important things we need lawyers to understand about the Internet..one per day for the next ten days.

#1: updating legal frameworks to accomodate technological advancement requires first updating other legal frameworks to accommodate empirically grounded research into what we have built, how it is used, and what it costs to sustain.

there is increasing recognition that various legal frameworks (from copyright to privacy to wiretapping to common carriage) need updating in light of technological developments of the last few decades. unfortunately, the light is too dim to really understand Internet behavior, usage patterns, architectural limitations, and economic constraints, because current legal frameworks for network provisioning also prevent sharing of data with researchers to scientifically investigate any of these questions. even for data that is legal to share, there are overwhelming counterincentives to sharing any data at all in the competitive environment we have chosen — although not achieved — for the network provisioning industry.

so while i support updating legal frameworks to be congruent with reality, i think we need to first confront that we have no basis for claiming what reality is yet.

no aphorism is more frequently repeated…than that we must ask Nature few questions, or ideally, one question at a time. The writer is convinced that this view is wholly mistaken. Nature, he suggests, will best respond to a logically and arefully thought out questionnaire; indeed if we ask her a single question, she will often refuse to answer until some other topic has been dicussed.
Sir Ronald A. Fisher, Perspectives in Medicine and Biology, 1973.

-k.

measuring broadband penetration

Sunday, March 30th, 2008

the U.S. FCC is trying to improve the way it measures broadband penetration, though the primary mode of measurement is still gathering data from the providers themselves. some meta-data on how the big three (verizon, att, tw) track penetration of their network infrastructures for the last year:

  1. every month verizon sends my mom a bill for her landline service in rural north carolina, containing a glossy flyer: “Get DSL in your area! call now!” every year she calls only to find out that verizon still doesn’t serve her house with broadband. it should not be a big shock that even verizon does not know who verizon serves with broadband, since just one merger ago they were emitting $9B accounting errors (counting doesn’t seem to be one of their strengths), but i don’t think verizon is the nuttiest one on stage here if the fcc is relying on them for broadband penetration numbers. i hope the census bureau is cogitating.
  2. in my area ATT charges you $25/month less for DSL if you have a $5/month ATT landline. how many landline customers are just trying to subsidize their DSL costs, but would rather have $5/month more Internet bandwidth instead?
  3. self-measurement of cabletv penetration is no better: when i tried to cancel my cable tv but just keep my cable modem service, my cable company offered to drop my monthly bill by $25 if i would just keep the tv content streaming to my wall. i asked if i could pay them $25/month for more Internet bandwidth instead of the tv bandwidth. that option is not even on their todo list..

europe is promising a quantum leap ahead of what the US is even attempting to measure:

“(…) by summer in the mid-term review of the i2010 strategy, I will publish a new indicator of broadband take-up in Europe that compares national performance, not only on broadband penetration but also geographic coverage, speed, competition and price.” This is important, since penetration only doesn’t tell the whole story. Compare the OECD Broadband Portal

a strategy review based on empirical data? i wish we had thought of that.

k.

DITL 2008: phase one complete.

Friday, March 28th, 2008

CAIDA, ISC, OARC, and The Measurement Factory managed to repeat our annual Day in the Life of the Internet data collection experiment this year — using a 2-day window of 18-19 March 2008. As with last year’s DITL (DITL2007 announcement, DITL2007 summary), we tried to capture a complete 48-hour interval of traffic to as many DNS root nameservers as could participate, and also invited other data providers to participate on terms compatible with their data sharing policies. if you engage in ongoing measurement of an operational network, and collected data for some or all of 18-19 mar 2008, it’s not too late to contribute data or metadata to DITL2008!

we gathered much more data than last year (2.5X more in bytes) with considerably less pain. So although it reflects only a slice of Internet activity, we believe this is (again) the largest synchronized data set about the Internet ever made available to the research community. the focus was again on DNS data, thanks to an NSF grant supporting measurement and analysis of the DNS and the tremendous cooperation of DNS operators around the world, including the rootops. So far we have DNS data from:

  • Root operators: A, C, E, F, H, old-J, K, & L (with B and M to come)
  • ccTLD operators: at, br, cl, cz, uk, se (& hopefully one more)
  • RIR in-addr operators: APNIC, LACNIC
  • gTLD operators: org
  • AS112 operators: Camel/8086, ISC, NaMEX, NIX.CZ, Qwest, WIDE
  • ORSN operators: Brave

Other types of DITL data we will index: caching resolver DNS logs (Level3); topology data (NetDIMES, CAIDA); UniRoma; BGP data (from CERT); Open DNS resolver survey (TMF); BGP tables (Routeviews, RIPE); anonymized packet headers (universities in Korea).

If we’ve missed anyone and/or you have data to submit, please let us know ASAP at ditl-info (@ this domain). Our DITL coverage map reflects 2 Terabytes of data so far, and continues to update as data comes in.

OARC is supporting the DITL experiment with disk space, networking and sysadmin resources, and handling the acceptable use policy (AUP) process for research use of the DNS root nameserver data. for datasets uploaded to the OARC servers, we will do some cleaning, timestamp correction and other curation such as binning the data in hourly files to make it easier to index and use. CAIDA and TMF will then work with the data providers to index the collected data so that researchers can correlate these heterogeneous datasets through the DatCat internet measurement data catalog. we hope to accomplish as much of the indexing work as we can prior to requesting input and review from the collectors, but interested data providers should point their browsers at DatCat, create an account, and familiarize yourself with the metadata fields. if you are interested in anonymizing the data yourself before contributing, we recommend Crypto-Pan, or ask us if you need help.

We recently completed a geeky comparison of DNS measurements for DITL2006 and DITL2007 which has some nice graphics as well as We recently completed compelling conclusions, pasted here:

  1. The anycast deployment of DNS root nameservers appears stable, efficient, and responsive to clients’ needs. Anycast instances cover all continents bringing a better service to the worldwide population of users.
  2. The overall query traffic experienced by the roots continues to grow. The observed 2007 query rate and client rate was 1.5-3X above their observed values in 2006
  3. The proportion of invalid traffic, i.e., DNS pollution, hitting the roots is still high, over 99% of the queries should not even be sent to the root servers. We found an extremely strong correlation both years: the higher the query rate of a client, the lower the fraction of valid queries.
  4. Repeated, identical and “referral-not-cached” queries constituted 69% of the total load on the roots during the 2007 observations. We are not in a position to evaluate the cost of this pollution to the root operators or to the Internet, nor the cost of cleaning it up. Some sources of this pollution could be mitigated by DNS operators locally serving common zones. For the March 2008 DITL experiment we will further investigate the patterns of these invalid queries and identify other ways to reduce the DNS pollution at the roots.
  5. About 40% of clients observed in 2006 and 2007 support EDNS, an extension mechanism that enables DNS to support larger queries needed for IPv6 and DNSSEC deployment.
  6. ORSN servers are subject to similar traffic composition and anomalies seen at the official DNS roots, in proportion to the reduced workload served.

we hope to have a comparison to 2008 data up sooner this year, if you have additional questions you want answered with DITL data, please comment or send mail to ditl-info (@ this domain).

thank you again for your willingness to contribute to this experiment. please don’t hesitate to ask any questions.

regards, caida, isc, tmf

“we should be able to do a much better job at modeling Internet attacks”

Tuesday, March 25th, 2008

one of my favorite program managers is posed the following question by senior mangament at his defense-related funding agency: “we should be able to do a much better job modeling internet attacks. what research can we fund that would enable us to do a better job at modeling internet attacks?”

because i happened to be reading a recent paper by Aaron Burstein of UC Berkeley, “Toward a Culture of Cybersecurity Research”, i was familiar with this quote:

(5) Accordingly, Federal investment in computer and network security research and development must be significantly increased to -

  1. improve vulnerability assessment and technological and systems solutions;
  2. expand and improve the pool of information security professionals, including researchers, in the United States workforce; and
  3. better coordinate information sharing and collaboration
    among industry, government, and academic research projects.

http://caselaw.lp.findlaw.com/casecode/uscodes/15/chapters/100/sections/section_7401.html

which almost hits on the two biggest problems with cybersecurity research today: the research community is not allowed to study the network, and they are not allowed to study the software that runs on the majority of the components (hosts and routers) on the network. networks are generally not allowed to share data with each other, these are all considered proprietary systems on which independent research (by those who do not work for the corporation) is illegal.

it would be nice to be able to turn the cybersecurity research agenda into a technology agenda so we can throw technology R&D money at the problem. so i am sympathetic to the question: “what R&D can we fund?”
but ten years of little measurable progress in this area has made it clear that to the extent that we can fund technology to help, it will be technology that improves our ability to do (A), (B), and (C) above. to do “(A) vulnerability assessment”, we need to analyze the software running on the systems that compose the network: that’s a problem with software ownership, i.e., current law (copyright, trade secrets, EULAs). to do “(C) coordinated information sharing”, we need it to be legal as well as incentive-compatible for networked organizations to share data with each other. that’s also a policy rather than technology problem. “(B) expand the security and research workforce” is more obviously a policy problem, but spending tax dollars to incent scholarship will be wasted if the funded researchers are not able to study the real system.

the government can certainly fund technical activities to facilitate useful data sharing: technology needed to collect, analyze, catalog, and correlate datasets to delineate baseline from anomalous internet traffic and routing patterns; tools that empower users to measure their own networks and automatically contribute data to aggregated, anonymized repositories with legal protection; reputation management systems to support scalable information sharing across vast admistrative boundaries. but these are all going to be impotent weapons against the growing illicit activity on the network if we don’t give ourselves the advantage the criminal actors have had from the beginning: data sharing (in their case, also selling to eachother). so there is reason to believe that we are learning more slowly than they are.

k.

internet infrastructure economics: top ten things i have learned so far

Sunday, October 7th, 2007

[ in sept 2007 i was privileged to attend an invitation-only intensely interactive workshop on the topic of Internet infrastructure economics. participants included economists, network engineers, infrastructure providers, network service providers, regulatory experts, investment analysts, application designers, academic researchers/professors, entrepreneurs/inventors, biologists, oceanographers. almost everyone in more than one category. lots of bloggers. we were all asked to write up a summary of what we learned over the 2.5 days. with permission to anonymize workshop sources of my learnings and post them here. -k. ]


  1. there is dismal progress on the topic of network economics for the same reason there is dismal progress on network science: measurement. lack of empirically grounded provisioning models is also what killed the first round of munis (”whew.”)
  2. the biggest reason we don’t have support for understanding the internet’s ominous structural problems is that we don’t yet have a sustainable business model for internet transport itself, so we’ve no capital to study it much less invest in solutions. in other universes this is pronounced ‘public utility’, but neither the netizens nor the network architecture seem prepared for the implications of the phrase. (the tubes are agnostic. alas, their owners are not.) [see (5) below]
  3. edge vs core economics:
    1. in 1984 att’s ceo decided to break off the edges and keep long distance (the core) because at the time it looked liked 1/3 of cost and 2/3 revenue were in long distance. (”oops.”)
    2. in 1994 when the U.S. govt got out of the business of providing a core IP infrastructure [nsfnet], the first thing the academic edge (universities) did was get together and buy a core [internet2.edu]. out in the ‘real world’, it took another decade for the edge to buy the core (sbc << att, verizon << mci), accompanied by an Economist cover story "how the Internet killed the phone business" [still considered premium content by economist.com. the irony pierces.]
    3. by 2004 anyedge who could afford it built/bought a core because when you have enough edge, the core starts to look significant. and affordable.
  4. opex has become completely uncorrelated to capex (as percentage of total cost of provisioning infrastructure), which is bad news for sustainable economic modeling. “playing the last century’s game” doesn’t work any better than “fighting the last century’s war”. [unless the game is capturing regulators..]
  5. the way the conversation is framed determines which goals get respect: (1) engineers: protocols/architecture (2) telcos: tubes (3) netizens: conversation/relationships. to effect change, you need to understand the issues that the people in your current conversation care about.
  6. people are afraid of governments getting involved in internet provisioning because ‘the technology is changing so fast’. despite that neither the network nor transport layer [the “internet” layers] have changed in the last 25 years. fact: the last time the internet’s network layer ‘innovated’, it was not only under government control, it was under U.S. military (DOD) control. (how NSF’s geni.net plans to pull it off this century still eludes me.)
  7. even given the demonstrated low correlation between funding levels for schools and student performance, many believe that investing more in our failing educational system is a better use of tax dollars than making sure every child is connected to all the world’s knowledge with an affordable, reliable, secure open source platform they can read, modify, and share. confusing.
  8. i had some sense knocked into my head regarding the danger of using the technical meaning of “hierarchical” when the world predominantly hears the political meaning. the technical reality is that scalability of the current internet routing system does rely on aggegration, currently implemented via hierarchical allocation and controlling announcement of address space into the global routing system. given that the network topology is naturally evolving away from the type of structure that makes the current routing system efficient, it’s fair to say that we need a new routing system. but, i should be more careful w loaded words, since (5) above.
  9. policies are outputs and inputs to co-evolving complex systems.
    1. the “invisible hand” effect of the market is an emergent property that depends on legal infrastructure to support it. e.g, sustainable property rights, contract law, reasonable/non-discrimatory access to infrastructure. [related reading: david brin’s essay on “accountability arenas”]
    2. historically common carriage had nothing to do with monopoly or public utility. (public utility law is a derivative of common carriage law).
  10. in the coming decade we face ominous problems under the hood of the internet architecture (running out of ipv4 addresses; the only purported technology solution will create an even worse problem if it manages to deploy; no serious r&d attention to the issue; demonstrated vulnerabilities in the most fundamental layers of the infrastructure (naming and routing); tens of millions of compromised windows systems taking advantage of these and other vulnerabilities to support unknown billions of dollars per year of criminal or shady activities with no incentive framework to support their recovery; massive amounts of dormant legacy address space with no known ownership and no way to regulate or execute reclamation/reuse; a “government” (icann) that can’t call itself that so it struggles to apply principles of good governance); and it turns out that in the last 5 years the United States — home of the creativity, inspiration and enlightened government forces (across several different agencies) that gave rise to the Internet in the first place — has thoroughly jettisoned 8 centuries of common carriage law that we critically relied on to guide public policy in equitably provisioning this kind of good in society, including jurisprudence and experience in determining ‘unreasonable discrimination’.and our justification for this abandonment of eight centuries of common law is that our “government” — and it turns out most of our underinformed population (see (1) above) — believes that market forces will create an open network on their own. which is a particularly suspicious prediction given how the Internet got to where it is today:in the 1960s the US government funded people like vint cerf and steve crocker to build an open network architected around the ‘end to end principle’, the primary intended use of which was CPU and file sharing among government funded researchers. [yes, the U.S. government fully intended to design, build, and maintain a peer-to-peer file-sharing network!]

    it was not until 1994 when the USG threw the architecture over the fence to the private sector to commercialize it that we saw what market forces would do to this open network. within ten years of this famous policy decision that the rest of the world followed, amidst much irrational exhuberance, misled capital markets, and outright fraud clouding reality, but still, within the short span of ten years it became clear that, even if you were completely honest, there was no economically sustainable way to provide open end-to-end IP connectivity in a competitive free market. So, now, ten years later, agog with market forces, we see the open network architecture going away.

    and in response the government is still insisting that we should further deregulate the infrastructure provisioning models so that “market forces will create an open network” [– john kneuer, director of ntia.gov, at supernova 2007]

the power of myths is astounding. it’s as if chips have been implanted in our heads to prevent us from seeing facts right in front of us.

What We Believe about the world Matters to how we pursue political, economic, social, and science goals, so it really is worth investing energy to make sure that we believe things that are, according to the best available data, true. which is why i care so much about measurement.


measurement accuracy is the only fail-safe means of distinguishing what is true from what one imagines, and even of defining what true means.
..this simple idea captures the essence of the physicist’s mind and explains why they are always so obsessed with mathematics and numbers: through precision, one exposes falsehood.
a subtle but inevitable consequence of this attitude is that truth and measurement technology are inextricably linked. — robert b laughlin, a different universe

renewing u.s. telecommunications research

Tuesday, September 18th, 2007

as part of my interest in solving problems of the internet [as related to me by several dozen engineers of operational commercial Internet infrastructure], i pay attention to proposals to improve the conditions of telecommunications research, such as in april 2007 when a UCSD professor testified in front of the U.S. Senate Commerce Subcommittee about the results of a 2006 National Academy of Sciences workshop on Renewing U.S. Telecommunications Research. i looked inside the report for answers to the data sharing problem. i think they’re postponing that for later. instead i found these recommendations:

Our report’s first major recommendation reflected the view that a strong, effective telecommunications R&D program for the United States will require a greater role for government-sponsored and university research, and more funding of long-term research by industry.

To underscore the seriousness with which the study committee viewed the challenge, we made a bold recommendation, that the federal government establish a new research program with the objective of stimulating and coordinating research across industry, academia, and government. This proposed research program, called the Advanced Telecommunications Research Activity (ATRA), was envisioned as a hybrid of activities of the sort historically associated with DARPA (which through the ARPANET program managed a research portfolio, developed a vision, and convened industry and academia to build what would become the internet) and SEMATECH (which brought the semiconductor industry together, initially with some federal support to complement industry dollars, to fund joint research, development, and roadmapping activities).

[bizarre… i thought SEMATECH, and their other example EPRI, were examples of failures of industry-wide research consortiums.]

ATRA’s mission would be to (1) identify, coordinate, and fund telecommunications R&D, (2) foster major architectural advances, and (3) strengthen the U.S. telecommunications research capability. Key suggested steps for implementing ATRA are (1) establishment of mechanisms for carrying out project-based research; (2) establishment of advisory committees with high-level industry participation; (3) exploration of the need for R&D centers; and (4) establishment of a forum for key parties to discuss critical technology development, legal and policy issues.

the report also recommends that NSF, DARPA, and all segments of the U.S. telecommunications industry increase their support for fundamental research, using some entity like ATRA which would be significantly funded by industry as a way “to pool funds and other resources, spread risk, and share beneficial results.” the report mentions a drop in industry research participation but does not really discuss the measurement or data sharing problems. ironic, since the most legitimate justification for the need for an activity such as ATRA would be empirical data to support concerns about the characteristics and evolution of current telecommunication systems, especially the internet which seems to be catching on, but the committee, like scientists in general, does not have access to such data. i’m reminded of Licklider’s 1968 paper, “The Computer as a Communication Device”, which forty years ago acknowledged the cause of and proposed a cost-effective solution to “renewing telecommunications research”:

It is perhaps not surprising that there are incompatibilities between the requirements of computer systems and the services supplied by the common carriers, for most of the common-carrier services were developed in support of voice rather than digital communication. Nevertheless, the incompatibilities are frustrating. It appears that the best and quickest way to overcome them — and to move forward the development of interactive communities of geographically separated people– is to set up an experimental network of multiaccess computers. Computers would concentrate and interleave the concurrent, intermittent messages of many users and their programs so as to utilize wide-band transmission channels continuously and efficiently, with marked reduction in overall cost. [p.30]

..In the half-dozen communities, the computer systems research and development and the development of substantive applications mutually support each other. They are producing large and growing resources of programs, data, and know-how. But we have seen only the beginning. There is much more programming and data collection — and much more learning how to cooperate — to be done before the full potential of the concept can be realized. [p.31]

Plus ca change..

in the meantime, as more people worry about the core internet architecture reaching fundamental limits (running out of IPv4 addresses, routing system scalability limits which IPv6 will trigger, transport protocol performance in the face of vast ranges of link characteristics, insecure transactions to negotiate the routing (BGP) and naming (DNS) information), the question becomes more relevant each day: how do we effect innovation in the Internet core when we’ve chosen a policy framework that not only intentionally drives any profits available for innovation down to zero (go competition go!) but also pits the innovators in a competitive death match with the organizations they would need to cooperate with in order to innovate in the core?

the policy issues are harder than the technology issues, and yet again ignored in this report. so my guess is we will see a disaster that costs billions of dollars (noone’s counting spam, which easily costs billions of dollars a year, not counting the profit made by spammers.) before we see serious discussion of an ATRA-like activity. and then it would have to be framed internationally in order to be relevant.

k

what we can’t measure on the Internet

Sunday, August 26th, 2007

As the era of the NSFnet Backbone Service came to a close in April 1995, the research community, and the U.S. public, lost the only set of publically available statistics for a large national U.S. backbone. The transition to the commercial sector essentially eliminated the public availability of statistics and analyses that would allow scientific understanding of the Internet a macroscopic level.

In 2004 I compiled an (incomplete) list of what we generally can’t measure on the Internet, from a talk I gave on our NSF-funded project correlating heterogeneous measurement data to achieve system-level analysis of Internet traffic trends:

  1. for the most part we really have no idea what’s on the network
  2. can’t figure out where an IP address is
  3. can’t measure topology effectively in either direction, at any layer
  4. can’t track the propagation of a routing update across the Internet.
  5. can’t get a router to send you all available routes, just best routes
    (prevents realistic simulation of what-if scenarios)
  6. can’t get precise one-way delay from two places on the Internet
  7. can’t get an hour of packets from any backbone
  8. can’t get accurate flow counts from any backbone
  9. can’t get anything at all from the backbones [we used to have anonymized traces]
  10. can’t get topology information from providers
  11. can’t get accurate bandwidth or capacity info. not even along a path, much less per link
  12. can’t trust whois registry data
  13. no general tool for `what’s causing my problem now?
  14. privacy/legal issues deter research (& it was hard in a enlightened monarchy)
  15. privacy/legal issues deter measurement

    kc, 2004 NSF SCI PI meeting

Some caveats are in order:

  1. Although some of these phenomenon are possible to partially or imprecisely measure under certain instrumented circumstances, or within a single company, this data is not generally available for research use.
  2. There are a few small efforts underway that attempt to share existing data, e.g., PREDICT, Datapository, Datcat,
    Media Research Hub, but they all rely on voluntary data submissions and scant operational budgets which limits their use and impact.
  3. After 9/11, national security concerns led to an increase in measurement and access capability for law enforcment officials at both tax and consumer expense, but none of this measurement has (yet) been made available (even in anonymized form) for research use.
  4. After the telecom crash, ISPs also started to deploy more measurement capability, motivated by security concerns and perhaps even more by the need to better understand and manipulate their own traffic profiles to increase the return on their infrastructure investments.
  5. The academic network research community has (few, but loud) examples of egegriously poor judgment, e.g., deanonymizing anonymized traces without consulting those who gave you the data, violating the trust model of those who shared data, and giving providers even more reason to keep data taps closed.

So I don’t mean to imply that Internet measurement is not occuring; on the contrary; it has become clear that a growing number of segments of society have access to — and use — sensitive private network information on individuals for purposes we might not approve of if we knew how the data was being used. But the scientific research community as well as the public remains severely underinformed regarding any macroscopic characteristics of the Internet. And although the Internet seems to survive quite well without macroscopic measurement, I also note a few reasons to worry.

  1. the growing gap between operations and scientific research, and the continuing opacity of the sector to consumers, auditors, regulators, and the public illustrates Stiglitz’s information asymmetry — the telecom bubble, crashes, restatements, and indictments of this decade are just the beginning of this systemic weakness unless the imbalance is corrected.
  2. Legislators, regulators, and politicians are engaged in deep public policy debate regarding our communications fabric, a conversations rooted in empirical questions that we cannot answer well with the current state of data availability.
  3. While the core of the Internet continues its relentless evolution, scientific measurement and modeling of its systemic characteristics has largely stalled. What little measurement is occurring reveals some disturbing realities about the ability of the Internet’s architecture to serve society’s needs and expectations.

It is eye-opening to note that even throughout the several decades of U.S. government stewardship of the early Internet, the only statistics collected regularly were those required by government contract. Since the privatization of the Internet in 1994-5, the United States has embraced a policy (and others have followed) that has sacrificed this data access in exchange for other public policy goals, such as Internet market expansion unfettered by the kind of regulatory reporting requirements applied to telephone companies. In fact one can attribute much of the recent industry angst to the growth success of the 90s that rendered data transport so affordable.

But Internet growth in this country has started to slow according to OECD rankings, and in particular the differentiating parameter between the U.S. and those countries ahead of us in the rankings (Denmark, Netherlands, Iceland, Korea, Switzerland, Norway, Finland, Sweden, Canada, Belgium, UK, Luxembourg, France, and Japan) has been government policy, specifically regulations governing cooperative shared use of critical communication facilties.

So now, in addition to the data/science crisis inside the ivory tower, we have set of public policy crises out in the real world: how to most cost-effectively improve — and measure — high-speed access to the Internet for Americans? Incumbent duopolists promise that their proprietary QoS innovations will help, but they want to charge a heavy price: not sharing infrastructure facilities. That is, the proposed solution of the incumbent telco and cablecos is to take the United States in the opposite policy direction from every nation with greater broadband penetration than we have, in order to achieve greater broadband in the U.S. And they want us to accept this strategy with no empirical data from their networks upon which to base a discussion. This level of discourse makes the prospect of regulation seem less surprising, even less disconcerting, to those seeking a healthy competitive network environment.

Of course, the first question that comes up in the discussion of broadband penetration and growth is: what and how do we measure this? And it turns out that no one is happy with how the U.S. FCC measures broadband — not even the FCC. My goodness, what a long road we have ahead of us.

k.

It is fair to say that we need a new routing system

Wednesday, August 8th, 2007

i get this question a lot:

at the current churn rate/ratio, at what size does the
FIB need to be before it will not converge? (also sometimes pronounced ‘when will the current Internet routing architecture break?’)

a good question, has been asked many times, and afaik no one has provided any empirically grounded answer.

a few realities hinder our ability to answer this question.

  1. there are technology factors we can’t predict, e.g., moore’s law effects on hardware development.
  2. there are economics and policy and social factors we can’t predict, e.g., how much convergence-capable hardware will providers/vendors be able to afford, how those costs will affect consumer prices, how that will affect consumer uptake, network growth, and industry dynamics, how regulation affects all of the above.
  3. we have no data from providers on the dynamics of BGP and IGP interactions, much less network wide convergence, so the research community can’t provide any empirically grounded input into an answer.

note, however, that like the ‘when do we run out of address space?’ question, uncertainties in both technology progress and human behavior render any prediction of an actual convergence apocalypse timestamp rather sketchy, and i reckon someone with an agenda could devise parameters and ‘observe correlations’ to match their agenda.

also note that this does not mean we don’t have a problem, just like not having a validated ipv4 address exhaustion timestamp means does not mean we don’t have a problem with address exhaustion.

the reason we know we have a problem, and that it’s only a matter of time before we’ll need another approach to routing, is that the current system is inherently not scalable indefinitely, and in particular is inherently a poor fit to the topology and traffic engineering practices that underlie the ‘natural’ operations and evolution of the infrastructure.

this is why the IAB still has workshops about the issue even though they don’t actually have any empirical data in the workshop report, and whenever the report touches on this question ‘how long do we have?’, they add “Editor’s note: This is an area of much controversy/debate, so further investigation/community input is required” (those type of words are in the report many times, sometimes before and after the same paragraph (see section 4 on the scaling problem.)):

http://tools.ietf.org/group/iab/draft-iab-raws-report/draft-iab-raws-report-02.txt

with neither a ‘macroscopic data analysis’ directorate of the IETF (or IRTF) nor an industry structure that could give rise to such an activity, the IAB punts on the ’supporting empirical data’ aspect of the issue, and instead focuses on what it can contribute: engineers discussing/ establishing/documenting what we do know about ‘fundamental problems w scalability and proposed engineering approaches to solving them’. if you were at the nov06 ietf plenary when the IAB presented this workshop summary, you may recall a few people in the audience got up and said ‘what data are you even basing this sky-is-falling stuff on?’ and the IAB again acknowledged the data gap, said “we’ll get back to you” and afaik at no point did they provide any data. (if they did please let me know, we’ll publish the numbers..)

we’re not alone in wanting better quantitative data on this topic, but such data is not essential to recognizing or understanding the problem. better data would assist those trying to get attention and resources invested in a better routing system, but that’s (we are) a small and highly unprofitable market segment for those who have the data. i’m not giving up on the data challenge, but, in the meantime, it is fair to say that we need a new routing system.

k