Archive for March, 2008

measuring broadband penetration

Sunday, March 30th, 2008

the U.S. FCC is trying to improve the way it measures broadband penetration, though the primary mode of measurement is still gathering data from the providers themselves. some meta-data on how the big three (verizon, att, tw) track penetration of their network infrastructures for the last year:

  1. every month verizon sends my mom a bill for her landline service in rural north carolina, containing a glossy flyer: “Get DSL in your area! call now!” every year she calls only to find out that verizon still doesn’t serve her house with broadband. it should not be a big shock that even verizon does not know who verizon serves with broadband, since just one merger ago they were emitting $9B accounting errors (counting doesn’t seem to be one of their strengths), but i don’t think verizon is the nuttiest one on stage here if the fcc is relying on them for broadband penetration numbers. i hope the census bureau is cogitating.
  2. in my area ATT charges you $25/month less for DSL if you have a $5/month ATT landline. how many landline customers are just trying to subsidize their DSL costs, but would rather have $5/month more Internet bandwidth instead?
  3. self-measurement of cabletv penetration is no better: when i tried to cancel my cable tv but just keep my cable modem service, my cable company offered to drop my monthly bill by $25 if i would just keep the tv content streaming to my wall. i asked if i could pay them $25/month for more Internet bandwidth instead of the tv bandwidth. that option is not even on their todo list..

europe is promising a quantum leap ahead of what the US is even attempting to measure:

“(…) by summer in the mid-term review of the i2010 strategy, I will publish a new indicator of broadband take-up in Europe that compares national performance, not only on broadband penetration but also geographic coverage, speed, competition and price.” This is important, since penetration only doesn’t tell the whole story. Compare the OECD Broadband Portal

a strategy review based on empirical data? i wish we had thought of that.

k.

DITL 2008: phase one complete.

Friday, March 28th, 2008

CAIDA, ISC, OARC, and The Measurement Factory managed to repeat our annual Day in the Life of the Internet data collection experiment this year — using a 2-day window of 18-19 March 2008. As with last year’s DITL (DITL2007 announcement, DITL2007 summary), we tried to capture a complete 48-hour interval of traffic to as many DNS root nameservers as could participate, and also invited other data providers to participate on terms compatible with their data sharing policies. if you engage in ongoing measurement of an operational network, and collected data for some or all of 18-19 mar 2008, it’s not too late to contribute data or metadata to DITL2008!

we gathered much more data than last year (2.5X more in bytes) with considerably less pain. So although it reflects only a slice of Internet activity, we believe this is (again) the largest synchronized data set about the Internet ever made available to the research community. the focus was again on DNS data, thanks to an NSF grant supporting measurement and analysis of the DNS and the tremendous cooperation of DNS operators around the world, including the rootops. So far we have DNS data from:

  • Root operators: A, C, E, F, H, old-J, K, & L (with B and M to come)
  • ccTLD operators: at, br, cl, cz, uk, se (& hopefully one more)
  • RIR in-addr operators: APNIC, LACNIC
  • gTLD operators: org
  • AS112 operators: Camel/8086, ISC, NaMEX, NIX.CZ, Qwest, WIDE
  • ORSN operators: Brave

Other types of DITL data we will index: caching resolver DNS logs (Level3); topology data (NetDIMES, CAIDA); UniRoma; BGP data (from CERT); Open DNS resolver survey (TMF); BGP tables (Routeviews, RIPE); anonymized packet headers (universities in Korea).

If we’ve missed anyone and/or you have data to submit, please let us know ASAP at ditl-info (@ this domain). Our DITL coverage map reflects 2 Terabytes of data so far, and continues to update as data comes in.

OARC is supporting the DITL experiment with disk space, networking and sysadmin resources, and handling the acceptable use policy (AUP) process for research use of the DNS root nameserver data. for datasets uploaded to the OARC servers, we will do some cleaning, timestamp correction and other curation such as binning the data in hourly files to make it easier to index and use. CAIDA and TMF will then work with the data providers to index the collected data so that researchers can correlate these heterogeneous datasets through the DatCat internet measurement data catalog. we hope to accomplish as much of the indexing work as we can prior to requesting input and review from the collectors, but interested data providers should point their browsers at DatCat, create an account, and familiarize yourself with the metadata fields. if you are interested in anonymizing the data yourself before contributing, we recommend Crypto-Pan, or ask us if you need help.

We recently completed a geeky comparison of DNS measurements for DITL2006 and DITL2007 which has some nice graphics as well as We recently completed compelling conclusions, pasted here:

  1. The anycast deployment of DNS root nameservers appears stable, efficient, and responsive to clients’ needs. Anycast instances cover all continents bringing a better service to the worldwide population of users.
  2. The overall query traffic experienced by the roots continues to grow. The observed 2007 query rate and client rate was 1.5-3X above their observed values in 2006
  3. The proportion of invalid traffic, i.e., DNS pollution, hitting the roots is still high, over 99% of the queries should not even be sent to the root servers. We found an extremely strong correlation both years: the higher the query rate of a client, the lower the fraction of valid queries.
  4. Repeated, identical and “referral-not-cached” queries constituted 69% of the total load on the roots during the 2007 observations. We are not in a position to evaluate the cost of this pollution to the root operators or to the Internet, nor the cost of cleaning it up. Some sources of this pollution could be mitigated by DNS operators locally serving common zones. For the March 2008 DITL experiment we will further investigate the patterns of these invalid queries and identify other ways to reduce the DNS pollution at the roots.
  5. About 40% of clients observed in 2006 and 2007 support EDNS, an extension mechanism that enables DNS to support larger queries needed for IPv6 and DNSSEC deployment.
  6. ORSN servers are subject to similar traffic composition and anomalies seen at the official DNS roots, in proportion to the reduced workload served.

we hope to have a comparison to 2008 data up sooner this year, if you have additional questions you want answered with DITL data, please comment or send mail to ditl-info (@ this domain).

thank you again for your willingness to contribute to this experiment. please don’t hesitate to ask any questions.

regards, caida, isc, tmf

“we should be able to do a much better job at modeling Internet attacks”

Tuesday, March 25th, 2008

one of my favorite program managers is posed the following question by senior mangament at his defense-related funding agency: “we should be able to do a much better job modeling internet attacks. what research can we fund that would enable us to do a better job at modeling internet attacks?”

because i happened to be reading a recent paper by Aaron Burstein of UC Berkeley, “Toward a Culture of Cybersecurity Research”, i was familiar with this quote:

(5) Accordingly, Federal investment in computer and network security research and development must be significantly increased to -

  1. improve vulnerability assessment and technological and systems solutions;
  2. expand and improve the pool of information security professionals, including researchers, in the United States workforce; and
  3. better coordinate information sharing and collaboration
    among industry, government, and academic research projects.

http://caselaw.lp.findlaw.com/casecode/uscodes/15/chapters/100/sections/section_7401.html

which almost hits on the two biggest problems with cybersecurity research today: the research community is not allowed to study the network, and they are not allowed to study the software that runs on the majority of the components (hosts and routers) on the network. networks are generally not allowed to share data with each other, these are all considered proprietary systems on which independent research (by those who do not work for the corporation) is illegal.

it would be nice to be able to turn the cybersecurity research agenda into a technology agenda so we can throw technology R&D money at the problem. so i am sympathetic to the question: “what R&D can we fund?”
but ten years of little measurable progress in this area has made it clear that to the extent that we can fund technology to help, it will be technology that improves our ability to do (A), (B), and (C) above. to do “(A) vulnerability assessment”, we need to analyze the software running on the systems that compose the network: that’s a problem with software ownership, i.e., current law (copyright, trade secrets, EULAs). to do “(C) coordinated information sharing”, we need it to be legal as well as incentive-compatible for networked organizations to share data with each other. that’s also a policy rather than technology problem. “(B) expand the security and research workforce” is more obviously a policy problem, but spending tax dollars to incent scholarship will be wasted if the funded researchers are not able to study the real system.

the government can certainly fund technical activities to facilitate useful data sharing: technology needed to collect, analyze, catalog, and correlate datasets to delineate baseline from anomalous internet traffic and routing patterns; tools that empower users to measure their own networks and automatically contribute data to aggregated, anonymized repositories with legal protection; reputation management systems to support scalable information sharing across vast admistrative boundaries. but these are all going to be impotent weapons against the growing illicit activity on the network if we don’t give ourselves the advantage the criminal actors have had from the beginning: data sharing (in their case, also selling to eachother). so there is reason to believe that we are learning more slowly than they are.

k.