DITL 2008: phase one complete.

March 28th, 2008 by kc

CAIDA, ISC, OARC, and The Measurement Factory managed to repeat our annual Day in the Life of the Internet data collection experiment this year — using a 2-day window of 18-19 March 2008. As with last year’s DITL (DITL2007 announcement, DITL2007 summary), we tried to capture a complete 48-hour interval of traffic to as many DNS root nameservers as could participate, and also invited other data providers to participate on terms compatible with their data sharing policies. if you engage in ongoing measurement of an operational network, and collected data for some or all of 18-19 mar 2008, it’s not too late to contribute data or metadata to DITL2008!

we gathered much more data than last year (2.5X more in bytes) with considerably less pain. So although it reflects only a slice of Internet activity, we believe this is (again) the largest synchronized data set about the Internet ever made available to the research community. the focus was again on DNS data, thanks to an NSF grant supporting measurement and analysis of the DNS and the tremendous cooperation of DNS operators around the world, including the rootops. So far we have DNS data from:

  • Root operators: A, C, E, F, H, old-J, K, & L (with B and M to come)
  • ccTLD operators: at, br, cl, cz, uk, se (& hopefully one more)
  • RIR in-addr operators: APNIC, LACNIC
  • gTLD operators: org
  • AS112 operators: Camel/8086, ISC, NaMEX, NIX.CZ, Qwest, WIDE
  • ORSN operators: Brave

Other types of DITL data we will index: caching resolver DNS logs (Level3); topology data (NetDIMES, CAIDA); UniRoma; BGP data (from CERT); Open DNS resolver survey (TMF); BGP tables (Routeviews, RIPE); anonymized packet headers (universities in Korea).

If we’ve missed anyone and/or you have data to submit, please let us know ASAP at ditl-info (@ this domain). Our DITL coverage map reflects 2 Terabytes of data so far, and continues to update as data comes in.

OARC is supporting the DITL experiment with disk space, networking and sysadmin resources, and handling the acceptable use policy (AUP) process for research use of the DNS root nameserver data. for datasets uploaded to the OARC servers, we will do some cleaning, timestamp correction and other curation such as binning the data in hourly files to make it easier to index and use. CAIDA and TMF will then work with the data providers to index the collected data so that researchers can correlate these heterogeneous datasets through the DatCat internet measurement data catalog. we hope to accomplish as much of the indexing work as we can prior to requesting input and review from the collectors, but interested data providers should point their browsers at DatCat, create an account, and familiarize yourself with the metadata fields. if you are interested in anonymizing the data yourself before contributing, we recommend Crypto-Pan, or ask us if you need help.

We recently completed a geeky comparison of DNS measurements for DITL2006 and DITL2007 which has some nice graphics as well as We recently completed compelling conclusions, pasted here:

  1. The anycast deployment of DNS root nameservers appears stable, efficient, and responsive to clients’ needs. Anycast instances cover all continents bringing a better service to the worldwide population of users.
  2. The overall query traffic experienced by the roots continues to grow. The observed 2007 query rate and client rate was 1.5-3X above their observed values in 2006
  3. The proportion of invalid traffic, i.e., DNS pollution, hitting the roots is still high, over 99% of the queries should not even be sent to the root servers. We found an extremely strong correlation both years: the higher the query rate of a client, the lower the fraction of valid queries.
  4. Repeated, identical and “referral-not-cached” queries constituted 69% of the total load on the roots during the 2007 observations. We are not in a position to evaluate the cost of this pollution to the root operators or to the Internet, nor the cost of cleaning it up. Some sources of this pollution could be mitigated by DNS operators locally serving common zones. For the March 2008 DITL experiment we will further investigate the patterns of these invalid queries and identify other ways to reduce the DNS pollution at the roots.
  5. About 40% of clients observed in 2006 and 2007 support EDNS, an extension mechanism that enables DNS to support larger queries needed for IPv6 and DNSSEC deployment.
  6. ORSN servers are subject to similar traffic composition and anomalies seen at the official DNS roots, in proportion to the reduced workload served.

we hope to have a comparison to 2008 data up sooner this year, if you have additional questions you want answered with DITL data, please comment or send mail to ditl-info (@ this domain).

thank you again for your willingness to contribute to this experiment. please don’t hesitate to ask any questions.

regards, caida, isc, tmf

For a history of the Day In the Life (DITL) project, see the list of DITL Collection Events.

One Response to “DITL 2008: phase one complete.”

  1. Tim Says:

    It would be good to have a posting that ran through history of these day in the life research projects. Let me know if you have done one.

    Thanks

    Tim

Leave a Reply