CAIDA’s 2022 Annual Report

July 10th, 2023 by kc

The CAIDA annual report summarizes CAIDA’s activities for 2022 in the areas of research, infrastructure, data collection and analysis. The executive summary is excerpted below:
Read the rest of this entry »

Hoiho API (Holistic Orthography of Internet Hostname Observations)

February 13th, 2023 by Bradley Huffaker

In December 2021, CAIDA published a method and system to automatically learn rules that extract geographic annotations from router hostnames. This is a challenging problem, because operators use different conventions and different dictionaries when they annotate router hostnames. For example, in the following figure, operators have used IATA codes (“iad”, “was”), a CLLI prefix (“asbnva”), a UN/LOCODE (“usqas”), and even city names (“ashburn”, “washington”) to refer to routers in approximately the same location — Ashburn, VA, US. Note that “ash” (router #4) is an IATA code for Nashua, NH, US, that the operators of he.net and seabone.net used to label routers in Ashburn, VA, US. Some operators also encoded the country (“us”) and state (“va”).

Our system, Hoiho, released as open-source as part of scamper, uses CAIDA’s Macroscopic Internet Topology Data Kit (ITDK) and observed round trip times to infer regular expressions that extract these apparent geolocation hints from hostnames. The ITDK contains a large dataset of routers with annotated hostnames, which we used as input to Hoiho for it infer rules (encoded as regular expressions) that extract these annotations. CAIDA has released these inferred rulesets in recent ITDKs.

Today, CAIDA is launching an API (api.hoiho.caida.org) and web front end (hoiho.caida.org) which returns extracted geographic locations from a user-provided list of DNS names. The API uses the rules that CAIDA infers with each ITDK. For embedded IATA, UN/LOCODE, and city names, the API returns the city name and a lat/long representing the location. For embedded CLLI codes, the API returns the CLLI code; please contact iconectiv for a dictionary that maps CLLI codes to locations.

Try the API out, and let us know if you find it useful!

[HOIHO] Luckie, M., Huffaker, B., Marder, A., Bischof, Z., Fletcher, M., and claffy, k., 2021. “Learning to Extract Geographic Information from Internet Router Hostnames.” ACM SIGCOMM Conference on emerging Networking EXperiments and Technologies (CoNEXT),
https://catalog.caida.org/paper/2021_learning_extract_geographic_information

Studying Conformance of MANRS Members

January 21st, 2023 by Ben Du

In November 2022, 85% MANRS members were conformant to Action #1 and Action #4.

 

The Mutually Agreed Norms on Routing Security (MANRS) initiative is an industry-led effort to improve Internet routing security. MANRS encourages participating networks to implement a series of routing security practices.  In our paper, Mind Your MANRS: Measuring the MANRS Routing Ecosystem, we at CAIDA (UC San Diego), in collaboration with Georgia Tech, and IIJ Research Lab, provided the first independent look into the MANRS ecosystem by using publicly available data to analyze the routing behavior of participant networks. MANRS membership has increased significantly in recent years, but our research goal was to get more clarity on the impact of the MANRS initiative on the state of overall Internet routing security.   In this post, we summarize how we characterized the growth of MANRS members, explain our process of analyzing ISP conformance with the MANRS practices we studied, compare RPKI ROA registration status between MANRS and non-MANRS members, and reflect on implications of our analysis for the future of MANRS. 

 

We first analyzed what types of networks have joined MANRS over time, and whether MANRS members are properly implementing the routing security practices (MANRS conformance).  The two practices (which MANRS calls actions) we focused on in our study are: 

  1. Participating ISPs will register their IP prefixes in a trusted routing database (either Resource Public Key Infrastructure (RPKI) or one of the databases of the Internet Routing Registry (IRR).   This practice is “MANRS Action #4”.
  2. Participating ISPs will use such information to prevent propagation of invalid routing information. This practice is “MANRS Action #1”.

 

Our paper analyzed the MANRS ecosystem in May 2022. Since MANRS is a growing community, for this post we have updated our analysis using data collected in November 2022 to capture a more recent view of the MANRS ecosystem. We have also published our analysis code here for interested readers to reproduce the analysis using the latest available data.

 

MANRS growth

We first downloaded a list of MANRS members. The Internet Society kindly provided us the dates when each MANRS participant joined the programs. We found that between 2015 and November 2022, 863 ASes joined MANRS. Over this 7-year period, an additional 12.1% of routed IPv4 address space was originated by MANRS ASes. Plotting growth by ASes and by address space (Figure 1) shows that most of these new ASes were based in the LACNIC region, but that those ASes originated little or no address space into BGP.   

(a)

(b)

Figure 1 – MANRS participation grew between 2015 and 2022, but the picture looks quite different if measured by number of ASes vs. % of routed address space. 

MANRS Conformance 

We examined whether MANRS (ISP and CDN) members properly implemented MANRS Action #4 and #1 according to the MANRS requirements:

  • To conform to Action #4, members must register at least 90% (100% for CDNs) IP prefixes in IRR or RPKI.
  • To conform to Action #1, members must filter out customer BGP announcements that do not match IRR or RPKI records.

 

We downloaded BGP prefixes and their IRR/RPKI status from the Internet Health Report (IHR) maintained by IIJ Research Labs. We found that in November 2022, 893 (95.9%) of all 931 MANRS ASes conformed to MANRS Action #4 (prefix registration). Figure 2 shows that in November 2022, 3.7% of the address space originated by MANRS ASes was contained in prefixes that either were not registered or were incorrectly registered in IRR or RPKI. We also conducted case studies of non-conformant MANRS CDN members  and found that one large CDN was not conformant because one of their 7000+ prefixes was RPKI-invalid. Please refer to section 8.4 of the paper for more details. 

 

(a)

 

(b)

Figure 2 – Most ASes participating in MANRS conformed with Action #4, and correspondingly, most of the address space those ASes originated into BGP was IRR or RPKI valid, i.e., had records that matched observed BGP announcements. 

 

To evaluate whether MANRS members filtered out customer BGP announcements that do not match IRR or RPKI records (Action #1), we downloaded BGP prefixes, their IRR and RPKI statuses, and their upstream ASes from the Internet Health Report. We then calculated the prevalence of IRR/RPKI Invalid prefixes propagated through each MANRS network. 

 

Figure 3 shows that in November 2022, 790 (84.9%) of 931 MANRS ASes conformed to the MANRS Action #1 . Figure 3 also shows that 141 (15.1%) MANRS ASes did not conform to Action #1. However, not all of the address space propagated by these ASes was incorrectly registered in RPKI or IRR.  In fact, those 141 ASes propagated 96.7% of the address space propagated by MANRS ASes, but only 1.5% of that total was incorrectly registered. In addition, we found that 25 out of 27 MANRS members that are large transit providers (i.e., had > 180 customer ASes) did not fully conform with MANRS Action #1, suggesting that conformance was hard to achieve for networks with complex routing relationships.

 

(a)

 

(b)

Figure 3 – MANRS ASes that did not conform to MANRS Action #1 only propagated a small fraction of address space announced by MANRS ASes that was not IRR or RPKI Valid. (b) shows 95.2% of MANRS-propagated address space was IRR/RPKI Valid despite being propagated by Action #1 non-conformant members.

 

Are MANRS members more likely to register in RPKI? 

Our study found that, except for a few cases, MANRS organizations tended to conform with the two actions we studied. However, to estimate the impact of the MANRS initiative on the state of routing security, we compared the behavior of MANRS and non-MANRS ASes. 

 

We first compared these two subsets of ASes in terms of registration of RPKI ROAs of prefixes announced in BGP.  In November 2022, 60.1% of routed IPv4 address space originated by MANRS ASes was covered by RPKI ROAs, compared with only 38.8% of all routed IPv4 addresses covered by ROAs. Figure 5 shows that in November 2022, IPv4 address space originated by MANRS ASes was more likely to be registered in RPKI in all RIR regions except APNIC. In the APNIC region, we found significant RPKI registration by non-MANRS networks from JPNIC and TWNIC, possibly due to local RPKI outreach efforts.  Overall, this difference suggests a positive influence of MANRS members on the adoption of RPKI. 

 

Similarly, changing the view from routed address space to the originating ASes, we found that in November 2022, MANRS members were more likely to originate at least 80% RPKI Valid prefixes in BGP compared to their non-MANRS counterparts in all RIR regions (Figure 6).

 

Figure 5 – In November 2022, IPv4 address space originated by MANRS ASes was more likely to be registered in RPKI in all RIR regions except APNIC.

 

Figure 6 – In November 2022, MANRS ASes were more likely to originate RPKI Valid prefixes than non-MANRS ASes.

 

Future for MANRS

In November 2022, we found 71 MANRS ASes that registered their prefixes only in IRR but not RPKI. Registering only in an IRR database is less optimal than registering in RPKI, since some IRR databases may contain inaccurate records due to looser validation standards (See our paper IRR Hygiene in the RPKI Era). We recommend that in the future, MANRS  members register in RPKI in addition to IRR databases.  We also recommend that MANRS add a conformance checker to its existing observatory to further motivate its members to maintain good routing security practices. We have published our analysis code to facilitate such conformance checking. 

New CAIDA Prefix-to-AS Mapping Data Set

November 14th, 2022 by Bradley Huffaker

Since May 9th, 2005, CAIDA has produced a data set that maps IPv4 prefixes (and later also IPv6 prefixes) to the AS (Autonomous System) originating that prefix into the global BGP routing system, as observed via a single BGP data collector of the Route Views data collection system. We have called this data set “RouteViews Prefix to AS”. We used CAIDA’s straighten_rv script to filter the RIB (routing information base file used as input data. We will discontinue this data set on December 31st, 2022 an replace it with a new more complete data set that we call CAIDA’s Prefix-to-AS data set.

CAIDA will use the BGPStream software package (and in particular the bgpview library) to include data from all available BGP collectors from both of the primary global publicly available collection systems: Route Views and RIPE NCC Routing Information Service. We will backfill Prefix-to-AS data to 2000. As part of this transition, CAIDA will no longer use straighten_rv to preprocess AS paths. We will create two files: an annotated file with all the data observed in BGP, and a simple file that filters out data of no interest to many researchers as described below.

Annotated files. The annotated file will include information about the stability and visibility of prefixes by different peers and collectors. Individuals who wish to produce a more refined mapping can fairly easily filter this data. The table below compares the older “Routeviews2” (a single Route Views collector) and the new annotated CAIDA Prefix-to-AS dataset (all collectors from both RIPE RIS and Route Views) for 1 June 2022. Most (99.6%) ASes and (87.2%) prefixes appeared in both datasets. Note that multiple ASNs announced the prefix 0.0.0.0/0, we exclude it since it covers the entire IPv4 address space.

ASN filtered Routeviews2 only Routeviews+RIPE both total
Multiorigin/set 128 4.10% 1552 49.73% 1441 46.17% 3121
public 0 0.00% 295 0.40% 73294 99.60% 73589
reserved X 0 0.00% 1379 88.97% 171 11.03% 1550
Prefix filtered Routeviews2 only Routeviews+RIPE both total
larger then /8 X 0 0.00% 1 100.00% 0 0.00% 1
private X 0 0.00% 504 84.85% 90 15.15% 594
public 0 0.00% 138498 12.81% 942469 87.19% 1080967

Simple files. The simple file will exclude very large prefixes, e.g., with mask lengths < 8, private addresses (RFC 1918), or prefixes announced exclusively by reserved ASNs (Special-Purpose ASN). The resulting simple prefix-to-ASN mapping covers 99.7% of the address space captured by the annotated file. In the table below (also reflecting 1 June 2022), 0.94% of prefixes and 0.42% of addresses had an additional origin AS that was not also observed in the Routeviews2-only dataset. This reflects the expanded visibility of more collectors and peer. 4.92% of CAIDA’s prefixes and 1.82% of addresses were not covered by Routeviews2-only prefix2as. Overall the combined data set provides visibility of 5.86% of prefixes and 2.24% of addresses not covered by routeviews2-only.

CAIDA’s Prefix to AS “simple” (99.7% of addresses observed in annotated files)

ASN type prefixes addressses
source agreement Routeviews2
only
Routeviews
+ RIPE
number group % all % number group % all %
both different multiorigin multiorigin 626 11.43% 0.11% 1241088 9.65% 0.04%
public multiorigin 4816 87.95% 0.82% 11442617 88.93% 0.37%
set multiorigin 34 0.62% 0.01% 183039 1.42% 0.01%
5476 100.00% 0.94% 12866744 100.00% 0.42%
both same multiorigin multiorigin 9869 1.79% 1.69% 12609229 0.42% 0.41%
public public 540032 98.20% 92.45% 2988739528 99.58% 97.35%
set set 8 0.00% 0.00% 9216 0.00% 0.00%
549909 100.00% 94.14% 3001357973 100.00% 97.76%
Routeviews+RIPE N/A multiorigin 1884 6.55% 0.32% 908601 1.63% 0.03%
public 26856 93.44% 4.60% 54919321 98.37% 1.79%
set 2 0.01% 0.00% 2816 0.01% 0.00%
28742 100.00% 4.92% 55830738 100.00% 1.82%

You can find the new CAIDA Prefix-to-AS Mapping Data Set here.

CAIDA contributions to ACM’s Internet Measurement Conference (IMC) 2022

October 18th, 2022 by Elena Yulaeva

ACM’s Internet Measurement Conference (IMC) is an annual highly selective venue for the presentation of Internet measurement and analysis research. The average acceptance rate for papers is around 25%. CAIDA researchers co-authored five papers and 3 posters that will be presented at this conference in Nice, France on October 25 – 27, 2022. We link to these publications below.

Investigating the impact of DDoS attacks on DNS infrastructure. Rafaele Sommese, KC Claffy, Roland van Rijswijk-Deij, Arnab Chattopadhyay, Alberto Dainotti, Anna Sperotto, and Mattijs Jonker. 2022.  This paper describes a newly developed scalable method to map DDoS attacks targeting or affecting DNS infrastructure. The measurements reveal evidence that millions of domains experienced  DDoS attacks during the recent 17-month observation window. Most attacks did not observably harm DNS performance, but in some cases, a 100-fold increase in DNS resolution time was observed. This research corroborates the value of known best practices to improve DNS resilience to attacks, including the use of anycast and topological redundancy in nameserver infrastructure.

Mind Your MANRS: Measuring the MANRS Ecosystem. Ben Du, Cecilia Testart, Romain Fontugne, Gautam Akiwate, Alex C. Snoeren, and kc claffy. 2022. Mutually Agreed on Norms on Routing Security (MANRS) is an industry-led initiative to improve Internet routing security by encouraging participating networks to implement a set of recommended actions. The goal of the paper is to evaluate the current state of the MANRS initiative in terms of its participants, their routing behavior, and its impact on the broader routing ecosystem, and discuss potential improvements. The findings confirm that MANRS participants are more likely to follow best practices than other similar networks on the Internet. However, within MANRS, not all networks take the MANRS mandate with the same rigor. This study demonstrates the need to continually assess the conformance of members for the prosperity of the MANRS initiative, and the challenges in automating such conformance checks.

Retroactive Identification of Targeted DNS Infrastructure HijackingGautam Akiwate, Rafaele Sommese, Mattijs Jonker, Zakir Durumeric, kc Claffy, Geofrey M. Voelker, and Stefan Savage. 2022. DNS infrastructure tampering attacks are particularly challenging to detect because they can be very short-lived, bypass the protections of TLS and DNSSEC, and are imperceptible to users. Identifying them retroactively is further complicated by the lack of fine-grained Internet-scale forensic data. This paper is the first attempt to make progress toward this latter goal. Combining a range of longitudinal data from Internet-wide scans, passive DNS records, and Certificate Transparency logs, we have constructed a methodology for identifying potential victims of sophisticated DNS infrastructure hijacking and have used it to identify a range of victims (primarily government agencies). The authors analyze possible best practices in terms of their measurability by third parties, including a review of DNS measurement studies and available data sets.

Stop, DROP, and ROA: Effectiveness of Defenses through the lens of DROPLeo Oliver, Gautam Akiwate, Matthew Luckie, Ben Du, and kc claffy. 2022. Malicious use of the Internet address space has been a persistent threat for decades. Multiple approaches to prevent and detect address space abuse include the use of blocklists and the validation against databases of address ownership such as the Internet Routing Registry (IRR) databases and the Resource Public Key Infrastructure (RPKI). The authors undertook a study of the effectiveness of these routing defenses through the lens of one of the most respected blocklists on the Internet: Spamhaus’ Don’t Route Or Peer (DROP) list. The authors show that attackers are subverting multiple defenses against malicious use of address space, including creating fraudulent Internet Routing Registry records for prefixes shortly before using them. Other attackers disguised their activities by announcing routes with spoofed origin ASes consistent with historic route announcements. The authors quantify the substantial and actively-exploited attack surface in unrouted address space, which warrants reconsideration of RPKI eligibility restrictions by RIRs, and reconsideration of AS0 policies by both operators and RIRs.

Where .ru? Assessing the Impact of Conflict on Russian Domain Infrastructure Mattijs Jonker, Gautam Akiwate, Antonia Afnito, kc claffy, Alessio Botta, Geofrey M. Voelker, Roland van Rijswijk-Deij, and Stefan Savage. 2022. The hostilities in Ukraine have driven unprecedented forces, both from third-party countries and in Russia, to create economic barriers. In the Internet, these manifest both as internal pressures on Russian sites to (re-)patriate the infrastructure they depend on (e.g., naming and hosting) and external pressures arising from Western providers disassociating from some or all Russian customers. This paper describes longitudinal changes in the makeup of naming, hosting, and certificate issuance for domains in the Russian Federation due to the war in Ukraine.

 

CAIDa also contributed to three extended abstracts:

“Observable KINDNS: Validating DNS Hygiene.” Sommese, Raffaele, Mattijs Jonker, kc claffy. ACM Internet Measurement Conference (IMC) Poster, 2022.

“PacketLab – Tools Alpha Release and Demo. Yan, Tzu-Bin, Yuxuan Chen, Anthea Chen, Zesen Zhang, Bradley Huffaker, Ricky K. P. Mok, Kirill Levchenko, kc claffy. ACM Internet Measurement Conference (IMC) Poster, 2022.

“A Scalable Network Event Detection Framework for Darknet Traffic.”Gao, Max, Ricky K. P. Mok, kc claffy. ACM Internet Measurement Conference (IMC) Poster, 2022.

CAIDA’s 2021 Annual Report

May 30th, 2022 by kc

The CAIDA annual report summarizes CAIDA’s activities for 2021 in the areas of research, infrastructure, data collection and analysis. Our research projects span: Internet cartography and performance; security, stability, and resilience studies; economics; and policy. Our infrastructure, software development, and data sharing activities support measurement-based Internet research, both at CAIDA and around the world, with focus on the health and integrity of the global Internet ecosystem.
The executive summary is excerpted below:
Read the rest of this entry »

IRR Hygiene in the RPKI Era

April 1st, 2022 by Ben Du

The Border Gateway Protocol (BGP) is the protocol that networks use to exchange (announce) routing information across the Internet. Unfortunately, BGP has no mechanism to prevent the propagation of false announcements such as hijacks and misconfigurations. The Internet Route Registry (IRR) and Resource Public Key Infrastructure (RPKI) both emerged as different solutions to improve routing security and operation in the Border Gateway Protocol (BGP) by allowing networks to register information and develop route filters based on information other networks have registered.

The Internet Routing Registry (IRR) was first introduced in 1995 and remained a popular tool for BGP route filtering. However, route origin information in the IRR suffers from inaccuracies due to the lack of incentive for registrants to keep information up to date and the use of non-standardized validation procedures across different IRR database providers.

Over the past few years, the Resource Public Key Infrastructure (RPKI), a system providing cryptographically attested route origin information, has seen steady growth in its deployment and has become widely used for Route Origin Validation (ROV) among large networks.

Some networks are unable to adopt RPKI filtering due to technical or administrative reasons and continue using only existing IRR-based route filtering. Such networks may not be able to construct correct routing filters due IRR inaccuracies and thus compromise routing security.

In our paper IRR Hygiene in the RPKI Era, we at CAIDA (UC San Diego), in collaboration with MIT, study the scale of inaccurate IRR information by quantifying the inconsistency between IRR and RPKI. In this post, we will succinctly explain how we compare records and then focus on the causes of such inconsistencies and provide insights on what operators could do to keep their IRR records accurate.

IRR and RPKI trends

For our study we downloaded IRR data from 4 IRR database providers: RADB, RIPE, APNIC, and AFRINIC, and RPKI data from all Trust Anchors published by the RIPE NCC. Figure 1 shows IRR cover more IPv4 address space than RPKI, but RPKI grew faster than IRR, having doubled its coverage over the past 6 years.

Figure 1. IPv4 coverage of IRR and RPKI databases. RADB, the largest IRR database, has records representing almost 60% of routable IPv4 address space. In contrast, the RPKI covers almost 30% of that address space but has been steadily growing in the past few years.

Checking the consistency of IRR records

 

We classified IRR records following the procedure in Figure 2: first we check if there is a Route Origin Authorization (ROA) record in RPKI covering the IRR record, then in case there is one if the ASN is consistent, and finally, if the ASN is consistent, we check the prefix length compared to the maximum length attribute of RPKI records. Using this procedure we are left with 4 categories:

  1. Not In RPKI: If the prefix in an IRR record has no matching or covering prefix in RPKI.
  2. Inconsistent ASN: If the IRR record has matching RPKI records but none has the same ASN..
  3. Inconsistent Max Length category: If the IRR record has the same ASN as its matching RPKI records but the prefix length in the IRR record is larger than the Max Length field in the RPKI records.
  4. Consistent: If the IRR records have a matching RPKI record with the same ASN and according maximum length attribute.

Figure 2. Classification of IRR records

Which is more consistent? RADB vs RIPE, APNIC, AFRINIC

 

As of October 2021, we found only 38% of RADB records with matching ROAs were consistent with RPKI, meaning that there were more inconsistent records than consistent records in RADB, see Figure 3 (left) . In contrast, 73%, 98%, and 93% of RIPE, APNIC, and AFRINIC IRR records were consistent with RPKI, showing a much higher consistency than RADB, see Figure 3 (right).

 

We attribute the big difference in consistency to a few reasons. First, the IRR database we collected from the RIRs are their respective authoritative databases, meaning the RIRs manages all the prefixes, and verifies the registration of IRR objects with address ownership information. This verification process is stricter than that of RADB and leads to the higher quality of IRR records. Second, APNIC provides its registrants a management platform that automatically creates IRR records for a network when it registers its prefixes in RPKI. This platform contributes to a larger number of consistent records compared to other RIRs.

 

Figure 3. RIR-managed IRR databases have higher consistency with RPKI compared to RADB.

 

What caused the inconsistency?

In our analysis we found that inconsistent max length was mostly caused by IRR records that are too specific, as the example shown in Figure 4, and to a lesser extent by misconfigured max length attribute in RPKI.  We also found that inconsistent ASN records are largely caused by customer networks failing to remove records after returning address space to their provider network, such as the example in Figure 5.

Inconsistent Max Length (Figure 4)

  • 713 caused by misconfigured RPKI Max Length.
  • 39,968 caused by too-specific IRR record.

Figure 4. IRR record with inconsistent max length: the IRR prefix length exceeds the RPKI max length value.

 

Inconsistent ASN (Figure 5)

  • 4,464 caused by customer network failing to remove RADB records after returning address space to provider network.

Figure 5. IRR record with inconsistent ASN: the IRR record ASN differs from the RPKI record ASN.

 

To Improve IRR accuracy

 

Although RPKI is becoming more widely deployed, we do not see a decrease in IRR usage, and therefore we should improve the accuracy of information in the IRR. We suggest that networks keep their IRR information up to date and IRR database providers implement policies that promote good IRR hygiene.

Networks currently using IRR for route filtering can avoid the negative impact of inaccurate IRR information by using IRRd version 4, which validates IRR information against RPKI, to ignore incorrect IRR records.

Response to NSTC’s JCORE

May 19th, 2021 by David Clark and kc claffy

A year ago January 2020, k claffy, CAIDA Director and UCSD Adjunct Professor of Computer Science and Engineering responded with collaborator David Clark, Senior Scientist at MIT’s Computer Science and Artificial Intelligence Laboratory to the Request for Information (RFI) put out by the National Science and Technology Council’s (NSTC) Joint Committee on the Research Environment (JCORE). The response establishes the critical importance of the internet to the infrastructure of society and the need for governments, and specifically the U.S. government, to send a strong signal to the private sector through high-level policy making that the only path to understanding the characteristics of the internet comes via data sharing and that responsible sharing of data for documented scientific research will not generate corporate liability.

Another focus and benefit to the policy for which we call comes with the development and delivery of academic training of professionals to work with large data sets focused on communications and networking.

You can see the complete response and more related material posted in CAIDA resource catalog.

Guiding principles for a “Bureau of Cyber Statistics”

April 24th, 2021 by David Clark and kc claffy

The recent Cyberspace Solarium Commission report (1) set out a strategic plan to improve the security of cyberspace. Among its many recommendations is that the government establish a Bureau of Cyber Statistics, to provide the government with the information that it needs for informed planning and action. A recent report from the Aspen Institute echoed this call. (2) Legal academics and lobbyists have already started to consider its structure. (3) The Internet measurement community needs to join this conversation.

The Solarium report proposed some specific characteristics: they recommend a bureau located in the Department of Commerce, and funded and authorized to gather necessary data. The report also says that “the center should be funded and equipped to host academics as well as private sector and independent security researchers as a part of extended exchanges”. We appreciate that the report acknowledges the value of academic researchers and that this objective requires careful thought to achieve. The report specifically mentions “purchasing private or proprietary data repositories”. Will “extended exchanges” act as the only pattern of access, where an academic would work under a Non-Disclosure Agreement (NDA), unable to publish results that relied on proprietary data? Would this allow graduate students to participate, i.e., how would they publish a thesis? The proposal does not indicate deep understanding of how academic research works. As an illustrative example, CAIDA/UCSD and MIT were hired by AT&T as “independent measurement experts” to propose and oversee methods for AT&T to satisfy FCC reporting requirements imposed as a merger condition. (4) AT&T covered all the data we received by an NDA, and we were not able to publish any details about what we learned. This sort of work does not qualify as academic research. It is consulting.

In our view, the bureau must be organized in such a way that academics are able and incentivized to utilize the resources of the bureau for research on questions that motivate the creation of the bureau in the first place. But this requires that when the U.S. government establishes the bureau, it makes apparent the value of academic participation and the modes of operation that will allow it.

These reports focus on cybersecurity, and indeed, security is the most prominent national challenge of the Internet. But the government needs to understand many other issues related to the character of the Internet ecosystem, many of which are inextricably related to security. We cannot secure what we do not understand, and we cannot understand what we do not measure. Measurement of the Internet raises epistemological challenges that span many disciplines, from network engineering and computer science to economics, sociology, ethics, law, and public policy. The following guiding principles can help accommodate these challenges, and the sometimes conflicting incentives across academic, government, commercial, and civic stakeholders.

  1. Incentivize academic participation. A national infrastructure must be organized in such a way that academics are able and incentivized to utilize its resources. This requires designing and implementing modes of operation that will incentivize independent researcher participation.
  2. Demonstrate innovation and value through real projects that address national-scale problems with data-intensive science and engineering research. To justify substantial U.S. government investment in cyberinfrastructure, the research community must demonstrate its value as an independent voice with important results that help to inform the future of the Internet. This demonstration will not be effective if it is hypothetical. Real projects are tricky, because the data does not necessarily exist yet, and if it does, may be proprietary. So researchers must overcome the chicken-and-egg problem of how to demonstrate the value of an independent research community before the Bureau exists.
  3. Start with public data and shared community infrastructure. The starting point must be to work with public data, and translate research results into forms that are meaningful to a constituency broader than the research community. But this path reveals more specific barriers: Who would fund such research? What are the incentives of the academic research community to undertake it? Yet if we do not recognize and overcome this challenge, the independent research community may essentially be written out of the story, as more and more data is proprietary and hidden away.
  4. Make specific and concrete calls for data of national importance. In our view, the community needs a focal point for discussion about collection and use of data, presenting an opportunity and responsibility to transform abstract calls for access to data into more specific and concrete articulations.
  5. Prioritize framework for research access to proprietary data. Sharing of proprietary data must address the reasons that data is considered proprietary. Understanding these reasons is required to design approaches to allow reasonable access for research purposes.
  6. Integrate focus on and metrics to evaluate workforce training efforts. The other risk of continuing on the current path, rather than confronting the data access problem, is the lost opportunity to train students to interpret complex operational data about Internet infrastructure, which is crucial to developing a globally competitive U.S. cybersecurity workforce capable of securing Internet infrastructure.

Other parts of the globe have moved to regularize cybersecurity data, and they have explicitly recognized the importance of engaging and sustaining the academic research establishment in developing cybersecurity tools to secure network infrastructure (5). If the U.S. does not take coherent steps to support its research community, there is a risk that it is sidelined in shaping the future of the Internet. The European Union’s proposed regulation for Digital Services (6) also discussed the importance of ensuring access to proprietary data by the academic research community:

Investigations by researchers on the evolution and severity of online systemic risks are particularly important for bridging information asymmetries and establishing a resilient system of risk mitigation, informing online platforms, Digital Services Coordinators, other competent authorities, the Commission and the public. This Regulation therefore provides a framework for compelling access to data from very large online platforms to vetted researchers.

They clarify what they mean by “vetted researchers”:


In order to be vetted, researchers shall be affiliated with academic institutions, be independent from commercial interests, have proven records of expertise in the fields related to the risks investigated or related research methodologies, and shall commit and be in a capacity to preserve the specific data security and confidentiality requirements corresponding to each request.

This regulation emphasizes a structure that allows the academic community to work with proprietary data, sending an important signal that they intend to make their academic research establishment a recognized part of shaping the future of the Internet in the EU. The U.S. needs to take a similar proactive stance.


References

  1. Cyberspace Solarium Commission report
  2. The Aspen Institute: A National Cybersecurity Agenda for Digital Infrastructure
  3. Lawfare: Considerations for the Structure of the Bureau of Cyber Statistics
  4. CAIDA: First Amended Report of AT&T Independent Measurement Expert: Reporting requirements and measurement methods
    CAIDA: Report of AT&T Independent Measurement Expert Background and supporting arguments for measurement and reporting requirements
  5. DIRECTIVE OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL on measures for a high common level of cybersecurity across the Union, repealing Directive (EU) 2016/1148
  6. Regulation of the European Parliament and of the Council on a Single Market For Digital Services

Unintended consequences of submarine cable deployment on Internet routing

December 15th, 2020 by Roderick Fanou, Ricky Mok, Bradley Huffaker and kc

Figure 1: This picture shows a line of floating buoys that designate the path of the long-awaited SACS (South-Atlantic Cable System). This submarine cable now connects Angola to Brazil (Source: G Massala, https://www.menosfios.com/en/finally-cable-submarine-sacs-arrived-to-brazil/, Feb 2018.)

The network layer of the Internet routes packets regardless of the underlying communication media (Wifi, cellular telephony, satellites, or optical fiber). The underlying physical infrastructure of the Internet includes a mesh of submarine cables, generally shared by network operators who purchase capacity from the cable owners [2,11]. As of late 2020, over 400 submarine cables interconnect continents worldwide and constitute the oceanic backbone of the Internet. Although they carry more than 99% of international traffic, little academic research has occurred to isolate end-to-end performance changes induced by their launch.

In mid-September 2018, Angola Cables (AC, AS37468) activated the SACS cable, the first trans-Atlantic cable traversing the Southern hemisphere [1][A1]. SACS connects Angola in Africa to Brazil in South America. Most assume that the deployment of undersea cables between continents improves Internet performance between the two continents. In our paper, “Unintended consequences: Effects of submarine cable deployment on Internet routing”, we shed empirical light on this hypothesis, by investigating the operational impact of SACS on Internet routing. We presented our results at the Passive and Active Measurement Conference (PAM) 2020, where the work received the best paper award [11,7,8]. We summarize the contributions of our study, including our methodology, data collection and key findings.

[A1]  Note that in the same year, Camtel (CM, AS15964), the incumbent operator of Cameroon, and China Unicom (CH, AS9800) deployed the 5,900km South Atlantic Inter Link (SAIL), which links Fortaleza to Kribi (Cameroon) [17], but this cable was not yet lit as of March 2020.

Read the rest of this entry »