Archive for the 'Commentaries' Category

Developing active Internet measurement software locally to run on Ark

Wednesday, January 24th, 2024 by Matthew Luckie

In the first part of our blog series, we introduced our brand-new python module for scamper, the packet-prober underpinning much of Ark’s ongoing measurements. One aspect that we highlighted was the ability for potential users of Ark to develop their code locally, before running it on the Ark platform. When I develop measurement applications, I use a couple of local Raspberry Pis and my own workstation to get the code correct, and then copy the code to the CAIDA system to run the experiment using available Ark vantage points. The goal of this blog article is to describe different ways to locally develop your measurement experiment code.

Example #1: Starting small with one scamper process.

The easiest way to begin is with one scamper process running on the same system where you develop your python code. With scamper installed (we recommend that you use a package listed on the scamper website), start a scamper process, and make it available for measurement commands on a Unix domain socket. For example, you might run scamper as follows:

$ scamper -U /tmp/scamper -p 100

This will create a Unix domain socket to drive scamper at /tmp/scamper, and tell scamper that it can probe at up to 100 packets/second. You can adjust these parameters to what is appropriate locally.

You can then develop and debug your measurement code in Python. To use this scamper process, your Python code might begin as follows:

01 from scamper import ScamperCtrl
03 # use the scamper process available at /tmp/scamper
04 ctrl = ScamperCtrl(unix="/tmp/scamper")
06 # do a simple ping to and print the outcome
07 o = ctrl.do_ping("")
08 if o.min_rtt is not None:
09   print(f"{o.min_rtt.total_seconds()*1000):.1f} ms")
10 else:
11   print("no reply")

Example #2: Coordinating measurements among VPs.

Once you are comfortable using the python module with a single local scamper instance, you might want to test your code with multiple scamper instances, each representing a distinct vantage point. The scamper software includes the sc_remoted interface to support that. sc_remoted has features for authenticating endpoints with TLS, but you might choose to initially operate endpoints without the complexity of TLS.

sc_remoted listens on a port for inbound scamper connections, and makes Unix domain sockets — one for each VP — available in a nominated directory. The best idea is to create an empty directory for these sockets. You might run sc_remoted as follows:

$ mkdir -p /path/to/remote-sockets
$ sc_remoted -U /path/to/remote-sockets -P 50265

The first command creates the directory, because sc_remoted will not create that directory for you. The second command starts sc_remoted listening on port 50265 for incoming scamper connections, and will place Unix domain sockets in /path/to/remote-sockets as they arrive.  Note, we use /path/to as a placeholder to the actual path in your local file system that is appropriate for your environment; you might put these sockets in a directory somewhere in your home directory, for example.

Then, on the systems that you want to act as vantage points, the following command:

$ scamper -p 100 -R -M

will (1) start a scamper process, (2) tell it that it can probe at up to 100 packets-per-second, (3) connect it to the specified IP address and port to receive measurement commands from, and (4) tell it to identify itself as “” to the remote controller. If you go into /path/to/remote-sockets, you might see the following:

$ cd /path/to/remote-sockets
$ ls -l
total 0
srwx------ 1 mjl mjl 0 Jan 22 16:57

This socket represents the scamper process you just started. The filename begins with, the parameter that you gave to scamper to identify itself. After the dash is the IP address and port number that the remote controller observed the remote system coming from. You can connect as many additional scamper instances as you like, and you will see them listed in the directory individually. You should name each differently with something meaningful to you (, bar.baz, etc) so that you can identify them on the system on which you’re writing your python code.

The python code we wrote in Example #1 above might be modified as follows:

01 import sys
02 from scamper import ScamperCtrl
04 if len(sys.argv) != 2:
05   print("specify path to unix domain socket")
06   sys.exit(-1)
08 # use the remote scamper process available at the specified location
09 ctrl = ScamperCtrl(remote=sys.argv[1])
11 # do a simple ping to and print the outcome
12 o = ctrl.do_ping("")
13 if o.min_rtt is not None:
14   print(f"{o.min_rtt.total_seconds()*1000):.1f} ms")
15 else:
16   print("no reply")

And run as:

$ python /path/to/remote-sockets/\:12369

If you have multiple remote-sockets in the directory, you can add them individually, or use all sockets in the directory. For example:

01 import sys
02 from datetime import timedelta
03 from scamper import ScamperCtrl
05 if len(sys.argv) != 3:
06   print("usage: $dir $ip")
07   sys.exit(-1)
09 ctrl = ScamperCtrl(remote_dir=sys.argv[1])
10 for i in ctrl.instances():
11   ctrl.do_ping(sys.argv[2], inst=i)
13 min_rtt = None
14 min_vp = None
15 for o in ctrl.responses(timeout=timedelta(seconds=10)):
16   if o.min_rtt is not None and (min_rtt is None or min_rtt > o.min_rtt):
17     min_rtt = o.min_rtt
18     min_vp = o.inst
20 if min_rtt is not None:
21   print(f"{} {(min_rtt.total_seconds()*1000):.1f} ms")
22 else:
23   print(f"no responses for {sys.argv[2]}")

and run this command:

$ python /path/to/remote-sockets

We encourage you to reach out via email if you have questions about using the module. In the first instance, you can email ark-info at

CAIDA contributions to ACM’s Internet Measurement Conference (IMC) 2023

Tuesday, November 14th, 2023 by CAIDA Webmaster

ACM’s Internet Measurement Conference (IMC) is an annual highly selective venue for the presentation of Internet measurement and analysis research. The average acceptance rate for papers is around 25%. CAIDA researchers co-authored three papers and one poster that was be presented at the IMC conference in Montreal, Quebec on October 26 – 28, 2023. We link to these publications below.

On the Importance of Being an AS: An Approach to Country-Level AS Rankings
Bradley Huffaker, Alexander Marder, Romain Fontugne, kc claffy. ACM Internet Measurement Conference (IMC), 2023.
Recent geopolitical events demonstrate that control of Internet infrastructure in a region is critical to economic activity and defense against armed conflict. This geopolitical importance necessitates novel empirical techniques to assess which countries remain susceptible to degraded or severed Internet connectivity because they rely heavily on networks based in other nation states. Currently, two preeminent BGP-based methods exist to identify influential or market-dominant networks on a global scale-network-level customer cone size and path hegemony–but these metrics fail to capture regional or national differences.

We adapt the two global metrics to capture country-specific differences by restricting the input data for a country-specific metric to destination prefixes in that country. Although conceptually simple, our study required tackling methodological challenges common to most Internet measurement research today, such as geolocation, incomplete data, vantage point access, and lack of ground truth. Restricting public routing data to individual countries requires substantial downsampling compared to global analysis, and we analyze the impact of downsampling on the robustness and stability of our country-specific metrics. As a measure of validation, we apply our country-specific metrics to case studies of Australia, Japan, Russia, Taiwan, and the United States, illuminating aspects of concentration and interdependence in telecommunications markets. To support reproducibility, we will share our code, inferences, and data sets with other researchers.

IRRegularities in the Internet Routing Registry
Ben Du, Gautam Akiwate, Cecilia Testart, Alex C. Snoeren, kc claffy, Katherine Izhikevich, Sumanth Rao. ACM Internet Measurement Conference (IMC), 2023.
The Internet Routing Registry (IRR) is a set of distributed databases used by networks to register routing policy information and to validate messages received in the Border Gateway Protocol (BGP). First deployed in the 1990s, the IRR remains the most widely used database for routing security purposes, despite the existence of more recent and more secure alternatives. Yet, the IRR lacks a strict validation standard and the limited coordination across diferent database providers can lead to inaccuracies. Moreover, it has been reported that attackers have begun to register false records in the IRR to bypass operators’ defenses when launching attacks on the Internet routing system, such as BGP hijacks. In this paper, we provide a longitudinal analysis of the IRR over the span of 1.5 years. We develop a workflow to identify irregular IRR records that contain conflicting information compared to different routing data sources. We identify 34,199 irregular route objects out of 1,542,724 route objects from November 2021 to May 2023 in the largest IRR database and find 6,373 to be potentially suspicious.

Coarse-grained Inference of BGP Community Intent
Thomas Krenc, Alexander Marder, Matthew Luckie, kc claffy. ACM Internet Measurement Conference (IMC), 2023.
BGP communities allow operators to influence routing decisions made by other networks (action communities) and to annotate their network’s routing information with metadata such as where each route was learned or the relationship the network has with their neighbor (information communities). BGP communities also help researchers understand complex Internet routing behaviors. However, there is no standard convention for how operators assign community values, and significant efforts to scalably infer community meanings have ignored this high-level classification. We discovered that doing so comes at a significant cost in accuracy, of both inference and validation. To advance this narrow but powerful direction in Internet infrastructure research, we design and validate an algorithm to execute this first fundamental step: inferring whether a BGP community is action or information. We applied our method to 78,480 community values observed in public BGP data for May 2023. Validating our inferences (24,376 action and 54,104 informational communities) against available ground truth (6,259 communities) we find that our method classified 96.5% correctly. We found that the precision of a state-of-the-art location community inference method increased from 68.2% to 94.8% with our classifications. We publicly share our code, dictionaries, inferences, and datasets to enable the community to benefit from them.

CAIDA also contributed to one extended abstract:

Empirically Testing the PacketLab Model
Tzu-Bin Yan, Zesen Zhang, Bradley Huffaker, Ricky K. P. Mok, kc claffy, Kirill Levchenko. ACM Internet Measurement Conference (IMC) Poster, 2023.
PacketLab is a recently proposed model for accessing remote vantage points. The core design is for the vantage points to export low-level network operations that measurement researchers could rely on to construct more complex measurements. Motivating the model is the assumption that such an approach can overcome persistent challenges such as the operational cost and security concerns of vantage point sharing that researchers face in launching distributed active Internet measurement experiments. However, the limitations imposed by the core design merit a deeper analysis of the applicability of such model to real-world measurements of interest. We undertook this analysis based on a survey of recent Internet measurement studies, followed by an empirical comparison of PacketLab-based versus native implementations of common measurement methods. We showed that for several canonical measurement types common in past studies, PacketLab yielded similar results to native versions of the same measurements. Our results suggest that PacketLab could help reproduce or extend around 16.4% (28 out of 171) of all surveyed studies and accommodate a variety of measurements from latency, throughput, network path, to non-timing data.

A summary of the EU’s Digital Services Act and the role of academic research

Tuesday, August 29th, 2023 by Elena Yulaeva

A new working paper from the CAIDA GMI3S project summarizes key aspects of the European Union’s Digital Services Act (DSA), providing insight into how the EU plans to regulate large online platforms and services.

The DSA is intended to create a safer and more transparent online environment for consumers by imposing new obligations and accountability on companies like Meta, Google, Apple, and Amazon. The regulations target issues like illegal content, disinformation, political manipulation, and harmful algorithms. Systemic societal risks, like disinformation and political influence campaigns, are a major focus of the regulations.

Some key takeaways from the DSA overview:

  • Very large online platforms and search engines face the most stringent requirements, like risk assessment audits, transparency reporting, and independent oversight.
  • The rules apply to any company offering services within the EU market, regardless of where they are based. This prevents tech giants from circumventing the law.
  • The Act establishes clear legal definitions and penalties around illegal content like hate speech, terrorist propaganda, and child sexual abuse material.
  • Covered platforms must assess and mitigate these and other systemic threats. They are expected to assess how the design and implementation of their service, and manipulation and use of it, including violations of terms of service, contribute to such risks.
  • Providers serving advertising must clearly identify its sponsor (who paid for it on and on whose behalf), and the main parameters used to its target audience, and where applicable, how to change them.
  • Providers of recommendation systems must provide the main parameters including the most important criteria determining what is recommended, and the reasons for the importance of these criteria.
  • A critical component is requiring platforms to provide access to data for vetted independent researchers studying systemic risks and harms. This enables evidence-based analysis of problems and solutions.

The DSA signals that the EU is taking a hard line on enforcing accountability and responsible practices for dominant online platforms that have largely operated without oversight. The success of the regulations will depend on effective coordination and enforcement across the EU’s member states. However the Act provides a potential model for balancing innovation and consumer protection in the digital marketplace. Consumers may benefit from transparency and accountability of manipulative algorithms, more control over data, and protections against online harms.

Importantly, the data access mandate signals the EU’s commitment to leveraging academic expertise in shaping a healthier digital ecosystem. Researchers will be able to investigate systemic issues like algorithmic bias, political polarization, misinformation dynamics, and the mental health impacts of social media.

This DSA working paper provides valuable context on this ambitious attempt to regulate the digital economy.

To read further:

CAIDA’s 2022 Annual Report

Monday, July 10th, 2023 by kc

The CAIDA annual report summarizes CAIDA’s activities for 2022 in the areas of research, infrastructure, data collection and analysis. The executive summary is excerpted below:

Hoiho API (Holistic Orthography of Internet Hostname Observations)

Monday, February 13th, 2023 by Bradley Huffaker

In December 2021, CAIDA published a method and system to automatically learn rules that extract geographic annotations from router hostnames. This is a challenging problem, because operators use different conventions and different dictionaries when they annotate router hostnames. For example, in the following figure, operators have used IATA codes (“iad”, “was”), a CLLI prefix (“asbnva”), a UN/LOCODE (“usqas”), and even city names (“ashburn”, “washington”) to refer to routers in approximately the same location — Ashburn, VA, US. Note that “ash” (router #4) is an IATA code for Nashua, NH, US, that the operators of and used to label routers in Ashburn, VA, US. Some operators also encoded the country (“us”) and state (“va”).

Our system, Hoiho, released as open-source as part of scamper, uses CAIDA’s Macroscopic Internet Topology Data Kit (ITDK) and observed round trip times to infer regular expressions that extract these apparent geolocation hints from hostnames. The ITDK contains a large dataset of routers with annotated hostnames, which we used as input to Hoiho for it infer rules (encoded as regular expressions) that extract these annotations. CAIDA has released these inferred rulesets in recent ITDKs.

Today, CAIDA is launching an API ( and web front end ( which returns extracted geographic locations from a user-provided list of DNS names. The API uses the rules that CAIDA infers with each ITDK. For embedded IATA, UN/LOCODE, and city names, the API returns the city name and a lat/long representing the location. For embedded CLLI codes, the API returns the CLLI code; please contact iconectiv for a dictionary that maps CLLI codes to locations.

Try the API out, and let us know if you find it useful!

[HOIHO] Luckie, M., Huffaker, B., Marder, A., Bischof, Z., Fletcher, M., and claffy, k., 2021. “Learning to Extract Geographic Information from Internet Router Hostnames.” ACM SIGCOMM Conference on emerging Networking EXperiments and Technologies (CoNEXT),

New CAIDA Prefix-to-AS Mapping Data Set

Monday, November 14th, 2022 by Bradley Huffaker

Since May 9th, 2005, CAIDA has produced a data set that maps IPv4 prefixes (and later also IPv6 prefixes) to the AS (Autonomous System) originating that prefix into the global BGP routing system, as observed via a single BGP data collector of the Route Views data collection system. We have called this data set “RouteViews Prefix to AS”. We used CAIDA’s straighten_rv script to filter the RIB (routing information base file used as input data. We will discontinue this data set on December 31st, 2022 an replace it with a new more complete data set that we call CAIDA’s Prefix-to-AS data set.

CAIDA will use the BGPStream software package (and in particular the bgpview library) to include data from all available BGP collectors from both of the primary global publicly available collection systems: Route Views and RIPE NCC Routing Information Service. We will backfill Prefix-to-AS data to 2000. As part of this transition, CAIDA will no longer use straighten_rv to preprocess AS paths. We will create two files: an annotated file with all the data observed in BGP, and a simple file that filters out data of no interest to many researchers as described below.

Annotated files. The annotated file will include information about the stability and visibility of prefixes by different peers and collectors. Individuals who wish to produce a more refined mapping can fairly easily filter this data. The table below compares the older “Routeviews2” (a single Route Views collector) and the new annotated CAIDA Prefix-to-AS dataset (all collectors from both RIPE RIS and Route Views) for 1 June 2022. Most (99.6%) ASes and (87.2%) prefixes appeared in both datasets. Note that multiple ASNs announced the prefix, we exclude it since it covers the entire IPv4 address space.

ASN filtered Routeviews2 only Routeviews+RIPE both total
Multiorigin/set 128 4.10% 1552 49.73% 1441 46.17% 3121
public 0 0.00% 295 0.40% 73294 99.60% 73589
reserved X 0 0.00% 1379 88.97% 171 11.03% 1550
Prefix filtered Routeviews2 only Routeviews+RIPE both total
larger then /8 X 0 0.00% 1 100.00% 0 0.00% 1
private X 0 0.00% 504 84.85% 90 15.15% 594
public 0 0.00% 138498 12.81% 942469 87.19% 1080967

Simple files. The simple file will exclude very large prefixes, e.g., with mask lengths < 8, private addresses (RFC 1918), or prefixes announced exclusively by reserved ASNs (Special-Purpose ASN). The resulting simple prefix-to-ASN mapping covers 99.7% of the address space captured by the annotated file. In the table below (also reflecting 1 June 2022), 0.94% of prefixes and 0.42% of addresses had an additional origin AS that was not also observed in the Routeviews2-only dataset. This reflects the expanded visibility of more collectors and peer. 4.92% of CAIDA’s prefixes and 1.82% of addresses were not covered by Routeviews2-only prefix2as. Overall the combined data set provides visibility of 5.86% of prefixes and 2.24% of addresses not covered by routeviews2-only.

CAIDA’s Prefix to AS “simple” (99.7% of addresses observed in annotated files)

ASN type prefixes addressses
source agreement Routeviews2
number group % all % number group % all %
both different multiorigin multiorigin 626 11.43% 0.11% 1241088 9.65% 0.04%
public multiorigin 4816 87.95% 0.82% 11442617 88.93% 0.37%
set multiorigin 34 0.62% 0.01% 183039 1.42% 0.01%
5476 100.00% 0.94% 12866744 100.00% 0.42%
both same multiorigin multiorigin 9869 1.79% 1.69% 12609229 0.42% 0.41%
public public 540032 98.20% 92.45% 2988739528 99.58% 97.35%
set set 8 0.00% 0.00% 9216 0.00% 0.00%
549909 100.00% 94.14% 3001357973 100.00% 97.76%
Routeviews+RIPE N/A multiorigin 1884 6.55% 0.32% 908601 1.63% 0.03%
public 26856 93.44% 4.60% 54919321 98.37% 1.79%
set 2 0.01% 0.00% 2816 0.01% 0.00%
28742 100.00% 4.92% 55830738 100.00% 1.82%

You can find the new CAIDA Prefix-to-AS Mapping Data Set here.

CAIDA contributions to ACM’s Internet Measurement Conference (IMC) 2022

Tuesday, October 18th, 2022 by Elena Yulaeva

ACM’s Internet Measurement Conference (IMC) is an annual highly selective venue for the presentation of Internet measurement and analysis research. The average acceptance rate for papers is around 25%. CAIDA researchers co-authored five papers and 3 posters that will be presented at this conference in Nice, France on October 25 – 27, 2022. We link to these publications below.

Investigating the impact of DDoS attacks on DNS infrastructure. Rafaele Sommese, KC Claffy, Roland van Rijswijk-Deij, Arnab Chattopadhyay, Alberto Dainotti, Anna Sperotto, and Mattijs Jonker. 2022.  This paper describes a newly developed scalable method to map DDoS attacks targeting or affecting DNS infrastructure. The measurements reveal evidence that millions of domains experienced  DDoS attacks during the recent 17-month observation window. Most attacks did not observably harm DNS performance, but in some cases, a 100-fold increase in DNS resolution time was observed. This research corroborates the value of known best practices to improve DNS resilience to attacks, including the use of anycast and topological redundancy in nameserver infrastructure.

Mind Your MANRS: Measuring the MANRS Ecosystem. Ben Du, Cecilia Testart, Romain Fontugne, Gautam Akiwate, Alex C. Snoeren, and kc claffy. 2022. Mutually Agreed on Norms on Routing Security (MANRS) is an industry-led initiative to improve Internet routing security by encouraging participating networks to implement a set of recommended actions. The goal of the paper is to evaluate the current state of the MANRS initiative in terms of its participants, their routing behavior, and its impact on the broader routing ecosystem, and discuss potential improvements. The findings confirm that MANRS participants are more likely to follow best practices than other similar networks on the Internet. However, within MANRS, not all networks take the MANRS mandate with the same rigor. This study demonstrates the need to continually assess the conformance of members for the prosperity of the MANRS initiative, and the challenges in automating such conformance checks.

Retroactive Identification of Targeted DNS Infrastructure HijackingGautam Akiwate, Rafaele Sommese, Mattijs Jonker, Zakir Durumeric, kc Claffy, Geofrey M. Voelker, and Stefan Savage. 2022. DNS infrastructure tampering attacks are particularly challenging to detect because they can be very short-lived, bypass the protections of TLS and DNSSEC, and are imperceptible to users. Identifying them retroactively is further complicated by the lack of fine-grained Internet-scale forensic data. This paper is the first attempt to make progress toward this latter goal. Combining a range of longitudinal data from Internet-wide scans, passive DNS records, and Certificate Transparency logs, we have constructed a methodology for identifying potential victims of sophisticated DNS infrastructure hijacking and have used it to identify a range of victims (primarily government agencies). The authors analyze possible best practices in terms of their measurability by third parties, including a review of DNS measurement studies and available data sets.

Stop, DROP, and ROA: Effectiveness of Defenses through the lens of DROPLeo Oliver, Gautam Akiwate, Matthew Luckie, Ben Du, and kc claffy. 2022. Malicious use of the Internet address space has been a persistent threat for decades. Multiple approaches to prevent and detect address space abuse include the use of blocklists and the validation against databases of address ownership such as the Internet Routing Registry (IRR) databases and the Resource Public Key Infrastructure (RPKI). The authors undertook a study of the effectiveness of these routing defenses through the lens of one of the most respected blocklists on the Internet: Spamhaus’ Don’t Route Or Peer (DROP) list. The authors show that attackers are subverting multiple defenses against malicious use of address space, including creating fraudulent Internet Routing Registry records for prefixes shortly before using them. Other attackers disguised their activities by announcing routes with spoofed origin ASes consistent with historic route announcements. The authors quantify the substantial and actively-exploited attack surface in unrouted address space, which warrants reconsideration of RPKI eligibility restrictions by RIRs, and reconsideration of AS0 policies by both operators and RIRs.

Where .ru? Assessing the Impact of Conflict on Russian Domain Infrastructure Mattijs Jonker, Gautam Akiwate, Antonia Afnito, kc claffy, Alessio Botta, Geofrey M. Voelker, Roland van Rijswijk-Deij, and Stefan Savage. 2022. The hostilities in Ukraine have driven unprecedented forces, both from third-party countries and in Russia, to create economic barriers. In the Internet, these manifest both as internal pressures on Russian sites to (re-)patriate the infrastructure they depend on (e.g., naming and hosting) and external pressures arising from Western providers disassociating from some or all Russian customers. This paper describes longitudinal changes in the makeup of naming, hosting, and certificate issuance for domains in the Russian Federation due to the war in Ukraine.


CAIDa also contributed to three extended abstracts:

“Observable KINDNS: Validating DNS Hygiene.” Sommese, Raffaele, Mattijs Jonker, kc claffy. ACM Internet Measurement Conference (IMC) Poster, 2022.

“PacketLab – Tools Alpha Release and Demo. Yan, Tzu-Bin, Yuxuan Chen, Anthea Chen, Zesen Zhang, Bradley Huffaker, Ricky K. P. Mok, Kirill Levchenko, kc claffy. ACM Internet Measurement Conference (IMC) Poster, 2022.

“A Scalable Network Event Detection Framework for Darknet Traffic.”Gao, Max, Ricky K. P. Mok, kc claffy. ACM Internet Measurement Conference (IMC) Poster, 2022.

CAIDA’s 2021 Annual Report

Monday, May 30th, 2022 by kc

The CAIDA annual report summarizes CAIDA’s activities for 2021 in the areas of research, infrastructure, data collection and analysis. Our research projects span: Internet cartography and performance; security, stability, and resilience studies; economics; and policy. Our infrastructure, software development, and data sharing activities support measurement-based Internet research, both at CAIDA and around the world, with focus on the health and integrity of the global Internet ecosystem.
The executive summary is excerpted below:

IRR Hygiene in the RPKI Era

Friday, April 1st, 2022 by Ben Du

The Border Gateway Protocol (BGP) is the protocol that networks use to exchange (announce) routing information across the Internet. Unfortunately, BGP has no mechanism to prevent the propagation of false announcements such as hijacks and misconfigurations. The Internet Route Registry (IRR) and Resource Public Key Infrastructure (RPKI) both emerged as different solutions to improve routing security and operation in the Border Gateway Protocol (BGP) by allowing networks to register information and develop route filters based on information other networks have registered.

The Internet Routing Registry (IRR) was first introduced in 1995 and remained a popular tool for BGP route filtering. However, route origin information in the IRR suffers from inaccuracies due to the lack of incentive for registrants to keep information up to date and the use of non-standardized validation procedures across different IRR database providers.

Over the past few years, the Resource Public Key Infrastructure (RPKI), a system providing cryptographically attested route origin information, has seen steady growth in its deployment and has become widely used for Route Origin Validation (ROV) among large networks.

Some networks are unable to adopt RPKI filtering due to technical or administrative reasons and continue using only existing IRR-based route filtering. Such networks may not be able to construct correct routing filters due IRR inaccuracies and thus compromise routing security.

In our paper IRR Hygiene in the RPKI Era, we at CAIDA (UC San Diego), in collaboration with MIT, study the scale of inaccurate IRR information by quantifying the inconsistency between IRR and RPKI. In this post, we will succinctly explain how we compare records and then focus on the causes of such inconsistencies and provide insights on what operators could do to keep their IRR records accurate.

IRR and RPKI trends

For our study we downloaded IRR data from 4 IRR database providers: RADB, RIPE, APNIC, and AFRINIC, and RPKI data from all Trust Anchors published by the RIPE NCC. Figure 1 shows IRR cover more IPv4 address space than RPKI, but RPKI grew faster than IRR, having doubled its coverage over the past 6 years.

Figure 1. IPv4 coverage of IRR and RPKI databases. RADB, the largest IRR database, has records representing almost 60% of routable IPv4 address space. In contrast, the RPKI covers almost 30% of that address space but has been steadily growing in the past few years.

Checking the consistency of IRR records


We classified IRR records following the procedure in Figure 2: first we check if there is a Route Origin Authorization (ROA) record in RPKI covering the IRR record, then in case there is one if the ASN is consistent, and finally, if the ASN is consistent, we check the prefix length compared to the maximum length attribute of RPKI records. Using this procedure we are left with 4 categories:

  1. Not In RPKI: If the prefix in an IRR record has no matching or covering prefix in RPKI.
  2. Inconsistent ASN: If the IRR record has matching RPKI records but none has the same ASN..
  3. Inconsistent Max Length category: If the IRR record has the same ASN as its matching RPKI records but the prefix length in the IRR record is larger than the Max Length field in the RPKI records.
  4. Consistent: If the IRR records have a matching RPKI record with the same ASN and according maximum length attribute.

Figure 2. Classification of IRR records

Which is more consistent? RADB vs RIPE, APNIC, AFRINIC


As of October 2021, we found only 38% of RADB records with matching ROAs were consistent with RPKI, meaning that there were more inconsistent records than consistent records in RADB, see Figure 3 (left) . In contrast, 73%, 98%, and 93% of RIPE, APNIC, and AFRINIC IRR records were consistent with RPKI, showing a much higher consistency than RADB, see Figure 3 (right).


We attribute the big difference in consistency to a few reasons. First, the IRR database we collected from the RIRs are their respective authoritative databases, meaning the RIRs manages all the prefixes, and verifies the registration of IRR objects with address ownership information. This verification process is stricter than that of RADB and leads to the higher quality of IRR records. Second, APNIC provides its registrants a management platform that automatically creates IRR records for a network when it registers its prefixes in RPKI. This platform contributes to a larger number of consistent records compared to other RIRs.


Figure 3. RIR-managed IRR databases have higher consistency with RPKI compared to RADB.


What caused the inconsistency?

In our analysis we found that inconsistent max length was mostly caused by IRR records that are too specific, as the example shown in Figure 4, and to a lesser extent by misconfigured max length attribute in RPKI.  We also found that inconsistent ASN records are largely caused by customer networks failing to remove records after returning address space to their provider network, such as the example in Figure 5.

Inconsistent Max Length (Figure 4)

  • 713 caused by misconfigured RPKI Max Length.
  • 39,968 caused by too-specific IRR record.

Figure 4. IRR record with inconsistent max length: the IRR prefix length exceeds the RPKI max length value.


Inconsistent ASN (Figure 5)

  • 4,464 caused by customer network failing to remove RADB records after returning address space to provider network.

Figure 5. IRR record with inconsistent ASN: the IRR record ASN differs from the RPKI record ASN.


To Improve IRR accuracy


Although RPKI is becoming more widely deployed, we do not see a decrease in IRR usage, and therefore we should improve the accuracy of information in the IRR. We suggest that networks keep their IRR information up to date and IRR database providers implement policies that promote good IRR hygiene.

Networks currently using IRR for route filtering can avoid the negative impact of inaccurate IRR information by using IRRd version 4, which validates IRR information against RPKI, to ignore incorrect IRR records.

Response to NSTC’s JCORE

Wednesday, May 19th, 2021 by David Clark and kc claffy

A year ago January 2020, k claffy, CAIDA Director and UCSD Adjunct Professor of Computer Science and Engineering responded with collaborator David Clark, Senior Scientist at MIT’s Computer Science and Artificial Intelligence Laboratory to the Request for Information (RFI) put out by the National Science and Technology Council’s (NSTC) Joint Committee on the Research Environment (JCORE). The response establishes the critical importance of the internet to the infrastructure of society and the need for governments, and specifically the U.S. government, to send a strong signal to the private sector through high-level policy making that the only path to understanding the characteristics of the internet comes via data sharing and that responsible sharing of data for documented scientific research will not generate corporate liability.

Another focus and benefit to the policy for which we call comes with the development and delivery of academic training of professionals to work with large data sets focused on communications and networking.

You can see the complete response and more related material posted in CAIDA resource catalog.