According to the Best Available Data

CAIDA’s Annual 2024 Report

August 15th, 2025 by kc

The CAIDA annual report summarizes CAIDA’s activities for 2024 in the areas of research, infrastructure, data collection and analysis. The executive summary is excerpted below:

Read the rest of this entry »

Posted in Commentaries | No Comments »

New Developments in 100 GB Anonymized Passive Traces

March 18th, 2025 by Elena Yulaeva

We want to share important updates regarding our 100 GB anonymized passive trace initiative. The Los Angeles–San Jose link has been upgraded out of the range of this monitoring capability, so the monitor has transitioned to capture traffic traces on a different link.

1. Transition to a New 100 GB Link

Beginning in January 2025, we shifted our trace collection to a 100 GB link between Los Angeles and Dallas. This change opens up opportunities to study network dynamics across different infrastructures and geographies. For additional details, please visit https://catalog.caida.org/dataset/passive_2025_pcap_100g.

2. Complementary Datasets Based on User Feedback

Based on a survey of 100 GB anonymized trace users, we have created and shared two complementary datasets to better serve the research community (1) a compilation of metadata statistics about the trace; and (2) a smaller (5-second) subset of data, which is less than 1GB of data compared to the full one-hour capture of about 600GB.

Passive 100G Metadata Dataset: a compilation of statistics about the traffic trace
This publicly available dataset provides key statistics for our restricted anonymized data. It includes:
trace date and time; trace duration (hours, minutes, and seconds); total packets and bytes captured; mean packet rate (packets per second); mean bit rate (bits per second); and mean link load as a fraction of the nominal maximum link capacity. You can access this dataset here.
Restricted Anonymized Two-Way Traffic Packet Header Traces Sampler
Part of the 2024 Anonymized Traces 100 GB dataset, this resource consists of a 5-second snapshot of bidirectional traffic captured in November 2024. This dataset allows researchers to evaluate the usability of the data before committing to downloading larger volumes. The size of this sampler dataset is less than 1 GB, making it a lightweight option for quick assessments. This sampler dataset is available here.

Posted in Commentaries | No Comments »

Observing the DDoS Landscape Requires Collaboration

December 21st, 2024 by Raphael Hiesgen

Distributed denial-of-service (DDoS) attacks are an ever-present phenomenon on the Internet. Over the years, many organizations and groups have undertaken efforts to reduce the feasibility and effectiveness of DDoS attacks, such as by disabling attack vectors (e.g., NTP’s get monlist), deploying source address validation (ingress & egress filtering), and enlisting law enforcement (booter takedowns). In addition, an industry of DDoS protection companies sells attack mitigation services. While these approaches have had some impact–who knows how dire the situation would be without such efforts?–DDoS remains a persistent threat.

A clear understanding and view of the DDoS landscape is the basis for developing and improving countermeasures. Our recent study comparatively evaluated long-term DDoS trends in academia and industry to better understand the current limitations. We focused on two classes of DDoS attacks: direct-path (DP) attacks and reflection-amplification (RA) attacks. In a direct-path attack, packets are sent directly to the target of the attack. One group of DP attacks establishes connections to abuse application layer protocols, while others use randomly spoofed source addresses. In a reflection-amplification attack, requests are spoofed to contain the source address of the attack target and sent to a reflective third party service (e.g., DNS), which then sends the replies to the victim.

Collecting DDoS Datasets

Our study analyzed longitudinal DDoS trends across academia and industry. We collected 10 datasets from seven observatories listed in Table 1. Each observatory shared 4.5 years of weekly attack counts for our long-term trend analysis. The observatories from academia additionally shared raw DDoS event data, which enabled us to analyze the visibility of targets across observatories. We further collected and analyzed 24 DDoS threat reports from 22 companies for the year 2022. We published the detailed analysis as an artifact at https://ddoscovery.github.io.

Observatory	Type	Coverage	DP Attack Trends	RA Attack Trends
UCSD NT	Network Telescope	12M IPs	Increase 🔺	(not applicable)
ORION NT	Network Telescope	500k IPs	Increase 🔺	(not applicable)
Netscout Atlas	On-path Network	Proprietary	Increase 🔺	Increase 🔺
Akamai Prolexic	On-path Network	Proprietary	Neutral 🔴	Neutral 🔴
IXP Blackholing	On-path Network	Proprietary	Increase 🔺	Decrease 🔻
AmpPot	Honeypot	~30 IPs	(not applicable)	Neutral 🔴
Hopscotch	Honeypot	65 IPs	(not applicable)	Decrease 🔻
Industry Reports	PDF/website/etc.	22 Companies	Increase 🔺	Increase 🔺 and Decrease 🔻

Long-term Attack Trends Depend on the Viewpoint

Our analysis of attack trends revealed that even observatories that agree on long-term trends (Table 1) exhibit many differences in short-term patterns, reflecting different views of the DDoS landscape. For the analysis, we normalized the weekly attack counts to the median of the first 15 weeks. We plot the exponentially weighted moving average (EWMA) with a 12-week window and linear regressions starting in 2019 and ending in 2022.

Direct-path Attack Trends

Both network telescopes (Fig. 1) observed an increase in attacks during the measurement period. They repeatedly saw short peaks that at least tripled attack counts, but did not coincide across both observatories. ORION saw its largest peaks in 2022Q1 and Q2, with smaller peaks in 2019Q2 and mid-2021. In contrast, UCSD saw its largest peak in 2023, with small peaks in each year. While ORION observed a decline in 2023 compared to 2022, UCSD trends remained positive.

Figure 1 a): The long-term trends of (randomly-spoofed) direct-path attacks observed by UCSD NT.

Figure 1 b): The long-term trends of (randomly-spoofed) direct-path attacks observed by ORION NT

The time series from one of our industry observatories (Figure 2) did not show large peaks. Netscout Atlas (Fig. 2a) experienced stable growth, except in 2021. Akamai Prolexic (Fig. 2b) fluctuated around its baseline with a slight decrease in attacks overall. Both companies likely have stable customer bases and are less affected by sudden bursts in attacks. Both companies saw a rise in attacks in 2020, followed by a decline in 2021. Netscout saw a rise throughout 2022, while Akamai saw peaks in 2022 but no persistent increase. Both companies detected a rise in attacks in 2023.

Figure 2 a): Long-term trends of direct-path attacks observed by Netcout Atlas.

Figure 2 b): Long-term trends of direct-path attacks observed by Akamai Prolexic.

Reflection-amplification Attack Trends

The honeypots in our study (Fig. 3) showed a significant increase in attacks in 2020 after a decline in 2019Q4. Hopscotch recorded most attacks early in 2020, while AmpPot saw its highest peaks later, coinciding with a decline in attack counts at Hopscotch. Both HPs detected a continued decline in 2021, aligning with industry efforts to deploy SAV (see discussion in the paper, Section 2.3). While both time series shared a peak in mid-2022, the peak was much more pronounced in the Hopscotch data.

Figure 3 a): Long-term trends of reflection-amplification attacks observed by Hopscotch. The red dashed lines mark DDoS takedown efforts by law enforcement.

Figure 3 b): Long-term trends of reflection-amplification attacks observed by AmpPot. The red dashed lines mark DDoS takedown efforts by law enforcement.

Akamai Prolexic (Fig. 4 a) experienced only small variations in attacks until 2020Q3 before they surged above 2x its baseline in 2021Q1. This peak coincided with a peak in the IXP time series (Fig. 4 b). However, the IXP already saw a steep rise in attacks starting in 2019Q4, with peaks in 2021Q1 and Q2. Both time series declined until the end of 2022, with more pronounced peaks in the Akamai time series. They both detected an increase in attacks in 2023 but had a neutral to negative trend overall.

Figure 4 a): Long-term trends of reflection-amplification attacks observed by Akamai Prolexic. The red dashed lines mark DDoS takedown efforts by law enforcement.

Figure 4 b): Long-term trends of reflection-amplification attacks observed in IXP Blackholing. The red dashed lines mark DDoS takedown efforts by law enforcement.

Booter takedowns by law enforcement. We marked known booter takedowns by law enforcement with red dashed lines in Figures 3 and 4. Booters offer DDoS-as-a-service usually relying on reflection-amplification attacks. The first takedown in late 2022Q4 led to immediate, small valleys in all graphs. In contrast, the 2023Q2 takedown did not affect the AmpPot time series (Fig. 3 b). Instead, attack counts increased. While we do not know how trends would have evolved without interference, the impact on DDoS trends appears limited in our time series.

Why do Views on DDoS Differ?

We investigated the cause of these differences by comparing DDoS targets across observatories (Section 7 in our paper). The analysis revealed that our four observatories from academia saw a substantial share of targets that were not seen by the other three. While the overlap among observatories of the same type, i.e., either honeypots or network telescopes, was considerable, each observatory provided a unique view into the DDoS attack landscape. This highlights the limitations of individual datasets and the root cause of different views of the DDoS landscape. Overlap between observatories from academia and industry was similarly limited. Thus, cooperation with industry partners is a valuable–and potentially necessary–source for improved visibility.

Recommendations

Our analysis of 10 longitudinal datasets from 7 observatories revealed the limited view that researchers have as a basis for DDoS research. Without an accurate view we can neither accurately plan actions nor evaluate their outcome.

Advice for researchers: DDoS research tries to make global inferences based on local views. Accepting and acknowledging the limitations of available datasets is important for accurate interpretation and comparison. When possible, researchers should analyze multiple datasets, which generally requires collaboration with operators. In parallel, stakeholders across academia and industry need to converge on specific frameworks for data sharing to facilitate comparison. Unexplored details include the definition of incidents and their impact, data formats to accommodate comparisons, disclosure control technologies, and access policies to allow rigorous independent analyses.

Advice for threat-intelligence companies: Collaborate with researchers. Gathering reliable data on DDoS attacks is challenging. Getting additional data from different vantage points–especially those that academia usually has no access to–is invaluable for researchers. We found that many DDoS reports are only available after providing email addresses and are not archived for long-term access. Lowering the effort to read reports and archiving historical reports increases visibility and perspective on trends over time. Since language is often not consistent across companies and since vantage points and methodologies differ, comparisons to previous reports from the same company are especially relevant to analyze long-term changes in the DDoS landscape.

Advice for operators: Spoofing is an integral mechanism abused in many DDoS attacks, including all reflection-amplification attacks and a significant subset of direct-path attacks. Source address validation (SAV) is an effective tool to stop these attacks. Supporting ongoing research and extending measurement systems to quantify the deployment of SAV and reveal persistent sources of spoofed packets is a challenging but worthwhile undertaking.

Let’s collaborate to achieve a comprehensive view of DDoS!

Paper Reference: Raphael Hiesgen, Marcin Nawrocki, Marinho Barcellos, Daniel Kopp, Oliver Hohlfeld, Echo Chan, Roland Dobbins, Christian Doerr, Christian Rossow, Daniel R. Thomas, Mattijs Jonker, Ricky Mok, Xiapu Luo, John Kristoff, Thomas C. Schmidt, Matthias Wählisch, KC Claffy, The Age of DDoScovery: An Empirical Comparison of Industry and Academic DDoS Assessments, In: Proc. of ACM Internet Measurement Conference (IMC), p. 259–279, ACM : New York, 2024. https://doi.org/10.1145/3646547.3688451

Posted in Commentaries | No Comments »

AS Reachability Visualization

December 4th, 2024 by Bradley Huffaker

The AS Reach Visualization provides a geographic breakdown of the number of ASes reachable through an AS’s customer, peers, providers, or an unknown neighbor. The interactive interface to the visualization can be found at https://www.caida.org/catalog/media/visualizations/as-reach/

Independent networks (Autonomous Systems, or ASes) engage in typically voluntary bilateral interconnection (“peering”) agreements to provide reachability to each other for some subset of the Internet. The implementation of these agreements introduces a non-trivial set of constraints regarding paths over which Internet traffic can flow, with implications for network operations, research, and evolution. Realistic models of Internet topology, routing, workload, and performance
must account for the underlying economic dynamics.

Although these business agreements between ISPs can be complicated, the original model introduced by Gao (On inferring autonomous system relationships in the Internet), abstracts business relationships into the following three most common types:

customer-to provider: in which a customer network gets access to the internet from a provider network
provider-to-customer: the reverse of the customer-to-provider, the provider provides access to it’s customer
peer-to-peer: where both ASes exchange traffic between their customers

An AS’s Reach is defined as the set of ASes the target AS can reach through its customers, peers, providers, or unknown. The Customer Reach is the set of ASes reachable through the AS’s customers. The Peer Reach is the set of ASes that are not in the Customer Reach, but reachable through the AS’s peers. The Provider Reach is the set of ASes not in the Customer or Peer Reach, but reachable through the AS’s provider. The Unknown Reach is the set of ASes not in the Customer, Peer, or Provider Reach. More details at https://www.caida.org/catalog/media/visualizations/as-reach/

Posted in Commentaries, Routing, Topology, Visualization | No Comments »

CAIDA’s 2023 Annual Report

October 23rd, 2024 by kc

The CAIDA annual report (quite a bit later than usual this year due to an unprecedented level of activity in 2024 which we will report on earlier next year!) summarizes CAIDA’s activities for 2023 in the areas of research, infrastructure, data collection and analysis. The executive summary is excerpted below:
Read the rest of this entry »

Posted in Commentaries, Data Collection, Domain Name System (DNS), Economics, Future, Geolocation, Measurement, Meetings, Network Telescope, Performance, Policy, Review, Routing, Security, Topology, Updates, Visualization | No Comments »

Streamlining Access to BGP Routing Data

October 7th, 2024 by Elena Yulaeva

Users can now request access to the CAIDA BGP2GO (https://bgp2go.caida.org) platform. BGP2GO lets users find the MRT files that contain a specific resource and thus avoid the download and processing of unrelated data. Users can compile a customize list of relevant MRT files, share that exact list with others, or stream the matching MRT files (e.g., using BGPStream).

Finding the needle in the haystack

Public BGP route collectors receive update messages from over 1,000 BGP routers worldwide. These updates are archived and made available for download and analysis. However, the data is organized in a way that often requires downloading vast amounts of unrelated information making it increasingly difficult to focus on the needles in the haystack.

Imagine you’re a network operator announcing a new prefix and you want to analyze its propagation. You may not know exactly which collector or MRT file contains the relevant data, forcing you to download and sift through unnecessary information.

BGP2GO solves this problem by allowing users to easily select only the files they need, saving significant time and effort.

We have developed a comprehensive index of all prefixes, ASNs, and communities across RouteViews update files (https://archive.routeviews.org), along with BGP2GO, a user interface that allows you to easily select the specific files you need (https://bgp2go.caida.org)

Use Case: Analyzing Prefix Propagation with PEERING Testbed, BGP2GO, and BGPStream

Let’s walk through a real-world example. Suppose you’re a network operator advertising a prefix and want to examine how it propagates across RouteViews collectors. Using the PEERING testbed (https://peering.ee.columbia.edu/), we performed a controlled advertisement of the prefix 184.164.246.0/24 during August 2024.

Step 1: Looking up the prefix and setting filters

In this case, we search for the prefix 184.164.246.0/24 in BGP2GO, filtering for data from August 2024. The platform identifies 33,251 announcements and withdrawals, spread across 746 files (2.32GB) from 26 collectors. This curated selection lets us focus only on the data we need, saving time and resources.

https://bgp2go.caida.org/details?pre=184.164.246.0/24&years=2024&months=8

Step 2: Stream selected files in the terminal

Once the relevant MRT files are selected, you can stream them for further processing using BGPStream. Clicking the BGPSTREAM button in the top right corner, right above the “collectors” chart gives instructions on how to stream the files in your terminal.

For this example, we use the following command (see step 4a above) in the terminal to process the relevant files:

bgpreader -k 184.164.246.0/24 -d csvfile -o csv-file="bgp2go.csv"

This command leverages bgpreader to read the files and output the lines that pertain to the resource (prefix 184.164.246.0/24) . The following screenshot shows an excerpt of routes related to the prefix 184.164.246.0/24 extracted from all files that contain this prefix in August 2024.

To learn more about the bgpreader command and its options, visit the BGPStream website (https://bgpstream.caida.org/) .

Getting Access to BGP2GO

To request access to the BGP2GO platform, a user should first create an account with the CAIDA Services Single Sign On (SSO) system ( https://auth.caida.org/realms/CAIDA/account ) by providing basic information and undergoing authentication via Keycloak. After authentication, a user can request access to the BGP2Go platform (which requires a CAIDA-authorized account with bgp2go-api:read role) by going to https://bgp2go.caida.org/ and filling out the request form. If you have any questions or problems, please contact us at data-info@caida.org

Posted in Review | No Comments »

Seeking Beta Users for 100 GB link Anonymized Passive Traces

August 11th, 2024 by Elena Yulaeva

We are seeking beta users for our new Anonymized Two-Way Passive Trace dataset, captured on a 100 GB link between Los Angeles and San Jose. Beginning in April 2024, we have been capturing a one-hour trace each month. To protect privacy, we strip all packet payloads after the layer 4 headers, and anonymize IP (v4 and v6) addresses with CryptoPan. The monthly data is provided in two separate files, one for each direction of traffic.

This dataset includes the following metadata fields:

Monitor Name
Year and month (including a link to a graphical display of breakup by protocol, application, and country)
Start time of trace (UTC)
Stop time of trace (UTC)
Number of IPv4 packets
Number of IPv6 packets
Unknown packets (as a fraction of the total number of packets)
Transmission rate in packets per second
Transmission rate in bits per second
Link load (as a fraction of the nominal maximum load for a 100 GB link)
Average packet size (bytes) (including a link to a graph of the packet size distribution).

The data is stored in our Swift OpenStack object storage. Each one-directional anonymized pcap file captured monthly is approximately 1TB in size, so users will need more than 2TB of space to download the entire one-hour capture. For those without access to such storage and/or processing capacity, contact us and we will discuss other alternatives. We are also releasing statistical information for each hourly trace.

Academic researchers can request access to the data by filling out and submitting the request form.

We will prioritize users who:

Have significant experience with network traffic analysis
Demonstrate a clear plan for how they will use the dataset
Can commit to regular feedback and participation throughout the beta testing period

Posted in Review | No Comments »

Help CAIDA Refine and Enhance the FANTAIL Traceroute Analytics platform.

August 9th, 2024 by Elena Yulaeva

We are excited to announce the beta testing phase of the Facilitating Advances in Network Topology Analysis (FANTAIL) platform (https://www.caida.org/projects/fantail/), a cutting-edge topology query system designed to search vast archives of raw Internet end-to-end path (traceroute) measurement data. FANTAIL is poised to support and advance various research domains within the Computer and Information Science and Engineering (CISE) field that heavily rely on the emerging sub-discipline of Internet cartography. Key areas of focus include:

Understanding the intricate ownership and interconnection structures and dynamics of Internet infrastructure.
Exploring methods for device identification and characterization within the digital landscape.
Enhancing the ability to detect and respond to network outages and route hijacking incidents.
Investigating network congestion patterns and their impact on data flow and quality of service.
Identifying and mitigating vulnerabilities within network topologies.

FANTAIL consists of four components:

Interactive Web Interface: FANTAIL Web Interface
Application Programming Interface (API): Built on web standards (FANTAIL API Documentation)
Full-Text Search System
Big Data Processing System

The system’s central data type is the traceroute path, representing the inferred IP-level Internet path that network traffic would take between two hosts, from the measurement vantage point to the destination, as determined with the traceroute technique by scamper (https://catalog.caida.org/software/scamper). FANTAIL leverages annotated and indexed data generated through the utilization of Spark, SQLite, and Elasticsearch, originating from CAIDA Internet traceroute probing data dating back to 2015.

Academic researchers interested in accessing the platform can request access by emailing fantail-info@caida.org.

We will prioritize users who can commit to regular feedback and participation throughout the beta testing period.

Posted in Review | 1 Comment »

Understanding the deployment of public recursive resolvers

May 6th, 2024 by Matthew Luckie

This is the third in a series of essays (two earlier blog posts [1, 2]) about CAIDA’s new effort to reduce the barrier to performing a variety of Internet measurements.

We were recently asked about running Trufflehunter, which infers the usage properties of rare domain names on the Internet by cache snooping public recursive resolvers, on Ark. The basic idea of Trufflehunter is to provide a lower bound of the use of a domain name by sampling caches of large recursive resolvers. One component of Trufflehunter is to identify the anycast instances that would answer a given vantage point’s DNS queries. Trufflehunter uses a series of TXT queries to obtain the anycast instance of a given public recursive resolver, and considers four large public recursive resolvers: Cloudflare (1.1.1.1), Google (8.8.8.8), Quad9 (9.9.9.9), and OpenDNS (208.67.220.220). The original paper describes the queries in section 4.1, which we summarize below, highlighting the interesting parts of each response with blue color.

Cloudflare returns an airport code representing the anycast deployment location used by the VP:$ host -c ch -t txt id.server 1.1.1.1 id.server descriptive text "AKL"

Google returns an IP address representing the anycast deployment used by the VP, which can be mapped to an anycast deployment location with a second query:
$ host -t txt o-o.myaddr.l.google.com 8.8.8.8 o-o.myaddr.l.google.com descriptive text "172.253.218.133" o-o.myaddr.l.google.com descriptive text "edns0-client-subnet <redacted>/24"
$ host -t txt locations.publicdns.goog locations.publicdns.goog descriptive text "34.64.0.0/24 icn " ... "172.253.218.128/26 syd " "172.253.218.192/26 cbf " ...

Quad9 returns the hostname representing the resolver that provides the answer:
$ host -c ch -t txt id.server 9.9.9.9 id.server descriptive text "res100.akl.rrdns.pch.net"

and finally, OpenDNS returns a bunch of information in a debugging query, which includes the server that handles the query:
$ host -t txt debug.opendns.com 208.67.220.220 debug.opendns.com descriptive text "server r2004.syd" debug.opendns.com descriptive text "flags 20040020 0 70 400180000000000000000007950800000000000000" debug.opendns.com descriptive text "originid 0" debug.opendns.com descriptive text "orgflags 2000000" debug.opendns.com descriptive text "actype 0" debug.opendns.com descriptive text "source <redacted>:42845"

Each of these queries requires a slightly different approach to extract the location of the anycast instance. Our suggestion is for experimenters to use our newly created python library, which provides programmatic access to measurement capabilities of Ark VPs (described in two earlier blog posts [1, 2]). The code for querying the recursive resolvers from all VPs is straight forward, and is shown below. First, on lines 15-27, we get the Google mapping, so that we can translate the address returned to an anycast location. This query requires TCP to complete, as the entry is larger than can fit in a UDP payload (the response is 9332 bytes at the time of writing this blog). This query is synchronous (we wait for the answer before continuing, because the google queries that follow depend on the mapping) and is issued using a randomly selected Ark VP. Then, on lines 29-38, we issue the four queries to each of the anycasted recursive resolvers from each VP. These queries are asynchronous; we receive the answers as each Ark VP obtains a response. On lines 40-82, we process the responses, storing the results in a multi-dimensional python dictionary, which associates each Ark VPs with their anycast recursive resolver location, as well as the RTT between asking the query and obtaining the response. Finally, on lines 84-96, we dump the results out in a nicely formatted table that allows us to spot interesting patterns.

01 import argparse 02 import datetime 03 import ipaddress 04 import random 05 import re 06 from scamper import ScamperCtrl 07 08 def _main(): 09 parser = argparse.ArgumentParser(description='get public recursive locs') 10 parser.add_argument('sockets') 11 args = parser.parse_args() 12 13 ctrl = ScamperCtrl(remote_dir=args.sockets) 14 15 # pick an ark VP at random to issue the query that gets the mapping 16 # of google recursive IP to location 17 goog_nets = {} 18 obj = ctrl.do_dns('locations.publicdns.goog', 19 inst=random.choice(ctrl.instances()), 20 qtype='txt', tcp=True, sync=True) 21 if obj is None or len(obj.ans_txts()) == 0: 22 print("could not get google mapping") 23 return 24 for rr in obj.ans_txts(): 25 for txt in rr.txt: 26 net, loc = txt.split() 27 goog_nets[ipaddress.ip_network(net)] = loc 28 29 # issue the magic queries to get the instance that answers the query 30 for inst in ctrl.instances(): 31 ctrl.do_dns('o-o.myaddr.l.google.com', server='8.8.8.8', qtype='txt', 32 attempts=2, wait_timeout=2, inst=inst) 33 ctrl.do_dns('id.server', server='1.1.1.1', qclass='ch', qtype='txt', 34 attempts=2, wait_timeout=2, inst=inst) 35 ctrl.do_dns('id.server', server='9.9.9.9', qclass='ch', qtype='txt', 36 attempts=2, wait_timeout=2, inst=inst) 37 ctrl.do_dns('debug.opendns.com', server='208.67.220.220', qtype='txt', 38 attempts=2, wait_timeout=2, inst=inst) 39 40 # collect the data 41 data = {} 42 for obj in ctrl.responses(timeout=datetime.timedelta(seconds=10)): 43 vp = obj.inst.name 44 if vp not in data: 45 data[vp] = {} 46 dst = str(obj.dst) 47 if dst in data[vp]: 48 continue 49 data[vp][dst] = {} 50 data[vp][dst]['rtt'] = obj.rtt 51 52 for rr in obj.ans_txts(): 53 for txt in rr.txt: 54 if dst == '8.8.8.8': 55 # google reports an IPv4 address that represents 56 # the site that answers the query. we then map 57 # that address to a location using the mapping 58 # returned by the locations TCP query. 59 try: 60 addr = ipaddress.ip_address(txt) 61 except ValueError: 62 continue 63 for net, loc in goog_nets.items(): 64 if addr in net: 65 data[vp][dst]['loc'] = loc 66 break 67 elif dst == '1.1.1.1': 68 # Cloudflare replies with a single TXT record 69 # containing an airport code 70 data[vp][dst]['loc'] = txt 71 elif dst == '9.9.9.9': 72 # Quad9 reports a hostname with an embedded 73 # airport code. 74 match = re.search("\\.(.+?)\\.rrdns\\.pch\\.net", txt) 75 if match: 76 data[vp][dst]['loc'] = match.group(1) 77 elif dst == '208.67.220.220': 78 # opendns reports multiple TXT records; we want the one 79 # that looks like "server r2005.syd" 80 match = re.search("^server .+\\.(.+?)$", txt) 81 if match: 82 data[vp][dst]['loc'] = match.group(1) 83 84 # format the output 85 print("{:15} {:>13} {:>13} {:>13} {:>13}".format( 86 "# vp", "google", "couldflare", "quad9", "opendns")) 87 for vp, recs in sorted(data.items()): 88 line = f"{vp:15}" 89 for rec in ('8.8.8.8', '1.1.1.1', '9.9.9.9', '208.67.220.220'): 90 cell = "" 91 if 'loc' in recs[rec]: 92 rtt = recs[rec]['rtt'] 93 loc = recs[rec]['loc'] 94 cell = f"{loc} {rtt.total_seconds()*1000:5.1f}" 95 line += f" {cell:>13}" 96 print(line) 97 98 if __name__ == "__main__": 99 _main()

The code runs quickly — no longer than 10 seconds (if one of the VPs is slow to report back), illustrating the capabilities of the Ark platform.

The output of running this program is shown below. We have highlighted cells where the RTT was at least 50ms larger than the minimum RTT to any of the large recursive resolvers for the given VP. These cells identify low-hanging fruit for operators, who could examine BGP routing policies with the goal of selecting better alternative paths.

# vp google couldflare quad9 opendns abz-uk.ark lhr 24.4 LHR 16.5 lhr 16.9 lon 16.4 abz2-uk.ark lhr 23.4 MAN 8.7 man 8.4 man1 8.5 acc-gh.ark jnb 246.7 JNB 167.3 acc 0.5 cpt1 236.6 adl-au.ark mel 28.1 ADL 5.5 syd 22.6 mel1 14.0 aep-ar.ark scl 29.4 EZE 8.0 qaep2 4.6 sao1 34.0 aep2-ar.ark scl 38.2 EZE 6.6 qaep2 5.7 sao1 42.5 akl-nz.ark syd 26.9 AKL 2.6 akl2 4.0 syd 26.1 akl2-nz.ark syd 29.2 AKL 4.9 akl2 4.0 syd 26.9 ams-gc.ark grq 5.0 FRA 6.3 ams 0.8 ams 1.1 ams3-nl.ark grq 6.0 AMS 2.1 ams 1.3 ams 1.9 ams5-nl.ark grq 5.1 AMS 1.2 ams 1.9 ams 1.7 ams7-nl.ark grq 6.0 AMS 2.4 ams 1.8 ams 2.3 ams8-nl.ark grq 14.8 AMS 11.5 fra 17.7 ams 11.8 arn-se.ark lpp 9.7 ARN 2.0 arn 0.8 cph1 10.5 asu-py.ark scl 55.4 EZE 22.6 asu 0.8 sao1 57.0 atl2-us.ark atl 9.8 DFW 23.6 qiad3 20.5 atl1 4.6 atl3-us.ark atl 22.7 ATL 13.1 atl 17.0 atl1 19.7 aus-us.ark dfw 80.7 DFW 56.6 dfw 46.6 dfw 17.6 avv-au.ark mel 142.3 MEL 9.4 syd 22.6 mel1 8.5 bcn-es.ark mad 38.0 BCN 13.3 bcn 0.5 mad1 9.8 bdl-us.ark iad 13.3 EWR 5.0 lga 4.9 ash 10.6 bed-us.ark iad 30.6 BOS 15.2 bos 13.5 bos1 19.8 beg-rs.ark mil 62.5 BEG 1.2 beg 0.7 otp1 12.9 bfi-us.ark dls 11.7 SEA 3.8 xsjc1 96.9 sea 10.5 bjl-gm.ark bru 87.5 CDG 71.1 bjl 6.7 cdg1 73.4 bna-us.ark atl 11.4 BNA 9.0 atl 9.5 atl1 9.3 bna2-us.ark cbf 25.5 ORD 19.7 ord 12.8 chi 12.8 bos6-us.ark iad 17.6 BOS 5.6 qiad3 15.5 bos1 5.7 bre-de.ark grq 21.4 TXL 23.1 ber 17.2 ams 23.7 btr-us.ark dfw 20.3 ORD 29.5 dfw 13.2 dfw 11.4 bwi2-us.ark iad 12.7 IAD 4.4 qiad3 3.6 rst1 3.1 cdg-fr.ark mil 24.9 MRS 1.5 mrs 1.0 mrs1 11.6 cdg3-fr.ark bru 8.5 CDG 3.9 ams 13.7 cdg1 5.5 cgs-us.ark iad 4.5 EWR 7.9 iad 2.2 ash 2.1 cjj-kr.ark hkg 63.4 qhnd2 70.7 hkg 47.3 cld4-us.ark lax 22.7 LAX 15.7 bur 104.1 lax 14.3 cld5-us.ark lax 52.6 LAX 20.7 bur 103.1 lax 21.9 cld6-us.ark lax 14.5 LAX 5.2 qlax1 7.0 lax 6.0 cos-us.ark cbf 22.9 DEN 7.3 ord 36.0 den1 12.6 dar-tz.ark jnb 224.6 NBO 14.0 dar 1.0 jnb 51.1 dar2-tz.ark jnb 223.0 DAR 0.9 dar 0.9 jnb 48.0 dmk-th.ark sin 32.2 SIN 28.0 bkk 2.4 nrt2 95.0 dtw2-us.ark cbf 18.1 DTW 4.6 dtw 4.2 chi 10.7 dub-ie.ark lhr 18.9 DUB 1.2 dub 0.7 dub1 0.7 dub2-ie.ark lhr 18.7 DUB 1.8 dub 1.1 dub1 1.2 dub3-ie.ark lhr 35.5 ZRH 45.6 dub 16.9 dub1 11.6 ens-nl.ark grq 9.3 AMS 5.2 ams 4.0 ams 4.9 eug-us.ark dls 15.7 SJC 18.3 sea 6.5 sea 6.5 fra-gc.ark fra 9.1 FRA 1.1 fra 1.1 fra 1.1 gig-br.ark gru 125.2 GIG 2.2 qrio1 1.4 rio1 1.1 gva-ch.ark ZRH 5.3 gva 1.0 fra 10.8 gye-ec.ark chs 128.7 MIA 66.8 quio2 10.7 mia 77.1 ham-de.ark grq 19.7 HAM 8.5 ber 11.0 fra 17.7 her2-gr.ark mil 64.9 ath 7.5 mil1 32.6 hkg4-cn.ark tpe 13.6 HKG 1.0 qhkg3 4.9 hkg 9.5 hkg5-cn.ark hkg 16.3 HKG 2.6 qhkg3 3.9 hkg 3.6 hlz2-nz.ark syd 40.9 AKL 16.9 akl 13.8 syd 38.2 hnd-jp.ark nrt 6.5 NRT 3.4 qhnd2 80.3 nrt 3.6 hnl-us.ark dls 82.3 HNL 1.1 sea 75.2 sea 75.3 iev-ua.ark waw 45.5 FRA 29.8 qwaw2 14.3 wrw1 14.9 igx2-us.ark iad 16.1 EWR 16.9 qiad3 11.5 atl1 11.8 ind-us.ark atl 25.1 IND 1.1 ord 8.5 atl1 11.0 ixc-in.ark del 78.8 DEL 0.8 qsin1 0.7 mum2 0.7 jfk-us.ark iad 9.6 EWR 1.5 lga 0.6 nyc 1.0 kgl-rw.ark jnb 231.6 NBO 15.0 kgl 3.3 jnb 73.6 ktm-np.ark del 98.6 DEL 30.4 ktm 7.6 mum1 94.4 las-us.ark lax 15.5 LAX 8.4 pao 15.9 lax 8.2 lax3-us.ark lax 10.5 LAX 3.0 bur 2.0 lax 2.0 lcy2-uk.ark lhr 11.5 MAN 7.4 lhr 11.2 lon 2.2 lej-de.ark fra 21.2 FRA 9.7 fra 12.5 fra 9.6 lex-us.ark iad 17.9 IAD 15.7 iad 17.3 atl1 25.9 lgw-uk.ark lhr 31.3 LHR 24.2 lhr 24.6 lon 23.6 lhe2-pk.ark dia 195.0 KHI 35.0 qsin4 113.1 sin 130.6 lis-pt.ark mad 33.0 LIS 4.5 lis 3.9 mad1 18.0 lke2-us.ark dls 21.7 SEA 21.4 sea 15.1 sea 11.7 lun-zm.ark jnb 189.9 JNB 32.2 jnb 23.3 jnb 23.3 lwc-us.ark tul 18.0 MCI 1.4 xsjc1 106.3 dfw 10.4 lwc2-us.ark dfw 24.4 DFW 15.7 qlax1 46.9 dfw 16.5 mdw-us.ark cbf 25.7 ORD 7.0 ord 3.4 chi 4.4 med2-co.ark mrn 96.4 MIA 50.7 qbog1 22.7 mia 48.9 mhg-de.ark fra 30.7 FRA 19.1 fra 21.4 fra 18.4 mia-gc.ark mrn 19.7 MIA 1.1 mia 1.0 mia 0.6 mnl-ph.ark hkg 87.9 MNL 1.8 qsin1 50.8 hkg 75.8 mnz-us.ark iad 13.4 IAD 5.8 qiad3 5.2 rst1 4.0 mru-mu.ark jnb 206.1 JNB 43.1 mru 1.0 jnb 43.3 msy-us.ark dfw 34.4 DFW 21.9 qlax1 55.2 dfw 23.0 mty-mx.ark tul 44.1 MFE 4.5 qiad3 39.6 dfw 16.7 muc-de.ark fra 30.3 FRA 9.1 fra 8.8 fra 8.9 muc3-de.ark zrh 25.9 MUC 6.1 fra 11.0 fra 12.2 nap2-it.ark mil 37.8 MXP 19.8 fra 38.0 mil1 15.2 nbo-ke.ark jnb 220.5 NBO 2.4 nbo 2.2 jnb 60.7 nic-cy.ark mil 61.3 LCA 1.0 mrs 35.2 mrs1 35.4 nrn-nl.ark grq 8.6 AMS 5.5 ams 2.9 ams 3.1 nrt-jp.ark nrt 4.8 NRT 1.2 hkg 101.9 nrt2 1.0 nrt3-jp.ark nrt 6.5 NRT 3.5 pao 101.7 nrt2 3.5 oak5-us.ark lax 19.3 SJC 5.9 sjc0 4.4 pao 4.4 okc-us.ark dfw 9.6 MCI 9.6 dal 8.2 dfw 7.3 ord-us.ark cbf 13.0 ORD 2.9 iad 19.0 chi 1.6 ory4-fr.ark bru 6.5 CDG 2.0 cdg 1.4 cdg1 1.7 ory6-fr.ark bru 6.4 LHR 8.8 lhr 8.4 lon 8.6 ory7-fr.ark bru 7.7 CDG 4.0 cdg 5.0 cdg1 4.0 ory8-fr.ark bru 6.3 CDG 2.0 lhr 25.7 lon 9.4 osl-no.ark lpp 46.5 OSL 0.9 osl 25.8 sto1 9.0 pbh2-bt.ark bom 105.2 MAA 92.0 pbh 1.8 sin 91.7 per-au.ark syd 48.9 PER 2.0 per 1.2 syd 45.3 per2-au.ark syd 140.0 PER 2.5 per 1.5 syd 44.9 phl-us.ark iad 15.9 iad 14.0 rst1 15.9 pna-es.ark mad 33.9 BCN 13.5 bcn 13.3 mad1 19.3 prg-cz.ark fra 16.9 PRG 1.1 qbts1 5.9 prg1 0.6 prg2-cz.ark fra 29.8 PRG 1.7 qfra3 14.0 prg1 1.5 pry-za.ark jnb 167.3 JNB 1.2 jnb 0.6 cpt1 18.4 puw-ru.ark lpp 18.2 DME 1.5 beg 67.1 fra 34.3 pvu-us.ark lax 25.2 SLC 3.8 slc 2.7 sea 18.4 rdu-us.ark iad 9.5 IAD 8.1 iad 7.7 ash 8.6 rdu2-us.ark iad 11.3 IAD 8.1 iad 8.5 ash 8.6 rdu3-us.ark iad 27.5 IAD 18.6 iad 18.4 atl1 23.8 rkv-is.ark lhr 49.0 KEF 2.0 kef 0.8 dub1 23.0 san-us.ark lax 10.4 LAX 4.7 bur 87.2 lax 3.1 san2-us.ark lax 39.0 LAX 6.5 qlax1 7.5 lax 5.6 san4-us.ark lax 26.6 LAX 20.2 bur 106.8 lax 17.3 sao-br.ark gru 117.3 GRU 2.5 qgru1 1.3 rio1 9.6 scq-es.ark mad 31.7 MAD 11.0 mad 10.8 mad1 11.3 sea3-us.ark dls 10.1 SEA 3.9 pao 24.6 sea 3.0 sin-gc.ark sin 4.0 SIN 1.5 qsin1 93.5 sin 1.1 sin-sg.ark sin 3.2 SIN 2.3 qsin1 12.2 sin 0.7 sjc2-us.ark lax 15.2 SJC 1.4 sjc0 0.5 pao 2.1 sjj-ba.ark waw 71.1 BUD 57.5 vie 213.3 mil1 89.4 sjo-cr.ark chs 63.0 SJO 2.2 mia 51.8 mia 78.6 snn-ie.ark lhr 20.4 DUB 4.4 dub 3.9 dub1 3.9 sql-us.ark lax 19.6 SJC 2.2 pao 1.2 pao 0.9 stx-vi.ark chs 38.3 MIA 26.4 mia 25.5 mia 25.1 svo2-ru.ark lpp 22.4 DME 5.8 fra 41.1 fra 38.2 swu-kr.ark hkg 77.6 ICN 6.5 qsin1 76.9 nrt2 38.5 syd3-au.ark syd 4.8 SYD 0.9 syd 1.0 syd 0.5 tij-mx.ark lax 15.8 LAX 6.6 qlax1 6.6 lax 5.6 tlv-il.ark mil 80.2 MRS 43.6 tlv 2.0 mil1 50.5 tlv3-il.ark mil 64.2 TLV 2.1 tlv 2.1 tlv1 3.0 tnr-mg.ark JNB 215.7 jnb 199.1 tpe-tw.ark tpe 10.3 TPE 5.2 tpe 3.1 sin 74.4 vdp-dk.ark lpp 21.9 CPH 5.6 arn 13.4 cph1 5.4 vie-at.ark fra 22.5 FRA 13.8 vie 2.1 prg1 6.5 waw-pl.ark waw 20.1 HAM 31.3 qwaw2 1.3 wrw1 0.9 wbu-us.ark cbf 13.4 DEN 2.7 den 1.8 den1 1.5 wlg2-nz.ark syd 35.3 AKL 17.0 akl2 16.5 syd 32.4 ygk-ca.ark yyz 29.5 YYZ 16.7 iad 35.5 yyz 16.1 yyc-ca.ark dls 26.5 YYC 4.6 sea 21.8 yvr 16.7 zrh-ch.ark zrh 14.5 ZRH 1.1 zrh2 0.6 mil1 8.1 zrh2-ch.ark zrh 16.2 ZRH 1.1 zrh2 1.0 fra 6.8 zrh4-ch.ark zrh 15.8 ZRH 1.5 zrh2 0.8 mil1 9.0

Posted in Commentaries | No Comments »

ITDK 2024-02

April 23rd, 2024 by Matthew Luckie

CAIDA has released the 2024-02 Internet Topology Data Kit (ITDK), the 24th ITDK in a series published over the past 14 years. In the year since the 2023-03 release, CAIDA has expanded its Ark platform with both hardware and software vantage points (VPs), and re-architected the ITDK probing software. We have been busy modernizing the software to enable us to collect ITDK snapshots more regularly, as well as annotate the router-level Internet topology graph with more features.

For IPv4, the ITDK probing software is based primarily around two reliable alias resolution techniques. The first, MIDAR, probes for IPID behavior that suggests that responses from different IP addresses had IPID values derived from a single counter, and thus the addresses are assigned to the same router. This inference is challenging because of the sheer number of router addresses observed in macroscopic Internet topologies, and the IPID value is held in a 16-bit field, requiring sophisticated probing techniques to identify distinct counters. The second, iffinder, probes for common source IP addresses in responses to probes sent to different target IP addresses.

In the past few months, we have replaced the MIDAR and iffinder probing component on the Ark VPs to use alias resolution primitives present in scamper (specifically, the midarest, midardisc, and radargun primitives). We used the recently released scamper python module, and 902 lines of python, which executes on a single machine at CAIDA to coordinate the probing from many VPs.

The following table provides statistics illustrating the growth of the ITDK over the past year, driven by the expansion of Ark VPs. Overall, we increased the number of Ark VPs providing topology data from 93 to 142, the number of addresses probed from 2.6 to 3.6M, doubled the number of VPs that we use for alias resolution probing, and found aliases for 50% more addresses than a year ago. Note that we use the term “node” to distinguish between our router inferences, and the actual routers themselves. By definition all routers have at least two IP addresses; our “nodes with at least two IPs” are the subset of routers we were able to observe with that property.

	2023-03	2024-02
Input:
Number of addresses probed:	2.64M	3.58M
Number of ark VPs:	93	142
Number of countries:	37	52
Alias resolution:
Number of ark VPs for MIDAR:	55	101
Number of ark VPs for iffinder:	46	101
MIDAR + iffinder Output:
Nodes with at least two IPs:	75,660	107,976
Addresses in nodes with at least two IPs:	284,479	425,964
MIDAR, iffinder, SNMP Output:
Nodes with at least two IPs:	–	124,857
Addresses in nodes with at least two IPs:	–	515,524

For 2024-02, we also evaluated the gains provided by SNMPv3 probing, following a paper published in IMC 2021 that showed many routers return a unique SNMP Engine ID in response to a SNMPv3 request; the basic idea is that different IP addresses returning the same SNMPv3 Engine ID are likely aliases. Of the 3.58M addresses we probed, 672K returned an SNMPv3 response. We inferred that IP addresses belonged to the same router when they return the same SNMP Engine ID, the size of the engine ID was at least 4 bytes, the number of engine boots was the same, and the router uptime was the same; we did not use the other filters in section 4.4 of the IMC paper. This inferred 47,770 nodes with at least two IPs, many of which were shared with existing nodes found with MIDAR + iffinder. In total, when we combined MIDAR, iffinder, and SNMP probing, we obtained a graph with 124,857 nodes with at least two IPs, covering 515,524 addresses. We are including both the MIDAR + iffinder and MIDAR + iffinder + SNMP graphs in ITDK 2024-02.

Our ITDK also includes an IPv6 graph derived from speedtrap, which infers that IPv6 addresses belong to the same router if the IPID values in fragmented IPv6 responses appear to be derived from a single counter, and a graph derived from speedtrap and SNMP. For IPv6, the gains provided by SNMP are more significant, as the effectiveness of the IPv6 IPID as an alias inference vector wanes. Of the 929K IPv6 addresses we probed, 68K returned an SNMPv3 response.

	2023-03	2024-02
Input:
Number of addresses probed:	592K	929K
Number of ark VPs:	36	54
Number of countries:	18	25
Speedtrap output:
Nodes with at least two IPs:	4,945	4,129
Addresses in nodes with at least two IPs:	12,638	10,886
Speedtrap + SNMP output:
Nodes with at least two IPs:	–	8,935
Addresses in nodes with at least two IPs:	–	35,164

Beyond the alias resolution, the nodes are also annotated with their bdrmapIT-inferred operator (expressed as an ASN) as well as an inferred geolocation. We look up the PTR records of all router IP addresses with zdns, following CNAMEs where they exist, and provide these names as part of the ITDK. For router geolocation, we used a combination of DNS-based heuristics, IXP geolocation (routers connected to an IXP are likely located at that IXP), and Maxmind GeoLite2.

We inferred DNS-based geolocation heuristics using RTT measurements from 148 Ark VPs in 52 countries to constrain Hoiho, which automatically infers naming conventions in PTR records as regular expressions, and covered 819 different suffixes (e.g., ^.+\.([a-z]+)\d+\.level3\.net$ and ^.+\.([a-z]{3})\d+\.[a-z\d]+\.cogentco\.com$ extract geolocation hints in hostnames for Level3 and Cogent in the above figure). There is no dominant source of geohint observed in these naming conventions; 443 (54.1%) embedded IATA airport codes (e.g. IAD, WAS for the Washington D.C. area), 310 (37.9%) embedded place names (e.g. Ashburn for Ashburn, VA, US), 87 (10.6%) embedded the first six characters of a CLLI code (e.g. ASBNVA for Ashburn), and 12 (1.4%) embedded locodes (e.g. USQAS for Ashburn, VA, US). Interestingly, the operators that used CLLI and locodes had conventions that were more congruent with observed RTT values than operators that used IATA codes or place names. For the nodes in the ITDK, hoiho provided a geolocation inference for 127K, IXP provided a geolocation inference for 14K, and maxmind covered the remainder. The rules we inferred are usable via CAIDA’s Hoiho API.

ITDKs older than one year are publicly available, and ITDK 2024-02 is available to researchers and CAIDA members, after completing a simple form for access.

Acknowledgment: We are grateful to all of the Ark hosting sites, MaxMind’s freely available geolocation database, and academic research access to Iconectiv’s CLLI database to support this work.

Posted in Commentaries | No Comments »

CAIDA’s Annual 2024 Report

New Developments in 100 GB Anonymized Passive Traces

1. Transition to a New 100 GB Link

2. Complementary Datasets Based on User Feedback

Observing the DDoS Landscape Requires Collaboration

Collecting DDoS Datasets

Long-term Attack Trends Depend on the Viewpoint

Direct-path Attack Trends

Reflection-amplification Attack Trends

Why do Views on DDoS Differ?

Recommendations

AS Reachability Visualization

CAIDA’s 2023 Annual Report

Streamlining Access to BGP Routing Data

Finding the needle in the haystack

Use Case: Analyzing Prefix Propagation with PEERING Testbed, BGP2GO, and BGPStream

Step 1: Looking up the prefix and setting filters

Step 2: Stream selected files in the terminal

Getting Access to BGP2GO

Seeking Beta Users for 100 GB link Anonymized Passive Traces

Help CAIDA Refine and Enhance the FANTAIL Traceroute Analytics platform.

Understanding the deployment of public recursive resolvers

ITDK 2024-02

Recent Posts

Pages

Archives

CAIDA

Other recommended related feeds