{"id":3567,"date":"2016-06-03T18:02:39","date_gmt":"2016-06-04T01:02:39","guid":{"rendered":"http:\/\/blog.caida.org\/best_available_data\/?p=3567"},"modified":"2016-06-03T18:02:39","modified_gmt":"2016-06-04T01:02:39","slug":"toward-a-congestion-heatmap-of-the-internet","status":"publish","type":"post","link":"https:\/\/blog.caida.org\/best_available_data\/2016\/06\/03\/toward-a-congestion-heatmap-of-the-internet\/","title":{"rendered":"Toward a Congestion Heatmap of the Internet"},"content":{"rendered":"<p>In the past year, we have made substantial progress on a system to\u00a0measure congestion on interdomain links between networks. This effort is part of our <a href=\"http:\/\/www.caida.org\/funding\/nets-congestion\/\">NSF-funded project<\/a> on measuring interdomain connectivity and congestion. The basic\u00a0nugget of our technique is to send TTL-limited probes from a vantage point (VP) within a network, toward the near\u00a0and the far end of an interdomain (border) link of that network, and to monitor diurnal patterns in the\u00a0near and far-side time series. We refer to this method as &#8220;Time-Series Latency Probing&#8221;, or TSLP. Our hypothesis is that a persistently elevated RTT\u00a0to the far end of the link, but no corresponding RTT elevation to the near side, is a signal of\u00a0congestion at the interdomain link.<\/p>\n<p>It turns out that identifying interdomain links from a\u00a0VP inside a network is surprisingly challenging, for several reasons: lack of standard\u00a0IP address assignment practices for inter domain links; unadvertised address\u00a0space by ISPs; and myriad things that can go wrong with traceroute\u00a0measurements (third-party addresses, unresponsive routers). See our\u00a0<a href=\"http:\/\/www.caida.org\/publications\/papers\/2014\/challenges_inferring_interdomain_congestion\/\">paper at the 2014 Internet Measurement Conference<\/a> (IMC) for a description of\u00a0these issues. To overcome those challenges and identify network\u00a0borders from within a network, we have developed <em>bdrmap<\/em>, an\u00a0active measurement tool to accurately identify interdomain links\u00a0between networks. A paper describing the bdrmap algorithms is\u00a0currently under submission to IMC 2016.<\/p>\n<p>Our second major activity in the last year has been to develop a\u00a0backend system that manages TSLP probing from our set of distributed\u00a0vantage points, collects and organizes data, and presents that data\u00a0for easy analysis and visualization. A major goal of the backend\u00a0system is to be <em>adaptive<\/em>, i.e., the probing state should adapt\u00a0to topological and routing changes in the network. To this end, we run\u00a0the <em>bdrmap<\/em> topology discovery process continuously on each VP. Every\u00a0day, we process completed <em>bdrmap<\/em> runs from each monitor and add newly\u00a0discovered interdomain links or update the probing state for existing\u00a0links (i.e., destinations we can use to probe those links, and the\u00a0distance of those links from our VP). We then push updated probing\u00a0lists to the monitor. This adaptive process ensures that we always\u00a0probe a relatively current state of thousands of interdomain links visible\u00a0from our VPs.<\/p>\n<p>Third, we have greatly expanded the scale of our measurement system. We started this project in 2014 with an initial set of approximately ten VPs in 5-6\u00a0access networks mostly in the United States. We are now running congestion measurements from over sixty Archipelago VPs in 39 networks and 26 countries around the\u00a0world. Our Ark VPs have sufficient memory and compute power to run\u00a0both the border mapping process and the TSLP probing without any\u00a0issues. However, when we looked into porting our measurements to other\u00a0active measurement platforms such as <a href=\"http:\/\/projectbismark.net\/\">Bismark<\/a> or the FCC&#8217;s measurement\u00a0infrastructure operated by SamKnows, we found that the OpenWRT-based home routers were too\u00a0resource-constrained to run <em>bdrmap<\/em> and TSLP directly. To overcome this challenge, we developed a method to move the bulk of the resource-intensive processing from the VPs to a central controller at CAIDA,\u00a0so the VP only has to run an efficient probing engine (<a href=\"https:\/\/www.caida.org\/tools\/measurement\/scamper\/\">scamper<\/a>) with a small memory footprint and low CPU usage. We have deployed a test set of 15 Bismark home routers in this type of <em>remote<\/em> configuration, with lots of help from the folks at the Bismark Project. Our next target deployment will be a set of &gt;5000 home routers\u00a0that are part of the FCC-SamKnows Measuring Broadband America\u00a0infrastructure.<\/p>\n<p>A fourth major advance we have made in the last year is in visualization and\u00a0analysis of the generated time series data. We were on the lookout for\u00a0a time series database to store, process and visualize the TSLP data.\u00a0After some initial experimentation, we found <a href=\"https:\/\/influxdata.com\/\">influxDB<\/a> to be\u00a0well-suited to our needs, due to its ability to scale to millions of\u00a0time series, scalable and usable read\/write API, and SQL-like querying\u00a0capability. We also discovered <a href=\"http:\/\/grafana.org\/\">Grafana<\/a>, a graphing frontend that\u00a0integrates seamlessly with the influxDB database to provide\u00a0interactive querying and graphing capability. Visualizing\u00a0time series plots from a given VP to various neighbor networks and\u00a0browsing hundreds of time series plots is now possible with a few mouse\u00a0clicks on the Grafana UI. The figure below shows RTT data for 7 interdomain links between a U.S. access provider and a content provider over the course of a week. This graph took a few minutes to produce with influxDB and Grafana; previously this data exploration would have taken hours using data stored in standard relational databases.<\/p>\n<p>&nbsp;<\/p>\n<p><a href=\"http:\/\/blog.caida.org\/best_available_data\/wp-content\/uploads\/2016\/05\/dashboard_agg1.png\"><img loading=\"lazy\" class=\"alignnone  wp-image-3588\" src=\"http:\/\/blog.caida.org\/best_available_data\/wp-content\/uploads\/2016\/05\/dashboard_agg1-1024x446.png\" alt=\"dashboard_agg\" width=\"717\" height=\"312\" srcset=\"https:\/\/blog.caida.org\/best_available_data\/wp-content\/uploads\/2016\/05\/dashboard_agg1-1024x446.png 1024w, https:\/\/blog.caida.org\/best_available_data\/wp-content\/uploads\/2016\/05\/dashboard_agg1-300x130.png 300w\" sizes=\"(max-width: 717px) 100vw, 717px\" \/><\/a><\/p>\n<p>As the cherry on the cake, we have set up the entire system to provide\u00a0a near real-time view of congestion events. TSLP data is pulled off\u00a0our VPs and indexed into the influxDB database within 30 minutes of\u00a0being generated. Grafana provides an <em>auto-refresh<\/em> mode wherein we can\u00a0set up a dashboard to periodically refresh when new data is\u00a0available. There is no technical barrier to shortening the 30-minute\u00a0duration to an arbitrarily short duration, within reason. The figure below shows a pre-configured dashboard with the real-time\u00a0congestion state of interdomain links from 5 large access networks in\u00a0the US to 3 different content providers\/CDNs (network names\u00a0anonymized). Several graphs on that dashboard show a diurnal pattern\u00a0that signals evidence of congestion on the interdomain link. While\u00a0drawing pretty pictures and having everything run faster is certainly satisfying,\u00a0it is neither the goal nor the most challenging aspect of this\u00a0project. A visualization is only as good as the data that goes into\u00a0it. Drawing graphs was the easy part; developing a sustainable and\u00a0scalable system that will keep producing meaningful data was infinitely\u00a0more challenging. We are delighted with where we are at the moment,\u00a0and look forward to opening up the data exploration interface for\u00a0external users.<\/p>\n<p><a href=\"http:\/\/blog.caida.org\/best_available_data\/wp-content\/uploads\/2016\/05\/dashboard-ac.png\"><img loading=\"lazy\" class=\"alignnone  wp-image-3568\" src=\"http:\/\/blog.caida.org\/best_available_data\/wp-content\/uploads\/2016\/05\/dashboard-ac-986x1024.png\" alt=\"dashboard-ac\" width=\"723\" height=\"751\" srcset=\"https:\/\/blog.caida.org\/best_available_data\/wp-content\/uploads\/2016\/05\/dashboard-ac-986x1024.png 986w, https:\/\/blog.caida.org\/best_available_data\/wp-content\/uploads\/2016\/05\/dashboard-ac-289x300.png 289w, https:\/\/blog.caida.org\/best_available_data\/wp-content\/uploads\/2016\/05\/dashboard-ac-32x32.png 32w\" sizes=\"(max-width: 723px) 100vw, 723px\" \/><\/a><\/p>\n<p>So what happens next? We are far from done here. We are currently\u00a0working on data analysis modules for time series data with the goal of\u00a0producing alarms, automatically and without human intervention, that\u00a0indicate evidence of congestion. Those alarms will be input to a <em>reactive measurement system<\/em> that we have developed to distribute\u00a0on-demand measurement tasks to VPs. We envision different types of\u00a0reactive measurement tasks, e.g., confirming the latency-based\u00a0evidence of congestion by launching probes to measure loss rate,\u00a0estimating the impact on achievable throughput by running NDT tests,\u00a0or estimating potential impacts to user Quality of Experience\u00a0(QoE). The diagram below shows the various components of the\u00a0measurement system we are developing. The major piece that remains is\u00a0continuous analysis of the TSLP data, generating alarms, and pushing\u00a0on-demand measurements to the reactive measurement system. Stay tuned!<\/p>\n<p style=\"text-align: center;\"><a href=\"http:\/\/blog.caida.org\/best_available_data\/wp-content\/uploads\/2016\/06\/system-diagram.jpg\"><img loading=\"lazy\" class=\"alignnone  wp-image-3597\" src=\"http:\/\/blog.caida.org\/best_available_data\/wp-content\/uploads\/2016\/06\/system-diagram.jpg\" alt=\"system-diagram\" width=\"637\" height=\"478\" srcset=\"https:\/\/blog.caida.org\/best_available_data\/wp-content\/uploads\/2016\/06\/system-diagram.jpg 1024w, https:\/\/blog.caida.org\/best_available_data\/wp-content\/uploads\/2016\/06\/system-diagram-300x225.jpg 300w\" sizes=\"(max-width: 637px) 100vw, 637px\" \/><\/a><\/p>\n<p>The team: Amogh Dhamdhere, Matthew Luckie, Alex Gamero-Garrido,\u00a0Bradley Huffaker, kc claffy, Steve Bauer, David Clark<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In the past year, we have made substantial progress on a system to\u00a0measure congestion on interdomain links between networks. This effort is part of our NSF-funded project on measuring interdomain connectivity and congestion. The basic\u00a0nugget of our technique is to send TTL-limited probes from a vantage point (VP) within a network, toward the near\u00a0and the [&hellip;]<\/p>\n","protected":false},"author":10,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[1,5,10,9],"tags":[],"coauthors":[40],"_links":{"self":[{"href":"https:\/\/blog.caida.org\/best_available_data\/wp-json\/wp\/v2\/posts\/3567"}],"collection":[{"href":"https:\/\/blog.caida.org\/best_available_data\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.caida.org\/best_available_data\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.caida.org\/best_available_data\/wp-json\/wp\/v2\/users\/10"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.caida.org\/best_available_data\/wp-json\/wp\/v2\/comments?post=3567"}],"version-history":[{"count":26,"href":"https:\/\/blog.caida.org\/best_available_data\/wp-json\/wp\/v2\/posts\/3567\/revisions"}],"predecessor-version":[{"id":3602,"href":"https:\/\/blog.caida.org\/best_available_data\/wp-json\/wp\/v2\/posts\/3567\/revisions\/3602"}],"wp:attachment":[{"href":"https:\/\/blog.caida.org\/best_available_data\/wp-json\/wp\/v2\/media?parent=3567"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.caida.org\/best_available_data\/wp-json\/wp\/v2\/categories?post=3567"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.caida.org\/best_available_data\/wp-json\/wp\/v2\/tags?post=3567"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/blog.caida.org\/best_available_data\/wp-json\/wp\/v2\/coauthors?post=3567"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}