{"id":1597,"date":"2012-04-04T08:42:05","date_gmt":"2012-04-04T15:42:05","guid":{"rendered":"http:\/\/blog.caida.org\/best_available_data\/?p=1597"},"modified":"2012-04-10T16:59:53","modified_gmt":"2012-04-10T23:59:53","slug":"targeted-serendipity-the-search-for-storage","status":"publish","type":"post","link":"https:\/\/blog.caida.org\/best_available_data\/2012\/04\/04\/targeted-serendipity-the-search-for-storage\/","title":{"rendered":"Targeted Serendipity: the Search for Storage"},"content":{"rendered":"<p>On the heels of our <a href=\"http:\/\/blog.caida.org\/best_available_data\/2012\/03\/28\/internet-censorship-revealed-through-the-haze-of-malware-pollution\/\">recent press release<\/a> regarding fresh publications that\u00a0 make use of the UCSD Network Telescope data, we would like to take a moment to thank the institutions that have helped preserve this data over the last eight years. Though we recently received an <a title=\"(NSF CNS-1059439) CRI-Telescope: A Real-time Lens into Dark Address Space of the Internet \" href=\"http:\/\/www.caida.org\/funding\/cri-telescope\/\">NSF award<\/a> to enable\u00a0 near-real-time sharing of this data as well as improved classification, the award does not cover the cost to maintain this historic archive. At current <a href=\"http:\/\/rci.ucsd.edu\/services\/storage.html\">UCSD rates<\/a>, the 104.66 TiB would cost us approximately $40,000 per year to store. This does not take into account the metadata we have collected which adds roughly 20 TB to the original data. \u00a0As a result, we had spent the last several months indexing this data in preparation for deleting it forever.<\/p>\n<p>Then, last month, I had the opportunity to attend the <a title=\"Security at the Cyberborder Workshop\" href=\"https:\/\/scholarworks.iu.edu\/dspace\/handle\/2022\/14070\">Security at the Cyberborder Workshop<\/a> in Indianapolis. This workshop focused on how the <a href=\"http:\/\/www.nsf.gov\/pubs\/2009\/nsf09564\/nsf09564.htm\">NSF-funded<\/a> <a href=\"http:\/\/irnclinks.net\/\">IRNC networks<\/a> might\u00a0(1) capture and articulate technical and policy cybersecurity considerations related to international research network connections, and (2) capture opportunities and challenges for the those connections to foster cybersecurity research.\u00a0 I did not expect to find a new benefactor for storage of our telescope data at the workshop though, in fact, I did.<\/p>\n<p><!--more--><br \/>\nDuring the workshop, I mentioned to the group that\u00a0 we were preparing to purge historic darknet data for lack of funds to pay for storage. Upon hearing of our plans to delete\u00a0 the data, a NERSC System Administrator offered to store the data in NERSC&#8217;s tape archive. He understood the relevance of the data to cybersecurity research and the value of longitudinal analysis on this fairly rare and unique data type. In less than a month, we had accounts and began the work of moving the data.<\/p>\n<p>CAIDA would like to thank the San Diego Supercomputer Center for archiving the UCSD Network Telescope data since 2003. The IBM HPSS\u00a0 and more recently Sun SamQFS archival storage systems dutifully preserved and delivered the 100+ Terabytes of raw pcap traces we have archived over the last eight years.<\/p>\n<p>We would also like to thank the National Energy Research Scientific Computing Center (NERSC) and ESnet for the resources that\u00a0 allowed us to continue to preserve this data. On 22 March 2012, we started the transfer via ESnet shown in-flight in Figure 1\u00a0 to\u00a0 the NERSC HPSS facilities. The transfer completed in roughly one week&#8217;s time and sustained an average of 1.52 Gbps limited by local host disk I\/O.<\/p>\n<div id=\"attachment_1698\" style=\"width: 310px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/blog.caida.org\/best_available_data\/wp-content\/uploads\/2012\/03\/120TB-transfer-March2012-SDSC-to-NERSC.png\"><img aria-describedby=\"caption-attachment-1698\" loading=\"lazy\" class=\"size-medium wp-image-1698 \" title=\"120 TB Transfer from SDSC to NERSC via ESnet\" src=\"http:\/\/blog.caida.org\/best_available_data\/wp-content\/uploads\/2012\/03\/120TB-transfer-March2012-SDSC-to-NERSC-300x187.png\" alt=\"120 TB Transfer from SDSC to NERSC via ESnet\" width=\"300\" height=\"187\" srcset=\"https:\/\/blog.caida.org\/best_available_data\/wp-content\/uploads\/2012\/03\/120TB-transfer-March2012-SDSC-to-NERSC-300x187.png 300w, https:\/\/blog.caida.org\/best_available_data\/wp-content\/uploads\/2012\/03\/120TB-transfer-March2012-SDSC-to-NERSC-1024x640.png 1024w, https:\/\/blog.caida.org\/best_available_data\/wp-content\/uploads\/2012\/03\/120TB-transfer-March2012-SDSC-to-NERSC.png 1356w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><p id=\"caption-attachment-1698\" class=\"wp-caption-text\">Figure 1. 100+ TB transfer of UCSD Network Telescope data from SDSC to NERSC via ESnet.<\/p><\/div>\n<p>Figure 2 below presents an interesting heat map visualization of the data collection volume. Each vertical bar represents one day of data, while the horizontal bands represent the size of a compressed file (in pcap format) of captured traffic for an hour of the day. We color each data point\u00a0based on its deviation from the median hourly captured traffic file size. Specifically, an hour equal to the median file size we color red. Hours with (compressed) traffic volumes at twice (or more) of the median  we color yellow, and hours with no data appear black. So, hotter colors mean more data. Data collection on the telescope is a best-effort service &#8212; outages show up as vertical black bars. This plot also reveals an increase in the amount of data stored after April 2009 due to the removal of an upstream rate limit filter on incoming packets (We removed that filter in the wake of the advent of the Conficker worm, in order to <a href=\"http:\/\/www.caida.org\/research\/security\/ms08-067\/conficker.xml\">study it<\/a>.) The color changes in the heat map also show the diurnal variation in traffic volume, although since this type of traffic originates from most time zones of the world, the &#8220;busy hour&#8221; is not sharply delineated.<\/p>\n<div id=\"attachment_1654\" style=\"width: 410px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/blog.caida.org\/best_available_data\/wp-content\/uploads\/2012\/03\/2012-03-29.nersc_.telescope.pcap_archive.heatmap.png\"><img aria-describedby=\"caption-attachment-1654\" loading=\"lazy\" class=\"size-medium wp-image-1654 \" title=\"Heatmap of the UCSD Network Telescope data\" src=\"http:\/\/blog.caida.org\/best_available_data\/wp-content\/uploads\/2012\/03\/2012-03-28.samqfs.telescope.pcap_archive.heatmap2-300x120.png\" alt=\"Heatmap of the UCSD Network Telscope data\" width=\"400\" height=\"200\" \/><\/a><p id=\"caption-attachment-1654\" class=\"wp-caption-text\">Figure 2. Heatmap of the UCSD Network Telescope data.<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>On the heels of our recent press release regarding fresh publications that\u00a0 make use of the UCSD Network Telescope data, we would like to take a moment to thank the institutions that have helped preserve this data over the last eight years. Though we recently received an NSF award to enable\u00a0 near-real-time sharing of this [&hellip;]<\/p>\n","protected":false},"author":6,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[1,5,12],"tags":[],"coauthors":[],"_links":{"self":[{"href":"https:\/\/blog.caida.org\/best_available_data\/wp-json\/wp\/v2\/posts\/1597"}],"collection":[{"href":"https:\/\/blog.caida.org\/best_available_data\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.caida.org\/best_available_data\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.caida.org\/best_available_data\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.caida.org\/best_available_data\/wp-json\/wp\/v2\/comments?post=1597"}],"version-history":[{"count":81,"href":"https:\/\/blog.caida.org\/best_available_data\/wp-json\/wp\/v2\/posts\/1597\/revisions"}],"predecessor-version":[{"id":1900,"href":"https:\/\/blog.caida.org\/best_available_data\/wp-json\/wp\/v2\/posts\/1597\/revisions\/1900"}],"wp:attachment":[{"href":"https:\/\/blog.caida.org\/best_available_data\/wp-json\/wp\/v2\/media?parent=1597"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.caida.org\/best_available_data\/wp-json\/wp\/v2\/categories?post=1597"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.caida.org\/best_available_data\/wp-json\/wp\/v2\/tags?post=1597"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/blog.caida.org\/best_available_data\/wp-json\/wp\/v2\/coauthors?post=1597"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}