what percentage of traffic on the Internet is peer-to-peer file sharing?
February 8th, 2009 by kcI get this question as often as I get any question about the Internet. finally, a visiting intern Mia Zhang from Beijing Jiaotung University has done a thorough literature roundup, extracting the best available data pertinent to this question that she could find in the public domain.
Although we co-authored two papers [P-5] [P-6] on P2P traffic classification, we cannot give a single numeric answer to this question for Internet traffic today, for a variety of reasons. First, traffic is highly variable, across time, space, and users. Second, as the papers above suggest, there is no single standard method for determining that traffic is “peer-to-peer file sharing”. Finally, it’s near impossible to get representative traffic samples to a scientific forum where peer-review can occur. As a result, there are a lot of claims of peer-to-peer traffic levels and percentages, but none that could be used to justify changes in policies, regulations, or pricing models.
We do consider it scientifically relevant that the range of peer-to-peer file sharing traffic fractions observed is from 1.2% to 93% across the eighteen (of the 64) papers that provide such numbers, and that the average fractions reported in studies has increased considerably from 2002 to 2006 (Table 1). Table 2 and Table 3 show that results also vary considerably according to link location, and time (though according to the few studies reporting numbers, peer-to-peer traffic is heavier at night). One study [34] suggests that users use peer-to-peer applications more often at home than in the office. Finally, a study in Europe found considerably higher fraction of P2P traffic on a university link in Europe than some Canadian academics [34] found on their campus.
Many of these numbers are based on P2P detection via statistical or behavioral classification, not the most reliable method of detecting an application. More accurate methods involve examination of traffic contents, which is fraught with legal and privacy issues. Thus there is far too little data available to allow conclusive claims beyond “There is a wide range of P2P traffic on Internet links; see your specific link of interest and classification technique you trust for more details.”
[Note: Ipoque, which sells proprietary products that help classify Internet traffic, just released their third annual Internet traffic study, with the first study (in 2006) focused only on P2P traffic. In this latest 2008/2009 study, Ipoque seems to see a much narrower range of P2P traffic on their customers’ networks than indicated in publicly available Internet research papers. One cause of this discrepancy could be that Ipoque systems are more accurately identifying P2P traffic because they are examining traffic contents, which academic studies are typically unable to do. Since none of the data sets or classification algorithms used in the Ipoque study are available, we cannot reproduce their analyses or verify their claims.]