Computer Science Department
School of Computer Science, Carnegie Mellon University
Eliminating Machine Duplicity in Traceroute-based
One of the hurdles faced by Internet topology measurements is machines appearing within the induced topology many times, each time with a different IP address, sometimes many hops apart. Most topology measurements are based on traceroute, which may result in a machine responding with different IP addresses in different traceroutes. There are three major techniques known for finding or detecting pairs of IP addresses belonging to the same machine. However, two of the three techniques naively require quadratic packets in the number of IP addresses to test. This paper presents practical, scaleable algorithms for each technique, using three novel methods to divide the input set to make the quadratic techniques practical on large sets. For each technique, the error is analyzed, looking at both the source and amount of error the technique exhibits, as well as looking at how responsive machines on the Internet are to the technique.