The CAIDA DNS root/gTLD RTT Dataset
The CAIDA DNS root/gTLD RTT Dataset
This dataset contains DNS roundtrip time (RTT) information useful for studying conditions within the global Internet, and the way it has changed over the last few years. DNS RTTs are influenced by several factors, including remote server loading, congestion within Internet routes, route changes, and local effects such as link or equipment failures.
Domain nameservers such as BIND use various algorithms to select which of the 13 root or gTLD servers they will ask to resolve a top-level domain name. By watching requests to root/gTLD at the boundary of a campus (i.e. large enterprise) network, and the responses to them, one can make passive measurements of the RTT to all of the root/gTLD servers which the site has routes to.
This data was collected using NeTraMet, producing daily data files in NeTraMet's flow data file format, from several different sites. The dataset contains historical data for four sites:
Location | Start date | End date |
---|---|---|
University of California, San Diego (UCSD) | 8 Jan 2002 | 9 Aug 2003 |
University of Colorado (CU) | 8 Jan 2003 | 4 Apr 2008 |
WIDE (Keio Univ. Shonan Fujisawa Campus) | 13 Jul 2005 | 15 Jun 2009 |
WIDE (The University of Tokyo) | 1 Jun 2004 | 27 Sep 2009 |
Currently one site is actively collecting data:
Location | Start date | End date |
---|---|---|
University of Auckland (UA) | 11 Nov 2002 | 05 Jan 2012 |
19 Feb 2014 | (ongoing) |
The data was collected at five-minute intervals using an SRL (RFC 2723) ruleset called dns-root.srl (available with the data).
A web page allowing you to examine 5-minute median RTT values over periods of 1 to 7 days appears at: https://cgi.caida.org/cgi-bin/dns_perf/main.pl
The NeTraMet flow data file layout is described in https://www.caida.org/catalog/software/netramet/download/ntm43.pdf
Flow data files contain 2 types of lines:
- control record lines, these start with a # (pound-sign)
- flow data records
Each file begins with a header giving details of when the file was created, and the format of its data records. The format is described by a #Format record, listing the RTFM attributes requested in the dns-root ruleset.
In this dataset, the NeTraMet meters were read every five minutes; flow data for each five-minute interval appears between #Time and #Enddata records in the files.
The data was collected using NeTraMet's ToTurnaround attribute, i.e. each flow record contains an RTT distribution for one root or gTLD, as described in https://catalog.caida.org/paper/2002_nsrtd/
If there are less than 100 (later 120) data points, their actual values are saved in the flow data record. Otherwise the data is binned, using bounds generated from the data. This approach preserves the maximum accuracy for the RTT values.
NeTraMet versions 51b4 (21 Sep 04) through 51b6 (15 Mar 05) had a bug in its handling of dynamic distributions.
If such a distribution had 120 or less data values, the distribution is written in the flow data file as a type 5 distribution, i.e. a list of actual values. These are correct.
However, if there were more than 120 values, NeTraMet computes lower and upper limits, and uses them to produce a binned distribution, type 6. The values are bin counts, those counts are correct. Unfortunately, the bug corrupted the computed upper limit, producing distributions which always had an upper limit of 7000 (700 ms). To produce actual values for the bin upper edges, one needs correct limit values - these are incorrect, and should NOT be used. The (corrupted) type 6 distributions remain in the .dif files, they have not been removed.
The affected date ranges are:
UC San Diego | (not affected) |
Auckland and Colorado | Mon 15 Sep 2004 through Wed 16 Mar 2005 |
Tokyo and Fujisawa | Mon 15 Sep 2004 through Wed 9 Nov 2005 |
The Auckland and Colorado data doesn't have very many type 6 distributions, so this isn't too much of a problem. For the WIDE data (Tokyo and Fujisawa), the root distributions seem mostly unaffected, but the gTLD distributions have very high counts per 5-minute reading interval, so there is almost no usable gTLD RTT data from the WIDE NeTraMet meter while the buggy version(s) were running.
A sample perl program, dnsroot_to_dat.pl, is provided to demonstrate one approach to extracting RTT distributions from the flow data files.
Acceptable Use Agreement
Please read the terms of the CAIDA Acceptable Use Agreement (AUA) for Publicy Accessible Datasets below:
When referencing this data (as required by the AUA), please use:
The CAIDA UCSD DNS root/gTLD RTT Dataset,You are required to report your publications using this dataset to CAIDA.
https://www.caida.org/catalog/datasets/dns_root_gtld_rtt_dataset
Data Access
- Access the publicly available CAIDA DNS root/gTLD RTT Dataset
References
For more information on Netramet, see:
Acknowledgments
Thanks to:
- The University of Auckland, for their support in developing early versions of NeTraMet and the standards effort leading up to the RTFM system.
- CAIDA for their ongoing support of NeTraMet development, making it a more versatile research tool.
- The WAND group at Waikato University for support in making NeTraMet work with early versions DAG cards.
- Endace Technologies for support with newer versions of DAG cards.
Special thanks (mostly for their help in setting up and maintaining the NeTraMet meters) to :
- Russell Fulton (U Auckland)
- CAIDA (Duane Wessels, Dan Andersen)
- U Colorado (Robert Roybal)
- WIDE (Yuji Sekiya, Kenjiro Cho)