Sorted String Tables: ISC mtbl and ISC dnstable

Robert Edmonds ()

22 October 2012

Introduction

The Google SSTable file format is used internally to store Bigtable data. An SSTable provides a persistent, ordered immutable map from keys to values, where both keys and values are arbitrary byte strings. Operations are provided to look up the value associated with a specified key, and to iterate over all key/value pairs in a specified key range.

Implementations

Problem

No standalone, reusable implementation for C and Python programmers

mtbl

mtbl

mtbl file layout

mtbl interfaces

mtbl: writer

mtbl: reader

mtbl: sorter

mtbl: merger

mtbl interface summary

dnstable

dnstable

dnstable

dnstable

dnstable

dnstable: command-line tools

dnstable: Python extension module

>>> import dnstable
>>> d = dnstable.reader('dns.fileset')
>>> q = dnstable.query(dnstable.RDATA_IP, '149.20.64.42')
>>> for res in d.query(q):
...     print res.to_json()
...
{"count": 1, "time_first": 1326465963, "rrtype": "A", "rrname": "mydots.net.", "rdata": ["149.20.64.42"], "time_last": 1326465963}
{"count": 6400537, "time_first": 1277382144, "rrtype": "A", "rrname": "isc.org.", "rdata": ["149.20.64.42"], "time_last": 1339929008}
{"count": 586095, "time_first": 1277353744, "rrtype": "A", "rrname": "www.isc.org.", "rdata": ["149.20.64.42"], "time_last": 1339928783}
{"count": 30, "time_first": 1288046214, "rrtype": "A", "rrname": "blog.isc.org.", "rdata": ["149.20.64.42"], "time_last": 1338907006}
{"count": 2508, "time_first": 1277492207, "rrtype": "A", "rrname": "f.root-servers.org.", "rdata": ["149.20.64.42"], "time_last": 1339920341}
>>>

dnstable

Inspecting a single MTBL file with the mtbl_info utility:

sql1c3:/srv/dnstable/mtbl# mtbl_info dns.201209.M.mtbl
file name:             dns.201209.M.mtbl
file size:             29,439,550,003
index bytes:           196,620,025 (0.67%)
data block bytes       29,242,929,466 (99.33%)
data block size:       8,192
data block count       8,775,560
entry count:           1,570,389,949
key bytes:             95,944,795,146
value bytes:           13,276,862,800
compression algorithm: zlib
compactness:           26.95%

sql1c3:/srv/dnstable/mtbl# 

dnstable

A single formatted key-value entry, from the dnstable_dump utility:

sql1c3:/srv/dnstable/mtbl# dnstable_dump -j -r dns.201209.M.mtbl | head -1 | fmt
{"bailiwick": ".", "rrname": ".", "time_last": 1349031299,
"time_first": 1346432688, "count": 19087899, "rrtype": 2, "rdata":
["a.root-servers.net.", "b.root-servers.net.", "c.root-servers.net.",
"d.root-servers.net.", "e.root-servers.net.", "f.root-servers.net.",
"g.root-servers.net.", "h.root-servers.net.", "i.root-servers.net.",
"j.root-servers.net.", "k.root-servers.net.", "l.root-servers.net.",
"m.root-servers.net."]}

Same entry, raw byte values, from the mtbl_dump utility:

sql1c3:/srv/dnstable/mtbl# mtbl_dump dns.201209.M.mtbl | head -1 | fmt
"\x00\x00\x02\x00\x14\x01a\x0croot-servers\x03net\x00\x14\x01b\x0croot-servers\x03net\x00\x14\x01c\x0croot-servers\x03net\x00\x14\x01d\x0croot-servers\x03net\x00\x14\x01e\x0croot-servers\x03net\x00\x14\x01f\x0croot-servers\x03net\x00\x14\x01g\x0croot-servers\x03net\x00\x14\x01h\x0croot-servers\x03net\x00\x14\x01i\x0croot-servers\x03net\x00\x14\x01j\x0croot-servers\x03net\x00\x14\x01k\x0croot-servers\x03net\x00\x14\x01l\x0croot-servers\x03net\x00\x14\x01m\x0croot-servers\x03net\x00"
"\xb0\xdd\x83\x82\x05\x83\xab\xa2\x83\x05\x9b\x84\x8d\x09"

Summary