Robert Edmonds ()
22 October 2012
The Google SSTable file format is used internally to store Bigtable data. An SSTable provides a persistent, ordered immutable map from keys to values, where both keys and values are arbitrary byte strings. Operations are provided to look up the value associated with a specified key, and to iterate over all key/value pairs in a specified key range.
No standalone, reusable implementation for C and Python programmers
mtbl
ISC mtbl
: "immutable string table library"
Heavily inspired by SSTable implementation in LevelDB
Primitives for creating, searching, merging "MTBL
format" SSTables
Data block compression (choice of zlib
or snappy
)
libmtbl
: C shared library
pymtbl
: Python extension module
mtbl
Key/value entries are arbitrary byte arrays
No duplicate keys permitted
User provides custom entry merging function
Intended to be a component in custom sort-and-merge based workflows
mtbl
file layoutmtbl
interfaceswriter
reader
sorter
merger
mtbl
: writer
MTBL
" file on diskmtbl
: reader
MTBL
" filemtbl
: sorter
mtbl
: merger
mtbl
interface summarysorter
+ writer
to convert raw data to MTBL
filesmerger
+ reader
to query MTBL
filesmerger
+ writer
to periodically aggregate data, if possible or desireddnstable
ISC dnstable
: "encoding format, library, and utilities for passive DNS data"
Concrete application built on top of libmtbl
Used in ISC DNSDB service
dnstable
Search by name:
*.example
example.*
Inverse search:
dnstable
Each passive DNS payload contains multiple fields:
rrname
rrtype
rdata[]
bailiwick
time_first
time_last
count
(See ISC Passive DNS Architecture for details.)
dnstable
Serialize payloads into multiple key/value pairs
First byte of key specifies the entry type (1
, 2
, etc.):
ENTRY_TYPE_RRSET
ENTRY_TYPE_RRSET_NAME_FWD
ENTRY_TYPE_RDATA
ENTRY_TYPE_RDATA_NAME_REV
Keys packed for efficient lookups
(See dnstable-encoding(5)
manpage for details.)
dnstable
RRtype handling:
dnstable
MX
and SRV
handled specially for ENTRY_TYPE_RDATA
NS
, CNAME
, DNAME
, PTR
, MX
, SRV
generate ENTRY_TYPE_RDATA_NAME_REV
entriesDS
, RRSIG
, NSEC
, DNSKEY
, NSEC3
, NSEC3PARAM
, DLV
) stored in separate MTBL filesdnstable
: command-line toolsdnstable_convert
NMSG
) to key/value entries (MTBL
)dnstable_merge
dnstable
-encoded MTBL
files togetherdnstable_dump
dnstable
-encoded MTBL
file to text or JSON
dnstable_lookup
dnstable
: Python extension module>>> import dnstable
>>> d = dnstable.reader('dns.fileset')
>>> q = dnstable.query(dnstable.RDATA_IP, '149.20.64.42')
>>> for res in d.query(q):
... print res.to_json()
...
{"count": 1, "time_first": 1326465963, "rrtype": "A", "rrname": "mydots.net.", "rdata": ["149.20.64.42"], "time_last": 1326465963}
{"count": 6400537, "time_first": 1277382144, "rrtype": "A", "rrname": "isc.org.", "rdata": ["149.20.64.42"], "time_last": 1339929008}
{"count": 586095, "time_first": 1277353744, "rrtype": "A", "rrname": "www.isc.org.", "rdata": ["149.20.64.42"], "time_last": 1339928783}
{"count": 30, "time_first": 1288046214, "rrtype": "A", "rrname": "blog.isc.org.", "rdata": ["149.20.64.42"], "time_last": 1338907006}
{"count": 2508, "time_first": 1277492207, "rrtype": "A", "rrname": "f.root-servers.org.", "rdata": ["149.20.64.42"], "time_last": 1339920341}
>>>
dnstable
Inspecting a single MTBL
file with the mtbl_info
utility:
sql1c3:/srv/dnstable/mtbl# mtbl_info dns.201209.M.mtbl
file name: dns.201209.M.mtbl
file size: 29,439,550,003
index bytes: 196,620,025 (0.67%)
data block bytes 29,242,929,466 (99.33%)
data block size: 8,192
data block count 8,775,560
entry count: 1,570,389,949
key bytes: 95,944,795,146
value bytes: 13,276,862,800
compression algorithm: zlib
compactness: 26.95%
sql1c3:/srv/dnstable/mtbl#
dnstable
A single formatted key-value entry, from the dnstable_dump
utility:
sql1c3:/srv/dnstable/mtbl# dnstable_dump -j -r dns.201209.M.mtbl | head -1 | fmt
{"bailiwick": ".", "rrname": ".", "time_last": 1349031299,
"time_first": 1346432688, "count": 19087899, "rrtype": 2, "rdata":
["a.root-servers.net.", "b.root-servers.net.", "c.root-servers.net.",
"d.root-servers.net.", "e.root-servers.net.", "f.root-servers.net.",
"g.root-servers.net.", "h.root-servers.net.", "i.root-servers.net.",
"j.root-servers.net.", "k.root-servers.net.", "l.root-servers.net.",
"m.root-servers.net."]}
Same entry, raw byte values, from the mtbl_dump
utility:
sql1c3:/srv/dnstable/mtbl# mtbl_dump dns.201209.M.mtbl | head -1 | fmt
"\x00\x00\x02\x00\x14\x01a\x0croot-servers\x03net\x00\x14\x01b\x0croot-servers\x03net\x00\x14\x01c\x0croot-servers\x03net\x00\x14\x01d\x0croot-servers\x03net\x00\x14\x01e\x0croot-servers\x03net\x00\x14\x01f\x0croot-servers\x03net\x00\x14\x01g\x0croot-servers\x03net\x00\x14\x01h\x0croot-servers\x03net\x00\x14\x01i\x0croot-servers\x03net\x00\x14\x01j\x0croot-servers\x03net\x00\x14\x01k\x0croot-servers\x03net\x00\x14\x01l\x0croot-servers\x03net\x00\x14\x01m\x0croot-servers\x03net\x00"
"\xb0\xdd\x83\x82\x05\x83\xab\xa2\x83\x05\x9b\x84\x8d\x09"
No standalone, reuseable "SSTable" implementation available
Built "mtbl
? based on open source Google code
Used mtbl
as the file format for "dnstable
?to store and index passive DNS data