Active DNS

https://www.activednsproject.org/

DNS | Dataset | IP | Network

Historical DNS database. Access can be requested for academic use.

Actively queries many DNS records, e.g., .com zone. It can contain information not in DNSDB, if the information was never seen by a resolver. It does not contain all the information, as some domains may be unknown to the project and thus cannot be crawled. Likewise, it uses popular zones, domain lists (e.g., Alexa, blacklists) and other domain feeds.

They normally maintain a rolling 14-day window.

Copy files (for date 2017-10-05) (ddos@gladbeck):

sftp -B1024000 -C -rp "activedns@kokino.gtisc.gatech.edu:active-dns/20171005/" .

The data is encoded in AVRO format, which can also be parsed as JSONL. Python has a AVRO library. AVRO schema:

{
    "namespace": "astrolavos.avro",
    "type": "record",
    "name": "ActiveDns",
    "fields": [
        {"name": "date", "type": "string"},
        {"name": "qname", "type": "string"},
        {"name": "qtype", "type": "int"},
        {"name": "rdata", "type": ["string", "null"]},
        {"name": "ttl", "type": ["int", "null"]},
        {"name": "authority_ips", "type": "string"},
        {"name": "count", "type": "long"},
        {"name": "hours", "type": "int"},
        {"name": "source", "type": "string"},
        {"name": "sensor", "type": "string"}
    ]
}

Some more information about some fields that are unique to that schema. The IPs in Authority IP are the collection of the authority name server IPs that replied to our query. We gather all the IPs that gave us the same answer for an entire day and concatenate them on the same field, mostly in order to reduce the number of records that we have to keep. The only field that might be slightly confusing, is the "hours" field. This is a 24bit integer that encodes the time of day we saw this RR for date (for example, 000000000000000001000010 = 18:00 and 23:00). Another important thing to keep in mind, is NXDOMAINs. A resolved QNAME does not exist when both the rdata and ttl fields are equal to null. If rdata exists but ttl is null then the record was part of the glue of the DNS packet and not in the answer section.

A similar active DNS project is Open INTEL which seems to be larger in scope and the data is publicly available.