All about Datasets

Content

.nl stats and data - SIDN Labs

DNS | DNSSEC | IP | Networks

Historic datasets (from 2014 onwards) for the .nl TLD. Datasets are available in JSON format.

Datasets cover information about:

  • DNS
    • Domain Names
    • Query Type
    • Resonse Codes
    • IPv6 Support
  • Resolvers
    • Location
    • Number of IP addresses
    • Validating Resolvers
    • Popular Networks
    • Port Randomness
  • DNSSEC
    • Validating Queries
    • DANE
    • Used Algorithms
  • Mail
    • Mail RRs
    • SPF Information

0day "In the Wild"

0day

Google Project Zero tracks a list of zero day exploits discovered in the wild. They track public resources to find uses of zero days and collect them in this spreadsheet. The spreadsheet contains data since 2014. Their blog provides an introduction and explanation of the spreadsheet.


Active DNS

DNS | IP | Networks

Historical DNS database. Access can be requested for academic use.

Activly queries many DNS records, e.g., .com zone. It can contain information not in DNSDB, if the information was never seen by a resolver. It does not contain all informatin, as some domains may be unknown to the project and thus cannot be crawled. It uses popular zones, domain lists (e.g., Alexa, blacklists) and other domain feeds.

They normally maintain a rolling 14-day window.

Copy files (for date 2017-10-05) (ddos@gladbeck):

sftp -B1024000 -C -rp "activedns@kokino.gtisc.gatech.edu:active-dns/20171005/" .

The data is encocded in AVRO format, which can also be parsed as JSONL. Python has a AVRO library. AVRO schema:

{
    "namespace": "astrolavos.avro",
    "type": "record",
    "name": "ActiveDns",
    "fields": [
        {"name": "date", "type": "string"},
        {"name": "qname", "type": "string"},
        {"name": "qtype", "type": "int"},
        {"name": "rdata", "type": ["string", "null"]},
        {"name": "ttl", "type": ["int", "null"]},
        {"name": "authority_ips", "type": "string"},
        {"name": "count", "type": "long"},
        {"name": "hours", "type": "int"},
        {"name": "source", "type": "string"},
        {"name": "sensor", "type": "string"}
    ]
}

Some more information about some fields that are unique to that schema. The IPs in Authority IP are the collection of the authority name server IPs that replied to our query. We gather all the IPs that gave us the same answer for an entire day and concatenate them on the same field, mostly in order to reduce the number of records that we have to keep. The only field that might be slightly confusing, is the "hours" field. This is a 24bit integer that encodes the time of day we saw this RR for date date (for example, 000000000000000001000010 = 18:00 and 23:00). Another important thing to keep in mind, is NXDOMAINs. A resolved QNAME does not exist when both the rdata and ttl fields are equal to null. If rdata exists but ttl is null then the record was part of the glue of the DNS packet and not in the answer section.


AVR Instruction Set

Cheatsheet | CTF

This websites provide reference documentation of the AVR instruction set, which is used for Arduino boards.


BGPlay

BGP | Networks | Tools

BGPlay shows a graph of the observed BGP routes. It allows to replay historical BGP announcements and displays route changes.

Documentation
Github


BGPmon Archive

BGP | Networks

Downloadable dataset of historic BGP information from different vantage points.


BGPStream (CAIDA)

BGP | Networks | Tools

An open-source software framework for live and historical BGP data analysis, supporting scientific research, operational monitoring, and post-event analysis.

BGP streams are freely accesible and provided by Route View, RIPE, and BGPmon.


BGPStream (OpenDNS)

BGP | Networks

BGP Stream is a free resource for receiving alerts about hijacks, leaks, and outages in the Border Gateway Protocol.

BGP Steam provides real-time information about BGP events. It includes information about affected IPs, ASNs, and even a replay feature how the BGP announcements changed.

A live alert bot also exists on Twitter.


Caida Datasets Overview

BGP | IP | Networks

Overview of datasets, monitors, and reports produced and organized by Caida. Also contains links to other datasets.


Censys

Certificates | DNS | IP | Networks

Censys performs regular scans for common protocols (e.g., DNS, HTTP(S), SSH). Provides a search for TLS certificates.

Access is free, but requires registration.

@InProceedings{censys15,
    author = {Zakir Durumeric and David Adrian and Ariana Mirian and Michael Bailey and J. Alex Halderman},
    title = {A Search Engine Backed by {I}nternet-Wide Scanning},
    booktitle = {Proceedings of the 22nd {ACM} Conference on Computer and Communications Security},
    month = oct,
    year = 2015
}

Certificate Search crt.sh

Certificates

Certificate search engine. crt.sh is based on the certificate transparency logs and provide wildcard search for domains.


Collection of "bad" packets in PCAPs

DNS | IP | Networks | PCAPs

Collection of "bad" packets in PCAPs that can be used for testing software.


Common Crawl

Networks

The Common Crawl project builds an openly accessible database of crawled websites. They index can be searched.


Computer Security Conference Ranking and Statistic

Paper | Security

This website offers a ranking of many computer security conferences. The ranking is accompanied by a yearly acceptance ratio statistic.


DDoS Mon

Amplification | Denial-of-Service | Networks

Provides a search interface to search for domain names and IP addresses under attacks. Shows results for the last 30 days. Provides an API, which requires special authorization.


DMAP Domain Mapper by SIDN Labs

DNS | Networks | Tools

DMAP is a scalable web scanning suit which supports DNS, HTTPS, TLS, and SMTP. It works based on domain names and crawls the domain for all supported protocols. The advantage over other tools is the unified SQL data model with 166 features and the easy scalability over many crawling machines.


DNS Authoritative Server Benchmarks

DNS | DNSSEC

The website is an ongoing project by Knot DNS to measure the performance of various DNS servers. Four open source servers are tested, namely BIND, Knot DNS, NSD, and PowerDNS. The benchmark includes different zone configurations matching to root zones, TLD zones, or hosting zones as well as different DNSSEC configurations.


DNS Privacy Project

DNS | DNSSEC

The DNS Privacy Project aims to improve privacy for users on the Internet.

The project is split into different groups working on DNS privacy:

The project focusses mostly on DNS over TLS. They provide overviews for the implementation status, configuration for test servers, and ongoing server monitoring which features they provide.


DNS Quality/Overview Tools

DNS | DNSSEC | Networks | Tools

Check My DNS

Browser-based DNS resolver quality measurement tool. Uses the browser to generate many resolver queries and tests for features they should have, such as EDNS support, IPv6, QNAME Minimisation, etc.

This test is also available as a CLI tool: https://github.com/DNS-OARC/cmdns-cli

DNSSEC Debugger

Analyze DNSSEC deployment for a zone and show errors in the configuration.

DNSViz

Gives an overview over DNSSEC delegations, response sizes, and name servers.

Github: https://github.com/dnsviz/dnsviz

DNS X-Ray

The website has an online test, which performs DNS lookups. These DNS lookups test if certain resource records are overwritten in the cache. The tool can then determine what DNS software is used, where the server is located, how many caches there are, etc.

EDNS Compliance Tester

Test name server of zones for correct EDNS support.

The Transitive Trust and DNS Dependency Graph Portal

Shows the trust dependencies in DNS. Given a domain name it can show how zones delegate to each other and why. The delegation is done between IP addresses and zones.

Root Canary Project

The project monitors the KSK rollover.

It provides statistics about support for DNSSEC algorithms. It has a web based test to test your own resolver and provides a live monitoring using the RIPA Atlas.


DNS Queries to Authoritative DNS Server at SURFnet by Google's Public DNS Resolver

DNS | Networks

This dataset covers approximately 3.5 billion DNS queries that were received at one of SURFnet's authoritative DNS servers from Google's Public DNS Resolver. The queries were collected during 2.5 years. The dataset contains only those queries that contained an EDNS Client Subnet.

The dataset covers data from 2015-06 through 2018-01.


DNSDB

DNS | Networks

Historical DNS database. Contains information recorded at recursive resolver about domain names, first/last seen, current bailiwick. Allows to see the lifetime of resource records and can be used as a large database.


DNSMON

DNS | Networks

Historical information about the reachability of root and some TLD name servers.


DNSSEC Deployment Reports

DNS | DNSSEC | Networks

Regularly updated reports about current DNSSEC deployment. Contains information per TLD and global distribution.


DNSSEC Early Warning System

DNS | DNSSEC

The website keeps track of all DNSSEC keys in the top level domains (TLDs) and informs when the signatures are about to expire. The time before some RRSIGs expire is color coded. It also shows error which happened during validation.


dnsstream (Twitter)

DNS | Networks

@dnsstream is a Twitter bot, which sends out notifications for important DNS changes of domains.

  • Potential DDoS attacks
  • Domains which link to know malicious IPs
  • Name server changes for a domain

dnsthought

DNS | DNSSEC | Networks

Dnsthought list many statistics about the resolvers visible to the .nl-authoritative name servers.


Domain Crawling Lists

DNS

Domain popularity lists provide a starting point for crawling domains with the most users. The most commonly used list for security research is the Alexa list.

  • Alexa
    The list is updated daily and contains one million websites. The ranking is based on page views, but very volatile.
  • CISCO Umbrella
    The list is updated daily and contains one million websites. The ranking is based on traffic seen on the OpenDNS resolvers.
  • Majestic
    The list is updated daily and contains one million websites. The ranking is based on backlinks from other websites.
  • Tranco
    A Research-Oriented Top Sites Ranking Hardened Against Manipulation
    The Tranco list aims to provide a better list for security research. The authors explain on their website and their paper what the flaws in the existing lists
  • Quantcast
    The list is updated daily and contains around 500,000 websites. It is based on users visiting the site within the previous month and highly US focussed.

gitignore Templates

These websites provide templates for good .gitignore files:


Google Transparency Report

Certificates

Google's Transparency Report contains various information. It provides information about email encryption, HTTPS encryption, information about potentially harmful applications in Android, and live reports of traffic disruptions, such as censorship.

It provides a certificate search based on the certificate transparency logs, similar to crt.sh: https://transparencyreport.google.com/https/certificates


IETF Officiel RFC Bibtex Downloads

Paper | TeX

The IETF now provides official bibtexs to download. They work for RFCs, BCPs, and drafts.

The bibtex for BCPs work, but only, if the BCP consist of a single RFC. If the BCP consists of multiple RFCs, the bibtex will only show the first one.

For drafts, the draft version number, the last two digits, have to be removed from the URL.

Examples:

Available entries can be found in the RFC Index and the BCP Index.


Intel Management Engine Partitions

The document lists and describes a large part of the Intel Management Engine Partitions. This is useful as a general resource to learn about the features of Intel ME.


Internet Maps (RIPE NCC)

DNS | Networks

Maps of measurements done with the RIPE Atlas.


IP to ASN Mapping (CIRCL LU)

Autonomous Systems Number | IP | Networks

Historical dataset about IP to ASN mappings.


IP to ASN Mapping (Cymru)

Autonomous Systems Number | IP | Networks

Historical dataset about IP to ASN mappings.


IPv6 Deployment Reports

IP | Networks

RIPE Report

Per continent, region, or country measurements of IPv6 deployment and preference. Allows to access historical data.

APNIC Report

Per continent, region, or country measurements of IPv6 deployment and preference.


IPv6 Hitlist Collection

IP | Networks

A curated list of IPv6 hosts, gathered by crawling different lists. Includes:

  • Alexa domains
  • Cisco Umbrella
  • CAIDA DNS names
  • Rapis7 DNS ANY and rDNS
  • Various zone files

Access to the full list requires registration by email.

Based on the paper "Scanning the IPv6 Internet: Towards a Comprehensive Hitlist".


IXP Pricing Overview

BGP | Networks

Contains a list of pricing information of different IXP.


libc database search

Reverse Engineering

Online interface to find a libc database by function offsets. It is powered by the libc-database repository.


Linux System Call Table

Cheatsheet | CTF | x86

These websites provided an overview over the Linux systemcall interface by listing the syscall numbers, their meanings, and their arguments.


List of Amplification Protocols

Amplification | Denial-of-Service | Networks

Contains a list of UDP-based protocols, which can be used for amplification attacks.


List of BGP Routing Datasets

BGP | Networks

Isolario

Isolario also provides historial routing data in MTR format for their route collectors. The data contains snapshots every two hours and updates with a granularity of five minutes.

Packet Clearing House (PCH)

The Packet Clearing House (PCH) publishes BGP data collected at more than 100 internet exchange points (IXP). The snapshot dataset contains the state of the routing tables in daily intervals.

PCH also provides raw routing data in MRT format. These contain all the update information in sorted by time.

Routing Information Service (RIS)

The RIS is the main resource from RIPE featuring all kinds of datasets about AS assignments and connectivity.

Routeviews

Routeviews is a project by the University of Oregon to provide live and historical BGP routing data.


List of Chrome CLI Switches

Cheatsheet

Most command line switches of Google Chrome are totally undocumented in the offical documentations. This website offers a list of all known switches with a single sentence description of what they are doing.


List of Default Passowrds

CTF | Passwords

The website features a large list of default passwords found in routers and IoT devices. The data is sorted by manufacturer and can be searched.


List of Looking Glasses Providing Traceroutes

Networks

The websites shows links to different looking glasses which provide either traceroute information or are usable as route servers.


Lists of DNS Blacklists

DNS | IP | Networks | Spam | Tools

These projects either operate DNS based Real-time Blackhole Lists (RBL) or allow checking if an IP is contained. The Multi-RBL websites are helpful in finding a large quantaty of RBLs.


Netlab 360 OpenData Project

Amplification | Networks | Tools

The Netlab of 360.com provides some open data streams.

One dataset concerns the number of abused reflectors per protocol.


netray.io Internet Observatory

Certificates | DNS | Networks

The Internet Observatory is a project by the RWTH Aachen University. It combines different scanning projects.

As of writing it contains information about:

  • DNS
  • HTTP2 and Server Push
  • QUIC
  • TCP Initial Window
  • Certificate Authority Aurthoization (CAA)

NetworkScan Mon

Amplification | Networks | Tools

Overview over IP addresses scanning the internet and which ports are scanned.


Open Resolver Scan

DNS | Networks

Open Resolver scanning project by the Shadowserver Foundation.


OpenIPmap RIPE

BGP | Networks | Tools

IP geolocation services feeding itself from geolocation databases, user provided locations, and most importantly active RTT measurements based on the RIPE Atlas system. It also provides a nice API to query the location. It provides a breakdown on where the results stem from and how much they contribute to the overall result.


Over The Wire: Wargames

CTF | Tools

Over The Wire provides with the wargames many different challenges, to learn exploitations of different things. There are different wargames based on skill and required tooling. In each level the user has to retrieve a flag to procede to the next level.


Passive DNS (CIRCL)

DNS | IP | Networks

Passive DNS dataset from circl.lu.


Passive SSL (CIRCL)

Certificates

Historical certificate dataset. Allows querying based on IP address or certificate.


PeeringDB

BGP | Networks

Contains information for some networks about peering information. This includes peering partnes, transfer speeds, peering requirements and similar.

Documentation


Privilege Escalation Cheatsheet (Vulnhub)

CTF

The repo contains a curated list of various ways to perform privilege escalation. It is sorted by different attack vectors.


Public DNS Server List

DNS

The website provides a currated list of various public DNS resolver operators and the IP addresses of the DNS servers.


Public Suffix List

DNS | Networks

The public suffix list gives a way to easily determine the effective second level domain, i.e., the domain which a domain owner registered and which can be under different owners.


RFC 8145 Root Trust Anchor Reports

DNS | DNSSEC

The root trust anchor reports show statistics how far the support for different root signing keys is in the resolver population. The data is collected using the trust anchor reporting specified in RFC 8145. There are graphs showing the distribution over time, combined for all root servers or split per letter, and a list of IP addresses which are only reporting support for outdated key signing keys (KSK).


RIPE Atlas

Certificates | DNS | IP | Networks

RIPE operates a set of probes, which can be used to send pings or similar measurements. The probes are mainly placed in Europe but some are also in other continents.

All the collected measurements can be found in the RIPE Atlas Daily Archives. The blog post gives some more details.


Root Servers

DNS | Tools

Overview page for the DNS root servers. It contains links to general news and all the supporting organizations.

The website features a map with all geographic locations. It contains information about locations, IPv4/IPv6 reachability and IP addresses.

Each root server has its own subdomain in the form of http://a.root-servers.org. It contains access to historical performance data like:

  • Size and time of zone updates
  • RCODE volume
  • query and response sizes for UDP and TCP
  • traffic volume (packets per time)
  • Unique sources

Routing Information Service (RIS)

BGP | DNS | Networks | Tools

Different information regarding reachability and connectiveness of ASs.


ROV Deployment Monitor

BGP | Networks

The Route Origin Validation (ROV) Deployment Monitor measures how many AS have deployed ROV. It uses PEERING for BGP annoucements and uses BGP monitors to see in which ASs the wrong announcements are filtered. A blogpost at APNIC describes it in more detail.


RPKI Browsers

Networks | RPKI | Tools

These websites allow you to browser the valid RPKI announcements. They show which address ranges are covered by RPKI and who the issuing authority is.


scans.io Internet-Wide Scan Data Repository

Certificates | DNS | IP | Networks

A list of Internet scans for free to download. Some of the data is historical, some scans are still actively updated.

Links to a downloadable list of the Alexa top 1 million.


Shodan

Certificates | DNS | IP | Networks

Shodan performs regular scan on common ports.

Access is free, but requires registration. More results can be gained with a paid account.


System Security Cirrcus

Paper | Security

The System Security Cirrcus by Davide Balzarotti presents many statistics about the Top-4 security conferences, such as authors and affiliations.


TeleGeography Map Gallery

Networks

TeleGeography provides different maps about the Internet. They contain information about submarine cables, global traffic volume, latency, internet exchange points. The data for the Submarine Map and the Internet Exchange Map can also be found on Github in text format.


vizAS

BGP | Networks | Tools

vizAS by APNIC shows the connectiveness between different ASs split by countries. It is usefull to find the ASs which are most central in the graph.


Vulnerable (Docker) Containers

CTF | Docker

The website lists docker containers from Docker Hub with known vulnerabilities in it. The top 1000 docker containers from Docker Hub are regurarly scanned with Trivy and the results reported here.

A similar tool to scan for vulnerable containers is Clair scanner.


World Country Information

Many different metadata about countries, such as name, country code, languages. It also has a geojson of the country outline and the flags.


x86 Instruction Set

Cheatsheet | x86 | CTF

These websites provide reference documentation of the x86 instruction set: