All about Dataset

Content

.nl stats and data - SIDN Labs0day "In the Wild"AMP-Research: Amplification ResearchAPNIC Labs StatsAVR Instruction SetAkamai DNS Trends and TrafficAlexa Top 1 Million Security AnalysisBGP Looking GlassBGPStream (CAIDA)BGPStream (OpenDNS)BGPlayBinary Hardening in IoT ProductsBitcoin MonitoringCAIDA BGP Hijacking ObservatoryCIRCL hashlookupCTF ArchivesCZ.NIC StatisticsCaida Datasets OverviewCensored PlanetCensysCertificate Search crt.shCitizenlab Censorship Test ListsCloudflare RadarCollection of "bad" packets in PCAPsCommon CrawlComputer Security Conference Ranking and StatisticCorona Dashboards for Germany and EuropaCorona Dashboards for SaarlandCorona Vacine DashboardsDDoS MonDEF CON CTF ArchiveDMAP Domain Mapper by SIDN LabsDNS Authoritative Server BenchmarksDNS Census 2013DNS CoffeeDNS Privacy ProjectDNS Quality/Overview ToolsDNS Queries to Authoritative DNS Server at SURFnet by Google's Public DNS ResolverDNSDBDNSMONDNSSEC Deployment ReportsDNSSEC Early Warning SystemDer Deutschlandatlas: Deutschland neu vermessenDomain Crawling ListsDomain Name System (DNS) ParametersDuckDuckGo Tracker RadarForward DNS Rapid7GitHub Advisory DatabaseGlobal Security Database (GSD)Google Transparency ReportHTTP Status CodesHurricane Electric Submarine Cable MapICANN Managed Root Servers StatisticsICLab DataIETF Officiel RFC Bibtex DownloadsIP Abuse ListsIP Flow Information Export (IPFIX) EntitiesIP to ASN Mapping (CIRCL LU)IP to ASN Mapping (Cymru)IPmap RIPEIPv6 Deployment ReportsIPv6 Hitlist CollectionIXP Pricing OverviewIntel Management Engine PartitionsInternet Maps (RIPE NCC)Is BGP safe yet?Known Exploited Vulnerabilities CatalogLinux System Call TableList of Amplification ProtocolsList of BGP Routing DatasetsList of Chrome CLI SwitchesList of Default PasswordsList of Looking Glasses Providing TraceroutesList of Network Speed TestsLists of DNS BlacklistsMalware BazaarManchester Academic PhrasebankMeasurement Factory: DNS SurveyMultipath TCP Measurement ServiceNIST RPKI MonitorNetlab 360 OpenData ProjectNetworkScan MonOnline Hash CrackersOpen INTELOpen Observatory of Network Interference (OONI)Open Source Vulnerabilities (OSV)Over The Wire: WargamesPassive DNS (CIRCL)Passive SSL (CIRCL)PeeringDBPrivilege Escalation Cheatsheet (Vulnhub)Public DNS Server ListPublic Suffix ListRIPE AtlasRIPEstat: Providing open data and insights for Internet resourcesROV Deployment MonitorRPKI BrowsersRSSAC002 DNS Root Server DataRegex to parse router hostnamesRoot ServersRouting Information Service (RIS)Shadowserver Scanning ProjectShodanSystem Security CirrcusTeleGeography Map GalleryTransient Execution AttacksTrickest CVE PoCVulnerable (Docker) ContainersWAND Active Measurement ProjectWorld Country InformationW³Techs SurveysZonefiles: Domain Listscaniuse.rsdnsdumpsterdnsstream (Twitter)dnsthoughtgitignore Templatesioda: Internet Outage Detection and Analysislibc Database SearchvizASx86 Instruction Set

.nl stats and data - SIDN Labs

 https://stats.sidnlabs.nl/en/

DNS | DNSSEC | Dataset | IP | Network

Historic datasets (from 2014 onwards) for the .nl TLD. Datasets are available in JSON format.

Datasets cover information about:

  • DNS
    • Domain Names
    • Query Type
    • Resonse Codes
    • IPv6 Support
  • Resolvers
    • Location
    • Number of IP addresses
    • Validating Resolvers
    • Popular Networks
    • Port Randomness
  • DNSSEC
    • Validating Queries
    • DANE
    • Used Algorithms
  • Mail
    • Mail RRs
    • SPF Information




APNIC Labs Stats

 https://stats.labs.apnic.net/

Autonomous System | BGP | DNS | DNSSEC | Dataset | IP

APNIC gathers many statistics and offers them on their website. However, they provide way more data than it might initially look like, since many of the datasets are not linked from their main page.







BGPStream (OpenDNS)

 https://bgpstream.com/

Autonomous System | BGP | Dataset | Network

BGP Stream is a free resource for receiving alerts about hijacks, leaks, and outages in the Border Gateway Protocol.

BGP Steam provides real-time information about BGP events. It includes information about affected IPs, ASNs, and even a replay feature how the BGP announcements changed.

A live alert bot also exists on Twitter.



Binary Hardening in IoT Products

 https://cyber-itl.org/2019/08/26/iot-data-writeup.html

Dataset

Detailed analysis on a 10 year dataset of IoT binaries and their security features. The Cyber ITL focussed on which compiler and toolchain hardenings the vendors use.

CITL identified a number of important takeaways from this study:

  • On average, updates were more likely to remove hardening features than add them.
  • Within our 15 year data set, there have been no positive trends from any one vendor.
  • MIPS is both the most common CPU architecture and least hardened on average.
  • There are a large number of duplicate binaries across multiple vendors, indicating a common build system or toolchain.




CIRCL hashlookup

 https://hashlookup.circl.lu/

Dataset | Malware

Lookup files by their md5 or sha1 hashes. The response contains information such as the filename, size or where the file was found, like a Linux package. On the website you have the API documentation which can be used directly from the browser.



CZ.NIC Statistics

 https://stats.adam.nic.cz/dashboard/en/index.html

DNS | Dataset

The website contains information about the cz. TLD operated by CZ.NIC. It contains information about the query volume, query type, round-trip time (RTT) and geographic location of the traffic sources. It also has information about the registry functions, such as registrar information, domain transfers or whois requests. Lastly, information about the mojeID accounts, a login provider operated by CZ.NIC are also available.



Censored Planet

 https://data.censoredplanet.org/raw

Censorship | Dataset

Censored Planet is a censorship measurement platform that collects data using multiple remote measurement techniques in more than 200 countries.

The website provides access to many recent scans. The scans are performed using different techniques to find different censors.


Censys

 https://censys.io/

Certificate | DNS | Dataset | IP | Network

Censys performs regular scans for common protocols (e.g., DNS, HTTP(S), SSH). Provides a search for TLS certificates.

Access is free, but requires registration. The website no longer provides free bulk access. Bulk access requires a commercial or a research license. The free access is limited to 1000 API calls per day.

@InProceedings{censys15,
    author = {Zakir Durumeric and David Adrian and Ariana Mirian and Michael Bailey and J. Alex Halderman},
    title = {A Search Engine Backed by {I}nternet-Wide Scanning},
    booktitle = {Proceedings of the 22nd {ACM} Conference on Computer and Communications Security},
    month = oct,
    year = 2015
}



Citizenlab Censorship Test Lists

 https://github.com/citizenlab/test-lists

Censorship | Dataset

The GitHub repository contains multiple lists for finding website censorship. The lists are organized by country and contain URLs specific for each of them. The URLs are also categoried and cover four broad themes:

  • Political, e.g., governmental views or human rights
  • Social, e.g., sexuality or gambling
  • Conflicts, e.g., armed conflicts or border displutes
  • Internet tools, e.g., hosting providers or circumvention methods.


Cloudflare Radar

 https://radar.cloudflare.com/

DDoS | Dataset | IP | Network

Cloudflare Radar is Cloudflares reporting website about internet trends and general traffic statistics. The website shows information about observed attacks and attack types and links to the DDoS report. General traffic statistics are reported, such as the used browser, fraction of human traffic, IP, HTTP, and TLS version.

The website also provides more detailed information for domains and IP addresses. Domains have information about age, popularity, and visitors. IP addresses have ASN and geolocation information.

More information about Cloudflare Radar are available in the introduction blogpost.





Corona Dashboards for Germany and Europa

Dataset

Robert Koch-Institut Official German dashboard.

Robert Koch-Institut Lagebericht Daily situational report about the state in Germany. Contains additional information about the situation in Germany and additional statistics.

COVID Trends Germany Daily updated dashboard with many graphs for Germany.

Berliner Morgenpost Shows sub-country numbers for Europe and worldwide.

WHO European Region Country level information for Europe.

WHO European Region Subnational Explorer Subnation information for Europe with incidence rates over the last 7/14 days.

Johns Hopkins University Contains worldwide information.

ECDC COVID-19 Country Overviews Very detailed breakdown for countries worldwide.

ECDC Europe Weekly updated incidence and test positivity rates within Europe.

ECDC Worldwide Daily updated worldwide numbers with by-region breakdowns.

Reuters Provides per country and regionally aggregated information.









DNS Coffee

 https://dns.coffee/

DNS | Dataset | IP | Network | Search

DNS Coffee collects and archives stats from DNS Zone files in order to provide insights into the growth and changes in DNS over time.

The website includes information such as the size of different zones. It track over 1200 zone files.

It provides searching through the zones files based on domain names, name servers, or IP addresses. It can also visualize the relationship between a domain, the parent zones and the name server in what they call a "Trust Tree".



DNS Quality/Overview Tools

DNS | DNSSEC | Dataset | Network | Tool

Check My DNS

Browser-based DNS resolver quality measurement tool. Uses the browser to generate many resolver queries and tests for features they should have, such as EDNS support, IPv6, QNAME Minimisation, etc.

This test is also available as a CLI tool: https://github.com/DNS-OARC/cmdns-cli

DNSSEC Debugger

Analyze DNSSEC deployment for a zone and show errors in the configuration.

DNSViz

Gives an overview over DNSSEC delegations, response sizes, and name servers.

GitHub: https://github.com/dnsviz/dnsviz

DNS X-Ray

The website has an online test, which performs DNS lookups. These DNS lookups test if certain resource records are overwritten in the cache. The tool can then determine what DNS software is used, where the server is located, how many caches there are, etc.

EDNS Compliance Tester

Test name server of zones for correct EDNS support.

The Transitive Trust and DNS Dependency Graph Portal

Shows the trust dependencies in DNS. Given a domain name it can show how zones delegate to each other and why. The delegation is done between IP addresses and zones.

Root Canary Project

The project used to monitor the first root KSK key rollover. Now it contains the paper "Roll, Roll, Roll your Root: A Comprehensive Analysis of the FirstEver DNSSEC Root KSK Rollover" describing the experiences of the first root KSK rollover

Additionally, it includes a tester for DNSSEC algorithm support, which shows the algorithms supported by the currently used recursive resolver. It provides statistics about support for DNSSEC algorithms. It has a web based test to test your own resolver and provides a live monitoring using the RIPA Atlas.

DNSSEC algorithms resolver test


DNS Queries to Authoritative DNS Server at SURFnet by Google's Public DNS Resolver

 https://data.4tu.nl/articles/dataset/DNS_Queries_to_Authoritative_DNS_Server_at_SURFnet_by_Google_s_Public_DNS_Resolver/12682040

DNS | Dataset | Network

This dataset covers approximately 3.5 billion DNS queries that were received at one of SURFnet's authoritative DNS servers from Google's Public DNS Resolver. The queries were collected during 2.5 years. The dataset contains only those queries that contained an EDNS Client Subnet.

The dataset covers data from 2015-06 through 2018-01.

DOI Identifier


DNSDB

 https://scout.dnsdb.info/

DNS | Dataset | Network

Historical DNS database. Contains information recorded at recursive resolver about domain names, first/last seen, current bailiwick. Allows to see the lifetime of resource records and can be used as a large database.






Domain Crawling Lists

DNS | Dataset

Domain popularity lists provide a starting point for crawling domains with the most users. The most commonly used list for security research is the Alexa list.

  • Alexa
    The list is updated daily and contains one million websites. The ranking is based on page views, but very volatile.
  • CISCO Umbrella
    The list is updated daily and contains one million websites. The ranking is based on traffic seen on the OpenDNS resolvers.
  • Majestic
    The list is updated daily and contains one million websites. The ranking is based on backlinks from other websites.
  • Tranco
    A Research-Oriented Top Sites Ranking Hardened Against Manipulation
    The Tranco list aims to provide a better list for security research. The authors explain on their website and their paper what the flaws in the existing lists
  • Quantcast
    The list is updated daily and contains around 500,000 websites. It is based on users visiting the site within the previous month and highly US focussed.



DuckDuckGo Tracker Radar

 https://github.com/duckduckgo/tracker-radar

Dataset | Network

Tracker Radar collects common third party domains and rich metadata about them. The data is collected from the DuckDuckGo crawler. More details are in this blogpost.

This is not a block list, but a data set of the most common third party domains on the web with information about their behavior, classification and ownership. It allows for easy custom solutions with the significant metadata it has for each domain: parent entity, prevalence, use of fingerprinting, cookies, privacy policy, and performance. The data on individual domains can be found in the domains directory.


Forward DNS Rapid7

 https://opendata.rapid7.com/sonar.fdns_v2/

DNS | Dataset | IP | Network

This dataset contains the responses to DNS requests for all forward DNS names known by Rapid7's Project Sonar. Until early November 2017, all of these were for the 'ANY' record with a fallback A and AAAA request if neccessary. After that, the ANY study represents only the responses to ANY requests, and dedicated studies were created for the A, AAAA, CNAME and TXT record lookups with appropriately named files.

The data is updated every month. Historic data can be downloaded after creating a free account.









IETF Officiel RFC Bibtex Downloads

Dataset | Paper Writing | TeX

The IETF now provides official bibtexs to download. They work for RFCs, BCPs, and drafts.

The bibtex for BCPs work, but only, if the BCP consist of a single RFC. If the BCP consists of multiple RFCs, the bibtex will only show the first one.

For drafts, the draft version number, the last two digits, have to be removed from the URL.

Examples:

Available entries can be found in the RFC Index and the BCP Index.






IPmap RIPE

 https://ipmap.ripe.net/

BGP | Dataset | Map | Network | Tool

IP geolocation services feeding itself from geolocation databases, user provided locations, and most importantly active RTT measurements based on the RIPE Atlas system. It also provides a nice API to query the location. It provides a breakdown on where the results stem from and how much they contribute to the overall result.



IPv6 Hitlist Collection

Dataset | IP | Network

https://www.net.in.tum.de/projects/gino/ipv6-hitlist.html

A curated list of IPv6 hosts, gathered by crawling different lists. Includes:

  • Alexa domains
  • Cisco Umbrella
  • CAIDA DNS names
  • Rapis7 DNS ANY and rDNS
  • Various zone files

Access to the full list requires registration by email.

Based on the paper "Scanning the IPv6 Internet: Towards a Comprehensive Hitlist".

https://ipv6hitlist.github.io/

The website contains the additional material of the IMC paper Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists. The IPv6 addresses can be downloaded from the website. The website has three lists, responsive IPv6 addresses, aliased prefixes, and non-aliased prefixes. Additionally, the website also has a list of tools used during the data creation.





Is BGP safe yet?

 https://isbgpsafeyet.com/

BGP | Dataset | Network | RPKI

"Is BGP safe yet?" is an effort by Cloudflare to track the deployment of RPKI filtering accross different ISPs. They provide a tester on the website with which each user can test if the current ISP is filtering RPKI invalid announcements. The website includes a list of networks and if and how they use RPKI (signing and/or filtering).

More details for this project can be found in Cloudflare's blog or on the GitHub project.





List of BGP Routing Datasets

BGP | Dataset | Network

Isolario

Isolario also provides historial routing data in MTR format for their route collectors. The data contains snapshots every two hours and updates with a granularity of five minutes.

Packet Clearing House (PCH)

The Packet Clearing House (PCH) publishes BGP data collected at more than 100 internet exchange points (IXP). The snapshot dataset contains the state of the routing tables in daily intervals.

PCH also provides raw routing data in MRT format. These contain all the update information in sorted by time.

Routing Information Service (RIS)

The RIS is the main resource from RIPE featuring all kinds of datasets about AS assignments and connectivity.

Routeviews

Routeviews is a project by the University of Oregon to provide live and historical BGP routing data.






Lists of DNS Blacklists

DNS | Dataset | IP | Network | Spam | Tool

These projects either operate DNS based Real-time Blackhole Lists (RBL) or allow checking if an IP is contained. The Multi-RBL websites are helpful in finding a large quantity of RBLs.


Malware Bazaar

 https://bazaar.abuse.ch/

Dataset | Malware

The Malware Bazaar is a project by abuse.ch to create an open repository with malware samples. The repository is small in size, but it can be freely downloaded and contributed by everyone. It only contains malicious files, which is in contrast to common malware feeds like Virustotal.


Manchester Academic Phrasebank

 https://www.phrasebank.manchester.ac.uk/

Dataset

The Academic Phrasebank is a general resource for academic writers. It aims to provide you with examples of some of the phraseological ‘nuts and bolts’ of writing organised according to the main sections of a research paper or dissertation.

The data bank contains the categories “Introducing Work”, “Referring to Sources”, “Describing Methods”, “Reporting Results”, “Discussing Findings”, and “Writing Conclusions”.




NIST RPKI Monitor

 https://rpki-monitor.antd.nist.gov/

BGP | Dataset | RPKI

The NIST RPKI Monitor shows different statistics about RPKI adoption and about the validation status. It shows the number of validating prefixes, their history, the autonomous systems with the most VALID and INVALID prefixes and how validation changes over time.





Open INTEL

 https://www.openintel.nl/

DNS | Dataset | IP | Network

Open INTEL is an active DNS database. It gathers information from public zone files, domain lists (Alexa, Umbrella), and reverse DNS entries. Once every 24 hours data is collected about a bunch of DNS RRsets (SOA, NS, A, AAAA, MX, TXT, DNSKEY, DS, NSEC3, CAA, CDS, CDNSKEY). The data is openly avaible as AVRO files and dates back until 2016.

The data can be freely downloaded. There is documentation on the layout of the AVRO files.

The project is similar to Active DNS but seems to be larger in scope.












RIPEstat: Providing open data and insights for Internet resources

 https://stat.ripe.net/

Autonomous System | BGP | DNS | Dataset | Network | Tool

RIPEstat is a network statistics platform by RIPE. The platform shows data for IP addresses, networks, ASNs, and DNS names. This includes information such as the registration information, abuse contacts, blocklist status, BGP information, geolocation lookups, or reverse DNS names. Additionally, the website links to many other useful tools, such as an address space hierarchy viewer, historical whois information, and routing consistency checks.






Root Servers

 https://root-servers.org

DNS | Dataset | Tool

Overview page for the DNS root servers. It contains links to general news and all the supporting organizations.

The website features a map with all geographic locations. It contains information about locations, IPv4/IPv6 reachability and IP addresses.

Each root server has its own subdomain in the form of https://a.root-servers.org. It contains access to historical performance data like:

  • Size and time of zone updates
  • RCODE volume
  • query and response sizes for UDP and TCP
  • traffic volume (packets per time)
  • Unique sources



Shadowserver Scanning Project

 https://scan.shadowserver.org/

DNS | Dataset | Malware | Network

The Shadowserver Scanning projects performs regular Internet wide scans for many protocols. They scan for four main types of protocols:

  1. Amplification protocols, e.g., DNS or NTP
  2. Botnet protocols, e.g., Gameover Zeus or Sality
  3. Protocols that should not be exposed, e.g., Elastic Search, LDAP, or RDP
  4. Vulnerable Protocols, e.g., SSLv3

The website is a great resource to get general statistics about the protocols, like the number of hosts speaking the protocol, their geographic distribution, associated ASNs, and the historic information.





Transient Execution Attacks

 https://transient.fail/

Dataset | Security

The website lists all known speculation side channel attacks. Each attack contains information about the attacked buffer, the affected vendors, and working state. They are sorted into a hierarchy. Each attack is also linked to proof-of-concepts and the academic papers.




WAND Active Measurement Project

 https://amp.wand.net.nz/

Autonomous System | DNS | Dataset | Network | Tool | Traceroute

AMP is a system designed to continuously perform active network measurements between a mesh of specialist monitor machines, as well as to other targets of interest. These measurements are used to provide both a view of long-term network performance as well as to detect notable network events when they happen.

The project is run with a custom client and server software. The measurement results can be viewed on the website. It includes traceroutes, latencies (DNS, HTTP, ICMP, TCP), HTTP page sizes, and packet loss. The software is available as open source.



W³Techs Surveys

 https://w3techs.com/technologies

Dataset

W³Techs crawls a large part of the web with over 10 million sites (Alexa). It focusses on the technologies used to implement the websites. The website offers a variety of statistics, such as the most used languages, frameworks, web servers, and hosting information.