Rapid7 Open Data offers researchers and community members open access to data from Project Sonar, which conducts internet-wide surveys to gain insights into global exposure to common vulnerabilities.
Project Sonar is a security research project by Rapid7 that conducts internet-wide surveys across different services and protocols to gain insights into global exposure to common vulnerabilities. The data collected is available to the public in an effort to enable security research.
This page contains a condensed version of the project activities. Please visit the following posts for further details and the motivation behind Project Sonar:
Project Sonar gathers data in two stages. In the first stage, this involves scanning all public IPv4 addresses in an attempt to determine which have the respective service port open. Once an IP is identified as meeting these criteria, collection activities take place which involve connecting to and communicating with the service.
Project Sonar performs its scans from several different subnets, which can be whitelisted or blacklisted at your preference:
Note that prior to August 23, 2018, Sonar scanned from two /27s. These are no longer used by Sonar and should not be blacklisted or otherwise attributable to Sonar or Rapid7 going forward:
Project Sonar performs its collection activities from AWS EC2 us-west-1, us-west-2 and us-east-1 nodes with non-static IP addresses, and as such cannot be readily whitelisted or blacklisted themselves, however it is sufficient to blacklist or whitelist the scan range listed above.
At no point does Sonar bypass any technical barriers or otherwise access non-public-facing computers. We are doing everything possible to reduce impact on remote networks and we follow best practices as already outlined by the ZMap developers.
Sonar collects all SSL certificates visible on public IPv4 HTTPS web servers. This data can be used to detect changes such as malicious replacement of certificates or reveal the revocation of a compromised previous certificate. This data is complementary to the Electronic Frontier Foundation's SSL Observatory project. Other purposes include detection of insecurely reused or still actively used revoked certificates. In addition, with the Sonar data one can see all IP addresses / services that claim to represent a particular domain - which in turn can be used for asset identification and detection of malicious certificate usage. Also the certificate fields can be used for soft- and hardware identification in specific situations. The SSL work is being expanded to encompass non-HTTP services, such as SSL and STARTTLS-enabled email services like SMTP, IMAP and POP.
Sonar gathers the reverse DNS records for all IPv4 addresses. This data enables organizational asset discovery and can help identify misconfigurations and possibly DNS hijacking attempts.
Sonar uses the domain names gathered from the above processes as well as certain TLD zone files to conduct DNS "ANY" record requests. This data is also useful for asset discovery and the identification of phishing portals, as well as new malicious domains matching algorithmic patterns.
Sonar scans a growing number of TCP and UDP services. TCP studies include SSH, SMB, Telnet, RDP, Mongo, Redis, CouchDB, and more. UDP studies include NetBIOS, DNS, NTP, IPMI, NAT-PMP, BACNet, SIP, SNMP, MDNS, and quite a few others. We use the metadata from these publicly exposed services to identify large-scale misconfigurations and vulnerabilities in consumer, enterprise, and critical infrastructure systems.
All datasets gathered are post-processed and published in compressed form for public use. You can find the data on opendata.rapid7.com.
Project Sonar employs a range of open-source tools, most notably the ZMap software developed by Zakir Durumeric, Eric Wustrow, and J. Alex Halderman at the University of Michigan. We publish a few of our own tools as well, including DAP and Recog, both of which are used in the processing stage of our scanning system.
Use of the Open Data research datasets available on this website ("Open Data datasets") is subject to the following terms. By accessing or using Open Data datasets, you accept these terms of service. If you are using Open Data datasets on behalf of another organization or entity, you represent that you have authority to accept these terms on behalf of the organization or entity and that the organization or entity accepts these terms. Subject to these terms, Rapid7 grants you a worldwide, non-exclusive, non-transferable license to use or reproduce Open Data datasets for internal or research purposes. Open Data datasets is published on this website to enhance cybersecurity by providing insights into global exposure to common vulnerabilities. The data may not be used:
To the extent you wish to license this data for incorporation into a commercial offering, please contact us at research[at]rapid7.com.
If this data is to be used as part of research or other non-commercial efforts, you must credit Rapid7 Labs. For any use or redistribution of the database, or works produced from it, you must make clear to others the license of the database and keep intact any notices on the original database.
You agree to abide by all applicable laws when using Open Data datasets. You are responsible at all times for the consequences of your use of Open Data datasets. Rapid7 is not responsible for the actions of third parties, and you agree to hold harmless and indemnify Rapid7 and its affiliates, officers, employees, and agents from any claim, action, or damages, known and unknown, related to the use of Open Data datasets. Rapid7 does not make any representations or warranties of any kind regarding Open Data datasets. If any portion of these terms is found to be unenforceable, the remaining portion shall remain in effect. If Rapid7 does not enforce these terms, it shall not be considered a waiver of the terms. Rapid7 reserves the right to update and modify these terms from time to time.
Feel free to contact research[at]rapid7.com regarding further questions. We also appreciate any community analysis results and hope for your collaboration.
In case you would like to be excluded from some or all of our probes please let us know at research[at]rapid7.com - make sure to mention your CIDR blocks / list of IP addresses and affiliation.
Please note that as part of the opt-out process we attempt to verify that the requestor has been delegated or otherwise controls the network addresses in the opt-out request. We typically perform this verification via WHOIS and other tools. If we cannot verify delegation or ownership we are unlikely to opt-out the requested addresses. Similarly if the WHOIS delegation changes we may also remove the opt-out. It can be requested again in the future.
Security data feeds available to Practitioners, Academics, and Researchers