HTTPS Ecosystem Scans
收藏DataCite Commons2020-09-20 更新2025-04-09 收录
下载链接:
https://www.impactcybertrust.org/dataset_view?idDataset=1093
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is composed of four parts: parsed certificates, raw certificates, individual scans (status of each responsive host in a single complete scan of the IPv4 address space), and raw ZMap output of TCP SYN scans on port 443. While we have split these into individual parts, the data is optimized for use in a relational database such as PostgreSQL or MySQL. The files certificates.csv.gz, public_keys.csv.gz, and extraneous_extensions.csv.gz contain parsed data from all certificates we have encountered over the course of our scanning. The certificates relation contains all common data found in a certificate (e.g. subject, issuer, etc). The relation is keyed on "id" and is also unique based on SHA-1 fingerprint. The issuer_id attribute is a self-referntial attribute back to the parent certificate's id. Certificates are valided using OpenSSL and recently downloaded root stores. We attempt to validate each certificate against the browser store along with any previously seen intermediate certificates in order to account for missing certificate chains. The validation is represented in the is-*-trusted attributes. We further validate the certificate for other issues (e.g. expiration, invalid signature), not including the trust chain, which is stored in the is-valid and validation-error attributes. The keys relation contains unique parsed RSA and DSA keys and is linked to by certificates.public_key_id == public_keys.id. Other types of keys are noted in the certificates relation, but are not otherwise further parsed. All other non-binary X.509 extensions are stored in the extraneous extensions relation. The scan files we provide contain data about every host that completed a successful TLS handshake on port 443 during a single comprehensive scan of the IPv4 address space. For each host we include: host IP address, certificate ID, the SHA-1 fingerprint of the certificate, and the timestamp at which the TLS handshake was completed. The data specifically originates from a PostgreSQL 9.2 database, whose schema is available in schema.txt, and we recommend for hosting this dataset. Strings are delimited with a double-quote and newlines are replaced with \n. Information about specific fields can be found in schema.txt. ; https-team@umich.edu
提供机构:
IMPACT
创建时间:
2018-10-25



