OpenSonarDatasets|声纳技术数据集|水下研究数据集
收藏OpenSonarDatasets 🌊
数据集概述
OpenSonarDatasets 是一个致力于整合开源声呐数据集的仓库,旨在为水下研究和开发提供便利。该仓库鼓励研究人员扩展当前的数据集集合,以增加开源声呐数据集的可见性,并提供一个更便捷的方式来查找和比较数据集。
数据集来源
该数据集比较起源于提交给 IEEE Journal of Oceanic Engineering 的期刊论文 "Sonar-based DL in Underwater Robotics: Overview, Robustness, and Challenges"。初始数据集包含在该论文中,但请注意,未来由社区贡献的数据集将不会包含在原始论文的比较中。
数据集比较表
以下表格比较了当前最先进的声呐水下数据集,分析了声呐类型、数据类型、数据样本数量、数据中的对象标签、数据是否标注、深度学习任务、数据集是否描述了声呐频率、高度等设置,以及数据集的发布年份。
数据集名称 | 声呐类型 | 数据类型 | 数据样本数量 | 对象标签 | 标注类型 | 设置描述 | 发布年份 | 相关论文 |
---|---|---|---|---|---|---|---|---|
Northern Adriatic Reefs | SSS | GeoTIFF | 7 | Reefs | ✗ | ✓ | 2010 | ✗ |
Lago Grey | SSS | Raw | ✗ | Glacier, Walls | ✗ | ✓ | 2019 | Paper |
UCI ML | ✗ | Raw | 211 | Mines, Rocks | Classification | ✗ | ✗ | ✗ |
SeabedObjects-KLSG | SSS | Images | 1190 | Wrecks, Humans, Mines | Classification | ✗ | 2020 | Paper |
Marine_PULSE | SSS | Images | 627 | Pipes, Mounds, Platforms | Classification | ✗ | 2023 | Paper |
NKSID | FLS | Images | 2617 | Infrastructures, Propellers, Tires | Classification | ✓ | 2024 | Paper |
UATD | FLS | Images | 9200 | Tires, Mannequins, Boxes | Object Detection | ✓ | 2022 | Paper |
SSS for Mine Detection | SSS | Images | 1170 | Mines | Object Detection | ✗ | 2024 | Paper |
SWDD | SSS | Images | 7904 | Walls | Object Detection | ✓ | 2024 | Paper |
SubPipe | SSS * | Images | 10030 | Pipelines | Object Detection | ✓ | 2024 | Paper |
UXO | FLS | Images/Raw | 74437 | Unexploded Ordnances | Object Detection | ✓ | 2024 | Paper |
MDT | FLS | Images | 2471 | Infrastructures, Debris | Segmentation | ✓ | 2021 | Paper |
SASSED | SAS | Images | 129 | Muds, Sea Grass, Rocks, Sands | Segmentation | ✗ | 2023 | ✗ |
Seafloor Sediments | SSS | Images | 434164 | Rocks, Marine life | Segmentation | ✓ | 2023 | Paper |
DIDSON | FLS | Images | 1000 | Fishes Species | Segmentation | ✓ | 2022 | Paper |
AI4Shipwreck | SSS | Images | 286 | Shipwrecks | Segmentation | ✓ | 2024 | Paper |
Cave Sonar | MSIS * | Rosbag | 500 meters | Cave Seabed | SLAM | ✓ | 2017 | Paper |
Aurora | MBES, SSS * | Raw | MBES: 81km, SSS: 15h | Seabed, Marine habitats | SLAM | ✓ | 2020 | Paper |
MBES-Slam | MBES | Rosbag | 4 missions | Seabed | SLAM | ✓ | 2022 | Paper |
贡献指南
欢迎社区贡献!如果您有开源声呐数据集并希望添加到此仓库,请创建一个包含数据集表格描述和数据集链接(以及相关论文)的拉取请求。该仓库不存储数据集,而是一个集中目录,用于查找大多数可用数据集的链接。通过贡献,您可以帮助创建一个中心位置,使研究人员能够轻松访问和比较声呐数据集,最终促进水下机器人领域的发展。

VoxBox
VoxBox是一个大规模语音语料库,由多样化的开源数据集构建而成,用于训练文本到语音(TTS)系统。
github 收录
中国裁判文书网
中国裁判文书网是中国最高人民法院设立的官方网站,旨在公开各级法院的裁判文书。该数据集包含了大量的法律文书,如判决书、裁定书、调解书等,涵盖了民事、刑事、行政、知识产权等多个法律领域。
wenshu.court.gov.cn 收录
ChemBL
ChemBL是一个化学信息学数据库,包含大量生物活性数据,涵盖了药物发现和开发过程中的各种化学实体。数据集包括化合物的结构信息、生物活性数据、靶点信息等。
www.ebi.ac.uk 收录
Canadian Census
**Overview** The data package provides demographics for Canadian population groups according to multiple location categories: Forward Sortation Areas (FSAs), Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs), Federal Electoral Districts (FEDs), Health Regions (HRs) and provinces. **Description** The data are available through the Canadian Census and the National Household Survey (NHS), separated or combined. The main demographic indicators provided for the population groups, stratified not only by location but also for the majority by demographical and socioeconomic characteristics, are population number, females and males, usual residents and private dwellings. The primary use of the data at the Health Region level is for health surveillance and population health research. Federal and provincial departments of health and human resources, social service agencies, and other types of government agencies use the information to monitor, plan, implement and evaluate programs to improve the health of Canadians and the efficiency of health services. Researchers from various fields use the information to conduct research to improve health. Non-profit health organizations and the media use the health region data to raise awareness about health, an issue of concern to all Canadians. The Census population counts for a particular geographic area representing the number of Canadians whose usual place of residence is in that area, regardless of where they happened to be on Census Day. Also included are any Canadians who were staying in that area on Census Day and who had no usual place of residence elsewhere in Canada, as well as those considered to be 'non-permanent residents'. National Household Survey (NHS) provides demographic data for various levels of geography, including provinces and territories, census metropolitan areas/census agglomerations, census divisions, census subdivisions, census tracts, federal electoral districts and health regions. In order to provide a comprehensive overview of an area, this product presents data from both the NHS and the Census. NHS data topics include immigration and ethnocultural diversity; aboriginal peoples; education and labor; mobility and migration; language of work; income and housing. 2011 Census data topics include population and dwelling counts; age and sex; families, households and marital status; structural type of dwelling and collectives; and language. The data are collected for private dwellings occupied by usual residents. A private dwelling is a dwelling in which a person or a group of persons permanently reside. Information for the National Household Survey does not include information for collective dwellings. Collective dwellings are dwellings used for commercial, institutional or communal purposes, such as a hotel, a hospital or a work camp. **Benefits** - Useful for canada public health stakeholders, for public health specialist or specialized public and other interested parties. for health surveillance and population health research. for monitoring, planning, implementation and evaluation of health-related programs. media agencies may use the health regions data to raise awareness about health, an issue of concern to all canadians. giving the addition of longitude and latitude in some of the datasets the data can be useful to transpose the values into geographical representations. the fields descriptions along with the dataset description are useful for the user to quickly understand the data and the dataset. **License Information** The use of John Snow Labs datasets is free for personal and research purposes. For commercial use please subscribe to the [Data Library](https://www.johnsnowlabs.com/marketplace/) on John Snow Labs website. The subscription will allow you to use all John Snow Labs datasets and data packages for commercial purposes. **Included Datasets** - [Canadian Population and Dwelling by FSA 2011](https://www.johnsnowlabs.com/marketplace/canadian-population-and-dwelling-by-fsa-2011) - This Canadian Census dataset covers data on population, total private dwellings and private dwellings occupied by usual residents by forward sortation area (FSA). It is enriched with the percentage of the population or dwellings versus the total amount as well as the geographical area, province, and latitude and longitude. The whole Canada's population is marked as 100, referring to 100% for the percentages. - [Detailed Canadian Population Statistics by CMAs and CAs 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-cmas-and-cas-2011) - This dataset covers the population statistics of Canada by Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs). It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by FED 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-fed-2011) - This dataset covers the population statistics of Canada from 2011 by Federal Electoral District of 2013 Representation Order. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by Health Region 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-health-region-2011) - This dataset covers the population statistics of Canada by health region. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by Province 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-province-2011) - This dataset covers the population statistics of Canada by provinces and territories. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. **Data Engineering Overview** **We deliver high-quality data** - Each dataset goes through 3 levels of quality review - 2 Manual reviews are done by domain experts - Then, an automated set of 60+ validations enforces every datum matches metadata & defined constraints - Data is normalized into one unified type system - All dates, unites, codes, currencies look the same - All null values are normalized to the same value - All dataset and field names are SQL and Hive compliant - Data and Metadata - Data is available in both CSV and Apache Parquet format, optimized for high read performance on distributed Hadoop, Spark & MPP clusters - Metadata is provided in the open Frictionless Data standard, and its every field is normalized & validated - Data Updates - Data updates support replace-on-update: outdated foreign keys are deprecated, not deleted **Our data is curated and enriched by domain experts** Each dataset is manually curated by our team of doctors, pharmacists, public health & medical billing experts: - Field names, descriptions, and normalized values are chosen by people who actually understand their meaning - Healthcare & life science experts add categories, search keywords, descriptions and more to each dataset - Both manual and automated data enrichment supported for clinical codes, providers, drugs, and geo-locations - The data is always kept up to date – even when the source requires manual effort to get updates - Support for data subscribers is provided directly by the domain experts who curated the data sets - Every data source’s license is manually verified to allow for royalty-free commercial use and redistribution. **Need Help?** If you have questions about our products, contact us at [info@johnsnowlabs.com](mailto:info@johnsnowlabs.com).
Databricks 收录
MedDialog
MedDialog数据集(中文)包含了医生和患者之间的对话(中文)。它有110万个对话和400万个话语。数据还在不断增长,会有更多的对话加入。原始对话来自好大夫网。
github 收录