five

Data from: Egg morphology fails to identify nests parasitized by conspecifics in common pochard: a test based on protein fingerprinting and including female relatedness

收藏
DataONE2016-06-15 更新2024-06-26 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈
资源简介:
Conspecific brood parasites lay eggs in nests of other females of the same species. A variety of methods have been developed and used to detect conspecific brood parasitism (CBP). Traditional methods may be inaccurate in detecting CBP and in revealing its true frequency. On the other hand more accurate molecular methods are expensive and time consuming. Eadie developed a method for revealing CBP based on differences in egg morphology. That method is based on Euclidean distances calculated for pairs of eggs within a clutch using standardized egg measurements (length, width and weight). We tested the applicability of this method in the common pochard Aythya ferina using nests that were identified as parasitized (39 nests) or non-parasitized (16 nests) based on protein fingerprinting of eggs. We also analyzed whether we can distinguish between parasitic and host eggs in the nest. We found that variation in MED can be explained by parasitism but there was a huge overlap in MED between parasitized and non-parasitized nests. MED also increased with clutch size. Using discriminant function analysis (DFA) we found that only 76.4% of nests were correctly assigned as parasitized or nonparasitized and only 68.3% of eggs as parasitic or host eggs. Moreover we found that MED in parasitized nests increased with relatedness of the females that laid eggs in the nest. This finding was supported by positive correlation between MED and estimated relatedness in female–female pairs. Although variation in egg morphology is associated with CBP, it does not provide a reliable clue for distinguishing parasitized nests from non-parasitized nests in common pochard.
创建时间:
2016-06-15
用户留言
有没有相关的论文或文献参考?
这个数据集是基于什么背景创建的?
数据集的作者是谁?
能帮我联系到这个数据集的作者吗?
这个数据集如何下载?
点击留言
数据主题
具身智能
数据集  4099个
机构  8个
大模型
数据集  439个
机构  10个
无人机
数据集  37个
机构  6个
指令微调
数据集  36个
机构  6个
蛋白质结构
数据集  50个
机构  8个
空间智能
数据集  21个
机构  5个
5,000+
优质数据集
54 个
任务类型
进入经典数据集
热门数据集

aqcat25

<h1 align="center" style="font-size: 36px;"> <span style="color: #FFD700;">AQCat25 Dataset:</span> Unlocking spin-aware, high-fidelity machine learning potentials for heterogeneous catalysis </h1> ![datset_schematic](https://cdn-uploads.huggingface.co/production/uploads/67256b7931376d3bacb18de0/W1Orc_AmSgRez5iKH0qjC.jpeg) This repository contains the **AQCat25 dataset**. AQCat25-EV2 models can be accessed [here](https://huggingface.co/SandboxAQ/aqcat25-ev2). The AQCat25 dataset provides a large and diverse collection of **13.5 million** DFT calculation trajectories, encompassing approximately 5K materials and 47K intermediate-catalyst systems. It is designed to complement existing large-scale datasets by providing calculations at **higher fidelity** and including critical **spin-polarized** systems, which are essential for accurately modeling many industrially relevant catalysts. Please see our [website](https://www.sandboxaq.com/aqcat25) and [paper](https://cdn.prod.website-files.com/622a3cfaa89636b753810f04/68ffc1e7c907b6088573ba8c_AQCat25.pdf) for more details about the impact of the dataset and [models](https://huggingface.co/SandboxAQ/aqcat25-ev2). ## 1. AQCat25 Dataset Details This repository uses a hybrid approach, providing lightweight, queryable Parquet files for each split alongside compressed archives (`.tar.gz`) of the raw ASE database files. More details can be found below. ### Queryable Metadata (Parquet Files) A set of Parquet files provides a "table of contents" for the dataset. They can be loaded directly with the `datasets` library for fast browsing and filtering. Each file contains the following columns: | Column Name | Data Type | Description | Example | | :--- | :--- | :--- | :--- | | `frame_id` | string | **Unique ID for this dataset**. Formatted as `database_name::index`. | `data.0015.aselmdb::42` | | `adsorption_energy`| float | **Key Target**. The calculated adsorption energy in eV. | -1.542 | | `total_energy` | float | The raw total energy of the adslab system from DFT (in eV). | -567.123 | | `fmax` | float | The maximum force magnitude on any single atom in eV/Å. | 0.028 | | `is_spin_off` | boolean | `True` if the system is non-magnetic (VASP ISPIN=1). | `false` | | `mag` | float | The total magnetization of the system (µB). | 32.619 | | `slab_id` | string | Identifier for the clean slab structure. | `mp-1216478_001_2_False` | | `adsorbate` | string | SMILES or chemical formula of the adsorbate. | `*NH2N(CH3)2` | | `is_rerun` | boolean | `True` if the calculation is a continuation. | `false` | | `is_md` | boolean | `True` if the frame is from a molecular dynamics run. | `false` | | `sid` | string | The original system ID from the source data. | `vadslabboth_82` | | `fid` | integer | The original frame index (step number) from the source VASP calculation. | 0 | --- #### Understanding `frame_id` and `fid` | Field | Purpose | Example | | :--- | :--- | :--- | | `fid` | **Original Frame Index**: This is the step number from the original VASP relaxation (`ionic_steps`). It tells you where the frame came from in its source simulation. | `4` (the 5th frame of a specific VASP run) | | `frame_id` | **Unique Dataset Pointer**: This is a new ID created for this specific dataset. It tells you exactly which file (`data.0015.aselmdb`) and which row (`101`) to look in to find the full atomic structure. | `data.0015.aselmdb::101` | --- ## Downloadable Data Archives The full, raw data for each split is available for download in compressed `.tar.gz` archives. The table below provides direct download links. The queryable Parquet files for each split can be loaded directly using the `datasets` library as shown in the "Example Usage" section. The data currently available for download (totaling ~11.1M frames, as listed in the table below) is the initial dataset version (v1.0) released on September 10, 2025. The 13.5M frame count mentioned in our paper and the introduction includes additional data used to rebalance non-magnetic element systems and add a low-fidelity spin-on dataset. These new data splits will be added to this repository soon. | Split Name | Structures | Archive Size | Download Link | | :--- | :--- | :--- | :--- | | ***In-Domain (ID)*** | | | | | Train | `7,386,750` | `23.8 GB` | [`train_id.tar.gz`](https://huggingface.co/datasets/SandboxAQ/aqcat25-dataset/resolve/main/train_id.tar.gz) | | Validation | `254,498` | `825 MB` | [`val_id.tar.gz`](https://huggingface.co/datasets/SandboxAQ/aqcat25-dataset/resolve/main/val_id.tar.gz) | | Test | `260,647` | `850 MB` | [`test_id.tar.gz`](https://huggingface.co/datasets/SandboxAQ/aqcat25-dataset/resolve/main/test_id.tar.gz) | | Slabs | `898,530` | `2.56 GB` | [`id_slabs.tar.gz`](https://huggingface.co/datasets/SandboxAQ/aqcat25-dataset/resolve/main/id_slabs.tar.gz) | | ***Out-of-Distribution (OOD) Validation*** | | | | | OOD Ads (Val) | `577,368` | `1.74 GB` | [`val_ood_ads.tar.gz`](https://huggingface.co/datasets/SandboxAQ/aqcat25-dataset/resolve/main/val_ood_ads.tar.gz) | | OOD Materials (Val) | `317,642` | `963 MB` | [`val_ood_mat.tar.gz`](https://huggingface.co/datasets/SandboxAQ/aqcat25-dataset/resolve/main/val_ood_mat.tar.gz) | | OOD Both (Val) | `294,824` | `880 MB` | [`val_ood_both.tar.gz`](https://huggingface.co/datasets/SandboxAQ/aqcat25-dataset/resolve/main/val_ood_both.tar.gz) | | OOD Slabs (Val) | `28,971` | `83 MB` | [`val_ood_slabs.tar.gz`](https://huggingface.co/datasets/SandboxAQ/aqcat25-dataset/resolve/main/val_ood_slabs.tar.gz) | | ***Out-of-Distribution (OOD) Test*** | | | | | OOD Ads (Test) | `346,738` | `1.05 GB` | [`test_ood_ads.tar.gz`](https://huggingface.co/datasets/SandboxAQ/aqcat25-dataset/resolve/main/test_ood_ads.tar.gz) | | OOD Materials (Test) | `315,931` | `993 MB` | [`test_ood_mat.tar.gz`](https://huggingface.co/datasets/SandboxAQ/aqcat25-dataset/resolve/main/test_ood_mat.tar.gz) | | OOD Both (Test) | `355,504` | `1.1 GB` | [`test_ood_both.tar.gz`](https://huggingface.co/datasets/SandboxAQ/aqcat25-dataset/resolve/main/test_ood_both.tar.gz) | | OOD Slabs (Test) | `35,936` | `109 MB` | [`test_ood_slabs.tar.gz`](https://huggingface.co/datasets/SandboxAQ/aqcat25-dataset/resolve/main/test_ood_slabs.tar.gz) | --- ## 2. Dataset Usage Guide This guide outlines the recommended workflow for accessing and querying the AQCat25 dataset. ### 2.1 Initial Setup Before you begin, you need to install the necessary libraries and authenticate with Hugging Face. This is a one-time setup. ```bash pip install datasets pandas ase tqdm requests huggingface_hub ase-db-backends ``` **1. Create a Hugging Face Account:** If you don't have one, create an account at [huggingface.co](https://huggingface.co/join). **2. Create an Access Token:** Navigate to your **Settings -> Access Tokens** page or click [here](https://huggingface.co/settings/tokens). Create a new token with at least **`read`** permissions. Copy this token to your clipboard. **3. Log in via the Command Line:** Open your terminal and run the following command: ```bash hf auth login ``` ### 2.2 Get the Helper Scripts You may copy the scripts directly from this repository, or download them by running the following in your local python environment: ```python from huggingface_hub import snapshot_download snapshot_download( repo_id="SandboxAQ/aqcat25", repo_type="dataset", allow_patterns=["scripts/*", "README.md"], local_dir="./aqcat25" ) ``` This will create a local folder named aqcat25 containing the scripts/ directory. ### 2.3 Download Desired Dataset Splits Data splits may be downloaded directly via the Hugging Face UI, or via the `download_split.py` script (found in `aqcat25/scripts/`). ```bash python aqcat25/scripts/download_split.py --split val_id ``` This will download `val_id.tar.gz` and extract it to a new folder named `aqcat_data/val_id/`. ### 2.4 Query the Dataset Use the `query_aqcat.py` script to filter the dataset and extract the specific atomic structures you need. It first queries the metadata on the Hub and then extracts the full structures from your locally downloaded files. **Example 1: Find all CO and OH structures in the test set:** ```bash python aqcat25/scripts/query_aqcat.py \ --split test_id \ --adsorbates "*CO" "*OH" \ --data-root ./aqcat_data/test_id ``` **Example 2: Find structures on metal slabs with low adsorption energy:** ```bash python aqcat25/scripts/query_aqcat.py \ --split val_ood_both \ --max-energy -2.0 \ --material-type nonmetal \ --magnetism magnetic \ --data-root ./aqcat_data/val_ood_both \ --output-file low_energy_metals.extxyz ``` **Example 3: Find CO on slabs containing both Ni AND Se with adsorption energy between -2.5 and -1.5 eV with a miller index of 011** ```bash python aqcat25/scripts/query_aqcat.py \ --split val_ood_ads \ --adsorbates "*COCH2OH" \ --min-energy -2.5 \ --max-energy -1.5 \ --contains-elements "Ni" "Se" \ --element-filter-mode all \ --facet 011 \ --data-root ./aqcat_data/val_ood_ads \ --output-file COCH2OH_on_ni_and_se.extxyz ``` --- ## 3. How to Cite If you use the AQCat25 dataset or the models in your research, please cite the following paper: ``` Omar Allam, Brook Wander, & Aayush R. Singh. (2025). AQCat25: Unlocking spin-aware, high-fidelity machine learning potentials for heterogeneous catalysis. arXiv preprint arXiv:XXXX.XXXXX. ``` ### BibTeX Entry ```bibtex @article{allam2025aqcat25, title={{AQCat25: Unlocking spin-aware, high-fidelity machine learning potentials for heterogeneous catalysis}}, author={Allam, Omar and Wander, Brook and Singh, Aayush R}, journal={arXiv preprint arXiv:2510.22938}, year={2025}, eprint={2510.22938}, archivePrefix={arXiv}, primaryClass={cond-mat.mtrl-sci} } ```

魔搭社区 收录

China Health and Nutrition Survey (CHNS)

China Health and Nutrition Survey(CHNS)是一项由美国北卡罗来纳大学人口中心与中国疾病预防控制中心营养与健康所合作开展的长期开放性队列研究项目,旨在评估国家和地方政府的健康、营养与家庭计划政策对人群健康和营养状况的影响,以及社会经济转型对居民健康行为和健康结果的作用。该调查覆盖中国15个省份和直辖市的约7200户家庭、超过30000名个体,采用多阶段随机抽样方法,收集了家庭、个体以及社区层面的详细数据,包括饮食、健康、经济和社会因素等信息。自2011年起,CHNS不断扩展,新增多个城市和省份,并持续完善纵向数据链接,为研究中国社会经济变化与健康营养的动态关系提供了重要的数据支持。

www.cpc.unc.edu 收录

CODrone

CODrone 是一个为无人机设计的全面定向目标检测数据集,它准确反映了真实世界条件。该数据集包含来自多个城市在不同光照条件下的广泛标注图像,增强了基准的逼真度。CODrone 包含超过 10,000 张高分辨率图像,捕获自五个城市的真实无人机飞行,涵盖了各种城市和工业环境,包括港口和码头。为了提高鲁棒性和泛化能力,它包括在正常光线、低光和夜间条件下相同场景的图像。我们采用了三种飞行高度和两种常用的相机角度,从而产生了六个不同的视角配置。所有图像都针对 12 个常见对象类别进行了定向边界框标注,总计超过 590,000 个标记实例。总体而言,这项工作构建了一个综合数据集和基准,用于城市无人机场景中的定向目标检测,旨在满足该领域的研究和实践应用需求。

arXiv 收录

⛈️ Digital Typhoon Dataset AU (GIFs| 20GB)

⛈️ Digital Typhoon Dataset Australia (Animated GIFs)

kaggle 收录

中国农村教育发展报告

该数据集包含了中国农村教育发展的相关数据,涵盖了教育资源分布、教育质量、学生表现等多个方面的信息。

www.moe.gov.cn 收录