five

RosettaCommons/2J-Protein-Couplings

收藏
Hugging Face2026-03-19 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/RosettaCommons/2J-Protein-Couplings
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: other license_name: non-commercial-license-dyna1 license_link: https://github.com/WaymentSteeleLab/Dyna-1/blob/main/LICENSE.txt tags: - proteins - nmr pretty_name: 2J_Coupling size_categories: - n<1K viewer: true configs: - config_name: main data_files: - split: 2J_couplings path: 2J_couplings.csv --- <h1>2J-Protein-Coupling Dataset</h1> This data set was curated from the paper below accessed through the Biological Magnetic Resonance Data Bank (BMRB). There are a total of 3999 2J coupling taken from 5 different proteins and up to 10 different experiments. This dataset contains information regarding PDB ID, Sequence, 2J coupling data of 15N, 13C, and 1H. Data was curated and organized into this set of the five papers below, with the addition of the sequence taken from the Protein Data Bank. <h2>Raw Data Source</h2> <p>This dataset was curated from the following research paper:</p> <ul> <p> Schmidt, Jurgen; Hua, Yixun; Lohr, Frank. "Correlation of (2)J couplings with protein secondary structure." Proteins 78, 1544-1562 (2010). <p> https://doi.org/10.1002/prot.22672 </p> </ul> <h2>Quickstart Usage</h2> <h3>1. Install the datasets Library</h3> <p>From the terminal / command line:</p> <pre><code>pip install datasets </code></pre> <p>In Jupyter Notebook, prefix with <code>!</code>:</p> <pre><code>!pip install datasets </code></pre> <h3>2. Load the Dataset in Python</h3> <pre><code>from datasets import load_dataset dataset_protein = load_dataset( "RosettaCommons/2J-Protein-Coupling", data_files="Protein/*.csv" ) </code></pre> <h3>3. Access Dataset Columns</h3> <pre><code># Example: Get protein sequences and lengths sequences = dataset_protein["train"]["sequence"] lengths = dataset_protein["train"]["sequence_length"] print(sequences[:5]) print(lengths[:5]) </code></pre> <h2>Dataset Description</h2> <p>Each row represents a single polymer chain within a larger macromolecular assembly. Key fields include:</p> <ul> <li><strong>entry_id</strong>: ID from the Biological Magnetic Resonance Data Bank (BMRB)</li> <li><strong>file_name</strong>: Named by experiment and entry_id</li> <li><strong>sequence_length</strong>: Length of the protein sequence</li> <li><strong>sequence</strong>: Protein sequence in single-letter amino acid code</li> <li><strong>experiment_code</strong>: Type of experiment</li> <li><strong>num_measurements</strong>: Number of experiments recorded for each protein</li> </ul> ### Citation ``` @article{Schmidt2010, title = {Correlation of 2J couplings with protein secondary structure}, volume = {78}, ISSN = {1097-0134}, url = {http://dx.doi.org/10.1002/prot.22672}, DOI = {10.1002/prot.22672}, number = {6}, journal = {Proteins: Structure, Function, and Bioinformatics}, publisher = {Wiley}, author = {Schmidt, J\"{u}rgen M. and Hua, Yixun and L\"{o}hr, Frank}, year = {2010}, month = feb, pages = {1544–1562} } ``` <i>_This dataset was curated by Nicolas Langdon (<a href="mailto:nblangdon@wesleyan.edu">nblangdon@wesleyan.edu</a>) from the five original papers listed above._</i>
提供机构:
RosettaCommons
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作