issdandavis/scbe-life-science-research-training-demo
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/issdandavis/scbe-life-science-research-training-demo
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: SCBE Research Training Package
language:
- en
license: mit
tags:
- research
- pubmed
- instruction-tuning
- citation-generation
task_categories:
- text-generation
- question-answering
size_categories:
- n<1K
---
# SCBE Research Training Package
This package was generated from live `pubmed` pulls for the query `protein structure prediction` and is meant for
lightweight Hugging Face dataset and SFT experiments.
## Files
- `papers.jsonl`: normalized raw research records
- `sft_train.jsonl`: train split for instruction-style tasks
- `sft_validation.jsonl`: validation split
- `dataset_manifest.json`: build metadata
- `research_report.md`: generated paper/report artifact
## Counts
- Raw papers: `5`
- Train rows: `10`
- Validation rows: `0`
- Tasks per paper: `metadata-extraction`, `citation-generation`
## Lead Papers
- [Evolutionary-scale prediction of atomic-level protein structure with a language model.](https://pubmed.ncbi.nlm.nih.gov/36927031/) (`36927031`)
- [Protein Structure Prediction: Conventional and Deep Learning Perspectives.](https://pubmed.ncbi.nlm.nih.gov/34050498/) (`34050498`)
- [Integrated structure-based protein interface prediction.](https://pubmed.ncbi.nlm.nih.gov/35879651/) (`35879651`)
- [Protein structure prediction using the evolutionary algorithm USPEX.](https://pubmed.ncbi.nlm.nih.gov/36780132/) (`36780132`)
- [The trRosetta server for fast and accurate protein structure prediction.](https://pubmed.ncbi.nlm.nih.gov/34759384/) (`34759384`)
## Schema
Raw records include `id`, `title`, `authors`, `primary_category`, `categories`, `published`,
`updated`, `summary`, `url`, `pdf_url`, and a `text` field suitable for indexing.
SFT rows include `instruction`, `response`, `messages`, `source_id`, and `task`.
提供机构:
issdandavis



