five

issdandavis/scbe-life-science-research-training-demo

收藏
Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/issdandavis/scbe-life-science-research-training-demo
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: SCBE Research Training Package language: - en license: mit tags: - research - pubmed - instruction-tuning - citation-generation task_categories: - text-generation - question-answering size_categories: - n<1K --- # SCBE Research Training Package This package was generated from live `pubmed` pulls for the query `protein structure prediction` and is meant for lightweight Hugging Face dataset and SFT experiments. ## Files - `papers.jsonl`: normalized raw research records - `sft_train.jsonl`: train split for instruction-style tasks - `sft_validation.jsonl`: validation split - `dataset_manifest.json`: build metadata - `research_report.md`: generated paper/report artifact ## Counts - Raw papers: `5` - Train rows: `10` - Validation rows: `0` - Tasks per paper: `metadata-extraction`, `citation-generation` ## Lead Papers - [Evolutionary-scale prediction of atomic-level protein structure with a language model.](https://pubmed.ncbi.nlm.nih.gov/36927031/) (`36927031`) - [Protein Structure Prediction: Conventional and Deep Learning Perspectives.](https://pubmed.ncbi.nlm.nih.gov/34050498/) (`34050498`) - [Integrated structure-based protein interface prediction.](https://pubmed.ncbi.nlm.nih.gov/35879651/) (`35879651`) - [Protein structure prediction using the evolutionary algorithm USPEX.](https://pubmed.ncbi.nlm.nih.gov/36780132/) (`36780132`) - [The trRosetta server for fast and accurate protein structure prediction.](https://pubmed.ncbi.nlm.nih.gov/34759384/) (`34759384`) ## Schema Raw records include `id`, `title`, `authors`, `primary_category`, `categories`, `published`, `updated`, `summary`, `url`, `pdf_url`, and a `text` field suitable for indexing. SFT rows include `instruction`, `response`, `messages`, `source_id`, and `task`.
提供机构:
issdandavis
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作