GilbertKrantz/scientific_papers-cleaned
收藏Hugging Face2024-11-28 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/GilbertKrantz/scientific_papers-cleaned
下载链接
链接失效反馈官方服务:
资源简介:
*Scientific Papers - Cleaned*数据集是一个经过清理和整理的科学论文数据集,专门用于研究和自然语言处理任务。该数据集包含输入-输出对,适用于文本摘要、改写或科学语言理解等任务。数据集由Wilbert Chandra整理,语言为英语,采用MIT许可证。数据集包含三个部分:训练集、验证集和测试集,每个部分都有唯一的标识符、输入文本和相应的输出文本。数据集的大小为34.1 MB,下载大小为17.5 MB。数据集的创建目的是促进科学文本的生成和摘要研究,强调清晰和简洁。数据集可能包含源材料中的偏见,用户在使用时应谨慎验证。
The *Scientific Papers - Cleaned* dataset is a cleaned and curated version of scientific papers designed for research and natural language processing tasks. It includes input-output pairs suitable for tasks such as text summarization, paraphrasing, or scientific language understanding. The dataset was curated by Wilbert Chandra, is in English, and is licensed under the MIT License. The dataset consists of three splits: train, validation, and test, each with unique identifiers, input texts, and corresponding output texts. The dataset size is 34.1 MB, with a download size of 17.5 MB. The dataset was created to facilitate research in text generation and summarization for scientific texts, emphasizing clarity and conciseness. The dataset may contain biases inherent in its source material, and users should exercise caution and validate findings when using it.
提供机构:
GilbertKrantz



