laion/Pes2oX-fulltext
收藏Hugging Face2024-09-29 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/laion/Pes2oX-fulltext
下载链接
链接失效反馈官方服务:
资源简介:
Pes2oX Full Text数据集是从Allen AI的Pes2o数据集转换而来的,旨在通过重新结构和重组原始数据集,使其更易于研究团体用于训练人工智能模型和特定任务的微调。重构后的数据集简化了使用过程,研究团体可以直接从Hugging Face流式传输或下载数据集,无需进行繁琐的提取过程。数据集保留了原始Pes2o数据集的结构和内容,未进行文本数据清洗的预处理,以防止非英语论文的Unicode中断。由于模式和数据类型的不一致,数据集中缺失了162行数据。
Introducing Pes2oX Full Text, a transformed dataset derived from the original Allen AIs Pes2o dataset. Our focus in this dataset was to restructure and reorganize the original Pes2o dataset to make it more accessible for training Artificial Intelligence models and fine-tuning for specific tasks within a particular domain. The restructuring simplifies the process of using the dataset, providing an out-of-the-box solution that allows research groups to either stream the dataset from Hugging Face or download it directly, eliminating the need for a tedious extraction process. The dataset preserves the original Pes2o datasets structure and content, avoiding preprocessing for textual data cleaning to prevent unicode disruption, as some papers in the dataset are not in English. Due to schema and data-type discrepancies, 162 rows are absent from this dataset.
提供机构:
laion



