five

specter

收藏
魔搭社区2025-11-12 更新2025-01-11 收录
下载链接:
https://modelscope.cn/datasets/sentence-transformers/specter
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for Specter This dataset is a collection of title-related-unrelated triplets from Scientific Publications on Specter. See [Specter](https://github.com/allenai/specter) for additional information. This dataset can be used directly with Sentence Transformers to train embedding models. ## Dataset Subsets ### `triplet` subset * Columns: "anchor", "positive", "negative" * Column types: `str`, `str`, `str` * Examples: ```python { 'anchor': "Integrating children's contributions in the interaction design process", 'positive': 'Designing for or designing with? Informant design for interactive learning environments', 'negative': 'Power Operation in ISD: Technological Frames Perspectives.', } ``` * Collection strategy: Reading the Specter dataset from [embedding-training-data](https://huggingface.co/datasets/sentence-transformers/embedding-training-data), followed by deduplication. * Deduplified: Yes ### `pair` subset * Columns: "anchor", "positive" * Column types: `str`, `str` * Examples: ```python { 'anchor': 'Time-dependent trajectory regression on road networks via multi-task learning', 'positive': 'Convex multi-task feature learning', } ``` * Collection strategy: Reading the Specter dataset from [embedding-training-data](https://huggingface.co/datasets/sentence-transformers/embedding-training-data), only taking the title and related title, and then performing deduplication. * Deduplified: Yes

# 数据集卡片:Specter 本数据集是科学出版物中标题相关与不相关三元组的集合,更多信息可参见[Specter](https://github.com/allenai/specter)。该数据集可直接配合Sentence Transformers(句子变换器)用于嵌入模型的训练。 ## 数据集子集 ### `triplet` 子集 * 列名:锚样本(anchor)、正样本(positive)、负样本(negative) * 列类型:均为字符串(str) * 示例: python { 'anchor': "Integrating children's contributions in the interaction design process", 'positive': 'Designing for or designing with? Informant design for interactive learning environments', 'negative': 'Power Operation in ISD: Technological Frames Perspectives.', } * 采集策略:从[embedding-training-data](https://huggingface.co/datasets/sentence-transformers/embedding-training-data)数据集加载Specter数据集,随后执行去重操作。 * 已去重:是 ### `pair` 子集 * 列名:锚样本(anchor)、正样本(positive) * 列类型:均为字符串(str) * 示例: python { 'anchor': 'Time-dependent trajectory regression on road networks via multi-task learning', 'positive': 'Convex multi-task feature learning', } * 采集策略:从[embedding-training-data](https://huggingface.co/datasets/sentence-transformers/embedding-training-data)数据集加载Specter数据集,仅提取标题及相关标题后进行去重。 * 已去重:是
提供机构:
maas
创建时间:
2025-01-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作