npr
收藏魔搭社区2025-12-04 更新2025-01-11 收录
下载链接:
https://modelscope.cn/datasets/sentence-transformers/npr
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for NPR
This dataset is a collection of title-body pairs collected from NPR using Pushshift. See [embedding-training-data](https://huggingface.co/datasets/sentence-transformers/embedding-training-data) for additional information.
This dataset can be used directly with Sentence Transformers to train embedding models.
## Dataset Subsets
### `pair` subset
* Columns: "title", "body"
* Column types: `str`, `str`
* Examples:
```python
{
'title': "Al-Awlaki's Death Raises Questions About U.S. Tactics",
'body': "A joint CIA and U.S. military operation targeted and killed the cleric Anwar al-Awlaki in an air strike this week. Awlaki had been linked to terrorist attacks against the United States and was a key target for several years. NPR's Rachel Martin shares the latest with host Scott Simon.",
}
```
* Collection strategy: Reading the NPR dataset from [embedding-training-data](https://huggingface.co/datasets/sentence-transformers/embedding-training-data).
* Deduplified: No
# NPR 数据集卡片
本数据集为通过Pushshift从NPR平台采集的标题-正文对集合。更多信息请参阅[embedding-training-data](https://huggingface.co/datasets/sentence-transformers/embedding-training-data)。
本数据集可直接配合Sentence Transformers用于嵌入模型的训练。
## 数据集子集
### `pair` 子集
* 列名:"title"、"body"
* 列类型:字符串(`str`)、字符串(`str`)
* 示例:
python
{
'title': "Al-Awlaki's Death Raises Questions About U.S. Tactics",
'body': "A joint CIA and U.S. military operation targeted and killed the cleric Anwar al-Awlaki in an air strike this week. Awlaki had been linked to terrorist attacks against the United States and was a key target for several years. NPR's Rachel Martin shares the latest with host Scott Simon.",
}
* 采集策略:从[embedding-training-data](https://huggingface.co/datasets/sentence-transformers/embedding-training-data)中读取NPR数据集。
* 去重:否
提供机构:
maas
创建时间:
2025-01-06



