five

lxyuan/Synthetic-NuNER

收藏
Hugging Face2024-06-07 更新2024-06-15 收录
下载链接:
https://hf-mirror.com/datasets/lxyuan/Synthetic-NuNER
下载链接
链接失效反馈
官方服务:
资源简介:
名为`Synthetic-Nuner`的数据集是通过从现有的nuner数据集中迭代采样创建的。我们使用每个样本来指导TheBloke/Mistral-7B-Instruct-v0.1-GPTQ模型生成10个新的合成条目。这些条目准确地反映了原始数据集的结构、特征和统计数据,确保了Synthetic-Nuner数据集保留了原始numind/NuNER数据集的完整性,并提供了新的数据点,适用于测试、实验和进一步研究。
提供机构:
lxyuan
原始信息汇总

数据集概述

基本信息

  • 许可证:Apache-2.0
  • 任务类别:token-classification
  • 语言:英语
  • 数据集名称:Synthetic-NuNER
  • 数据规模:100K<n<1M

数据集描述

Synthetic-NuNER 数据集是通过从现有的 "nuner" 数据集中迭代采样创建的。每个样本用于指导 TheBloke/Mistral-7B-Instruct-v0.1-GPTQ 模型生成10个新的合成条目。这些条目准确反映了原始数据集的结构、特征和统计数据。此过程确保 Synthetic-NuNER 保留了原始 numind/NuNER 数据集的完整性,并提供了理想的新数据点用于测试、实验和进一步研究。

使用方法

python from datasets import load_dataset

my_dataset = load_dataset("lxyuan/Synthetic-NuNER")

DatasetDict({ train: Dataset({ features: [input, output], num_rows: 967700 }) })

for _ in range(100): random_index = random.randrange(len(my_dataset["train"]))

sample = my_dataset["train"][random_index]

print(f"[INDEX]: {random_index}")
print(f"[INPUT]: {sample[input]}")
print(f"[OUTPUT]: {sample[output]}")
print()

[INPUT]: The Great Depression, World War II, the Cold War, these are just some of the most significant events in world history. [OUTPUT]: [The Great Depression <> Event, World War II <> Event, The Cold War <> Event]

[INPUT]: The Cold War was a state of political and military tension between the Western powers (led by the United States) and the Eastern powers (led by the Soviet Union) after World War II until 1991. [OUTPUT]: [Cold War <> Historical period, political and military tension between Western and Eastern powers <> International conflict, Western powers <> Group of countries, Eastern powers <> Group of countries, arms race <> Arms competition, space race <> Space exploration, 1945 to 1991 <> Time period]

[INPUT]: I know it can be challenging--I have a lot of great responses to this question, and I offer classes specifically on this topic and consulting with organizations to do it. [OUTPUT]: [classes <> education, consulting <> service, organizations <> business]

[INPUT]: The gift was wrapped in white paper and tied with a bow. No sticky tape, no fuss, no stress! [OUTPUT]: [gift <> object, paper <> object, bow <> object, stress <> emotion]

[INPUT]: The French Revolution was a period of political and social upheaval in France. It was characterized by the overthrow of the monarchy and the establishment of a republic, as well as by a series of reforms and social changes. [OUTPUT]: [French Revolution <> historical period, political and social upheaval <> political and social change, overthrow of the monarchy <> political change, establishment of a republic <> political institution, series of reforms and social changes <> social change]

[INPUT]: The New York Stock Exchange (NYSE) has announced a new trading platform. [OUTPUT]: [New York Stock Exchange (NYSE) <> Stock exchange, Trading platform <> Technology]

[INPUT]: The first iPhone was released in 2007. [OUTPUT]: [first iPhone <> Electronics device, released in <> Year, 2007 <> Year]

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作