five

mehr4n-m/autotrain-data-nllb_600_ft

收藏
Hugging Face2022-09-22 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/mehr4n-m/autotrain-data-nllb_600_ft
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - conditional-text-generation --- # AutoTrain Dataset for project: nllb_600_ft ## Dataset Description This dataset has been automatically processed by AutoTrain for project nllb_600_ft. ### Languages The BCP-47 code for the dataset's language is unk. ## Dataset Structure ### Data Instances A sample from this dataset looks as follows: ```json [ { "feat_id": "772", "feat_URL": "https://en.wikivoyage.org/wiki/Apia", "feat_domain": "wikivoyage", "feat_topic": "Travel", "feat_has_image": "0", "feat_has_hyperlink": "0", "text": "All the ships were sunk, except for one British cruiser. Nearly 200 American and German lives were lost.", "target": "\u0628\u0647\u200c\u062c\u0632 \u06cc\u06a9 \u06a9\u0634\u062a\u06cc \u062c\u0646\u06af\u06cc \u0627\u0646\u06af\u0644\u06cc\u0633\u06cc \u0647\u0645\u0647 \u06a9\u0634\u062a\u06cc\u200c\u0647\u0627 \u063a\u0631\u0642 \u0634\u062f\u0646\u062f\u060c \u0648 \u0646\u0632\u062f\u06cc\u06a9 \u0628\u0647 200 \u0646\u0641\u0631 \u0622\u0645\u0631\u06cc\u06a9\u0627\u06cc\u06cc \u0648 \u0622\u0644\u0645\u0627\u0646\u06cc \u062c\u0627\u0646 \u062e\u0648\u062f \u0631\u0627 \u0627\u0632 \u062f\u0633\u062a \u062f\u0627\u062f\u0646\u062f." }, { "feat_id": "195", "feat_URL": "https://en.wikinews.org/wiki/Mitt_Romney_wins_Iowa_Caucus_by_eight_votes_over_surging_Rick_Santorum", "feat_domain": "wikinews", "feat_topic": "Politics", "feat_has_image": "0", "feat_has_hyperlink": "0", "text": "Bachmann, who won the Ames Straw Poll in August, decided to end her campaign.", "target": "\u0628\u0627\u062e\u0645\u0646\u060c \u06a9\u0647 \u062f\u0631 \u0645\u0627\u0647 \u0622\u06af\u0648\u0633\u062a \u0628\u0631\u0646\u062f\u0647 \u0646\u0638\u0631\u0633\u0646\u062c\u06cc \u0622\u0645\u0633 \u0627\u0633\u062a\u0631\u0627\u0648 \u0634\u062f\u060c \u062a\u0635\u0645\u06cc\u0645 \u06af\u0631\u0641\u062a \u06a9\u0645\u067e\u06cc\u0646 \u062e\u0648\u062f \u0631\u0627 \u062e\u0627\u062a\u0645\u0647 \u062f\u0647\u062f." } ] ``` ### Dataset Fields The dataset has the following fields (also called "features"): ```json { "feat_id": "Value(dtype='string', id=None)", "feat_URL": "Value(dtype='string', id=None)", "feat_domain": "Value(dtype='string', id=None)", "feat_topic": "Value(dtype='string', id=None)", "feat_has_image": "Value(dtype='string', id=None)", "feat_has_hyperlink": "Value(dtype='string', id=None)", "text": "Value(dtype='string', id=None)", "target": "Value(dtype='string', id=None)" } ``` ### Dataset Splits This dataset is split into a train and validation split. The split sizes are as follow: | Split name | Num samples | | ------------ | ------------------- | | train | 1608 | | valid | 402 |
提供机构:
mehr4n-m
原始信息汇总

AutoTrain Dataset for project: nllb_600_ft

数据集描述

该数据集由AutoTrain自动处理,用于项目nllb_600_ft。

语言

数据集的语言BCP-47代码为unk。

数据集结构

数据实例

数据集的一个样本如下所示:

json [ { "feat_id": "772", "feat_URL": "https://en.wikivoyage.org/wiki/Apia", "feat_domain": "wikivoyage", "feat_topic": "Travel", "feat_has_image": "0", "feat_has_hyperlink": "0", "text": "All the ships were sunk, except for one British cruiser. Nearly 200 American and German lives were lost.", "target": "u0628u0647u200cu062cu0632 u06ccu06a9 u06a9u0634u062au06cc u062cu0646u06afu06cc u0627u0646u06afu0644u06ccu0633u06cc u0647u0645u0647 u06a9u0634u062au06ccu200cu0647u0627 u063au0631u0642 u0634u062fu0646u062fu060c u0648 u0646u0632u062fu06ccu06a9 u0628u0647 200 u0646u0641u0631 u0622u0645u0631u06ccu06a9u0627u06ccu06cc u0648 u0622u0644u0645u0627u0646u06cc u062cu0627u0646 u062eu0648u062f u0631u0627 u0627u0632 u062fu0633u062a u062fu0627u062fu0646u062f." }, { "feat_id": "195", "feat_URL": "https://en.wikinews.org/wiki/Mitt_Romney_wins_Iowa_Caucus_by_eight_votes_over_surging_Rick_Santorum", "feat_domain": "wikinews", "feat_topic": "Politics", "feat_has_image": "0", "feat_has_hyperlink": "0", "text": "Bachmann, who won the Ames Straw Poll in August, decided to end her campaign.", "target": "u0628u0627u062eu0645u0646u060c u06a9u0647 u062fu0631 u0645u0627u0647 u0622u06afu0648u0633u062a u0628u0631u0646u062fu0647 u0646u0638u0631u0633u0646u062cu06cc u0622u0645u0633 u0627u0633u062au0631u0627u0648 u0634u062fu060c u062au0635u0645u06ccu0645 u06afu0631u0641u062a u06a9u0645u067eu06ccu0646 u062eu0648u062f u0631u0627 u062eu0627u062au0645u0647 u062fu0647u062f." } ]

数据集字段

数据集包含以下字段(也称为“特征”):

json { "feat_id": "Value(dtype=string, id=None)", "feat_URL": "Value(dtype=string, id=None)", "feat_domain": "Value(dtype=string, id=None)", "feat_topic": "Value(dtype=string, id=None)", "feat_has_image": "Value(dtype=string, id=None)", "feat_has_hyperlink": "Value(dtype=string, id=None)", "text": "Value(dtype=string, id=None)", "target": "Value(dtype=string, id=None)" }

数据集分割

该数据集被分割为训练集和验证集。分割大小如下:

分割名称 样本数量
train 1608
valid 402
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作