mehr4n-m/autotrain-data-nllb_600_ft
收藏AutoTrain Dataset for project: nllb_600_ft
数据集描述
该数据集由AutoTrain自动处理,用于项目nllb_600_ft。
语言
数据集的语言BCP-47代码为unk。
数据集结构
数据实例
数据集的一个样本如下所示:
json [ { "feat_id": "772", "feat_URL": "https://en.wikivoyage.org/wiki/Apia", "feat_domain": "wikivoyage", "feat_topic": "Travel", "feat_has_image": "0", "feat_has_hyperlink": "0", "text": "All the ships were sunk, except for one British cruiser. Nearly 200 American and German lives were lost.", "target": "u0628u0647u200cu062cu0632 u06ccu06a9 u06a9u0634u062au06cc u062cu0646u06afu06cc u0627u0646u06afu0644u06ccu0633u06cc u0647u0645u0647 u06a9u0634u062au06ccu200cu0647u0627 u063au0631u0642 u0634u062fu0646u062fu060c u0648 u0646u0632u062fu06ccu06a9 u0628u0647 200 u0646u0641u0631 u0622u0645u0631u06ccu06a9u0627u06ccu06cc u0648 u0622u0644u0645u0627u0646u06cc u062cu0627u0646 u062eu0648u062f u0631u0627 u0627u0632 u062fu0633u062a u062fu0627u062fu0646u062f." }, { "feat_id": "195", "feat_URL": "https://en.wikinews.org/wiki/Mitt_Romney_wins_Iowa_Caucus_by_eight_votes_over_surging_Rick_Santorum", "feat_domain": "wikinews", "feat_topic": "Politics", "feat_has_image": "0", "feat_has_hyperlink": "0", "text": "Bachmann, who won the Ames Straw Poll in August, decided to end her campaign.", "target": "u0628u0627u062eu0645u0646u060c u06a9u0647 u062fu0631 u0645u0627u0647 u0622u06afu0648u0633u062a u0628u0631u0646u062fu0647 u0646u0638u0631u0633u0646u062cu06cc u0622u0645u0633 u0627u0633u062au0631u0627u0648 u0634u062fu060c u062au0635u0645u06ccu0645 u06afu0631u0641u062a u06a9u0645u067eu06ccu0646 u062eu0648u062f u0631u0627 u062eu0627u062au0645u0647 u062fu0647u062f." } ]
数据集字段
数据集包含以下字段(也称为“特征”):
json { "feat_id": "Value(dtype=string, id=None)", "feat_URL": "Value(dtype=string, id=None)", "feat_domain": "Value(dtype=string, id=None)", "feat_topic": "Value(dtype=string, id=None)", "feat_has_image": "Value(dtype=string, id=None)", "feat_has_hyperlink": "Value(dtype=string, id=None)", "text": "Value(dtype=string, id=None)", "target": "Value(dtype=string, id=None)" }
数据集分割
该数据集被分割为训练集和验证集。分割大小如下:
| 分割名称 | 样本数量 |
|---|---|
| train | 1608 |
| valid | 402 |



