ltg/en-wiki-paraphrased
收藏Hugging Face2025-01-23 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/ltg/en-wiki-paraphrased
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是从英文维基百科中选取过去一年访问量最高的10%的文章(大约4亿个单词)构成的释义对数据集。释义是由经过指令微调的Mistral 7B语言模型生成的,采用了核采样和top-k采样技术,并通过重复惩罚来避免生成低质量输出。释义生成的模板鼓励创造性改写,不改变原意或信息。
This dataset consists of paraphrase pairs constructed from the English Wikipedia, selecting the top 10% most visited articles in the past year (about 400 million words). The paraphrases are generated by an instruction-tuned Mistral 7B language model using nucleus sampling and top-k sampling techniques, with a repetition penalty to avoid low-quality outputs. The template for generating paraphrases encourages creative rephrasing without changing the original meaning or information.
提供机构:
ltg
原始信息汇总
数据集许可证
- 许可证类型:Apache 2.0



