legmlai/finefrench-v1
收藏Hugging Face2025-06-30 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/legmlai/finefrench-v1
下载链接
链接失效反馈官方服务:
资源简介:
Fine-French 是一个由人工智能自动筛选的法国网络语料库,使用了 GPT-4 对文本质量进行评估,并过滤掉了低质量内容。该数据集旨在用于训练高质量的文本生成模型,并由 Mohamad Alhajar 进行筛选。数据集包含多种特征,如文本、ID、语言、语言评分以及一个指示内容是否被过滤的字段。数据集遵循 ODC-By 1.0 许可证,并推荐用于文本生成任务。README 文件还提供了保留和过滤内容的示例,以及用于过滤的方法。此外,它还提供了使用说明、性能指标和支持联系信息。
Fine-French is a 100% AI-automated French web corpus curated by GPT-4 for text quality assessment and filtering out low-quality content. It is designed for training high-quality language models and is curated by Mohamad Alhajar. The dataset includes various features such as text, ID, language, language score, and a field indicating whether the content was filtered out. The dataset is licensed under ODC-By 1.0 and is recommended for use in text generation tasks. The README also includes examples of content that was kept and filtered, as well as the methodology used for filtering. Additionally, it provides usage instructions, performance metrics, and contact information for support.
提供机构:
legmlai



