five

lilacai/lilac-opus100-en-us-validation

收藏
Hugging Face2023-08-21 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/lilacai/lilac-opus100-en-us-validation
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset is generated by [Lilac](http://lilacml.com) for a HuggingFace Space: [huggingface.co/spaces/lilacai/lilac](https://huggingface.co/spaces/lilacai/lilac). Original dataset: [https://huggingface.co/datasets/opus100](https://huggingface.co/datasets/opus100) Lilac dataset config: ```embeddings: - embedding: gte-small path: [translation, en] - embedding: gte-small path: [translation, es] name: opus100-en-us-validation namespace: lilac settings: preferred_embedding: gte-small ui: media_paths: - [translation, es] - [translation, en] signals: - path: [translation, es] signal: {signal_name: near_dup} - path: [translation, es] signal: {signal_name: pii} - path: [translation, es] signal: {signal_name: lang_detection} - path: [translation, es] signal: {signal_name: text_statistics} - path: [translation, en] signal: {signal_name: near_dup} - path: [translation, en] signal: {signal_name: text_statistics} - path: [translation, en] signal: {signal_name: pii} - path: [translation, en] signal: {signal_name: lang_detection} source: {config_name: en-es, dataset_name: opus100, source_name: huggingface, split: validation} tags: [machine-learning] ```
提供机构:
lilacai
原始信息汇总

数据集概述

数据集配置

  • 名称: opus100-en-us-validation
  • 命名空间: lilac
  • 源配置:
    • 数据集名称: opus100
    • 配置名称: en-es
    • 源名称: huggingface
    • 分割: validation

嵌入设置

  • 嵌入: gte-small
  • 路径:
    • [translation, en]
    • [translation, es]

用户界面设置

  • 媒体路径:
    • [translation, es]
    • [translation, en]

信号设置

  • 路径: [translation, es]
    • 信号:
      • near_dup
      • pii
      • lang_detection
      • text_statistics
  • 路径: [translation, en]
    • 信号:
      • near_dup
      • text_statistics
      • pii
      • lang_detection

标签

  • 标签: [machine-learning]
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作