lilacai/lilac-opus100-en-us-validation
收藏Hugging Face2023-08-21 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/lilacai/lilac-opus100-en-us-validation
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is generated by [Lilac](http://lilacml.com) for a HuggingFace Space: [huggingface.co/spaces/lilacai/lilac](https://huggingface.co/spaces/lilacai/lilac).
Original dataset: [https://huggingface.co/datasets/opus100](https://huggingface.co/datasets/opus100)
Lilac dataset config:
```embeddings:
- embedding: gte-small
path: [translation, en]
- embedding: gte-small
path: [translation, es]
name: opus100-en-us-validation
namespace: lilac
settings:
preferred_embedding: gte-small
ui:
media_paths:
- [translation, es]
- [translation, en]
signals:
- path: [translation, es]
signal: {signal_name: near_dup}
- path: [translation, es]
signal: {signal_name: pii}
- path: [translation, es]
signal: {signal_name: lang_detection}
- path: [translation, es]
signal: {signal_name: text_statistics}
- path: [translation, en]
signal: {signal_name: near_dup}
- path: [translation, en]
signal: {signal_name: text_statistics}
- path: [translation, en]
signal: {signal_name: pii}
- path: [translation, en]
signal: {signal_name: lang_detection}
source: {config_name: en-es, dataset_name: opus100, source_name: huggingface, split: validation}
tags: [machine-learning]
```
提供机构:
lilacai
原始信息汇总
数据集概述
数据集配置
- 名称: opus100-en-us-validation
- 命名空间: lilac
- 源配置:
- 数据集名称: opus100
- 配置名称: en-es
- 源名称: huggingface
- 分割: validation
嵌入设置
- 嵌入: gte-small
- 路径:
- [translation, en]
- [translation, es]
用户界面设置
- 媒体路径:
- [translation, es]
- [translation, en]
信号设置
- 路径: [translation, es]
- 信号:
- near_dup
- pii
- lang_detection
- text_statistics
- 信号:
- 路径: [translation, en]
- 信号:
- near_dup
- text_statistics
- pii
- lang_detection
- 信号:
标签
- 标签: [machine-learning]



